Message ID | 20250103010002.619062-1-cel@kernel.org (mailing list archive) |
---|---|
State | Under Review |
Headers | show |
Series | [v1,1/2] Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages" | expand |
On Thu, 2025-01-02 at 20:00 -0500, cel@kernel.org wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > I noticed that a handful of NFSv3 fstests were taking an > unexpectedly long time to run. Troubleshooting showed that the > server's TCP window closed and never re-opened, which caused the > client to trigger an RPC retransmit timeout after 180 seconds. > > The client's recovery action was to establish a fresh connection > and retransmit the timed-out requests. This worked, but it adds a > long delay. > > I tracked the problem to the commit that attempted to reduce the > rate at which the network layer delivers TCP socket data_ready > callbacks. Under most circumstances this change worked as expected, > but for NFSv3, which has no session or other type of throttling, it > can overwhelm the receiver on occasion. > > I'm sure I could tweak the lowat settings, but the small benefit > doesn't seem worth the bother. Just revert it. > > Fixes: 2b877fc53e97 ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages") > Cc: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > net/sunrpc/svcsock.c | 12 +----------- > 1 file changed, 1 insertion(+), 11 deletions(-) > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 95397677673b..cb3bd12f5818 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -1083,9 +1083,6 @@ static void svc_tcp_fragment_received(struct svc_sock *svsk) > /* If we have more data, signal svc_xprt_enqueue() to try again */ > svsk->sk_tcplen = 0; > svsk->sk_marker = xdr_zero; > - > - smp_wmb(); > - tcp_set_rcvlowat(svsk->sk_sk, 1); > } > > /** > @@ -1175,17 +1172,10 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) > goto err_delete; > if (len == want) > svc_tcp_fragment_received(svsk); > - else { > - /* Avoid more ->sk_data_ready() calls until the rest > - * of the message has arrived. This reduces service > - * thread wake-ups on large incoming messages. */ > - tcp_set_rcvlowat(svsk->sk_sk, > - svc_sock_reclen(svsk) - svsk->sk_tcplen); Could we instead clamp this to the window size so that we at least only do a wakeup for each window-size chunk? > - > + else > trace_svcsock_tcp_recv_short(&svsk->sk_xprt, > svc_sock_reclen(svsk), > svsk->sk_tcplen - sizeof(rpc_fraghdr)); > - } > goto err_noclose; > error: > if (len != -EAGAIN) Reverting for now is fine though. Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Sat, 04 Jan 2025, Jeff Layton wrote: > On Thu, 2025-01-02 at 20:00 -0500, cel@kernel.org wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > I noticed that a handful of NFSv3 fstests were taking an > > unexpectedly long time to run. Troubleshooting showed that the > > server's TCP window closed and never re-opened, which caused the > > client to trigger an RPC retransmit timeout after 180 seconds. > > > > The client's recovery action was to establish a fresh connection > > and retransmit the timed-out requests. This worked, but it adds a > > long delay. > > > > I tracked the problem to the commit that attempted to reduce the > > rate at which the network layer delivers TCP socket data_ready > > callbacks. Under most circumstances this change worked as expected, > > but for NFSv3, which has no session or other type of throttling, it > > can overwhelm the receiver on occasion. > > > > I'm sure I could tweak the lowat settings, but the small benefit > > doesn't seem worth the bother. Just revert it. > > > > Fixes: 2b877fc53e97 ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages") > > Cc: Jakub Kicinski <kuba@kernel.org> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > --- > > net/sunrpc/svcsock.c | 12 +----------- > > 1 file changed, 1 insertion(+), 11 deletions(-) > > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > > index 95397677673b..cb3bd12f5818 100644 > > --- a/net/sunrpc/svcsock.c > > +++ b/net/sunrpc/svcsock.c > > @@ -1083,9 +1083,6 @@ static void svc_tcp_fragment_received(struct svc_sock *svsk) > > /* If we have more data, signal svc_xprt_enqueue() to try again */ > > svsk->sk_tcplen = 0; > > svsk->sk_marker = xdr_zero; > > - > > - smp_wmb(); > > - tcp_set_rcvlowat(svsk->sk_sk, 1); > > } > > > > /** > > @@ -1175,17 +1172,10 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) > > goto err_delete; > > if (len == want) > > svc_tcp_fragment_received(svsk); > > - else { > > - /* Avoid more ->sk_data_ready() calls until the rest > > - * of the message has arrived. This reduces service > > - * thread wake-ups on large incoming messages. */ > > - tcp_set_rcvlowat(svsk->sk_sk, > > - svc_sock_reclen(svsk) - svsk->sk_tcplen); > > Could we instead clamp this to the window size so that we at least only > do a wakeup for each window-size chunk? tcp_set_rcvlowat() already imposes and upper limit - based on configured queue space. So I don't think we need to clamp. I note that the calculation is wrong. sk_tcplen includes the size of the 4byte length header, but svc_sock_reclen() does not. Other places there these two numbers are compared, sizeof(rpc_fraghdr) is subtracted from sk_tcplen. So there is a corner case where the size passed to tcp_set_rcvlowat() is negative. The two numbers in the subtraction are u32 and the function expects "int". I didn't dig enough to understand how an negative would be handled. > > > - > > + else > > trace_svcsock_tcp_recv_short(&svsk->sk_xprt, > > svc_sock_reclen(svsk), > > svsk->sk_tcplen - sizeof(rpc_fraghdr)); > > - } > > goto err_noclose; > > error: > > if (len != -EAGAIN) > > Reverting for now is fine though. Agreed. I would expect modern network cards to deliver fairly large packets to the application already. Still, it would be nice to understand what is really happening. Setting rcvlowat shouldn't be able to cause the window size to go to zero. NeilBrown > > Reviewed-by: Jeff Layton <jlayton@kernel.org> >
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 95397677673b..cb3bd12f5818 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1083,9 +1083,6 @@ static void svc_tcp_fragment_received(struct svc_sock *svsk) /* If we have more data, signal svc_xprt_enqueue() to try again */ svsk->sk_tcplen = 0; svsk->sk_marker = xdr_zero; - - smp_wmb(); - tcp_set_rcvlowat(svsk->sk_sk, 1); } /** @@ -1175,17 +1172,10 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) goto err_delete; if (len == want) svc_tcp_fragment_received(svsk); - else { - /* Avoid more ->sk_data_ready() calls until the rest - * of the message has arrived. This reduces service - * thread wake-ups on large incoming messages. */ - tcp_set_rcvlowat(svsk->sk_sk, - svc_sock_reclen(svsk) - svsk->sk_tcplen); - + else trace_svcsock_tcp_recv_short(&svsk->sk_xprt, svc_sock_reclen(svsk), svsk->sk_tcplen - sizeof(rpc_fraghdr)); - } goto err_noclose; error: if (len != -EAGAIN)