diff mbox series

[v1,1/2] Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"

Message ID 20250103010002.619062-1-cel@kernel.org (mailing list archive)
State Under Review
Headers show
Series [v1,1/2] Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages" | expand

Commit Message

Chuck Lever Jan. 3, 2025, 1 a.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

I noticed that a handful of NFSv3 fstests were taking an
unexpectedly long time to run. Troubleshooting showed that the
server's TCP window closed and never re-opened, which caused the
client to trigger an RPC retransmit timeout after 180 seconds.

The client's recovery action was to establish a fresh connection
and retransmit the timed-out requests. This worked, but it adds a
long delay.

I tracked the problem to the commit that attempted to reduce the
rate at which the network layer delivers TCP socket data_ready
callbacks. Under most circumstances this change worked as expected,
but for NFSv3, which has no session or other type of throttling, it
can overwhelm the receiver on occasion.

I'm sure I could tweak the lowat settings, but the small benefit
doesn't seem worth the bother. Just revert it.

Fixes: 2b877fc53e97 ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages")
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svcsock.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

Comments

Jeff Layton Jan. 3, 2025, 2:11 p.m. UTC | #1
On Thu, 2025-01-02 at 20:00 -0500, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> I noticed that a handful of NFSv3 fstests were taking an
> unexpectedly long time to run. Troubleshooting showed that the
> server's TCP window closed and never re-opened, which caused the
> client to trigger an RPC retransmit timeout after 180 seconds.
> 
> The client's recovery action was to establish a fresh connection
> and retransmit the timed-out requests. This worked, but it adds a
> long delay.
> 
> I tracked the problem to the commit that attempted to reduce the
> rate at which the network layer delivers TCP socket data_ready
> callbacks. Under most circumstances this change worked as expected,
> but for NFSv3, which has no session or other type of throttling, it
> can overwhelm the receiver on occasion.
> 
> I'm sure I could tweak the lowat settings, but the small benefit
> doesn't seem worth the bother. Just revert it.
> 
> Fixes: 2b877fc53e97 ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages")
> Cc: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/svcsock.c | 12 +-----------
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 95397677673b..cb3bd12f5818 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1083,9 +1083,6 @@ static void svc_tcp_fragment_received(struct svc_sock *svsk)
>  	/* If we have more data, signal svc_xprt_enqueue() to try again */
>  	svsk->sk_tcplen = 0;
>  	svsk->sk_marker = xdr_zero;
> -
> -	smp_wmb();
> -	tcp_set_rcvlowat(svsk->sk_sk, 1);
>  }
>  
>  /**
> @@ -1175,17 +1172,10 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>  		goto err_delete;
>  	if (len == want)
>  		svc_tcp_fragment_received(svsk);
> -	else {
> -		/* Avoid more ->sk_data_ready() calls until the rest
> -		 * of the message has arrived. This reduces service
> -		 * thread wake-ups on large incoming messages. */
> -		tcp_set_rcvlowat(svsk->sk_sk,
> -				 svc_sock_reclen(svsk) - svsk->sk_tcplen);

Could we instead clamp this to the window size so that we at least only
do a wakeup for each window-size chunk?

> -
> +	else
>  		trace_svcsock_tcp_recv_short(&svsk->sk_xprt,
>  				svc_sock_reclen(svsk),
>  				svsk->sk_tcplen - sizeof(rpc_fraghdr));
> -	}
>  	goto err_noclose;
>  error:
>  	if (len != -EAGAIN)

Reverting for now is fine though.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
NeilBrown Jan. 4, 2025, 12:44 a.m. UTC | #2
On Sat, 04 Jan 2025, Jeff Layton wrote:
> On Thu, 2025-01-02 at 20:00 -0500, cel@kernel.org wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > I noticed that a handful of NFSv3 fstests were taking an
> > unexpectedly long time to run. Troubleshooting showed that the
> > server's TCP window closed and never re-opened, which caused the
> > client to trigger an RPC retransmit timeout after 180 seconds.
> > 
> > The client's recovery action was to establish a fresh connection
> > and retransmit the timed-out requests. This worked, but it adds a
> > long delay.
> > 
> > I tracked the problem to the commit that attempted to reduce the
> > rate at which the network layer delivers TCP socket data_ready
> > callbacks. Under most circumstances this change worked as expected,
> > but for NFSv3, which has no session or other type of throttling, it
> > can overwhelm the receiver on occasion.
> > 
> > I'm sure I could tweak the lowat settings, but the small benefit
> > doesn't seem worth the bother. Just revert it.
> > 
> > Fixes: 2b877fc53e97 ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages")
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  net/sunrpc/svcsock.c | 12 +-----------
> >  1 file changed, 1 insertion(+), 11 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 95397677673b..cb3bd12f5818 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -1083,9 +1083,6 @@ static void svc_tcp_fragment_received(struct svc_sock *svsk)
> >  	/* If we have more data, signal svc_xprt_enqueue() to try again */
> >  	svsk->sk_tcplen = 0;
> >  	svsk->sk_marker = xdr_zero;
> > -
> > -	smp_wmb();
> > -	tcp_set_rcvlowat(svsk->sk_sk, 1);
> >  }
> >  
> >  /**
> > @@ -1175,17 +1172,10 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
> >  		goto err_delete;
> >  	if (len == want)
> >  		svc_tcp_fragment_received(svsk);
> > -	else {
> > -		/* Avoid more ->sk_data_ready() calls until the rest
> > -		 * of the message has arrived. This reduces service
> > -		 * thread wake-ups on large incoming messages. */
> > -		tcp_set_rcvlowat(svsk->sk_sk,
> > -				 svc_sock_reclen(svsk) - svsk->sk_tcplen);
> 
> Could we instead clamp this to the window size so that we at least only
> do a wakeup for each window-size chunk?

tcp_set_rcvlowat() already imposes and upper limit - based on configured
queue space.  So I don't think we need to clamp.

I note that the calculation is wrong. sk_tcplen includes the size of the
4byte length header, but svc_sock_reclen() does not.  Other places there
these two numbers are compared, sizeof(rpc_fraghdr) is subtracted from
sk_tcplen.
So there is a corner case where the size passed to tcp_set_rcvlowat() is
negative.  The two numbers in the subtraction are u32 and the function
expects "int".
I didn't dig enough to understand how an negative would be handled.

> 
> > -
> > +	else
> >  		trace_svcsock_tcp_recv_short(&svsk->sk_xprt,
> >  				svc_sock_reclen(svsk),
> >  				svsk->sk_tcplen - sizeof(rpc_fraghdr));
> > -	}
> >  	goto err_noclose;
> >  error:
> >  	if (len != -EAGAIN)
> 
> Reverting for now is fine though.

Agreed.  I would expect modern network cards to deliver fairly large
packets to the application already.  Still, it would be nice to
understand what is really happening.  Setting rcvlowat shouldn't be able
to cause the window size to go to zero.

NeilBrown


> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>
diff mbox series

Patch

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 95397677673b..cb3bd12f5818 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1083,9 +1083,6 @@  static void svc_tcp_fragment_received(struct svc_sock *svsk)
 	/* If we have more data, signal svc_xprt_enqueue() to try again */
 	svsk->sk_tcplen = 0;
 	svsk->sk_marker = xdr_zero;
-
-	smp_wmb();
-	tcp_set_rcvlowat(svsk->sk_sk, 1);
 }
 
 /**
@@ -1175,17 +1172,10 @@  static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 		goto err_delete;
 	if (len == want)
 		svc_tcp_fragment_received(svsk);
-	else {
-		/* Avoid more ->sk_data_ready() calls until the rest
-		 * of the message has arrived. This reduces service
-		 * thread wake-ups on large incoming messages. */
-		tcp_set_rcvlowat(svsk->sk_sk,
-				 svc_sock_reclen(svsk) - svsk->sk_tcplen);
-
+	else
 		trace_svcsock_tcp_recv_short(&svsk->sk_xprt,
 				svc_sock_reclen(svsk),
 				svsk->sk_tcplen - sizeof(rpc_fraghdr));
-	}
 	goto err_noclose;
 error:
 	if (len != -EAGAIN)