diff mbox series

[v2] SUNRPC: Fix backchannel reply, again

Message ID 20240619135107.176384-2-cel@kernel.org (mailing list archive)
State New
Headers show
Series [v2] SUNRPC: Fix backchannel reply, again | expand

Commit Message

Chuck Lever June 19, 2024, 1:51 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

I still see "RPC: Could not send backchannel reply error: -110"
quite often, along with slow-running tests. Debugging shows that the
backchannel is still stumbling when it has to queue a callback reply
on a busy transport.

Note that every one of these timeouts causes a connection loss by
virtue of the xprt_conditional_disconnect() call in that arm of
call_cb_transmit_status().

I found that setting to_maxval is necessary to get the RPC timeout
logic to behave whenever to_exponential is not set.

Fixes: 57331a59ac0d ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Benjamin Coddington June 20, 2024, 11:41 a.m. UTC | #1
On 19 Jun 2024, at 9:51, cel@kernel.org wrote:

> From: Chuck Lever <chuck.lever@oracle.com>
>
> I still see "RPC: Could not send backchannel reply error: -110"
> quite often, along with slow-running tests. Debugging shows that the
> backchannel is still stumbling when it has to queue a callback reply
> on a busy transport.
>
> Note that every one of these timeouts causes a connection loss by
> virtue of the xprt_conditional_disconnect() call in that arm of
> call_cb_transmit_status().
>
> I found that setting to_maxval is necessary to get the RPC timeout
> logic to behave whenever to_exponential is not set.
>
> Fixes: 57331a59ac0d ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

That makes sense - I guess we were getting some random stack value in there?

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>

Ben

> ---
>  net/sunrpc/svc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 965a27806bfd..e03f14024e47 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1588,9 +1588,11 @@ void svc_process(struct svc_rqst *rqstp)
>   */
>  void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
>  {
> +	struct rpc_timeout timeout = {
> +		.to_increment		= 0,
> +	};
>  	struct rpc_task *task;
>  	int proc_error;
> -	struct rpc_timeout timeout;
>
>  	/* Build the svc_rqst used by the common processing routine */
>  	rqstp->rq_xid = req->rq_xid;
> @@ -1643,6 +1645,7 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
>  		timeout.to_initval = req->rq_xprt->timeout->to_initval;
>  		timeout.to_retries = req->rq_xprt->timeout->to_retries;
>  	}
> +	timeout.to_maxval = timeout.to_initval;
>  	memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
>  	task = rpc_run_bc_task(req, &timeout);
>
> -- 
> 2.45.1
Chuck Lever June 20, 2024, 2:11 p.m. UTC | #2
On Thu, Jun 20, 2024 at 07:41:21AM -0400, Benjamin Coddington wrote:
> On 19 Jun 2024, at 9:51, cel@kernel.org wrote:
> 
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > I still see "RPC: Could not send backchannel reply error: -110"
> > quite often, along with slow-running tests. Debugging shows that the
> > backchannel is still stumbling when it has to queue a callback reply
> > on a busy transport.
> >
> > Note that every one of these timeouts causes a connection loss by
> > virtue of the xprt_conditional_disconnect() call in that arm of
> > call_cb_transmit_status().
> >
> > I found that setting to_maxval is necessary to get the RPC timeout
> > logic to behave whenever to_exponential is not set.
> >
> > Fixes: 57331a59ac0d ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> That makes sense - I guess we were getting some random stack value in there?

Hi Ben-

On my systems it was always zero (which is why v1 of this patch did
not clear the other fields in @timeout before using it).

A zero to_maxval value results in the same timeout-on-sleep behavior
as you saw before 57331a59ac0d was applied.

A random non-zero value will behave correctly as long as the transport
is making forward progress, so we never noticed a problem.


> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
> 
> Ben
> 
> > ---
> >  net/sunrpc/svc.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 965a27806bfd..e03f14024e47 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -1588,9 +1588,11 @@ void svc_process(struct svc_rqst *rqstp)
> >   */
> >  void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
> >  {
> > +	struct rpc_timeout timeout = {
> > +		.to_increment		= 0,
> > +	};
> >  	struct rpc_task *task;
> >  	int proc_error;
> > -	struct rpc_timeout timeout;
> >
> >  	/* Build the svc_rqst used by the common processing routine */
> >  	rqstp->rq_xid = req->rq_xid;
> > @@ -1643,6 +1645,7 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
> >  		timeout.to_initval = req->rq_xprt->timeout->to_initval;
> >  		timeout.to_retries = req->rq_xprt->timeout->to_retries;
> >  	}
> > +	timeout.to_maxval = timeout.to_initval;
> >  	memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
> >  	task = rpc_run_bc_task(req, &timeout);
> >
> > -- 
> > 2.45.1
>
diff mbox series

Patch

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 965a27806bfd..e03f14024e47 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1588,9 +1588,11 @@  void svc_process(struct svc_rqst *rqstp)
  */
 void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
 {
+	struct rpc_timeout timeout = {
+		.to_increment		= 0,
+	};
 	struct rpc_task *task;
 	int proc_error;
-	struct rpc_timeout timeout;
 
 	/* Build the svc_rqst used by the common processing routine */
 	rqstp->rq_xid = req->rq_xid;
@@ -1643,6 +1645,7 @@  void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
 		timeout.to_initval = req->rq_xprt->timeout->to_initval;
 		timeout.to_retries = req->rq_xprt->timeout->to_retries;
 	}
+	timeout.to_maxval = timeout.to_initval;
 	memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
 	task = rpc_run_bc_task(req, &timeout);