diff mbox series

svcrdma: disable timeouts on rdma backchannel

Message ID 20210222233619.21568-1-timo@rothenpieler.org (mailing list archive)
State New
Headers show
Series svcrdma: disable timeouts on rdma backchannel | expand

Commit Message

Timo Rothenpieler Feb. 22, 2021, 11:36 p.m. UTC
This brings it in line with the regular tcp backchannel, which also has
all those timeouts disabled.

Prevents the backchannel from timing out, getting some async operations
like server side copying getting stuck indefinitely on the client side.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
---
Did the same testing with this applied than before, and could not
observe it getting stuck, same as with the previous patch, which I
removed before testing this one.

This obviously still does not fix the issue of it being seemingly unable
to reestablish the disconnected backchannel.
An event that disconnects the backchannel but leaves the main connection
intact seems a pretty rare occurance though, outside of this issue.

 net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Chuck Lever III Feb. 24, 2021, 2:18 p.m. UTC | #1
> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> 
> This brings it in line with the regular tcp backchannel, which also has
> all those timeouts disabled.
> 
> Prevents the backchannel from timing out, getting some async operations
> like server side copying getting stuck indefinitely on the client side.
> 
> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>

Thanks for your patch! I've included it in the for-rc branch at

git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git


> ---
> Did the same testing with this applied than before, and could not
> observe it getting stuck, same as with the previous patch, which I
> removed before testing this one.
> 
> This obviously still does not fix the issue of it being seemingly unable
> to reestablish the disconnected backchannel.
> An event that disconnects the backchannel but leaves the main connection
> intact seems a pretty rare occurance though, outside of this issue.
> 
> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> index 63f8be974df2..8186ab6f99f1 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
> 	xprt->timeout = &xprt_rdma_bc_timeout;
> 	xprt_set_bound(xprt);
> 	xprt_set_connected(xprt);
> -	xprt->bind_timeout = RPCRDMA_BIND_TO;
> -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
> -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
> +	xprt->bind_timeout = 0;
> +	xprt->reestablish_timeout = 0;
> +	xprt->idle_timeout = 0;
> 
> 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
> 	xprt->ops = &xprt_rdma_bc_procs;
> -- 
> 2.25.1
> 

--
Chuck Lever
bfields@fieldses.org Feb. 24, 2021, 8:02 p.m. UTC | #2
On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote:
> 
> 
> > On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> > 
> > This brings it in line with the regular tcp backchannel, which also has
> > all those timeouts disabled.
> > 
> > Prevents the backchannel from timing out, getting some async operations
> > like server side copying getting stuck indefinitely on the client side.
> > 
> > Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
> 
> Thanks for your patch! I've included it in the for-rc branch at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

So, I'm sure this patch makes sense.

But I'm also curious why it's not recovering.

What I think should happen:

	- clp->cl_cb_state should be set to NFSD4_CB_DOWN.
	- This should cause the next SEQUENCE reply to have
	  SEQ4_STATUS_CB_PATH_DOWN set.
	- That should poke the client to recover.  (Maybe by sending a
	  BIND_CONN_TO_SESSION call?)

I'd be curious whether any of that's actually happening.

--b.

> 
> 
> > ---
> > Did the same testing with this applied than before, and could not
> > observe it getting stuck, same as with the previous patch, which I
> > removed before testing this one.
> > 
> > This obviously still does not fix the issue of it being seemingly unable
> > to reestablish the disconnected backchannel.
> > An event that disconnects the backchannel but leaves the main connection
> > intact seems a pretty rare occurance though, outside of this issue.
> > 
> > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > index 63f8be974df2..8186ab6f99f1 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> > @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
> > 	xprt->timeout = &xprt_rdma_bc_timeout;
> > 	xprt_set_bound(xprt);
> > 	xprt_set_connected(xprt);
> > -	xprt->bind_timeout = RPCRDMA_BIND_TO;
> > -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
> > -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
> > +	xprt->bind_timeout = 0;
> > +	xprt->reestablish_timeout = 0;
> > +	xprt->idle_timeout = 0;
> > 
> > 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
> > 	xprt->ops = &xprt_rdma_bc_procs;
> > -- 
> > 2.25.1
> > 
> 
> --
> Chuck Lever
> 
>
Chuck Lever III Feb. 24, 2021, 8:03 p.m. UTC | #3
> On Feb 24, 2021, at 3:02 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote:
>> 
>> 
>>> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
>>> 
>>> This brings it in line with the regular tcp backchannel, which also has
>>> all those timeouts disabled.
>>> 
>>> Prevents the backchannel from timing out, getting some async operations
>>> like server side copying getting stuck indefinitely on the client side.
>>> 
>>> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
>> 
>> Thanks for your patch! I've included it in the for-rc branch at
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
> 
> So, I'm sure this patch makes sense.
> 
> But I'm also curious why it's not recovering.

Agreed. This patch is not a substitute for proper callback channel recovery.


> What I think should happen:
> 
> 	- clp->cl_cb_state should be set to NFSD4_CB_DOWN.

I think it's set to FAULT.


> 	- This should cause the next SEQUENCE reply to have
> 	  SEQ4_STATUS_CB_PATH_DOWN set.
> 	- That should poke the client to recover.  (Maybe by sending a
> 	  BIND_CONN_TO_SESSION call?)
> 
> I'd be curious whether any of that's actually happening.
> 
> --b.
> 
>> 
>> 
>>> ---
>>> Did the same testing with this applied than before, and could not
>>> observe it getting stuck, same as with the previous patch, which I
>>> removed before testing this one.
>>> 
>>> This obviously still does not fix the issue of it being seemingly unable
>>> to reestablish the disconnected backchannel.
>>> An event that disconnects the backchannel but leaves the main connection
>>> intact seems a pretty rare occurance though, outside of this issue.
>>> 
>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> index 63f8be974df2..8186ab6f99f1 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
>>> 	xprt->timeout = &xprt_rdma_bc_timeout;
>>> 	xprt_set_bound(xprt);
>>> 	xprt_set_connected(xprt);
>>> -	xprt->bind_timeout = RPCRDMA_BIND_TO;
>>> -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
>>> -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
>>> +	xprt->bind_timeout = 0;
>>> +	xprt->reestablish_timeout = 0;
>>> +	xprt->idle_timeout = 0;
>>> 
>>> 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
>>> 	xprt->ops = &xprt_rdma_bc_procs;
>>> -- 
>>> 2.25.1
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever
bfields@fieldses.org Feb. 24, 2021, 8:08 p.m. UTC | #4
On Wed, Feb 24, 2021 at 08:03:54PM +0000, Chuck Lever wrote:
> 
> 
> > On Feb 24, 2021, at 3:02 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote:
> >> 
> >> 
> >>> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote:
> >>> 
> >>> This brings it in line with the regular tcp backchannel, which also has
> >>> all those timeouts disabled.
> >>> 
> >>> Prevents the backchannel from timing out, getting some async operations
> >>> like server side copying getting stuck indefinitely on the client side.
> >>> 
> >>> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
> >> 
> >> Thanks for your patch! I've included it in the for-rc branch at
> >> 
> >> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
> > 
> > So, I'm sure this patch makes sense.
> > 
> > But I'm also curious why it's not recovering.
> 
> Agreed. This patch is not a substitute for proper callback channel recovery.
> 
> 
> > What I think should happen:
> > 
> > 	- clp->cl_cb_state should be set to NFSD4_CB_DOWN.
> 
> I think it's set to FAULT.

OK.  The result should be similar in that case, but SEQUENCE gets the
SEQ4_STATUS_BACKCHANNEL_FAULT flag set instead.

--b.

> 
> 
> > 	- This should cause the next SEQUENCE reply to have
> > 	  SEQ4_STATUS_CB_PATH_DOWN set.
> > 	- That should poke the client to recover.  (Maybe by sending a
> > 	  BIND_CONN_TO_SESSION call?)
> > 
> > I'd be curious whether any of that's actually happening.
> > 
> > --b.
> > 
> >> 
> >> 
> >>> ---
> >>> Did the same testing with this applied than before, and could not
> >>> observe it getting stuck, same as with the previous patch, which I
> >>> removed before testing this one.
> >>> 
> >>> This obviously still does not fix the issue of it being seemingly unable
> >>> to reestablish the disconnected backchannel.
> >>> An event that disconnects the backchannel but leaves the main connection
> >>> intact seems a pretty rare occurance though, outside of this issue.
> >>> 
> >>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>> 
> >>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>> index 63f8be974df2..8186ab6f99f1 100644
> >>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >>> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args)
> >>> 	xprt->timeout = &xprt_rdma_bc_timeout;
> >>> 	xprt_set_bound(xprt);
> >>> 	xprt_set_connected(xprt);
> >>> -	xprt->bind_timeout = RPCRDMA_BIND_TO;
> >>> -	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
> >>> -	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
> >>> +	xprt->bind_timeout = 0;
> >>> +	xprt->reestablish_timeout = 0;
> >>> +	xprt->idle_timeout = 0;
> >>> 
> >>> 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
> >>> 	xprt->ops = &xprt_rdma_bc_procs;
> >>> -- 
> >>> 2.25.1
> >>> 
> >> 
> >> --
> >> Chuck Lever
> 
> --
> Chuck Lever
> 
>
diff mbox series

Patch

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 63f8be974df2..8186ab6f99f1 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -252,9 +252,9 @@  xprt_setup_rdma_bc(struct xprt_create *args)
 	xprt->timeout = &xprt_rdma_bc_timeout;
 	xprt_set_bound(xprt);
 	xprt_set_connected(xprt);
-	xprt->bind_timeout = RPCRDMA_BIND_TO;
-	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
-	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
+	xprt->bind_timeout = 0;
+	xprt->reestablish_timeout = 0;
+	xprt->idle_timeout = 0;
 
 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
 	xprt->ops = &xprt_rdma_bc_procs;