Message ID | 20210222233619.21568-1-timo@rothenpieler.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | svcrdma: disable timeouts on rdma backchannel | expand |
> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote: > > This brings it in line with the regular tcp backchannel, which also has > all those timeouts disabled. > > Prevents the backchannel from timing out, getting some async operations > like server side copying getting stuck indefinitely on the client side. > > Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> Thanks for your patch! I've included it in the for-rc branch at git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git > --- > Did the same testing with this applied than before, and could not > observe it getting stuck, same as with the previous patch, which I > removed before testing this one. > > This obviously still does not fix the issue of it being seemingly unable > to reestablish the disconnected backchannel. > An event that disconnects the backchannel but leaves the main connection > intact seems a pretty rare occurance though, outside of this issue. > > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > index 63f8be974df2..8186ab6f99f1 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args) > xprt->timeout = &xprt_rdma_bc_timeout; > xprt_set_bound(xprt); > xprt_set_connected(xprt); > - xprt->bind_timeout = RPCRDMA_BIND_TO; > - xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO; > - xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO; > + xprt->bind_timeout = 0; > + xprt->reestablish_timeout = 0; > + xprt->idle_timeout = 0; > > xprt->prot = XPRT_TRANSPORT_BC_RDMA; > xprt->ops = &xprt_rdma_bc_procs; > -- > 2.25.1 > -- Chuck Lever
On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote: > > > > On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote: > > > > This brings it in line with the regular tcp backchannel, which also has > > all those timeouts disabled. > > > > Prevents the backchannel from timing out, getting some async operations > > like server side copying getting stuck indefinitely on the client side. > > > > Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> > > Thanks for your patch! I've included it in the for-rc branch at > > git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git So, I'm sure this patch makes sense. But I'm also curious why it's not recovering. What I think should happen: - clp->cl_cb_state should be set to NFSD4_CB_DOWN. - This should cause the next SEQUENCE reply to have SEQ4_STATUS_CB_PATH_DOWN set. - That should poke the client to recover. (Maybe by sending a BIND_CONN_TO_SESSION call?) I'd be curious whether any of that's actually happening. --b. > > > > --- > > Did the same testing with this applied than before, and could not > > observe it getting stuck, same as with the previous patch, which I > > removed before testing this one. > > > > This obviously still does not fix the issue of it being seemingly unable > > to reestablish the disconnected backchannel. > > An event that disconnects the backchannel but leaves the main connection > > intact seems a pretty rare occurance though, outside of this issue. > > > > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > > index 63f8be974df2..8186ab6f99f1 100644 > > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > > @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args) > > xprt->timeout = &xprt_rdma_bc_timeout; > > xprt_set_bound(xprt); > > xprt_set_connected(xprt); > > - xprt->bind_timeout = RPCRDMA_BIND_TO; > > - xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO; > > - xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO; > > + xprt->bind_timeout = 0; > > + xprt->reestablish_timeout = 0; > > + xprt->idle_timeout = 0; > > > > xprt->prot = XPRT_TRANSPORT_BC_RDMA; > > xprt->ops = &xprt_rdma_bc_procs; > > -- > > 2.25.1 > > > > -- > Chuck Lever > >
> On Feb 24, 2021, at 3:02 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote: >> >> >>> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote: >>> >>> This brings it in line with the regular tcp backchannel, which also has >>> all those timeouts disabled. >>> >>> Prevents the backchannel from timing out, getting some async operations >>> like server side copying getting stuck indefinitely on the client side. >>> >>> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> >> >> Thanks for your patch! I've included it in the for-rc branch at >> >> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git > > So, I'm sure this patch makes sense. > > But I'm also curious why it's not recovering. Agreed. This patch is not a substitute for proper callback channel recovery. > What I think should happen: > > - clp->cl_cb_state should be set to NFSD4_CB_DOWN. I think it's set to FAULT. > - This should cause the next SEQUENCE reply to have > SEQ4_STATUS_CB_PATH_DOWN set. > - That should poke the client to recover. (Maybe by sending a > BIND_CONN_TO_SESSION call?) > > I'd be curious whether any of that's actually happening. > > --b. > >> >> >>> --- >>> Did the same testing with this applied than before, and could not >>> observe it getting stuck, same as with the previous patch, which I >>> removed before testing this one. >>> >>> This obviously still does not fix the issue of it being seemingly unable >>> to reestablish the disconnected backchannel. >>> An event that disconnects the backchannel but leaves the main connection >>> intact seems a pretty rare occurance though, outside of this issue. >>> >>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c >>> index 63f8be974df2..8186ab6f99f1 100644 >>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c >>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c >>> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args) >>> xprt->timeout = &xprt_rdma_bc_timeout; >>> xprt_set_bound(xprt); >>> xprt_set_connected(xprt); >>> - xprt->bind_timeout = RPCRDMA_BIND_TO; >>> - xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO; >>> - xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO; >>> + xprt->bind_timeout = 0; >>> + xprt->reestablish_timeout = 0; >>> + xprt->idle_timeout = 0; >>> >>> xprt->prot = XPRT_TRANSPORT_BC_RDMA; >>> xprt->ops = &xprt_rdma_bc_procs; >>> -- >>> 2.25.1 >>> >> >> -- >> Chuck Lever -- Chuck Lever
On Wed, Feb 24, 2021 at 08:03:54PM +0000, Chuck Lever wrote: > > > > On Feb 24, 2021, at 3:02 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Wed, Feb 24, 2021 at 02:18:18PM +0000, Chuck Lever wrote: > >> > >> > >>> On Feb 22, 2021, at 6:36 PM, Timo Rothenpieler <timo@rothenpieler.org> wrote: > >>> > >>> This brings it in line with the regular tcp backchannel, which also has > >>> all those timeouts disabled. > >>> > >>> Prevents the backchannel from timing out, getting some async operations > >>> like server side copying getting stuck indefinitely on the client side. > >>> > >>> Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> > >> > >> Thanks for your patch! I've included it in the for-rc branch at > >> > >> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git > > > > So, I'm sure this patch makes sense. > > > > But I'm also curious why it's not recovering. > > Agreed. This patch is not a substitute for proper callback channel recovery. > > > > What I think should happen: > > > > - clp->cl_cb_state should be set to NFSD4_CB_DOWN. > > I think it's set to FAULT. OK. The result should be similar in that case, but SEQUENCE gets the SEQ4_STATUS_BACKCHANNEL_FAULT flag set instead. --b. > > > > - This should cause the next SEQUENCE reply to have > > SEQ4_STATUS_CB_PATH_DOWN set. > > - That should poke the client to recover. (Maybe by sending a > > BIND_CONN_TO_SESSION call?) > > > > I'd be curious whether any of that's actually happening. > > > > --b. > > > >> > >> > >>> --- > >>> Did the same testing with this applied than before, and could not > >>> observe it getting stuck, same as with the previous patch, which I > >>> removed before testing this one. > >>> > >>> This obviously still does not fix the issue of it being seemingly unable > >>> to reestablish the disconnected backchannel. > >>> An event that disconnects the backchannel but leaves the main connection > >>> intact seems a pretty rare occurance though, outside of this issue. > >>> > >>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++--- > >>> 1 file changed, 3 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > >>> index 63f8be974df2..8186ab6f99f1 100644 > >>> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > >>> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > >>> @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args) > >>> xprt->timeout = &xprt_rdma_bc_timeout; > >>> xprt_set_bound(xprt); > >>> xprt_set_connected(xprt); > >>> - xprt->bind_timeout = RPCRDMA_BIND_TO; > >>> - xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO; > >>> - xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO; > >>> + xprt->bind_timeout = 0; > >>> + xprt->reestablish_timeout = 0; > >>> + xprt->idle_timeout = 0; > >>> > >>> xprt->prot = XPRT_TRANSPORT_BC_RDMA; > >>> xprt->ops = &xprt_rdma_bc_procs; > >>> -- > >>> 2.25.1 > >>> > >> > >> -- > >> Chuck Lever > > -- > Chuck Lever > >
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index 63f8be974df2..8186ab6f99f1 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -252,9 +252,9 @@ xprt_setup_rdma_bc(struct xprt_create *args) xprt->timeout = &xprt_rdma_bc_timeout; xprt_set_bound(xprt); xprt_set_connected(xprt); - xprt->bind_timeout = RPCRDMA_BIND_TO; - xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO; - xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO; + xprt->bind_timeout = 0; + xprt->reestablish_timeout = 0; + xprt->idle_timeout = 0; xprt->prot = XPRT_TRANSPORT_BC_RDMA; xprt->ops = &xprt_rdma_bc_procs;
This brings it in line with the regular tcp backchannel, which also has all those timeouts disabled. Prevents the backchannel from timing out, getting some async operations like server side copying getting stuck indefinitely on the client side. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> --- Did the same testing with this applied than before, and could not observe it getting stuck, same as with the previous patch, which I removed before testing this one. This obviously still does not fix the issue of it being seemingly unable to reestablish the disconnected backchannel. An event that disconnects the backchannel but leaves the main connection intact seems a pretty rare occurance though, outside of this issue. net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)