Message ID | 20200221214906.2072.32572.stgit@manet.1015granger.net (mailing list archive) |
---|---|
Headers | show |
Series | NFS/RDMA client side connection overhaul | expand |
On 2/21/2020 2:00 PM, Chuck Lever wrote: > Howdy. > > I've had reports (and personal experience) where the Linux NFS/RDMA > client waits for a very long time after a disruption of the network > or NFS server. > > There is a disconnect time wait in the Connection Manager which > blocks the RPC/RDMA transport from tearing down a connection for a > few minutes when the remote cannot respond to DREQ messages. This seems really unfortunate. Why such a long wait in the RDMA layer? I can see a backoff, to prevent connection attempt flooding, but a constant "few minute" pause is a very blunt instrument. > An RPC/RDMA transport has only one slot for connection state, so the > transport is prevented from establishing a fresh connection until > the time wait completes. > > This patch series refactors the connection end point data structures > to enable one active and multiple zombie connections. Now, while a > defunct connection is waiting to die, it is separated from the > transport, clearing the way for the immediate creation of a new > connection. Clean-up of the old connection's data structures and > resources then completes in the background. This is a good idea in any case. It separates the layers, and leads to better connection establishment throughput. Does the RPCRDMA layer ensure it backs off, if connection retries fail? Or are you depending on the NFS upper layer for this. Tom. > Well, that's the idea, anyway. Review and comments welcome. Hoping > this can be merged in v5.7. > > --- > > Chuck Lever (11): > xprtrdma: Invoke rpcrdma_ep_create() in the connect worker > xprtrdma: Refactor frwr_init_mr() > xprtrdma: Clean up the post_send path > xprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect() > xprtrdma: Allocate Protection Domain in rpcrdma_ep_create() > xprtrdma: Invoke rpcrdma_ia_open in the connect worker > xprtrdma: Remove rpcrdma_ia::ri_flags > xprtrdma: Disconnect on flushed completion > xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep > xprtrdma: Extract sockaddr from struct rdma_cm_id > xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt > > > include/trace/events/rpcrdma.h | 97 ++--- > net/sunrpc/xprtrdma/backchannel.c | 8 > net/sunrpc/xprtrdma/frwr_ops.c | 152 ++++---- > net/sunrpc/xprtrdma/rpc_rdma.c | 32 +- > net/sunrpc/xprtrdma/transport.c | 72 +--- > net/sunrpc/xprtrdma/verbs.c | 681 ++++++++++++++----------------------- > net/sunrpc/xprtrdma/xprt_rdma.h | 89 ++--- > 7 files changed, 445 insertions(+), 686 deletions(-) > > -- > Chuck Lever > >
> On Mar 1, 2020, at 1:09 PM, Tom Talpey <tom@talpey.com> wrote: > > On 2/21/2020 2:00 PM, Chuck Lever wrote: >> Howdy. >> I've had reports (and personal experience) where the Linux NFS/RDMA >> client waits for a very long time after a disruption of the network >> or NFS server. >> There is a disconnect time wait in the Connection Manager which >> blocks the RPC/RDMA transport from tearing down a connection for a >> few minutes when the remote cannot respond to DREQ messages. > > This seems really unfortunate. Why such a long wait in the RDMA layer? > I can see a backoff, to prevent connection attempt flooding, but a > constant "few minute" pause is a very blunt instrument. The last clause here is the operative conundrum: "when the remote cannot respond". That should be pretty rare, but it's frequent enough to be bothersome in some environments. As to why the time wait is so long, I don't know the answer to that. >> An RPC/RDMA transport has only one slot for connection state, so the >> transport is prevented from establishing a fresh connection until >> the time wait completes. >> This patch series refactors the connection end point data structures >> to enable one active and multiple zombie connections. Now, while a >> defunct connection is waiting to die, it is separated from the >> transport, clearing the way for the immediate creation of a new >> connection. Clean-up of the old connection's data structures and >> resources then completes in the background. > > This is a good idea in any case. It separates the layers, and leads > to better connection establishment throughput. > > Does the RPCRDMA layer ensure it backs off, if connection retries > fail? Or are you depending on the NFS upper layer for this. There is a complicated back-off scheme that is modeled on the TCP connection back-off logic. > Tom. > >> Well, that's the idea, anyway. Review and comments welcome. Hoping >> this can be merged in v5.7. >> --- >> Chuck Lever (11): >> xprtrdma: Invoke rpcrdma_ep_create() in the connect worker >> xprtrdma: Refactor frwr_init_mr() >> xprtrdma: Clean up the post_send path >> xprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect() >> xprtrdma: Allocate Protection Domain in rpcrdma_ep_create() >> xprtrdma: Invoke rpcrdma_ia_open in the connect worker >> xprtrdma: Remove rpcrdma_ia::ri_flags >> xprtrdma: Disconnect on flushed completion >> xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep >> xprtrdma: Extract sockaddr from struct rdma_cm_id >> xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt >> include/trace/events/rpcrdma.h | 97 ++--- >> net/sunrpc/xprtrdma/backchannel.c | 8 >> net/sunrpc/xprtrdma/frwr_ops.c | 152 ++++---- >> net/sunrpc/xprtrdma/rpc_rdma.c | 32 +- >> net/sunrpc/xprtrdma/transport.c | 72 +--- >> net/sunrpc/xprtrdma/verbs.c | 681 ++++++++++++++----------------------- >> net/sunrpc/xprtrdma/xprt_rdma.h | 89 ++--- >> 7 files changed, 445 insertions(+), 686 deletions(-) >> -- >> Chuck Lever -- Chuck Lever
Hi Anna, I don't recall receiving any comments that require modifying this series. Do you want me to resend it for the next merge window? > On Feb 21, 2020, at 5:00 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > Howdy. > > I've had reports (and personal experience) where the Linux NFS/RDMA > client waits for a very long time after a disruption of the network > or NFS server. > > There is a disconnect time wait in the Connection Manager which > blocks the RPC/RDMA transport from tearing down a connection for a > few minutes when the remote cannot respond to DREQ messages. > > An RPC/RDMA transport has only one slot for connection state, so the > transport is prevented from establishing a fresh connection until > the time wait completes. > > This patch series refactors the connection end point data structures > to enable one active and multiple zombie connections. Now, while a > defunct connection is waiting to die, it is separated from the > transport, clearing the way for the immediate creation of a new > connection. Clean-up of the old connection's data structures and > resources then completes in the background. > > Well, that's the idea, anyway. Review and comments welcome. Hoping > this can be merged in v5.7. > > --- > > Chuck Lever (11): > xprtrdma: Invoke rpcrdma_ep_create() in the connect worker > xprtrdma: Refactor frwr_init_mr() > xprtrdma: Clean up the post_send path > xprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect() > xprtrdma: Allocate Protection Domain in rpcrdma_ep_create() > xprtrdma: Invoke rpcrdma_ia_open in the connect worker > xprtrdma: Remove rpcrdma_ia::ri_flags > xprtrdma: Disconnect on flushed completion > xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep > xprtrdma: Extract sockaddr from struct rdma_cm_id > xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt > > > include/trace/events/rpcrdma.h | 97 ++--- > net/sunrpc/xprtrdma/backchannel.c | 8 > net/sunrpc/xprtrdma/frwr_ops.c | 152 ++++---- > net/sunrpc/xprtrdma/rpc_rdma.c | 32 +- > net/sunrpc/xprtrdma/transport.c | 72 +--- > net/sunrpc/xprtrdma/verbs.c | 681 ++++++++++++++----------------------- > net/sunrpc/xprtrdma/xprt_rdma.h | 89 ++--- > 7 files changed, 445 insertions(+), 686 deletions(-) > > -- > Chuck Lever
Hi Chuck, On Wed, 2020-03-11 at 11:27 -0400, Chuck Lever wrote: > Hi Anna, I don't recall receiving any comments that require modifying > this series. Do you want me to resend it for the next merge window? If there haven't been any changes, then I'll just use the version you've already posted. No need to resend. Thanks for checking! Anna > > > > On Feb 21, 2020, at 5:00 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > > Howdy. > > > > I've had reports (and personal experience) where the Linux NFS/RDMA > > client waits for a very long time after a disruption of the network > > or NFS server. > > > > There is a disconnect time wait in the Connection Manager which > > blocks the RPC/RDMA transport from tearing down a connection for a > > few minutes when the remote cannot respond to DREQ messages. > > > > An RPC/RDMA transport has only one slot for connection state, so the > > transport is prevented from establishing a fresh connection until > > the time wait completes. > > > > This patch series refactors the connection end point data structures > > to enable one active and multiple zombie connections. Now, while a > > defunct connection is waiting to die, it is separated from the > > transport, clearing the way for the immediate creation of a new > > connection. Clean-up of the old connection's data structures and > > resources then completes in the background. > > > > Well, that's the idea, anyway. Review and comments welcome. Hoping > > this can be merged in v5.7. > > > > --- > > > > Chuck Lever (11): > > xprtrdma: Invoke rpcrdma_ep_create() in the connect worker > > xprtrdma: Refactor frwr_init_mr() > > xprtrdma: Clean up the post_send path > > xprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect() > > xprtrdma: Allocate Protection Domain in rpcrdma_ep_create() > > xprtrdma: Invoke rpcrdma_ia_open in the connect worker > > xprtrdma: Remove rpcrdma_ia::ri_flags > > xprtrdma: Disconnect on flushed completion > > xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep > > xprtrdma: Extract sockaddr from struct rdma_cm_id > > xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt > > > > > > include/trace/events/rpcrdma.h | 97 ++--- > > net/sunrpc/xprtrdma/backchannel.c | 8 > > net/sunrpc/xprtrdma/frwr_ops.c | 152 ++++---- > > net/sunrpc/xprtrdma/rpc_rdma.c | 32 +- > > net/sunrpc/xprtrdma/transport.c | 72 +--- > > net/sunrpc/xprtrdma/verbs.c | 681 ++++++++++++++--------------------- > > -- > > net/sunrpc/xprtrdma/xprt_rdma.h | 89 ++--- > > 7 files changed, 445 insertions(+), 686 deletions(-) > > > > -- > > Chuck Lever