Message ID | 20200620171805.1748399-1-dan@kernelim.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | xprtrdma: Wake up re_connect_wait on disconnect | expand |
Hi Dan- > On Jun 20, 2020, at 1:18 PM, Dan Aloni <dan@kernelim.com> wrote: > > Given that rpcrdma_xprt_connect() happens from workqueue context, on cases where > connections don't succeeds, something needs to wake it up. In my case, this has > been observed when the CM callback received `RDMA_CM_EVENT_REJECTED`, and > `rpcrdma_xprt_connect()` slept forever. Interesting. My development and testing generates plenty of REJECTED connection requests, but I never saw this particular failure mode. > This continues the fix in commit 58bd6656f808 ('xprtrdma: Restore wake-up-all to > rpcrdma_cm_event_handler()'). The patch looks sensible. I'll pull it into my test harness. > Signed-off-by: Dan Aloni <dan@kernelim.com> > CC: Chuck Lever <chuck.lever@oracle.com> > --- > > Notes: > Hi Chuck, > > Maybe I missd something, as it is not clear to me how otherwise (without this > patch), re_connect_wait can be woken up in this situation. Please explain? > > net/sunrpc/xprtrdma/verbs.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index 2ae348377806..8bd76a47a91f 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) > ep->re_connect_status = -ECONNABORTED; > disconnected: > xprt_force_disconnect(xprt); > + wake_up_all(&ep->re_connect_wait); > return rpcrdma_ep_destroy(ep); > default: > break; > -- > 2.25.4 > -- Chuck Lever
Hi Dan- > On Jun 20, 2020, at 2:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > Hi Dan- > >> On Jun 20, 2020, at 1:18 PM, Dan Aloni <dan@kernelim.com> wrote: >> >> Given that rpcrdma_xprt_connect() happens from workqueue context, on cases where >> connections don't succeeds, something needs to wake it up. In my case, this has >> been observed when the CM callback received `RDMA_CM_EVENT_REJECTED`, and >> `rpcrdma_xprt_connect()` slept forever. > > Interesting. My development and testing generates plenty of REJECTED connection > requests, but I never saw this particular failure mode. Correction: My testing _used_ _to_ generate REJECTED events regularly. It does not seem to any more, even after client crashes. So that explains why I haven't seen this before. I haven't reproduced the problem here, but the fix still looks proper to me, and doesn't appear to introduce any regressions. I do have some issues with your proposed patch, though. The first paragraph of the patch description is incorrect. RDMA_CM_EVENT_DISCONNECTED can occur only once a connection has been established. That guarantees there are no waiters on re_connect_wait in that case. It's connect errors that need to wake-up the connect worker. >> This continues the fix in commit 58bd6656f808 ('xprtrdma: Restore wake-up-all to >> rpcrdma_cm_event_handler()'). IMO this paragraph needs to be replaced by: Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt") >> Signed-off-by: Dan Aloni <dan@kernelim.com> >> CC: Chuck Lever <chuck.lever@oracle.com> >> --- >> >> Notes: >> Hi Chuck, >> >> Maybe I missd something, as it is not clear to me how otherwise (without this >> patch), re_connect_wait can be woken up in this situation. Please explain? >> >> net/sunrpc/xprtrdma/verbs.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c >> index 2ae348377806..8bd76a47a91f 100644 >> --- a/net/sunrpc/xprtrdma/verbs.c >> +++ b/net/sunrpc/xprtrdma/verbs.c >> @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) >> ep->re_connect_status = -ECONNABORTED; >> disconnected: >> xprt_force_disconnect(xprt); >> + wake_up_all(&ep->re_connect_wait); >> return rpcrdma_ep_destroy(ep); >> default: >> break; This hunk does not apply on top of fixes I've already sent to Anna for 5.8-rc1. So, if you don't object, I'll adjust your patch (this hunk and the description) before sending it along to Anna. -- Chuck Lever
On Sun, Jun 21, 2020 at 10:49:53AM -0400, Chuck Lever wrote: > >> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > >> index 2ae348377806..8bd76a47a91f 100644 > >> --- a/net/sunrpc/xprtrdma/verbs.c > >> +++ b/net/sunrpc/xprtrdma/verbs.c > >> @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) > >> ep->re_connect_status = -ECONNABORTED; > >> disconnected: > >> xprt_force_disconnect(xprt); > >> + wake_up_all(&ep->re_connect_wait); > >> return rpcrdma_ep_destroy(ep); > >> default: > >> break; > > This hunk does not apply on top of fixes I've already sent to Anna for 5.8-rc1. > > So, if you don't object, I'll adjust your patch (this hunk and the description) > before sending it along to Anna. Sure, go ahead. Thanks for working on this!
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 2ae348377806..8bd76a47a91f 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) ep->re_connect_status = -ECONNABORTED; disconnected: xprt_force_disconnect(xprt); + wake_up_all(&ep->re_connect_wait); return rpcrdma_ep_destroy(ep); default: break;
Given that rpcrdma_xprt_connect() happens from workqueue context, on cases where connections don't succeeds, something needs to wake it up. In my case, this has been observed when the CM callback received `RDMA_CM_EVENT_REJECTED`, and `rpcrdma_xprt_connect()` slept forever. This continues the fix in commit 58bd6656f808 ('xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()'). Signed-off-by: Dan Aloni <dan@kernelim.com> CC: Chuck Lever <chuck.lever@oracle.com> --- Notes: Hi Chuck, Maybe I missd something, as it is not clear to me how otherwise (without this patch), re_connect_wait can be woken up in this situation. Please explain? net/sunrpc/xprtrdma/verbs.c | 1 + 1 file changed, 1 insertion(+)