diff mbox series

xprtrdma: Ensure connect worker is awoken after connect error

Message ID 20200621145934.4069.31886.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show
Series xprtrdma: Ensure connect worker is awoken after connect error | expand

Commit Message

Chuck Lever June 21, 2020, 2:59 p.m. UTC
From: Dan Aloni <dan@kernelim.com>

The connect worker sleeps waiting for either successful connection
establishment or an error. Commit e28ce90083f0 ("xprtrdma: kmalloc
rpcrdma_ep separate from rpcrdma_xprt") mistakenly removed the
wake-up in cases when connection establishment fails.

Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Dan Aloni <dan@kernelim.com>
[ cel: rebased on recent fixes to 5.8-rc; patch description rewritten ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Chuck Lever June 25, 2020, 7:19 p.m. UTC | #1
Anna, please drop this one. It appears to trigger a particularly nasty
use-after-free. I'll follow up with a more complete fix soon.

(Yes, a wake-up on connect errors is indeed necessary... but the connect
worker needs to be re-organized to deal properly with it).


> On Jun 21, 2020, at 10:59 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> From: Dan Aloni <dan@kernelim.com>
> 
> The connect worker sleeps waiting for either successful connection
> establishment or an error. Commit e28ce90083f0 ("xprtrdma: kmalloc
> rpcrdma_ep separate from rpcrdma_xprt") mistakenly removed the
> wake-up in cases when connection establishment fails.
> 
> Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
> Signed-off-by: Dan Aloni <dan@kernelim.com>
> [ cel: rebased on recent fixes to 5.8-rc; patch description rewritten ]
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/xprtrdma/verbs.c |    1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 2198c8ec8dff..54c6809bf06e 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -296,6 +296,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
> 		ep->re_connect_status = -ECONNABORTED;
> disconnected:
> 		rpcrdma_force_disconnect(ep);
> +		wake_up_all(&ep->re_connect_wait);
> 		return rpcrdma_ep_put(ep);
> 	default:
> 		break;
> 

--
Chuck Lever
Dan Aloni June 26, 2020, 7:17 a.m. UTC | #2
On Thu, Jun 25, 2020 at 03:19:39PM -0400, Chuck Lever wrote:
> Anna, please drop this one. It appears to trigger a particularly nasty
> use-after-free. I'll follow up with a more complete fix soon.
> 
> (Yes, a wake-up on connect errors is indeed necessary... but the connect
> worker needs to be re-organized to deal properly with it).

After sending that patch I also noticed more issues with the management
of the EP context. The decref inside the CM handler caused freeing of
the EP while connect path still held a reference to it. A KASAN-enabled
kernel revealed this easily. I've sent just now a more comprehensive
patch to deal with this.
diff mbox series

Patch

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 2198c8ec8dff..54c6809bf06e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -296,6 +296,7 @@  rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
 		ep->re_connect_status = -ECONNABORTED;
 disconnected:
 		rpcrdma_force_disconnect(ep);
+		wake_up_all(&ep->re_connect_wait);
 		return rpcrdma_ep_put(ep);
 	default:
 		break;