Message ID | 1442613607.11370.18.camel@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Sep 19, 2015 at 8:08 AM, Jeff Layton <jlayton@poochiereds.net> wrote: > On Fri, 18 Sep 2015 18:00:07 -0400 > Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> 8<------------------------------------------------------------------- >> From 3e1c9d8092e2fa4509d84a00fcf21e7e0c581fe2 Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust <trond.myklebust@primarydata.com> >> Date: Fri, 18 Sep 2015 15:53:24 -0400 >> Subject: [PATCH] SUNRPC: Lock the transport layer on shutdown >> >> Avoid all races with the connect/disconnect handlers by taking the >> transport lock. >> >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >> --- >> net/sunrpc/xprt.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >> index ab5dd621ae0c..2e98f4a243e5 100644 >> --- a/net/sunrpc/xprt.c >> +++ b/net/sunrpc/xprt.c >> @@ -614,6 +614,7 @@ static void xprt_autoclose(struct work_struct *work) >> clear_bit(XPRT_CLOSE_WAIT, &xprt->state); >> xprt->ops->close(xprt); >> xprt_release_write(xprt, NULL); >> + wake_up_bit(&xprt->state, XPRT_LOCKED); >> } >> >> /** >> @@ -723,6 +724,7 @@ void xprt_unlock_connect(struct rpc_xprt *xprt, void *cookie) >> xprt->ops->release_xprt(xprt, NULL); >> out: >> spin_unlock_bh(&xprt->transport_lock); >> + wake_up_bit(&xprt->state, XPRT_LOCKED); >> } >> >> /** >> @@ -1394,6 +1396,10 @@ out: >> static void xprt_destroy(struct rpc_xprt *xprt) >> { >> dprintk("RPC: destroying transport %p\n", xprt); >> + >> + /* Exclude transport connect/disconnect handlers */ >> + wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_UNINTERRUPTIBLE); >> + >> del_timer_sync(&xprt->timer); >> >> rpc_xprt_debugfs_unregister(xprt); > > > Yeah, that does seem cleaner and better follows the xs_close locking > rules. Are those wake_up_bit calls sufficient though? Would it be > better to just add it to xprt_clear_locked? In principle we should only > sleep on this bit if there are no other users of the xprt, but I'm > having a hard time following the code to ensure that that's the case. If we're in xprt_destroy, then the refcount on the xprt has to be zero. There should be no struct rpc_clnt referencing it, and hence also no rpc_tasks to set/clear the lock state. > Also, it seems like part of the problem is that we're ending up with > these workqueue jobs being requeued after we've cancelled them. Perhaps > we ought to have some way to ensure that once the xprt is on its way to > destruction, the workqueue jobs cannot be requeued? The workqueue jobs are required to own the lock before being enqueued, since they need exclusive access to the socket until completed. Since xprt_destroy now owns the lock, that means the socket callbacks etc can no longer enqueue any new jobs. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18/09/15 23:00, Trond Myklebust wrote: > On Fri, 2015-09-18 at 12:51 -0400, Trond Myklebust wrote: >> On Fri, 2015-09-18 at 12:19 +0100, Suzuki K. Poulose wrote: >>> On 16/09/15 12:17, Jeff Layton wrote: >>>> On Wed, 16 Sep 2015 10:35:49 +0100 >>>> "Suzuki K. Poulose" <suzuki.poulose@arm.com> wrote: >>>> >>>>> From: "Suzuki K. Poulose" <suzuki.poulose@arm.com> >>>>> >>> >>> ... >>> >>>>> + write_unlock_bh(&sk->sk_callback_lock); >>>>> + return; >>>>> + } >>>>> + sock = transport->sock; >>>>> + >>>>> transport->inet = NULL; >>>>> transport->sock = NULL; >>>>> >>>>> @@ -833,6 +838,10 @@ static void xs_reset_transport(struct >>>>> sock_xprt *transport) >>>>> xs_restore_old_callbacks(transport, sk); >>>>> xprt_clear_connected(xprt); >>>>> write_unlock_bh(&sk->sk_callback_lock); >>>>> + ... >> >> So how about the following patch? It should apply cleanly on top of >> the >> first one (which is still needed, btw). > > Having thought some more about this, I think the safest thing in order > to avoid races is simply to have the shutdown set XPRT_LOCKED. That way > we can keep the current desirable behaviour of closing the socket > automatically any time the server initiates a close, while still > preventing it during shutdown. > > 8<------------------------------------------------------------------- > From 3e1c9d8092e2fa4509d84a00fcf21e7e0c581fe2 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Fri, 18 Sep 2015 15:53:24 -0400 > Subject: [PATCH] SUNRPC: Lock the transport layer on shutdown > > Avoid all races with the connect/disconnect handlers by taking the > transport lock. > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > net/sunrpc/xprt.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index ab5dd621ae0c..2e98f4a243e5 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -614,6 +614,7 @@ static void xprt_autoclose(struct work_struct *work) > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > xprt->ops->close(xprt); > xprt_release_write(xprt, NULL); > + wake_up_bit(&xprt->state, XPRT_LOCKED); > } > > /** > @@ -723,6 +724,7 @@ void xprt_unlock_connect(struct rpc_xprt *xprt, void *cookie) > xprt->ops->release_xprt(xprt, NULL); > out: > spin_unlock_bh(&xprt->transport_lock); > + wake_up_bit(&xprt->state, XPRT_LOCKED); > } > > /** > @@ -1394,6 +1396,10 @@ out: > static void xprt_destroy(struct rpc_xprt *xprt) > { > dprintk("RPC: destroying transport %p\n", xprt); > + > + /* Exclude transport connect/disconnect handlers */ > + wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_UNINTERRUPTIBLE); > + > del_timer_sync(&xprt->timer); > > rpc_xprt_debugfs_unregister(xprt); > That works for me, please feel free to add: Reported-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Thanks Suzuki -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index ab5dd621ae0c..2e98f4a243e5 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -614,6 +614,7 @@ static void xprt_autoclose(struct work_struct *work) clear_bit(XPRT_CLOSE_WAIT, &xprt->state); xprt->ops->close(xprt); xprt_release_write(xprt, NULL); + wake_up_bit(&xprt->state, XPRT_LOCKED); } /** @@ -723,6 +724,7 @@ void xprt_unlock_connect(struct rpc_xprt *xprt, void *cookie) xprt->ops->release_xprt(xprt, NULL); out: spin_unlock_bh(&xprt->transport_lock); + wake_up_bit(&xprt->state, XPRT_LOCKED); } /** @@ -1394,6 +1396,10 @@ out: static void xprt_destroy(struct rpc_xprt *xprt) { dprintk("RPC: destroying transport %p\n", xprt); + + /* Exclude transport connect/disconnect handlers */ + wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_UNINTERRUPTIBLE); + del_timer_sync(&xprt->timer); rpc_xprt_debugfs_unregister(xprt);