[v2] nfsd: Don't release the callback slot unless it was actually held
diff mbox series

Message ID 20190405155437.5545-1-trond.myklebust@hammerspace.com
State New
Headers show
Series
  • [v2] nfsd: Don't release the callback slot unless it was actually held
Related show

Commit Message

Trond Myklebust April 5, 2019, 3:54 p.m. UTC
If there are multiple callbacks queued, waiting for the callback
slot when the callback gets shut down, then they all currently
end up acting as if they hold the slot, and call
nfsd4_cb_sequence_done() resulting in interesting side-effects.

In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
causes a loop back to nfsd4_cb_prepare() without first freeing the
slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
a second time.

This patch therefore adds a boolean to track whether or not the
callback did pick up the slot, so that it can do the right thing
in these 2 cases.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
v2: try to restart the callback if we hit nfsd4_cb_sequence_done()
    without a slot.

 fs/nfsd/nfs4callback.c | 8 +++++++-
 fs/nfsd/state.h        | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Trond Myklebust April 8, 2019, 3:20 p.m. UTC | #1
On Fri, 2019-04-05 at 08:54 -0700, Trond Myklebust wrote:
> If there are multiple callbacks queued, waiting for the callback
> slot when the callback gets shut down, then they all currently
> end up acting as if they hold the slot, and call
> nfsd4_cb_sequence_done() resulting in interesting side-effects.
> 
> In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
> causes a loop back to nfsd4_cb_prepare() without first freeing the
> slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
> a second time.
> 
> This patch therefore adds a boolean to track whether or not the
> callback did pick up the slot, so that it can do the right thing
> in these 2 cases.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> v2: try to restart the callback if we hit nfsd4_cb_sequence_done()
>     without a slot.
> 
> 

Hi Bruce,

Should this patch perhaps be considered for stable? The callback slot
leak is permanent (or at least for the lifetime of the nfs4_client).

Cheers
  Trond
J. Bruce Fields April 8, 2019, 4:41 p.m. UTC | #2
On Mon, Apr 08, 2019 at 03:20:48PM +0000, Trond Myklebust wrote:
> On Fri, 2019-04-05 at 08:54 -0700, Trond Myklebust wrote:
> > If there are multiple callbacks queued, waiting for the callback
> > slot when the callback gets shut down, then they all currently
> > end up acting as if they hold the slot, and call
> > nfsd4_cb_sequence_done() resulting in interesting side-effects.
> > 
> > In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
> > causes a loop back to nfsd4_cb_prepare() without first freeing the
> > slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
> > a second time.
> > 
> > This patch therefore adds a boolean to track whether or not the
> > callback did pick up the slot, so that it can do the right thing
> > in these 2 cases.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> > v2: try to restart the callback if we hit nfsd4_cb_sequence_done()
> >     without a slot.
> > 
> > 
> 
> Hi Bruce,
> 
> Should this patch perhaps be considered for stable? The callback slot
> leak is permanent (or at least for the lifetime of the nfs4_client).

Makes sense to me; I'll queue it up for 5.1 and stable.

What were the original symptoms that prompted this?

--b.
Trond Myklebust April 8, 2019, 6:07 p.m. UTC | #3
On Mon, 2019-04-08 at 12:41 -0400, J. Bruce Fields wrote:
> On Mon, Apr 08, 2019 at 03:20:48PM +0000, Trond Myklebust wrote:
> > On Fri, 2019-04-05 at 08:54 -0700, Trond Myklebust wrote:
> > > If there are multiple callbacks queued, waiting for the callback
> > > slot when the callback gets shut down, then they all currently
> > > end up acting as if they hold the slot, and call
> > > nfsd4_cb_sequence_done() resulting in interesting side-effects.
> > > 
> > > In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
> > > causes a loop back to nfsd4_cb_prepare() without first freeing
> > > the
> > > slot, which causes a deadlock when nfsd41_cb_get_slot() gets
> > > called
> > > a second time.
> > > 
> > > This patch therefore adds a boolean to track whether or not the
> > > callback did pick up the slot, so that it can do the right thing
> > > in these 2 cases.
> > > 
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > > v2: try to restart the callback if we hit
> > > nfsd4_cb_sequence_done()
> > >     without a slot.
> > > 
> > > 
> > 
> > Hi Bruce,
> > 
> > Should this patch perhaps be considered for stable? The callback
> > slot
> > leak is permanent (or at least for the lifetime of the
> > nfs4_client).
> 
> Makes sense to me; I'll queue it up for 5.1 and stable.
> 
> What were the original symptoms that prompted this?
> 

We were using the NFSv4.2 client for doing CLONE against a knfsd
server, and eventually ended up with a bunch of CB_RECALL requests that
were seen hanging forever on the server.

Patch
diff mbox series

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index d219159b98af..7caa3801ce72 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1010,8 +1010,9 @@  static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
 	cb->cb_seq_status = 1;
 	cb->cb_status = 0;
 	if (minorversion) {
-		if (!nfsd41_cb_get_slot(clp, task))
+		if (!cb->cb_holds_slot && !nfsd41_cb_get_slot(clp, task))
 			return;
+		cb->cb_holds_slot = true;
 	}
 	rpc_call_start(task);
 }
@@ -1038,6 +1039,9 @@  static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		return true;
 	}
 
+	if (!cb->cb_holds_slot)
+		goto need_restart;
+
 	switch (cb->cb_seq_status) {
 	case 0:
 		/*
@@ -1076,6 +1080,7 @@  static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 			cb->cb_seq_status);
 	}
 
+	cb->cb_holds_slot = false;
 	clear_bit(0, &clp->cl_cb_slot_busy);
 	rpc_wake_up_next(&clp->cl_cb_waitq);
 	dprintk("%s: freed slot, new seqid=%d\n", __func__,
@@ -1283,6 +1288,7 @@  void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp,
 	cb->cb_seq_status = 1;
 	cb->cb_status = 0;
 	cb->cb_need_restart = false;
+	cb->cb_holds_slot = false;
 }
 
 void nfsd4_run_cb(struct nfsd4_callback *cb)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 396c76755b03..9d6cb246c6c5 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -70,6 +70,7 @@  struct nfsd4_callback {
 	int cb_seq_status;
 	int cb_status;
 	bool cb_need_restart;
+	bool cb_holds_slot;
 };
 
 struct nfsd4_callback_ops {