[v5,00/25] Fix delegation behaviour when server revokes some state
diff mbox

Message ID 1474140727.7526.1.camel@primarydata.com
State New
Headers show

Commit Message

Trond Myklebust Sept. 17, 2016, 7:32 p.m. UTC
On Sat, 2016-09-17 at 15:16 -0400, Oleg Drokin wrote:
> On Sep 17, 2016, at 2:18 PM, Trond Myklebust wrote:

> 

> > 

> > 

> > > 

> > > On Sep 17, 2016, at 14:04, Oleg Drokin <green@linuxhacker.ru>

> > > wrote:

> > > 

> > > 

> > > On Sep 17, 2016, at 1:13 AM, Trond Myklebust wrote:

> > > 

> > > > 

> > > > According to RFC5661, if any of the SEQUENCE status bits

> > > > SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,

> > > > SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,

> > > > SEQ4_STATUS_ADMIN_STATE_REVOKED,

> > > > or SEQ4_STATUS_RECALLABLE_STATE_REVOKED are set, then we need

> > > > to use

> > > > TEST_STATEID to figure out which stateids have been revoked, so

> > > > we

> > > > can acknowledge the loss of state using FREE_STATEID.

> > > > 

> > > > While we already do this for open and lock state, we have not

> > > > been doing

> > > > so for all the delegations.

> > > > 

> > > > v2: nfs_v4_2_minor_ops needs to set .test_and_free_expired too

> > > > v3: Now with added lock revoke fixes and

> > > > close/delegreturn/locku fixes

> > > > v4: Close a bunch of corner cases

> > > > v5: Report revoked delegations as invalid in

> > > > nfs_have_delegation()

> > > >  Fix an infinite loop in nfs_reap_expired_delegations.

> > > >  Fixes for other looping behaviour

> > > 

> > > This time around the loop seems to be more tight,

> > > in userspace process:

> > > 

> > > [ 9197.256571] --> nfs41_call_sync_prepare data->seq_server

> > > ffff8800a73ce000

> > > [ 9197.256572] --> nfs41_setup_sequence

> > > [ 9197.256573] --> nfs4_alloc_slot used_slots=0000

> > > highest_used=4294967295 max_slots=31

> > > [ 9197.256574] <-- nfs4_alloc_slot used_slots=0001 highest_used=0

> > > slotid=0

> > > [ 9197.256574] <-- nfs41_setup_sequence slotid=0 seqid=14013800

> > > [ 9197.256582] encode_sequence: sessionid=1474126170:1:2:0

> > > seqid=14013800 slotid=0 max_slotid=0 cache_this=1

> > > [ 9197.256755] --> nfs4_alloc_slot used_slots=0001 highest_used=0

> > > max_slots=31

> > > [ 9197.256756] <-- nfs4_alloc_slot used_slots=0003 highest_used=1

> > > slotid=1

> > > [ 9197.256757] nfs4_free_slot: slotid 1 highest_used_slotid 0

> > > [ 9197.256758] nfs41_sequence_process: Error 0 free the slot 

> > > [ 9197.256760] nfs4_free_slot: slotid 0 highest_used_slotid

> > > 4294967295

> > > [ 9197.256779] --> nfs_put_client({2})

> > 

> > What operation is the userspace process hanging on? Do you have a

> > stack trace for it?

> 

> seems to be open_create->truncate->ssetattr coming from:

> cp /bin/sleep /mnt/nfs2/racer/12

> 

> (gdb) bt

> #0  nfs41_setup_sequence (session=0xffff88005a853800,

> args=0xffff8800a7253b80, 

>     res=0xffff8800a7253b48, task=0xffff8800b0eb0f00)

>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:876

> #1  0xffffffff813a751c in nfs41_call_sync_prepare (task=<optimized

> out>, 

>     calldata=0xffff8800a7253b80)

>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:966

> #2  0xffffffff8185c639 in rpc_prepare_task (task=<optimized out>)

>     at /home/green/bk/linux-test/net/sunrpc/sched.c:683

> #3  0xffffffff8185f12b in __rpc_execute (task=0xffff88005a853800)

>     at /home/green/bk/linux-test/net/sunrpc/sched.c:775

> #4  0xffffffff818617b4 in rpc_execute (task=0xffff88005a853800)

>     at /home/green/bk/linux-test/net/sunrpc/sched.c:843

> #5  0xffffffff818539b9 in rpc_run_task

> (task_setup_data=0xffff8800a7253a50)

>     at /home/green/bk/linux-test/net/sunrpc/clnt.c:1052

> #6  0xffffffff813a75e3 in nfs4_call_sync_sequence (clnt=<optimized

> out>, 

>     server=<optimized out>, msg=<optimized out>, args=<optimized

> out>, 

>     res=<optimized out>) at /home/green/bk/linux-

> test/fs/nfs/nfs4proc.c:1051

> #7  0xffffffff813b4645 in nfs4_call_sync (cache_reply=<optimized

> out>, 

>     res=<optimized out>, args=<optimized out>, msg=<optimized out>, 

>     server=<optimized out>, clnt=<optimized out>)

>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:1069

> #8  _nfs4_do_setattr (state=<optimized out>, cred=<optimized out>, 

>     res=<optimized out>, arg=<optimized out>, inode=<optimized out>)

> ---Type <return> to continue, or q <return> to quit---

>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:2916

> #9  nfs4_do_setattr (inode=0xffff880079b152a8, cred=<optimized out>, 

>     fattr=<optimized out>, sattr=<optimized out>,

> state=0xffff880060588e00, 

>     ilabel=<optimized out>, olabel=0x0 <irq_stack_union>)

>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:2955

> #10 0xffffffff813b4a16 in nfs4_proc_setattr (dentry=<optimized out>, 

>     fattr=0xffff8800a7253b80, sattr=0xffff8800a7253b48)

>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:3684

> #11 0xffffffff8138f1cb in nfs_setattr (dentry=0xffff8800740c1000, 



Cool! Does the following help?

8<------------------------------------------------------------
From 98ddf32a99cfe00e9ae108044e2be67522987511 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>

Date: Sat, 17 Sep 2016 15:27:10 -0400
Subject: [PATCH] NFS: Don't assume a stateid represents a delegation in
 nfs4_do_handle_exception

If the stateid being passed to the error handler is not a delegation
stateid, we want to mark the locks/open_state it does represent for
recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

---
 fs/nfs/nfs4proc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.7.4

Patch
diff mbox

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7cecb1d7a217..acc572c51735 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -397,6 +397,10 @@  static int nfs4_do_handle_exception(struct nfs_server *server,
 	exception->delay = 0;
 	exception->recovering = 0;
 	exception->retry = 0;
+
+	if (stateid == NULL && state != NULL)
+		stateid = &state->stateid;
+
 	switch(errorcode) {
 		case 0:
 			return 0;
@@ -405,7 +409,7 @@  static int nfs4_do_handle_exception(struct nfs_server *server,
 		case -NFS4ERR_EXPIRED:
 		case -NFS4ERR_BAD_STATEID:
 			if (inode != NULL && stateid != NULL) {
-				nfs_inode_find_delegation_state_and_recover(inode,
+				nfs_inode_find_state_and_recover(inode,
 						stateid);
 				goto wait_on_recovery;
 			}