diff mbox

[v5,00/25] Fix delegation behaviour when server revokes some state

Message ID 1474140727.7526.1.camel@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust Sept. 17, 2016, 7:32 p.m. UTC
On Sat, 2016-09-17 at 15:16 -0400, Oleg Drokin wrote:
> On Sep 17, 2016, at 2:18 PM, Trond Myklebust wrote:
> 
> > 
> > 
> > > 
> > > On Sep 17, 2016, at 14:04, Oleg Drokin <green@linuxhacker.ru>
> > > wrote:
> > > 
> > > 
> > > On Sep 17, 2016, at 1:13 AM, Trond Myklebust wrote:
> > > 
> > > > 
> > > > According to RFC5661, if any of the SEQUENCE status bits
> > > > SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
> > > > SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,
> > > > SEQ4_STATUS_ADMIN_STATE_REVOKED,
> > > > or SEQ4_STATUS_RECALLABLE_STATE_REVOKED are set, then we need
> > > > to use
> > > > TEST_STATEID to figure out which stateids have been revoked, so
> > > > we
> > > > can acknowledge the loss of state using FREE_STATEID.
> > > > 
> > > > While we already do this for open and lock state, we have not
> > > > been doing
> > > > so for all the delegations.
> > > > 
> > > > v2: nfs_v4_2_minor_ops needs to set .test_and_free_expired too
> > > > v3: Now with added lock revoke fixes and
> > > > close/delegreturn/locku fixes
> > > > v4: Close a bunch of corner cases
> > > > v5: Report revoked delegations as invalid in
> > > > nfs_have_delegation()
> > > >  Fix an infinite loop in nfs_reap_expired_delegations.
> > > >  Fixes for other looping behaviour
> > > 
> > > This time around the loop seems to be more tight,
> > > in userspace process:
> > > 
> > > [ 9197.256571] --> nfs41_call_sync_prepare data->seq_server
> > > ffff8800a73ce000
> > > [ 9197.256572] --> nfs41_setup_sequence
> > > [ 9197.256573] --> nfs4_alloc_slot used_slots=0000
> > > highest_used=4294967295 max_slots=31
> > > [ 9197.256574] <-- nfs4_alloc_slot used_slots=0001 highest_used=0
> > > slotid=0
> > > [ 9197.256574] <-- nfs41_setup_sequence slotid=0 seqid=14013800
> > > [ 9197.256582] encode_sequence: sessionid=1474126170:1:2:0
> > > seqid=14013800 slotid=0 max_slotid=0 cache_this=1
> > > [ 9197.256755] --> nfs4_alloc_slot used_slots=0001 highest_used=0
> > > max_slots=31
> > > [ 9197.256756] <-- nfs4_alloc_slot used_slots=0003 highest_used=1
> > > slotid=1
> > > [ 9197.256757] nfs4_free_slot: slotid 1 highest_used_slotid 0
> > > [ 9197.256758] nfs41_sequence_process: Error 0 free the slot 
> > > [ 9197.256760] nfs4_free_slot: slotid 0 highest_used_slotid
> > > 4294967295
> > > [ 9197.256779] --> nfs_put_client({2})
> > 
> > What operation is the userspace process hanging on? Do you have a
> > stack trace for it?
> 
> seems to be open_create->truncate->ssetattr coming from:
> cp /bin/sleep /mnt/nfs2/racer/12
> 
> (gdb) bt
> #0  nfs41_setup_sequence (session=0xffff88005a853800,
> args=0xffff8800a7253b80, 
>     res=0xffff8800a7253b48, task=0xffff8800b0eb0f00)
>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:876
> #1  0xffffffff813a751c in nfs41_call_sync_prepare (task=<optimized
> out>, 
>     calldata=0xffff8800a7253b80)
>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:966
> #2  0xffffffff8185c639 in rpc_prepare_task (task=<optimized out>)
>     at /home/green/bk/linux-test/net/sunrpc/sched.c:683
> #3  0xffffffff8185f12b in __rpc_execute (task=0xffff88005a853800)
>     at /home/green/bk/linux-test/net/sunrpc/sched.c:775
> #4  0xffffffff818617b4 in rpc_execute (task=0xffff88005a853800)
>     at /home/green/bk/linux-test/net/sunrpc/sched.c:843
> #5  0xffffffff818539b9 in rpc_run_task
> (task_setup_data=0xffff8800a7253a50)
>     at /home/green/bk/linux-test/net/sunrpc/clnt.c:1052
> #6  0xffffffff813a75e3 in nfs4_call_sync_sequence (clnt=<optimized
> out>, 
>     server=<optimized out>, msg=<optimized out>, args=<optimized
> out>, 
>     res=<optimized out>) at /home/green/bk/linux-
> test/fs/nfs/nfs4proc.c:1051
> #7  0xffffffff813b4645 in nfs4_call_sync (cache_reply=<optimized
> out>, 
>     res=<optimized out>, args=<optimized out>, msg=<optimized out>, 
>     server=<optimized out>, clnt=<optimized out>)
>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:1069
> #8  _nfs4_do_setattr (state=<optimized out>, cred=<optimized out>, 
>     res=<optimized out>, arg=<optimized out>, inode=<optimized out>)
> ---Type <return> to continue, or q <return> to quit---
>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:2916
> #9  nfs4_do_setattr (inode=0xffff880079b152a8, cred=<optimized out>, 
>     fattr=<optimized out>, sattr=<optimized out>,
> state=0xffff880060588e00, 
>     ilabel=<optimized out>, olabel=0x0 <irq_stack_union>)
>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:2955
> #10 0xffffffff813b4a16 in nfs4_proc_setattr (dentry=<optimized out>, 
>     fattr=0xffff8800a7253b80, sattr=0xffff8800a7253b48)
>     at /home/green/bk/linux-test/fs/nfs/nfs4proc.c:3684
> #11 0xffffffff8138f1cb in nfs_setattr (dentry=0xffff8800740c1000, 


Cool! Does the following help?

8<------------------------------------------------------------
From 98ddf32a99cfe00e9ae108044e2be67522987511 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Sat, 17 Sep 2016 15:27:10 -0400
Subject: [PATCH] NFS: Don't assume a stateid represents a delegation in
 nfs4_do_handle_exception

If the stateid being passed to the error handler is not a delegation
stateid, we want to mark the locks/open_state it does represent for
recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/nfs4proc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.7.4
diff mbox

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7cecb1d7a217..acc572c51735 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -397,6 +397,10 @@  static int nfs4_do_handle_exception(struct nfs_server *server,
 	exception->delay = 0;
 	exception->recovering = 0;
 	exception->retry = 0;
+
+	if (stateid == NULL && state != NULL)
+		stateid = &state->stateid;
+
 	switch(errorcode) {
 		case 0:
 			return 0;
@@ -405,7 +409,7 @@  static int nfs4_do_handle_exception(struct nfs_server *server,
 		case -NFS4ERR_EXPIRED:
 		case -NFS4ERR_BAD_STATEID:
 			if (inode != NULL && stateid != NULL) {
-				nfs_inode_find_delegation_state_and_recover(inode,
+				nfs_inode_find_state_and_recover(inode,
 						stateid);
 				goto wait_on_recovery;
 			}