diff mbox

NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID

Message ID 8737k2354z.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Oct. 11, 2016, 10:14 p.m. UTC
On Sat, Oct 08 2016, Trond Myklebust wrote:

>> +	set_bit(NFS_OWNER_STALE, &sp->so_flags);
>> +	/* Delegation recall might insist on using this open_owner
>> +	 * so reset it to force a new 'confirm' stage to be initiated.
>> +	 */
>> +	sp->so_seqid.create_time = ktime_get();
>
> Hmm…. If you’re going to do this, then why mark the state_owner as stale at all? Why not just reset sp->so_seqid.flags and call nfs4_clear_open_state() to let the state recovery try to do what it can.

Good point ... I was too focused on how I got there and what I had
tested...

I cannot test this properly as I don't have a server that returned
BAD_SEQID, and I doubt I can convince the customer to test some new as
they have something that works, but I think this is correct.

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Date: Fri, 3 Jun 2016 14:58:02 +1000
Subject: [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up.  nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree.  As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing.  The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned.  For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner.  With this the indefinite retries no longer happen.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/nfs/nfs4state.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)
diff mbox

Patch

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 74cc32490c7a..c4d721ed493d 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -496,19 +496,10 @@  nfs4_alloc_state_owner(struct nfs_server *server,
 static void
 nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 {
-	struct rb_node *rb_node = &sp->so_server_node;
-
-	if (!RB_EMPTY_NODE(rb_node)) {
-		struct nfs_server *server = sp->so_server;
-		struct nfs_client *clp = server->nfs_client;
-
-		spin_lock(&clp->cl_lock);
-		if (!RB_EMPTY_NODE(rb_node)) {
-			rb_erase(rb_node, &server->state_owners);
-			RB_CLEAR_NODE(rb_node);
-		}
-		spin_unlock(&clp->cl_lock);
-	}
+	/* Delegation recall might insist on using this open_owner
+	 * so reset it to force a new 'confirm' stage to be initiated.
+	 */
+	sp->so_seqid.create_time = ktime_get();
 }
 
 static void nfs4_free_state_owner(struct nfs4_state_owner *sp)