diff mbox

[v3,015/114] nfsd: Allow struct nfsd4_compound_state to cache the nfs4_client

Message ID 20140703183115.397af08e@tlielax.poochiereds.net (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton July 3, 2014, 10:31 p.m. UTC
On Thu, 3 Jul 2014 17:35:26 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Jul 03, 2014 at 04:32:59PM -0400, J. Bruce Fields wrote:
> > On Mon, Jun 30, 2014 at 11:48:44AM -0400, Jeff Layton wrote:
> > > We want to use the nfsd4_compound_state to cache the nfs4_client in
> > > order to optimise away extra lookups of the clid.
> > > 
> > > In the v4.0 case, we use this to ensure that we only have to look up the
> > > client at most once per compound for each call into lookup_clientid. For
> > > v4.1+ we set the pointer in the cstate during SEQUENCE processing so we
> > > should never need to do a search for it.
> > 
> > The connectathon locking test is failing for me in the nfsv4/krb5i case
> > as of this commit.
> > 
> > Which makes no sense to me whatsoever, so it's entirely possible this is
> > some unrelated problem on my side.  I'll let you know when I've figured
> > out anything more.
> 
> It's intermittent.
> 
> I've reproduced it on the previous commit so I know at least that this
> one isn't at fault.
> 
> I doubt it's really dependent on krb5i, at most that's probably just
> making it more likely to reproduce.
> 
> --b.

Bruce,

Does this patch help? I suspect this is where the bug crept in, but I'm
unclear on why it would be intermittent...

FWIW, this all gets cleaned up in a later patch that changes how the
refcounting on lock and openowners works.

Comments

Jeff Layton July 4, 2014, 12:14 p.m. UTC | #1
On Thu, 3 Jul 2014 18:31:15 -0400
Jeff Layton <jeff.layton@primarydata.com> wrote:

> On Thu, 3 Jul 2014 17:35:26 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Thu, Jul 03, 2014 at 04:32:59PM -0400, J. Bruce Fields wrote:
> > > On Mon, Jun 30, 2014 at 11:48:44AM -0400, Jeff Layton wrote:
> > > > We want to use the nfsd4_compound_state to cache the nfs4_client in
> > > > order to optimise away extra lookups of the clid.
> > > > 
> > > > In the v4.0 case, we use this to ensure that we only have to look up the
> > > > client at most once per compound for each call into lookup_clientid. For
> > > > v4.1+ we set the pointer in the cstate during SEQUENCE processing so we
> > > > should never need to do a search for it.
> > > 
> > > The connectathon locking test is failing for me in the nfsv4/krb5i case
> > > as of this commit.
> > > 
> > > Which makes no sense to me whatsoever, so it's entirely possible this is
> > > some unrelated problem on my side.  I'll let you know when I've figured
> > > out anything more.
> > 
> > It's intermittent.
> > 
> > I've reproduced it on the previous commit so I know at least that this
> > one isn't at fault.
> > 
> > I doubt it's really dependent on krb5i, at most that's probably just
> > making it more likely to reproduce.
> > 
> > --b.
> 
> Bruce,
> 
> Does this patch help? I suspect this is where the bug crept in, but I'm
> unclear on why it would be intermittent...
> 
> FWIW, this all gets cleaned up in a later patch that changes how the
> refcounting on lock and openowners works.
> 

I was finally able to reproduce this after a running the cthon lock
tests in a loop, with krb5i. With the patch that I sent earlier, I was
able to run 100 iterations of it without a failure, so I think that was
the bug.

Cheers!
J. Bruce Fields July 7, 2014, 6:23 p.m. UTC | #2
On Fri, Jul 04, 2014 at 08:14:49AM -0400, Jeff Layton wrote:
> On Thu, 3 Jul 2014 18:31:15 -0400
> Jeff Layton <jeff.layton@primarydata.com> wrote:
> 
> > On Thu, 3 Jul 2014 17:35:26 -0400
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Thu, Jul 03, 2014 at 04:32:59PM -0400, J. Bruce Fields wrote:
> > > > On Mon, Jun 30, 2014 at 11:48:44AM -0400, Jeff Layton wrote:
> > > > > We want to use the nfsd4_compound_state to cache the nfs4_client in
> > > > > order to optimise away extra lookups of the clid.
> > > > > 
> > > > > In the v4.0 case, we use this to ensure that we only have to look up the
> > > > > client at most once per compound for each call into lookup_clientid. For
> > > > > v4.1+ we set the pointer in the cstate during SEQUENCE processing so we
> > > > > should never need to do a search for it.
> > > > 
> > > > The connectathon locking test is failing for me in the nfsv4/krb5i case
> > > > as of this commit.
> > > > 
> > > > Which makes no sense to me whatsoever, so it's entirely possible this is
> > > > some unrelated problem on my side.  I'll let you know when I've figured
> > > > out anything more.
> > > 
> > > It's intermittent.
> > > 
> > > I've reproduced it on the previous commit so I know at least that this
> > > one isn't at fault.
> > > 
> > > I doubt it's really dependent on krb5i, at most that's probably just
> > > making it more likely to reproduce.
> > > 
> > > --b.
> > 
> > Bruce,
> > 
> > Does this patch help? I suspect this is where the bug crept in, but I'm
> > unclear on why it would be intermittent...
> > 
> > FWIW, this all gets cleaned up in a later patch that changes how the
> > refcounting on lock and openowners works.
> > 
> 
> I was finally able to reproduce this after a running the cthon lock
> tests in a loop, with krb5i. With the patch that I sent earlier, I was
> able to run 100 iterations of it without a failure, so I think that was
> the bug.

Thanks!  That seems to be holding up for me too.

I'll continue slowly applying your patches to for-3.17.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 84884320b983eaeb68c652de8fe4b48aad4052c3 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@primarydata.com>
Date: Thu, 3 Jul 2014 18:20:31 -0400
Subject: [PATCH] nfsd: fix lock stateid cleanup on error in nfsd4_lock

commit 43a1b041e3b changed nfsd4_lock to call release_lockowner_if_empty
if the the lock stateid was new and the function was going to return an
error.

This is wrong. The lockowner will never be empty in that situation since
it still has the new lock stateid attached to it. Change it to call
release_lock_stateid instead, which will destroy the lock stateid and
then release the lockowner if it's empty at that point.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/nfsd/nfs4state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a22c73f14a17..b2ea0a06fbf2 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4600,7 +4600,7 @@  nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	}
 out:
 	if (status && new_state)
-		release_lockowner_if_empty(lock_sop);
+		release_lock_stateid(lock_stp);
 	nfsd4_bump_seqid(cstate, status);
 	if (!cstate->replay_owner)
 		nfs4_unlock_state();
-- 
1.9.3