diff mbox

[v2,06/10] nfsd: Protect the nfs4_file delegation fields using the fi_lock

Message ID 1405521125-2303-7-git-send-email-jlayton@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton July 16, 2014, 2:32 p.m. UTC
Move more of the delegation fields to be protected by the fi_lock. It's
more granular than the state_lock and in later patches we'll want to
be able to rely on it in addition to the state_lock.

Also, the current code in nfs4_setlease calls vfs_setlease and uses the
client_mutex to ensure that it doesn't disappear before we can hash the
delegation. With the client_mutex gone, we'll have a potential race
condition.

It's possible that the delegation could be recalled after we acquire the
lease but before we ever get around to hashing it. If that happens, then
we'd have a nfs4_file that *thinks* it has a delegation, when it
actually has none.

Attempt to acquire a delegation. If that succeeds, take the state_lock
and recheck to make sure the lease is still there. If it is, then take
the fi_lock and set up the rest of the delegation fields. This prevents
the race because the delegation break workqueue job must also take the
state_lock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/nfsd/nfs4state.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 13 deletions(-)

Comments

Christoph Hellwig July 16, 2014, 6:09 p.m. UTC | #1
>  	dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> +	spin_lock(&fp->fi_lock);
>  	list_del_init(&dp->dl_perclnt);
>  	list_del_init(&dp->dl_recall_lru);
> -	spin_lock(&fp->fi_lock);
>  	list_del_init(&dp->dl_perfile);
>  	spin_unlock(&fp->fi_lock);
> -	spin_unlock(&state_lock);
>  	if (fp) {
>  		nfs4_put_deleg_lease(fp);
> -		put_nfs4_file(fp);
>  		dp->dl_file = NULL;
>  	}
> +	spin_unlock(&state_lock);
> +	put_nfs4_file(fp);
>  }

Unless I've missed something put_nfs4_file does not handle a NULL
pointer argument in the current tree, so this needs a NULL check
for fp.

> +	spin_lock(&state_lock);
> +	/* Did the lease get broken before we took the lock? */
> +	status = -EAGAIN;
> +	if (!file_has_lease(fl->fl_file))
> +		goto out_unlock;

So if the delegation we just tried to add got broken, but someone else
managed to add one this would also return true.  But could that happen
either in real life or in theory?  Shouldn't we instead have an
atomic flag on the delegation that it got broken which we could check
here?

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton July 16, 2014, 7:04 p.m. UTC | #2
On Wed, 16 Jul 2014 11:09:43 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> >  	dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> > +	spin_lock(&fp->fi_lock);
> >  	list_del_init(&dp->dl_perclnt);
> >  	list_del_init(&dp->dl_recall_lru);
> > -	spin_lock(&fp->fi_lock);
> >  	list_del_init(&dp->dl_perfile);
> >  	spin_unlock(&fp->fi_lock);
> > -	spin_unlock(&state_lock);
> >  	if (fp) {
> >  		nfs4_put_deleg_lease(fp);
> > -		put_nfs4_file(fp);
> >  		dp->dl_file = NULL;
> >  	}
> > +	spin_unlock(&state_lock);
> > +	put_nfs4_file(fp);
> >  }
> 
> Unless I've missed something put_nfs4_file does not handle a NULL
> pointer argument in the current tree, so this needs a NULL check
> for fp.
> 

Well spotted, I'll fix that...

> > +	spin_lock(&state_lock);
> > +	/* Did the lease get broken before we took the lock? */
> > +	status = -EAGAIN;
> > +	if (!file_has_lease(fl->fl_file))
> > +		goto out_unlock;
> 
> So if the delegation we just tried to add got broken, but someone else
> managed to add one this would also return true.  But could that happen
> either in real life or in theory?  Shouldn't we instead have an
> atomic flag on the delegation that it got broken which we could check
> here?
> 

Hmm...maybe. It's a very unlikely race though, and I think the same
sort of race exists today. In fact, it's worse today since we don't do
any checking of the validity of the lease after acquiring it now.

The flag sounds like a good idea, but the code is structured completely
wrong for it currently. The delegation is only hashed after we get a
lease, so the lease break wouldn't find anything to set a flag on.

Quite frankly, I _really_ do not want to have to rework the locking in
the delegation code yet again. I think this scheme is an improvement
over what we have now, even if it's not 100% perfect.

Once we get the scalability set done, I'd like to go back and overhaul
the delegation code. There are a lot of ugly warts here, but fixing
them is really a separate project in its own right.
Christoph Hellwig July 17, 2014, 2:55 p.m. UTC | #3
On Wed, Jul 16, 2014 at 03:04:01PM -0400, Jeff Layton wrote:
> Hmm...maybe. It's a very unlikely race though, and I think the same
> sort of race exists today. In fact, it's worse today since we don't do
> any checking of the validity of the lease after acquiring it now.
> 
> The flag sounds like a good idea, but the code is structured completely
> wrong for it currently. The delegation is only hashed after we get a
> lease, so the lease break wouldn't find anything to set a flag on.
> 
> Quite frankly, I _really_ do not want to have to rework the locking in
> the delegation code yet again. I think this scheme is an improvement
> over what we have now, even if it's not 100% perfect.
> 
> Once we get the scalability set done, I'd like to go back and overhaul
> the delegation code. There are a lot of ugly warts here, but fixing
> them is really a separate project in its own right.

But in the old code we had the client lock over all of this, right?

Anyway, if Bruce and you are fine with this I'm not going to block it,
although it seems a little incomplete to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton July 17, 2014, 3:31 p.m. UTC | #4
On Thu, 17 Jul 2014 07:55:34 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Jul 16, 2014 at 03:04:01PM -0400, Jeff Layton wrote:
> > Hmm...maybe. It's a very unlikely race though, and I think the same
> > sort of race exists today. In fact, it's worse today since we don't do
> > any checking of the validity of the lease after acquiring it now.
> > 
> > The flag sounds like a good idea, but the code is structured completely
> > wrong for it currently. The delegation is only hashed after we get a
> > lease, so the lease break wouldn't find anything to set a flag on.
> > 
> > Quite frankly, I _really_ do not want to have to rework the locking in
> > the delegation code yet again. I think this scheme is an improvement
> > over what we have now, even if it's not 100% perfect.
> > 
> > Once we get the scalability set done, I'd like to go back and overhaul
> > the delegation code. There are a lot of ugly warts here, but fixing
> > them is really a separate project in its own right.
> 
> But in the old code we had the client lock over all of this, right?
> 

Yes, but it's still possible for the lease to get broken after setting
it but before the delegation is hashed in the existing code. You're
correct though that another nfsd wouldn't be able to set a lease on the
same file though.

One possibility to ensure that can't happen is to check whether
fp->fi_had_conflict became true after vfs_setlease returns. That would
also obviate the need for file_has_lease. I'm still trying to figure
out whether there are potential races there with that approach though.

> Anyway, if Bruce and you are fine with this I'm not going to block it,
> although it seems a little incomplete to me.

Thanks, and I agree. I think we're going to need to revisit this area,
but I suspect we need to do some surgery to the generic lease handling
code too.

For one thing, it's a little disturbing to me that vfs_setlease returns
a pointer to the file_lock that's sitting on the i_flock list. I'm not
sure that it's a problem today, but it seems like a caller could end up
trying to use that pointer after time_out_leases has been called, or
after the lease has been otherwise deleted.
diff mbox

Patch

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fd4deb049ddf..91a33f8896dd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -624,6 +624,8 @@  nfs4_put_delegation(struct nfs4_delegation *dp)
 
 static void nfs4_put_deleg_lease(struct nfs4_file *fp)
 {
+	lockdep_assert_held(&state_lock);
+
 	if (!fp->fi_lease)
 		return;
 	if (atomic_dec_and_test(&fp->fi_delegees)) {
@@ -643,11 +645,10 @@  static void
 hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
 {
 	lockdep_assert_held(&state_lock);
+	lockdep_assert_held(&fp->fi_lock);
 
 	dp->dl_stid.sc_type = NFS4_DELEG_STID;
-	spin_lock(&fp->fi_lock);
 	list_add(&dp->dl_perfile, &fp->fi_delegations);
-	spin_unlock(&fp->fi_lock);
 	list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
 }
 
@@ -659,17 +660,17 @@  unhash_delegation(struct nfs4_delegation *dp)
 
 	spin_lock(&state_lock);
 	dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
+	spin_lock(&fp->fi_lock);
 	list_del_init(&dp->dl_perclnt);
 	list_del_init(&dp->dl_recall_lru);
-	spin_lock(&fp->fi_lock);
 	list_del_init(&dp->dl_perfile);
 	spin_unlock(&fp->fi_lock);
-	spin_unlock(&state_lock);
 	if (fp) {
 		nfs4_put_deleg_lease(fp);
-		put_nfs4_file(fp);
 		dp->dl_file = NULL;
 	}
+	spin_unlock(&state_lock);
+	put_nfs4_file(fp);
 }
 
 static void destroy_revoked_delegation(struct nfs4_delegation *dp)
@@ -3493,7 +3494,7 @@  static int nfs4_setlease(struct nfs4_delegation *dp)
 {
 	struct nfs4_file *fp = dp->dl_file;
 	struct file_lock *fl;
-	int status;
+	int status = 0;
 
 	fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
 	if (!fl)
@@ -3501,15 +3502,31 @@  static int nfs4_setlease(struct nfs4_delegation *dp)
 	fl->fl_file = find_readable_file(fp);
 	status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
 	if (status)
-		goto out_free;
+		goto out_fput;
+	spin_lock(&state_lock);
+	/* Did the lease get broken before we took the lock? */
+	status = -EAGAIN;
+	if (!file_has_lease(fl->fl_file))
+		goto out_unlock;
+	spin_lock(&fp->fi_lock);
+	/* Race breaker */
+	if (fp->fi_lease) {
+		status = 0;
+		atomic_inc(&fp->fi_delegees);
+		hash_delegation_locked(dp, fp);
+		spin_unlock(&fp->fi_lock);
+		goto out_unlock;
+	}
 	fp->fi_lease = fl;
 	fp->fi_deleg_file = fl->fl_file;
 	atomic_set(&fp->fi_delegees, 1);
-	spin_lock(&state_lock);
 	hash_delegation_locked(dp, fp);
+	spin_unlock(&fp->fi_lock);
 	spin_unlock(&state_lock);
 	return 0;
-out_free:
+out_unlock:
+	spin_unlock(&state_lock);
+out_fput:
 	if (fl->fl_file)
 		fput(fl->fl_file);
 	locks_free_lock(fl);
@@ -3518,19 +3535,27 @@  out_free:
 
 static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
 {
+	int status = 0;
+
 	if (fp->fi_had_conflict)
 		return -EAGAIN;
 	get_nfs4_file(fp);
+	spin_lock(&state_lock);
+	spin_lock(&fp->fi_lock);
 	dp->dl_file = fp;
-	if (!fp->fi_lease)
+	if (!fp->fi_lease) {
+		spin_unlock(&fp->fi_lock);
+		spin_unlock(&state_lock);
 		return nfs4_setlease(dp);
-	spin_lock(&state_lock);
+	}
 	atomic_inc(&fp->fi_delegees);
 	if (fp->fi_had_conflict) {
-		spin_unlock(&state_lock);
-		return -EAGAIN;
+		status = -EAGAIN;
+		goto out_unlock;
 	}
 	hash_delegation_locked(dp, fp);
+out_unlock:
+	spin_unlock(&fp->fi_lock);
 	spin_unlock(&state_lock);
 	return 0;
 }