nfsd: fix RELEASE_LOCKOWNER

Message ID	170589589641.23031.16356786177193106749@noble.neil.brown.name (mailing list archive)
State	New
Headers	show Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFAB91849 for <linux-nfs@vger.kernel.org>; Mon, 22 Jan 2024 03:58:28 +0000 (UTC) Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk MIME-Version: 1.0 From: "NeilBrown" <neilb@suse.de> To: Chuck Lever <chuck.lever@oracle.com>, Jeff Layton <jlayton@kernel.org>, Dai Ngo <Dai.Ngo@oracle.com>, Olga Kornievskaia <kolga@netapp.com>, Tom Talpey <tom@talpey.com> Cc: linux-nfs@vger.kernel.org Subject: [PATCH] nfsd: fix RELEASE_LOCKOWNER Date: Mon, 22 Jan 2024 14:58:16 +1100 Message-id: <170589589641.23031.16356786177193106749@noble.neil.brown.name>
Series	nfsd: fix RELEASE_LOCKOWNER \| expand nfsd: fix RELEASE_LOCKOWNER

NeilBrown Jan. 22, 2024, 3:58 a.m. UTC

The test on so_count in nfsd4_release_lockowner() is nonsense and
harmful.  Revert to using check_for_locks(), changing that to not sleep.

First: harmful.
As is documented in the kdoc comment for nfsd4_release_lockowner(), the
test on so_count can transiently return a false positive resulting in a
return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
clearly a protocol violation and with the Linux NFS client it can cause
incorrect behaviour.

If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
processing a LOCK request which failed because, at the time that request
was received, the given owner held a conflicting lock, then the nfsd
thread processing that LOCK request can hold a reference (conflock) to
the lock owner that causes nfsd4_release_lockowner() to return an
incorrect error.

The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
it knows that the error is impossible.  It assumes the lock owner was in
fact released so it feels free to use the same lock owner identifier in
some later locking request.

When it does reuse a lock owner identifier for which a previous RELEASE
failed, it will naturally use a lock_seqid of zero.  However the server,
which didn't release the lock owner, will expect a larger lock_seqid and
so will respond with NFS4ERR_BAD_SEQID.

So clearly it is harmful to allow a false positive, which testing
so_count allows.

The test is nonsense because ... well... it doesn't mean anything.

so_count is the sum of three different counts.
1/ the set of states listed on so_stateids
2/ the set of active vfs locks owned by any of those states
3/ various transient counts such as for conflicting locks.

When it is tested against '2' it is clear that one of these is the
transient reference obtained by find_lockowner_str_locked().  It is not
clear what the other one is expected to be.

In practice, the count is often 2 because there is precisely one state
on so_stateids.  If there were more, this would fail.

It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
all the lock states being removed, and so the lockowner being discarded
(it is removed when there are no more references which usually happens
when the lock state is discarded).  When nfsd4_release_lockowner() finds
that the lock owner doesn't exist, it returns success.

The other case shows an so_count of '2' and precisely one state listed
in so_stateid.  It appears that the Linux client uses a separate lock
owner for each file resulting in one lock state per lock owner, so this
test on '2' is safe.  For another client it might not be safe.

So this patch changes check_for_locks() to use the (newish)
find_any_file_locked() so that it doesn't take a reference on the
nfs4_file and so never calls nfsd_file_put(), and so never sleeps.  With
this check is it safe to restore the use of check_for_locks() rather
than testing so_count against the mysterious '2'.

Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()")
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

Jeffrey Layton Jan. 22, 2024, 12:53 p.m. UTC | #1

On Mon, 2024-01-22 at 14:58 +1100, NeilBrown wrote:
> The test on so_count in nfsd4_release_lockowner() is nonsense and
> harmful.  Revert to using check_for_locks(), changing that to not sleep.
> 
> First: harmful.
> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
> test on so_count can transiently return a false positive resulting in a
> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
> clearly a protocol violation and with the Linux NFS client it can cause
> incorrect behaviour.
> 
> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
> processing a LOCK request which failed because, at the time that request
> was received, the given owner held a conflicting lock, then the nfsd
> thread processing that LOCK request can hold a reference (conflock) to
> the lock owner that causes nfsd4_release_lockowner() to return an
> incorrect error.
> 
> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
> it knows that the error is impossible.  It assumes the lock owner was in
> fact released so it feels free to use the same lock owner identifier in
> some later locking request.
> 
> When it does reuse a lock owner identifier for which a previous RELEASE
> failed, it will naturally use a lock_seqid of zero.  However the server,
> which didn't release the lock owner, will expect a larger lock_seqid and
> so will respond with NFS4ERR_BAD_SEQID.
> 
> So clearly it is harmful to allow a false positive, which testing
> so_count allows.
> 
> The test is nonsense because ... well... it doesn't mean anything.
> 
> so_count is the sum of three different counts.
> 1/ the set of states listed on so_stateids
> 2/ the set of active vfs locks owned by any of those states
> 3/ various transient counts such as for conflicting locks.
> 
> When it is tested against '2' it is clear that one of these is the
> transient reference obtained by find_lockowner_str_locked().  It is not
> clear what the other one is expected to be.
> 
> In practice, the count is often 2 because there is precisely one state
> on so_stateids.  If there were more, this would fail.
> 
> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
> all the lock states being removed, and so the lockowner being discarded
> (it is removed when there are no more references which usually happens
> when the lock state is discarded).  When nfsd4_release_lockowner() finds
> that the lock owner doesn't exist, it returns success.
> 
> The other case shows an so_count of '2' and precisely one state listed
> in so_stateid.  It appears that the Linux client uses a separate lock
> owner for each file resulting in one lock state per lock owner, so this
> test on '2' is safe.  For another client it might not be safe.
> 
> So this patch changes check_for_locks() to use the (newish)
> find_any_file_locked() so that it doesn't take a reference on the
> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.  With
> this check is it safe to restore the use of check_for_locks() rather
> than testing so_count against the mysterious '2'.
> 
> Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()")
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 2fa54cfd4882..6dc6340e2852 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -7911,14 +7911,16 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
>  {
>  	struct file_lock *fl;
>  	int status = false;
> -	struct nfsd_file *nf = find_any_file(fp);
> +	struct nfsd_file *nf;
>  	struct inode *inode;
>  	struct file_lock_context *flctx;
>  
> +	spin_lock(&fp->fi_lock);
> +	nf = find_any_file_locked(fp);
>  	if (!nf) {
>  		/* Any valid lock stateid should have some sort of access */
>  		WARN_ON_ONCE(1);
> -		return status;
> +		goto out;
>  	}
>  
>  	inode = file_inode(nf->nf_file);
> @@ -7934,7 +7936,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
>  		}
>  		spin_unlock(&flctx->flc_lock);
>  	}
> -	nfsd_file_put(nf);
> +out:
> +	spin_unlock(&fp->fi_lock);
>  	return status;
>  }
>  

^^^
Nice cleanup here! Not having to take a reference in this path is great.

> @@ -7944,10 +7947,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
>   * @cstate: NFSv4 COMPOUND state
>   * @u: RELEASE_LOCKOWNER arguments
>   *
> - * The lockowner's so_count is bumped when a lock record is added
> - * or when copying a conflicting lock. The latter case is brief,
> - * but can lead to fleeting false positives when looking for
> - * locks-in-use.
> + * Check if theree are any locks still held and if not - free the lockowner
> + * and any lock state that is owned.
>   *
>   * Return values:
>   *   %nfs_ok: lockowner released or not found
> @@ -7983,10 +7984,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
>  		spin_unlock(&clp->cl_lock);
>  		return nfs_ok;
>  	}
> -	if (atomic_read(&lo->lo_owner.so_count) != 2) {
> -		spin_unlock(&clp->cl_lock);
> -		nfs4_put_stateowner(&lo->lo_owner);
> -		return nfserr_locks_held;
> +
> +	list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) {
> +		if (check_for_locks(stp->st_stid.sc_file, lo)) {
> +			spin_unlock(&clp->cl_lock);
> +			nfs4_put_stateowner(&lo->lo_owner);
> +			return nfserr_locks_held;
> +		}
>  	}
>  	unhash_lockowner_locked(lo);
>  	while (!list_empty(&lo->lo_owner.so_stateids)) {


Anytime I see codepaths checking reference counts for specific values,
that's always a red flag to me, and I've hated this particular so_count
check since we added it several years ago.

This is a much better solution.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

Chuck Lever Jan. 22, 2024, 2:27 p.m. UTC | #2

On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
> 
> The test on so_count in nfsd4_release_lockowner() is nonsense and
> harmful.  Revert to using check_for_locks(), changing that to not sleep.
> 
> First: harmful.
> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
> test on so_count can transiently return a false positive resulting in a
> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
> clearly a protocol violation and with the Linux NFS client it can cause
> incorrect behaviour.
> 
> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
> processing a LOCK request which failed because, at the time that request
> was received, the given owner held a conflicting lock, then the nfsd
> thread processing that LOCK request can hold a reference (conflock) to
> the lock owner that causes nfsd4_release_lockowner() to return an
> incorrect error.
> 
> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
> it knows that the error is impossible.  It assumes the lock owner was in
> fact released so it feels free to use the same lock owner identifier in
> some later locking request.
> 
> When it does reuse a lock owner identifier for which a previous RELEASE
> failed, it will naturally use a lock_seqid of zero.  However the server,
> which didn't release the lock owner, will expect a larger lock_seqid and
> so will respond with NFS4ERR_BAD_SEQID.
> 
> So clearly it is harmful to allow a false positive, which testing
> so_count allows.
> 
> The test is nonsense because ... well... it doesn't mean anything.
> 
> so_count is the sum of three different counts.
> 1/ the set of states listed on so_stateids
> 2/ the set of active vfs locks owned by any of those states
> 3/ various transient counts such as for conflicting locks.
> 
> When it is tested against '2' it is clear that one of these is the
> transient reference obtained by find_lockowner_str_locked().  It is not
> clear what the other one is expected to be.
> 
> In practice, the count is often 2 because there is precisely one state
> on so_stateids.  If there were more, this would fail.
> 
> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
> all the lock states being removed, and so the lockowner being discarded
> (it is removed when there are no more references which usually happens
> when the lock state is discarded).  When nfsd4_release_lockowner() finds
> that the lock owner doesn't exist, it returns success.
> 
> The other case shows an so_count of '2' and precisely one state listed
> in so_stateid.  It appears that the Linux client uses a separate lock
> owner for each file resulting in one lock state per lock owner, so this
> test on '2' is safe.  For another client it might not be safe.
> 
> So this patch changes check_for_locks() to use the (newish)
> find_any_file_locked() so that it doesn't take a reference on the
> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.

More to the point, find_any_file_locked() was added by commit
e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
seqfile display"), which was merged several months /after/ commit
ce3c4ad7f4ce ("NFSD: Fix possible sleep during
nfsd4_release_lockowner()").

Not having to deal with nfsd_file_put() in check_for_locks is a Good
Thing.

Am I correct in observing that the new check_for_locks() is the only
place where flc_lock and fi_lock are held concurrently?


> With
> this check is it safe to restore the use of check_for_locks() rather
> than testing so_count against the mysterious '2'.
> 
> Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()")
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 2fa54cfd4882..6dc6340e2852 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -7911,14 +7911,16 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
>  {
>  	struct file_lock *fl;
>  	int status = false;
> -	struct nfsd_file *nf = find_any_file(fp);
> +	struct nfsd_file *nf;
>  	struct inode *inode;
>  	struct file_lock_context *flctx;
>  
> +	spin_lock(&fp->fi_lock);
> +	nf = find_any_file_locked(fp);
>  	if (!nf) {
>  		/* Any valid lock stateid should have some sort of access */
>  		WARN_ON_ONCE(1);
> -		return status;
> +		goto out;
>  	}
>  
>  	inode = file_inode(nf->nf_file);
> @@ -7934,7 +7936,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
>  		}
>  		spin_unlock(&flctx->flc_lock);
>  	}
> -	nfsd_file_put(nf);
> +out:
> +	spin_unlock(&fp->fi_lock);
>  	return status;
>  }
>  
> @@ -7944,10 +7947,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
>   * @cstate: NFSv4 COMPOUND state
>   * @u: RELEASE_LOCKOWNER arguments
>   *
> - * The lockowner's so_count is bumped when a lock record is added
> - * or when copying a conflicting lock. The latter case is brief,
> - * but can lead to fleeting false positives when looking for
> - * locks-in-use.
> + * Check if theree are any locks still held and if not - free the lockowner
> + * and any lock state that is owned.
>   *
>   * Return values:
>   *   %nfs_ok: lockowner released or not found
> @@ -7983,10 +7984,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
>  		spin_unlock(&clp->cl_lock);
>  		return nfs_ok;
>  	}
> -	if (atomic_read(&lo->lo_owner.so_count) != 2) {
> -		spin_unlock(&clp->cl_lock);
> -		nfs4_put_stateowner(&lo->lo_owner);
> -		return nfserr_locks_held;
> +
> +	list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) {
> +		if (check_for_locks(stp->st_stid.sc_file, lo)) {
> +			spin_unlock(&clp->cl_lock);
> +			nfs4_put_stateowner(&lo->lo_owner);
> +			return nfserr_locks_held;
> +		}
>  	}
>  	unhash_lockowner_locked(lo);
>  	while (!list_empty(&lo->lo_owner.so_stateids)) {
> -- 
> 2.43.0
>

NeilBrown Jan. 22, 2024, 9:57 p.m. UTC | #3

On Tue, 23 Jan 2024, Chuck Lever wrote:
> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
> > 
> > The test on so_count in nfsd4_release_lockowner() is nonsense and
> > harmful.  Revert to using check_for_locks(), changing that to not sleep.
> > 
> > First: harmful.
> > As is documented in the kdoc comment for nfsd4_release_lockowner(), the
> > test on so_count can transiently return a false positive resulting in a
> > return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
> > clearly a protocol violation and with the Linux NFS client it can cause
> > incorrect behaviour.
> > 
> > If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
> > processing a LOCK request which failed because, at the time that request
> > was received, the given owner held a conflicting lock, then the nfsd
> > thread processing that LOCK request can hold a reference (conflock) to
> > the lock owner that causes nfsd4_release_lockowner() to return an
> > incorrect error.
> > 
> > The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
> > never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
> > it knows that the error is impossible.  It assumes the lock owner was in
> > fact released so it feels free to use the same lock owner identifier in
> > some later locking request.
> > 
> > When it does reuse a lock owner identifier for which a previous RELEASE
> > failed, it will naturally use a lock_seqid of zero.  However the server,
> > which didn't release the lock owner, will expect a larger lock_seqid and
> > so will respond with NFS4ERR_BAD_SEQID.
> > 
> > So clearly it is harmful to allow a false positive, which testing
> > so_count allows.
> > 
> > The test is nonsense because ... well... it doesn't mean anything.
> > 
> > so_count is the sum of three different counts.
> > 1/ the set of states listed on so_stateids
> > 2/ the set of active vfs locks owned by any of those states
> > 3/ various transient counts such as for conflicting locks.
> > 
> > When it is tested against '2' it is clear that one of these is the
> > transient reference obtained by find_lockowner_str_locked().  It is not
> > clear what the other one is expected to be.
> > 
> > In practice, the count is often 2 because there is precisely one state
> > on so_stateids.  If there were more, this would fail.
> > 
> > It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
> > In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
> > all the lock states being removed, and so the lockowner being discarded
> > (it is removed when there are no more references which usually happens
> > when the lock state is discarded).  When nfsd4_release_lockowner() finds
> > that the lock owner doesn't exist, it returns success.
> > 
> > The other case shows an so_count of '2' and precisely one state listed
> > in so_stateid.  It appears that the Linux client uses a separate lock
> > owner for each file resulting in one lock state per lock owner, so this
> > test on '2' is safe.  For another client it might not be safe.
> > 
> > So this patch changes check_for_locks() to use the (newish)
> > find_any_file_locked() so that it doesn't take a reference on the
> > nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
> 
> More to the point, find_any_file_locked() was added by commit
> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
> seqfile display"), which was merged several months /after/ commit
> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
> nfsd4_release_lockowner()").

Yes.  To flesh out the history:
nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
would never sleep.  However the problem patch was backported 4.9, 4.14,
and 4.19 and should be reverted.

find_any_file_locked() was added in v6.2 so when this patch is
backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
find_and_file_locked()

The patch should apply unchanged to stable kernels 6.2 and later.


> 
> Not having to deal with nfsd_file_put() in check_for_locks is a Good
> Thing.

:-)
Makes me wonder if there is anywhere is were we don't want
nfsd_file_put() ... but I cannot find any obvious candidates for change.

> 
> Am I correct in observing that the new check_for_locks() is the only
> place where flc_lock and fi_lock are held concurrently?

That is what I see - yes.
fi_lock is taken inside cl_lock elsewhere, and we preserve the ordering
in this patch.
I cannot see that any nfsd locks are taken when flc_lock is held, so it
is safe to take it while fi_lock and cl_lock are held.

Thanks,
NeilBrown

Chuck Lever Jan. 22, 2024, 11:14 p.m. UTC | #4

> On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Tue, 23 Jan 2024, Chuck Lever wrote:
>> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
>>> 
>>> The test on so_count in nfsd4_release_lockowner() is nonsense and
>>> harmful.  Revert to using check_for_locks(), changing that to not sleep.
>>> 
>>> First: harmful.
>>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
>>> test on so_count can transiently return a false positive resulting in a
>>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
>>> clearly a protocol violation and with the Linux NFS client it can cause
>>> incorrect behaviour.
>>> 
>>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
>>> processing a LOCK request which failed because, at the time that request
>>> was received, the given owner held a conflicting lock, then the nfsd
>>> thread processing that LOCK request can hold a reference (conflock) to
>>> the lock owner that causes nfsd4_release_lockowner() to return an
>>> incorrect error.
>>> 
>>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
>>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
>>> it knows that the error is impossible.  It assumes the lock owner was in
>>> fact released so it feels free to use the same lock owner identifier in
>>> some later locking request.
>>> 
>>> When it does reuse a lock owner identifier for which a previous RELEASE
>>> failed, it will naturally use a lock_seqid of zero.  However the server,
>>> which didn't release the lock owner, will expect a larger lock_seqid and
>>> so will respond with NFS4ERR_BAD_SEQID.
>>> 
>>> So clearly it is harmful to allow a false positive, which testing
>>> so_count allows.
>>> 
>>> The test is nonsense because ... well... it doesn't mean anything.
>>> 
>>> so_count is the sum of three different counts.
>>> 1/ the set of states listed on so_stateids
>>> 2/ the set of active vfs locks owned by any of those states
>>> 3/ various transient counts such as for conflicting locks.
>>> 
>>> When it is tested against '2' it is clear that one of these is the
>>> transient reference obtained by find_lockowner_str_locked().  It is not
>>> clear what the other one is expected to be.
>>> 
>>> In practice, the count is often 2 because there is precisely one state
>>> on so_stateids.  If there were more, this would fail.
>>> 
>>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
>>> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
>>> all the lock states being removed, and so the lockowner being discarded
>>> (it is removed when there are no more references which usually happens
>>> when the lock state is discarded).  When nfsd4_release_lockowner() finds
>>> that the lock owner doesn't exist, it returns success.
>>> 
>>> The other case shows an so_count of '2' and precisely one state listed
>>> in so_stateid.  It appears that the Linux client uses a separate lock
>>> owner for each file resulting in one lock state per lock owner, so this
>>> test on '2' is safe.  For another client it might not be safe.
>>> 
>>> So this patch changes check_for_locks() to use the (newish)
>>> find_any_file_locked() so that it doesn't take a reference on the
>>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
>> 
>> More to the point, find_any_file_locked() was added by commit
>> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
>> seqfile display"), which was merged several months /after/ commit
>> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
>> nfsd4_release_lockowner()").
> 
> Yes.  To flesh out the history:
> nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
> would never sleep.  However the problem patch was backported 4.9, 4.14,
> and 4.19 and should be reverted.

I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
in any of those kernels. All but 4.19 are now EOL.


> find_any_file_locked() was added in v6.2 so when this patch is
> backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
> find_and_file_locked()

I think I'd rather leave those unperturbed until someone hits a real
problem. Unless you have a distribution kernel that needs to see
this fix in one of the LTS kernels? The supported stable/LTS kernels
are 5.4, 5.10, 5.15, and 6.1.


> The patch should apply unchanged to stable kernels 6.2 and later.

I can add a Cc: stable #v6.2+


--
Chuck Lever

Chuck Lever Jan. 22, 2024, 11:18 p.m. UTC | #5

> On Jan 22, 2024, at 6:14 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@suse.de> wrote:
>> 
>> On Tue, 23 Jan 2024, Chuck Lever wrote:
>>> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
>>>> 
>>>> The test on so_count in nfsd4_release_lockowner() is nonsense and
>>>> harmful.  Revert to using check_for_locks(), changing that to not sleep.
>>>> 
>>>> First: harmful.
>>>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
>>>> test on so_count can transiently return a false positive resulting in a
>>>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
>>>> clearly a protocol violation and with the Linux NFS client it can cause
>>>> incorrect behaviour.
>>>> 
>>>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
>>>> processing a LOCK request which failed because, at the time that request
>>>> was received, the given owner held a conflicting lock, then the nfsd
>>>> thread processing that LOCK request can hold a reference (conflock) to
>>>> the lock owner that causes nfsd4_release_lockowner() to return an
>>>> incorrect error.
>>>> 
>>>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
>>>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
>>>> it knows that the error is impossible.  It assumes the lock owner was in
>>>> fact released so it feels free to use the same lock owner identifier in
>>>> some later locking request.
>>>> 
>>>> When it does reuse a lock owner identifier for which a previous RELEASE
>>>> failed, it will naturally use a lock_seqid of zero.  However the server,
>>>> which didn't release the lock owner, will expect a larger lock_seqid and
>>>> so will respond with NFS4ERR_BAD_SEQID.
>>>> 
>>>> So clearly it is harmful to allow a false positive, which testing
>>>> so_count allows.
>>>> 
>>>> The test is nonsense because ... well... it doesn't mean anything.
>>>> 
>>>> so_count is the sum of three different counts.
>>>> 1/ the set of states listed on so_stateids
>>>> 2/ the set of active vfs locks owned by any of those states
>>>> 3/ various transient counts such as for conflicting locks.
>>>> 
>>>> When it is tested against '2' it is clear that one of these is the
>>>> transient reference obtained by find_lockowner_str_locked().  It is not
>>>> clear what the other one is expected to be.
>>>> 
>>>> In practice, the count is often 2 because there is precisely one state
>>>> on so_stateids.  If there were more, this would fail.
>>>> 
>>>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
>>>> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
>>>> all the lock states being removed, and so the lockowner being discarded
>>>> (it is removed when there are no more references which usually happens
>>>> when the lock state is discarded).  When nfsd4_release_lockowner() finds
>>>> that the lock owner doesn't exist, it returns success.
>>>> 
>>>> The other case shows an so_count of '2' and precisely one state listed
>>>> in so_stateid.  It appears that the Linux client uses a separate lock
>>>> owner for each file resulting in one lock state per lock owner, so this
>>>> test on '2' is safe.  For another client it might not be safe.
>>>> 
>>>> So this patch changes check_for_locks() to use the (newish)
>>>> find_any_file_locked() so that it doesn't take a reference on the
>>>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
>>> 
>>> More to the point, find_any_file_locked() was added by commit
>>> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
>>> seqfile display"), which was merged several months /after/ commit
>>> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
>>> nfsd4_release_lockowner()").
>> 
>> Yes.  To flesh out the history:
>> nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
>> would never sleep.  However the problem patch was backported 4.9, 4.14,
>> and 4.19 and should be reverted.
> 
> I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
> in any of those kernels. All but 4.19 are now EOL.

OK, I see it now. I'll ask stable to remove it from v4.19.y.


>> find_any_file_locked() was added in v6.2 so when this patch is
>> backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
>> find_and_file_locked()
> 
> I think I'd rather leave those unperturbed until someone hits a real
> problem. Unless you have a distribution kernel that needs to see
> this fix in one of the LTS kernels? The supported stable/LTS kernels
> are 5.4, 5.10, 5.15, and 6.1.
> 
> 
>> The patch should apply unchanged to stable kernels 6.2 and later.
> 
> I can add a Cc: stable #v6.2+
> 
> 
> --
> Chuck Lever


--
Chuck Lever

NeilBrown Jan. 22, 2024, 11:31 p.m. UTC | #6

On Tue, 23 Jan 2024, Chuck Lever III wrote:
> 
> 
> > On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Tue, 23 Jan 2024, Chuck Lever wrote:
> >> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
> >>> 
> >>> The test on so_count in nfsd4_release_lockowner() is nonsense and
> >>> harmful.  Revert to using check_for_locks(), changing that to not sleep.
> >>> 
> >>> First: harmful.
> >>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
> >>> test on so_count can transiently return a false positive resulting in a
> >>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
> >>> clearly a protocol violation and with the Linux NFS client it can cause
> >>> incorrect behaviour.
> >>> 
> >>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
> >>> processing a LOCK request which failed because, at the time that request
> >>> was received, the given owner held a conflicting lock, then the nfsd
> >>> thread processing that LOCK request can hold a reference (conflock) to
> >>> the lock owner that causes nfsd4_release_lockowner() to return an
> >>> incorrect error.
> >>> 
> >>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
> >>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
> >>> it knows that the error is impossible.  It assumes the lock owner was in
> >>> fact released so it feels free to use the same lock owner identifier in
> >>> some later locking request.
> >>> 
> >>> When it does reuse a lock owner identifier for which a previous RELEASE
> >>> failed, it will naturally use a lock_seqid of zero.  However the server,
> >>> which didn't release the lock owner, will expect a larger lock_seqid and
> >>> so will respond with NFS4ERR_BAD_SEQID.
> >>> 
> >>> So clearly it is harmful to allow a false positive, which testing
> >>> so_count allows.
> >>> 
> >>> The test is nonsense because ... well... it doesn't mean anything.
> >>> 
> >>> so_count is the sum of three different counts.
> >>> 1/ the set of states listed on so_stateids
> >>> 2/ the set of active vfs locks owned by any of those states
> >>> 3/ various transient counts such as for conflicting locks.
> >>> 
> >>> When it is tested against '2' it is clear that one of these is the
> >>> transient reference obtained by find_lockowner_str_locked().  It is not
> >>> clear what the other one is expected to be.
> >>> 
> >>> In practice, the count is often 2 because there is precisely one state
> >>> on so_stateids.  If there were more, this would fail.
> >>> 
> >>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
> >>> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
> >>> all the lock states being removed, and so the lockowner being discarded
> >>> (it is removed when there are no more references which usually happens
> >>> when the lock state is discarded).  When nfsd4_release_lockowner() finds
> >>> that the lock owner doesn't exist, it returns success.
> >>> 
> >>> The other case shows an so_count of '2' and precisely one state listed
> >>> in so_stateid.  It appears that the Linux client uses a separate lock
> >>> owner for each file resulting in one lock state per lock owner, so this
> >>> test on '2' is safe.  For another client it might not be safe.
> >>> 
> >>> So this patch changes check_for_locks() to use the (newish)
> >>> find_any_file_locked() so that it doesn't take a reference on the
> >>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
> >> 
> >> More to the point, find_any_file_locked() was added by commit
> >> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
> >> seqfile display"), which was merged several months /after/ commit
> >> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
> >> nfsd4_release_lockowner()").
> > 
> > Yes.  To flesh out the history:
> > nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
> > would never sleep.  However the problem patch was backported 4.9, 4.14,
> > and 4.19 and should be reverted.
> 
> I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
> in any of those kernels. All but 4.19 are now EOL.

I hadn't checked which were EOL.  Thanks for finding the 4.19 patch and
requesting a revert.

> 
> 
> > find_any_file_locked() was added in v6.2 so when this patch is
> > backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
> > find_and_file_locked()
> 
> I think I'd rather leave those unperturbed until someone hits a real
> problem. Unless you have a distribution kernel that needs to see
> this fix in one of the LTS kernels? The supported stable/LTS kernels
> are 5.4, 5.10, 5.15, and 6.1.

Why not fix the bug?  It's a real bug that a real customer really hit.
I've fixed it in all SLE kernels - we don't depend on LTS though we do
make use of the stable trees in various ways.

But it's your call.

Thanks,
NeilBrown


> 
> 
> > The patch should apply unchanged to stable kernels 6.2 and later.
> 
> I can add a Cc: stable #v6.2+
> 
> 
> --
> Chuck Lever
> 
> 
>

Chuck Lever Jan. 22, 2024, 11:44 p.m. UTC | #7

> On Jan 22, 2024, at 6:31 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Tue, 23 Jan 2024, Chuck Lever III wrote:
>> 
>> 
>>> On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> On Tue, 23 Jan 2024, Chuck Lever wrote:
>>>> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
>>>>> 
>>>>> The test on so_count in nfsd4_release_lockowner() is nonsense and
>>>>> harmful.  Revert to using check_for_locks(), changing that to not sleep.
>>>>> 
>>>>> First: harmful.
>>>>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
>>>>> test on so_count can transiently return a false positive resulting in a
>>>>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
>>>>> clearly a protocol violation and with the Linux NFS client it can cause
>>>>> incorrect behaviour.
>>>>> 
>>>>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
>>>>> processing a LOCK request which failed because, at the time that request
>>>>> was received, the given owner held a conflicting lock, then the nfsd
>>>>> thread processing that LOCK request can hold a reference (conflock) to
>>>>> the lock owner that causes nfsd4_release_lockowner() to return an
>>>>> incorrect error.
>>>>> 
>>>>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
>>>>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
>>>>> it knows that the error is impossible.  It assumes the lock owner was in
>>>>> fact released so it feels free to use the same lock owner identifier in
>>>>> some later locking request.
>>>>> 
>>>>> When it does reuse a lock owner identifier for which a previous RELEASE
>>>>> failed, it will naturally use a lock_seqid of zero.  However the server,
>>>>> which didn't release the lock owner, will expect a larger lock_seqid and
>>>>> so will respond with NFS4ERR_BAD_SEQID.
>>>>> 
>>>>> So clearly it is harmful to allow a false positive, which testing
>>>>> so_count allows.
>>>>> 
>>>>> The test is nonsense because ... well... it doesn't mean anything.
>>>>> 
>>>>> so_count is the sum of three different counts.
>>>>> 1/ the set of states listed on so_stateids
>>>>> 2/ the set of active vfs locks owned by any of those states
>>>>> 3/ various transient counts such as for conflicting locks.
>>>>> 
>>>>> When it is tested against '2' it is clear that one of these is the
>>>>> transient reference obtained by find_lockowner_str_locked().  It is not
>>>>> clear what the other one is expected to be.
>>>>> 
>>>>> In practice, the count is often 2 because there is precisely one state
>>>>> on so_stateids.  If there were more, this would fail.
>>>>> 
>>>>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
>>>>> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
>>>>> all the lock states being removed, and so the lockowner being discarded
>>>>> (it is removed when there are no more references which usually happens
>>>>> when the lock state is discarded).  When nfsd4_release_lockowner() finds
>>>>> that the lock owner doesn't exist, it returns success.
>>>>> 
>>>>> The other case shows an so_count of '2' and precisely one state listed
>>>>> in so_stateid.  It appears that the Linux client uses a separate lock
>>>>> owner for each file resulting in one lock state per lock owner, so this
>>>>> test on '2' is safe.  For another client it might not be safe.
>>>>> 
>>>>> So this patch changes check_for_locks() to use the (newish)
>>>>> find_any_file_locked() so that it doesn't take a reference on the
>>>>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
>>>> 
>>>> More to the point, find_any_file_locked() was added by commit
>>>> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
>>>> seqfile display"), which was merged several months /after/ commit
>>>> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
>>>> nfsd4_release_lockowner()").
>>> 
>>> Yes.  To flesh out the history:
>>> nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
>>> would never sleep.  However the problem patch was backported 4.9, 4.14,
>>> and 4.19 and should be reverted.
>> 
>> I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
>> in any of those kernels. All but 4.19 are now EOL.
> 
> I hadn't checked which were EOL.  Thanks for finding the 4.19 patch and
> requesting a revert.
> 
>> 
>> 
>>> find_any_file_locked() was added in v6.2 so when this patch is
>>> backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
>>> find_and_file_locked()
>> 
>> I think I'd rather leave those unperturbed until someone hits a real
>> problem. Unless you have a distribution kernel that needs to see
>> this fix in one of the LTS kernels? The supported stable/LTS kernels
>> are 5.4, 5.10, 5.15, and 6.1.
> 
> Why not fix the bug?  It's a real bug that a real customer really hit.

That's what I'm asking. Was there a real customer issue? Because
ce3c4ad7f4ce was the result of a might_sleep splat, not an issue
in the field.

Since this hit a real user, is there a BugLink: or Closes: tag
I can add?


> I've fixed it in all SLE kernels - we don't depend on LTS though we do
> make use of the stable trees in various ways.
> 
> But it's your call.

Well, it's not my call whether it's backported. Anyone can ask.
It /is/ my call if I do the work.

Mostly the issue is having the time to manage 5-6 stable
kernels as well as upstream development. I'd like to ensure
that any patches for which we manually request a backport have
been applied and tested by someone with a real NFS rig, and
there is a real problem getting addressed.

It's getting better with kdevops... I can cherry pick any
recent kernel and run it through a number of tests without
needing to hand-hold. Haven't tried that with the older stable
kernels, though; those require older compilers and what-not.


--
Chuck Lever

NeilBrown Jan. 23, 2024, 12:18 a.m. UTC | #8

On Tue, 23 Jan 2024, Chuck Lever III wrote:
> 
> 
> > On Jan 22, 2024, at 6:31 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Tue, 23 Jan 2024, Chuck Lever III wrote:
> >> 
> >> 
> >>> On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@suse.de> wrote:
> >>> 
> >>> On Tue, 23 Jan 2024, Chuck Lever wrote:
> >>>> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
> >>>>> 
> >>>>> The test on so_count in nfsd4_release_lockowner() is nonsense and
> >>>>> harmful.  Revert to using check_for_locks(), changing that to not sleep.
> >>>>> 
> >>>>> First: harmful.
> >>>>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
> >>>>> test on so_count can transiently return a false positive resulting in a
> >>>>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
> >>>>> clearly a protocol violation and with the Linux NFS client it can cause
> >>>>> incorrect behaviour.
> >>>>> 
> >>>>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
> >>>>> processing a LOCK request which failed because, at the time that request
> >>>>> was received, the given owner held a conflicting lock, then the nfsd
> >>>>> thread processing that LOCK request can hold a reference (conflock) to
> >>>>> the lock owner that causes nfsd4_release_lockowner() to return an
> >>>>> incorrect error.
> >>>>> 
> >>>>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
> >>>>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
> >>>>> it knows that the error is impossible.  It assumes the lock owner was in
> >>>>> fact released so it feels free to use the same lock owner identifier in
> >>>>> some later locking request.
> >>>>> 
> >>>>> When it does reuse a lock owner identifier for which a previous RELEASE
> >>>>> failed, it will naturally use a lock_seqid of zero.  However the server,
> >>>>> which didn't release the lock owner, will expect a larger lock_seqid and
> >>>>> so will respond with NFS4ERR_BAD_SEQID.
> >>>>> 
> >>>>> So clearly it is harmful to allow a false positive, which testing
> >>>>> so_count allows.
> >>>>> 
> >>>>> The test is nonsense because ... well... it doesn't mean anything.
> >>>>> 
> >>>>> so_count is the sum of three different counts.
> >>>>> 1/ the set of states listed on so_stateids
> >>>>> 2/ the set of active vfs locks owned by any of those states
> >>>>> 3/ various transient counts such as for conflicting locks.
> >>>>> 
> >>>>> When it is tested against '2' it is clear that one of these is the
> >>>>> transient reference obtained by find_lockowner_str_locked().  It is not
> >>>>> clear what the other one is expected to be.
> >>>>> 
> >>>>> In practice, the count is often 2 because there is precisely one state
> >>>>> on so_stateids.  If there were more, this would fail.
> >>>>> 
> >>>>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
> >>>>> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
> >>>>> all the lock states being removed, and so the lockowner being discarded
> >>>>> (it is removed when there are no more references which usually happens
> >>>>> when the lock state is discarded).  When nfsd4_release_lockowner() finds
> >>>>> that the lock owner doesn't exist, it returns success.
> >>>>> 
> >>>>> The other case shows an so_count of '2' and precisely one state listed
> >>>>> in so_stateid.  It appears that the Linux client uses a separate lock
> >>>>> owner for each file resulting in one lock state per lock owner, so this
> >>>>> test on '2' is safe.  For another client it might not be safe.
> >>>>> 
> >>>>> So this patch changes check_for_locks() to use the (newish)
> >>>>> find_any_file_locked() so that it doesn't take a reference on the
> >>>>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
> >>>> 
> >>>> More to the point, find_any_file_locked() was added by commit
> >>>> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
> >>>> seqfile display"), which was merged several months /after/ commit
> >>>> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
> >>>> nfsd4_release_lockowner()").
> >>> 
> >>> Yes.  To flesh out the history:
> >>> nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
> >>> would never sleep.  However the problem patch was backported 4.9, 4.14,
> >>> and 4.19 and should be reverted.
> >> 
> >> I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
> >> in any of those kernels. All but 4.19 are now EOL.
> > 
> > I hadn't checked which were EOL.  Thanks for finding the 4.19 patch and
> > requesting a revert.
> > 
> >> 
> >> 
> >>> find_any_file_locked() was added in v6.2 so when this patch is
> >>> backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
> >>> find_and_file_locked()
> >> 
> >> I think I'd rather leave those unperturbed until someone hits a real
> >> problem. Unless you have a distribution kernel that needs to see
> >> this fix in one of the LTS kernels? The supported stable/LTS kernels
> >> are 5.4, 5.10, 5.15, and 6.1.
> > 
> > Why not fix the bug?  It's a real bug that a real customer really hit.
> 
> That's what I'm asking. Was there a real customer issue? Because
> ce3c4ad7f4ce was the result of a might_sleep splat, not an issue
> in the field.
> 
> Since this hit a real user, is there a BugLink: or Closes: tag
> I can add?

Unfortunately not.  Our bugs for paying customers are private - so they
can feel comfortable providing all the details we need.
The link is https://bugzilla.suse.com/show_bug.cgi?id=1218968
that won't help anyone outside SUSE.

But definitely a real bug.

> 
> 
> > I've fixed it in all SLE kernels - we don't depend on LTS though we do
> > make use of the stable trees in various ways.
> > 
> > But it's your call.
> 
> Well, it's not my call whether it's backported. Anyone can ask.
> It /is/ my call if I do the work.

That's fair.  I'm happy to create the manual backport patch.  I'm not
going to test those kernels.  I generally assume that someone is testing
stable kernels - so obvious problems will be found by someone else, and
non-obvious problems I wouldn't find even if I did some testing :-(

Thanks,
NeilBrown


> 
> Mostly the issue is having the time to manage 5-6 stable
> kernels as well as upstream development. I'd like to ensure
> that any patches for which we manually request a backport have
> been applied and tested by someone with a real NFS rig, and
> there is a real problem getting addressed.
> 
> It's getting better with kdevops... I can cherry pick any
> recent kernel and run it through a number of tests without
> needing to hand-hold. Haven't tried that with the older stable
> kernels, though; those require older compilers and what-not.
> 
> 
> --
> Chuck Lever
> 
> 
>

Chuck Lever Jan. 23, 2024, 12:52 a.m. UTC | #9

> On Jan 22, 2024, at 7:18 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Tue, 23 Jan 2024, Chuck Lever III wrote:
>> 
>> 
>>> On Jan 22, 2024, at 6:31 PM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> On Tue, 23 Jan 2024, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@suse.de> wrote:
>>>>> 
>>>>> On Tue, 23 Jan 2024, Chuck Lever wrote:
>>>>>> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
>>>>>>> 
>>>>>>> The test on so_count in nfsd4_release_lockowner() is nonsense and
>>>>>>> harmful.  Revert to using check_for_locks(), changing that to not sleep.
>>>>>>> 
>>>>>>> First: harmful.
>>>>>>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
>>>>>>> test on so_count can transiently return a false positive resulting in a
>>>>>>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
>>>>>>> clearly a protocol violation and with the Linux NFS client it can cause
>>>>>>> incorrect behaviour.
>>>>>>> 
>>>>>>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
>>>>>>> processing a LOCK request which failed because, at the time that request
>>>>>>> was received, the given owner held a conflicting lock, then the nfsd
>>>>>>> thread processing that LOCK request can hold a reference (conflock) to
>>>>>>> the lock owner that causes nfsd4_release_lockowner() to return an
>>>>>>> incorrect error.
>>>>>>> 
>>>>>>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
>>>>>>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
>>>>>>> it knows that the error is impossible.  It assumes the lock owner was in
>>>>>>> fact released so it feels free to use the same lock owner identifier in
>>>>>>> some later locking request.
>>>>>>> 
>>>>>>> When it does reuse a lock owner identifier for which a previous RELEASE
>>>>>>> failed, it will naturally use a lock_seqid of zero.  However the server,
>>>>>>> which didn't release the lock owner, will expect a larger lock_seqid and
>>>>>>> so will respond with NFS4ERR_BAD_SEQID.
>>>>>>> 
>>>>>>> So clearly it is harmful to allow a false positive, which testing
>>>>>>> so_count allows.
>>>>>>> 
>>>>>>> The test is nonsense because ... well... it doesn't mean anything.
>>>>>>> 
>>>>>>> so_count is the sum of three different counts.
>>>>>>> 1/ the set of states listed on so_stateids
>>>>>>> 2/ the set of active vfs locks owned by any of those states
>>>>>>> 3/ various transient counts such as for conflicting locks.
>>>>>>> 
>>>>>>> When it is tested against '2' it is clear that one of these is the
>>>>>>> transient reference obtained by find_lockowner_str_locked().  It is not
>>>>>>> clear what the other one is expected to be.
>>>>>>> 
>>>>>>> In practice, the count is often 2 because there is precisely one state
>>>>>>> on so_stateids.  If there were more, this would fail.
>>>>>>> 
>>>>>>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
>>>>>>> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
>>>>>>> all the lock states being removed, and so the lockowner being discarded
>>>>>>> (it is removed when there are no more references which usually happens
>>>>>>> when the lock state is discarded).  When nfsd4_release_lockowner() finds
>>>>>>> that the lock owner doesn't exist, it returns success.
>>>>>>> 
>>>>>>> The other case shows an so_count of '2' and precisely one state listed
>>>>>>> in so_stateid.  It appears that the Linux client uses a separate lock
>>>>>>> owner for each file resulting in one lock state per lock owner, so this
>>>>>>> test on '2' is safe.  For another client it might not be safe.
>>>>>>> 
>>>>>>> So this patch changes check_for_locks() to use the (newish)
>>>>>>> find_any_file_locked() so that it doesn't take a reference on the
>>>>>>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
>>>>>> 
>>>>>> More to the point, find_any_file_locked() was added by commit
>>>>>> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
>>>>>> seqfile display"), which was merged several months /after/ commit
>>>>>> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
>>>>>> nfsd4_release_lockowner()").
>>>>> 
>>>>> Yes.  To flesh out the history:
>>>>> nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
>>>>> would never sleep.  However the problem patch was backported 4.9, 4.14,
>>>>> and 4.19 and should be reverted.
>>>> 
>>>> I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
>>>> in any of those kernels. All but 4.19 are now EOL.
>>> 
>>> I hadn't checked which were EOL.  Thanks for finding the 4.19 patch and
>>> requesting a revert.
>>> 
>>>> 
>>>> 
>>>>> find_any_file_locked() was added in v6.2 so when this patch is
>>>>> backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
>>>>> find_and_file_locked()
>>>> 
>>>> I think I'd rather leave those unperturbed until someone hits a real
>>>> problem. Unless you have a distribution kernel that needs to see
>>>> this fix in one of the LTS kernels? The supported stable/LTS kernels
>>>> are 5.4, 5.10, 5.15, and 6.1.
>>> 
>>> Why not fix the bug?  It's a real bug that a real customer really hit.
>> 
>> That's what I'm asking. Was there a real customer issue? Because
>> ce3c4ad7f4ce was the result of a might_sleep splat, not an issue
>> in the field.
>> 
>> Since this hit a real user, is there a BugLink: or Closes: tag
>> I can add?
> 
> Unfortunately not. Our bugs for paying customers are private - so they
> can feel comfortable providing all the details we need.
> The link is https://bugzilla.suse.com/show_bug.cgi?id=1218968
> that won't help anyone outside SUSE.

Well, that link does show that there's a real bug report behind
the fix. That part wasn't clear from your lengthy patch
description, which I otherwise appreciated.


> But definitely a real bug.

Understood. That makes it a priority to backport.


>>> I've fixed it in all SLE kernels - we don't depend on LTS though we do
>>> make use of the stable trees in various ways.
>>> 
>>> But it's your call.
>> 
>> Well, it's not my call whether it's backported. Anyone can ask.
>> It /is/ my call if I do the work.
> 
> That's fair.  I'm happy to create the manual backport patch.  I'm not
> going to test those kernels.

Testing is the resource-bound part, fwiw.


> I generally assume that someone is testing
> stable kernels - so obvious problems will be found by someone else, and
> non-obvious problems I wouldn't find even if I did some testing :-(

Stable kernels don't get any specific testing of subsystem code,
unless we do it ourselves. I will try to get to it soon.


> Thanks,
> NeilBrown
> 
> 
>> 
>> Mostly the issue is having the time to manage 5-6 stable
>> kernels as well as upstream development. I'd like to ensure
>> that any patches for which we manually request a backport have
>> been applied and tested by someone with a real NFS rig, and
>> there is a real problem getting addressed.
>> 
>> It's getting better with kdevops... I can cherry pick any
>> recent kernel and run it through a number of tests without
>> needing to hand-hold. Haven't tried that with the older stable
>> kernels, though; those require older compilers and what-not.
>> 
>> 
>> --
>> Chuck Lever


--
Chuck Lever

nfsd: fix RELEASE_LOCKOWNER

Commit Message

Comments

Patch