diff mbox series

nfsd: Fix race to FREE_STATEID and cl_revoked

Message ID c0fe2b35900938943048340ef70fc12282fe1af8.1691160604.git.bcodding@redhat.com (mailing list archive)
State New, archived
Headers show
Series nfsd: Fix race to FREE_STATEID and cl_revoked | expand

Commit Message

Benjamin Coddington Aug. 4, 2023, 2:52 p.m. UTC
We have some reports of linux NFS clients that cannot satisfy a linux knfsd
server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
those clients repeatedly walk all their known state using TEST_STATEID and
receive NFS4_OK for all.

Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
FREE_STATEID.  Afterward, revoke_delegation() moves the same delegation to
cl_revoked.  This would produce the observed client/server effect.

Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
and move to cl_revoked happens within the same cl_lock.  This will allow
nfsd4_free_stateid() to properly remove the delegation from cl_revoked.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfsd/nfs4state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chuck Lever Aug. 4, 2023, 3:02 p.m. UTC | #1
On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
> those clients repeatedly walk all their known state using TEST_STATEID and
> receive NFS4_OK for all.
> 
> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
> FREE_STATEID.  Afterward, revoke_delegation() moves the same delegation to
> cl_revoked.  This would produce the observed client/server effect.
> 
> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
> and move to cl_revoked happens within the same cl_lock.  This will allow
> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>

Hi Ben, does this fix deserve:

Cc: stable@vger.kernel.org # v4.17+

??

> ---
>  fs/nfsd/nfs4state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 3aefbad4cc09..daf305daa751 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs4_delegation *dp)
>  	trace_nfsd_stid_revoke(&dp->dl_stid);
>  
>  	if (clp->cl_minorversion) {
> +		spin_lock(&clp->cl_lock);
>  		dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
>  		refcount_inc(&dp->dl_stid.sc_count);
> -		spin_lock(&clp->cl_lock);
>  		list_add(&dp->dl_recall_lru, &clp->cl_revoked);
>  		spin_unlock(&clp->cl_lock);
>  	}
> -- 
> 2.40.1
>
Benjamin Coddington Aug. 4, 2023, 3:16 p.m. UTC | #2
On 4 Aug 2023, at 11:02, Chuck Lever wrote:

> On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
>> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
>> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
>> those clients repeatedly walk all their known state using TEST_STATEID and
>> receive NFS4_OK for all.
>>
>> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
>> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
>> FREE_STATEID.  Afterward, revoke_delegation() moves the same delegation to
>> cl_revoked.  This would produce the observed client/server effect.
>>
>> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
>> and move to cl_revoked happens within the same cl_lock.  This will allow
>> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
>>
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>
> Hi Ben, does this fix deserve:
>
> Cc: stable@vger.kernel.org # v4.17+

Yes, that's probably appropriate.

Ben
Jeffrey Layton Aug. 4, 2023, 3:33 p.m. UTC | #3
On Fri, 2023-08-04 at 11:02 -0400, Chuck Lever wrote:
> On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
> > We have some reports of linux NFS clients that cannot satisfy a linux knfsd
> > server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
> > those clients repeatedly walk all their known state using TEST_STATEID and
> > receive NFS4_OK for all.
> > 
> > Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
> > nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
> > FREE_STATEID.  Afterward, revoke_delegation() moves the same delegation to
> > cl_revoked.  This would produce the observed client/server effect.
> > 
> > Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
> > and move to cl_revoked happens within the same cl_lock.  This will allow
> > nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
> > 
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> 
> Hi Ben, does this fix deserve:
> 
> Cc: stable@vger.kernel.org # v4.17+
> 
> ??
> 

What's special about v4.17? Is there a patch that broke it that went
into that release?

> > ---
> >  fs/nfsd/nfs4state.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 3aefbad4cc09..daf305daa751 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> >  	trace_nfsd_stid_revoke(&dp->dl_stid);
> >  
> >  	if (clp->cl_minorversion) {
> > +		spin_lock(&clp->cl_lock);
> >  		dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
> >  		refcount_inc(&dp->dl_stid.sc_count);
> > -		spin_lock(&clp->cl_lock);
> >  		list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> >  		spin_unlock(&clp->cl_lock);
> >  	}
> > -- 
> > 2.40.1
> > 
> 

The fix looks correct though. You can also add:

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Benjamin Coddington Aug. 4, 2023, 3:35 p.m. UTC | #4
On 4 Aug 2023, at 11:33, Jeff Layton wrote:

> On Fri, 2023-08-04 at 11:02 -0400, Chuck Lever wrote:
>> On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
>>> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
>>> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
>>> those clients repeatedly walk all their known state using TEST_STATEID and
>>> receive NFS4_OK for all.
>>>
>>> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
>>> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
>>> FREE_STATEID.  Afterward, revoke_delegation() moves the same delegation to
>>> cl_revoked.  This would produce the observed client/server effect.
>>>
>>> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
>>> and move to cl_revoked happens within the same cl_lock.  This will allow
>>> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
>>>
>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
>>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>>
>> Hi Ben, does this fix deserve:
>>
>> Cc: stable@vger.kernel.org # v4.17+
>>
>> ??
>>
>
> What's special about v4.17? Is there a patch that broke it that went
> into that release?

Before 0af6e690f0d4e the patch won't apply.

Ben
Chuck Lever Aug. 4, 2023, 3:41 p.m. UTC | #5
On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
> those clients repeatedly walk all their known state using TEST_STATEID and
> receive NFS4_OK for all.
> 
> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
> FREE_STATEID.  Afterward, revoke_delegation() moves the same delegation to
> cl_revoked.  This would produce the observed client/server effect.
> 
> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
> and move to cl_revoked happens within the same cl_lock.  This will allow
> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>

Applied to nfsd-fixes (for v6.5-rc).


> ---
>  fs/nfsd/nfs4state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 3aefbad4cc09..daf305daa751 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs4_delegation *dp)
>  	trace_nfsd_stid_revoke(&dp->dl_stid);
>  
>  	if (clp->cl_minorversion) {
> +		spin_lock(&clp->cl_lock);
>  		dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
>  		refcount_inc(&dp->dl_stid.sc_count);
> -		spin_lock(&clp->cl_lock);
>  		list_add(&dp->dl_recall_lru, &clp->cl_revoked);
>  		spin_unlock(&clp->cl_lock);
>  	}
> -- 
> 2.40.1
>
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3aefbad4cc09..daf305daa751 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1354,9 +1354,9 @@  static void revoke_delegation(struct nfs4_delegation *dp)
 	trace_nfsd_stid_revoke(&dp->dl_stid);
 
 	if (clp->cl_minorversion) {
+		spin_lock(&clp->cl_lock);
 		dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
 		refcount_inc(&dp->dl_stid.sc_count);
-		spin_lock(&clp->cl_lock);
 		list_add(&dp->dl_recall_lru, &clp->cl_revoked);
 		spin_unlock(&clp->cl_lock);
 	}