diff mbox series

[v2,3/4] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client

Message ID 56bc4d7e614a6d9d0aa520c71bd0ffb102e3ef08.1742919341.git.trond.myklebust@hammerspace.com (mailing list archive)
State New
Headers show
Series Ensure that ENETUNREACH terminates state recovery | expand

Commit Message

Trond Myklebust March 25, 2025, 4:17 p.m. UTC
From: Trond Myklebust <trond.myklebust@hammerspace.com>

If someone calls nfs_mark_client_ready(clp, status) with a negative
value for status, then that should signal that the nfs_client is no
longer valid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Jeff Layton March 25, 2025, 5:59 p.m. UTC | #1
On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> If someone calls nfs_mark_client_ready(clp, status) with a negative
> value for status, then that should signal that the nfs_client is no
> longer valid.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/nfs4state.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 542cdf71229f..738eb2789266 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp)
>  	struct rpc_clnt *clnt = clp->cl_rpcclient;
>  	bool swapon = false;
>  
> -	if (clnt->cl_shutdown)
> +	if (clnt->cl_shutdown || clp->cl_cons_state < 0)

Would it be simpler to just set cl_shutdown when this occurs instead of
having to check cl_cons_state as well?

>  		return;
>  
>  	set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state);
> @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_
>  	dprintk("%s: scheduling stateid recovery for server %s\n", __func__,
>  			clp->cl_hostname);
>  	nfs4_schedule_state_manager(clp);
> -	return 0;
> +	return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0;
>  }
>  EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);
>
Trond Myklebust March 25, 2025, 6:48 p.m. UTC | #2
On Tue, 2025-03-25 at 13:59 -0400, Jeff Layton wrote:
> On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > If someone calls nfs_mark_client_ready(clp, status) with a negative
> > value for status, then that should signal that the nfs_client is no
> > longer valid.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/nfs4state.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 542cdf71229f..738eb2789266 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct
> > nfs_client *clp)
> >  	struct rpc_clnt *clnt = clp->cl_rpcclient;
> >  	bool swapon = false;
> >  
> > -	if (clnt->cl_shutdown)
> > +	if (clnt->cl_shutdown || clp->cl_cons_state < 0)
> 
> Would it be simpler to just set cl_shutdown when this occurs instead
> of
> having to check cl_cons_state as well?

Do we need the check for clnt->cl_shutdown at all here? I'd expect any
caller of this function to already hold a reference to the client,
which means that the RPC client should still be up.

I'm a little suspicious of the check in nfs41_sequence_call_done() too.

> 
> >  		return;
> >  
> >  	set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state);
> > @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const
> > struct nfs_server *server, struct nfs4_
> >  	dprintk("%s: scheduling stateid recovery for server %s\n",
> > __func__,
> >  			clp->cl_hostname);
> >  	nfs4_schedule_state_manager(clp);
> > -	return 0;
> > +	return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0;
> >  }
> >  EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);
> >  
>
Jeff Layton March 25, 2025, 7:44 p.m. UTC | #3
On Tue, 2025-03-25 at 18:48 +0000, Trond Myklebust wrote:
> On Tue, 2025-03-25 at 13:59 -0400, Jeff Layton wrote:
> > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > 
> > > If someone calls nfs_mark_client_ready(clp, status) with a negative
> > > value for status, then that should signal that the nfs_client is no
> > > longer valid.
> > > 
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > >  fs/nfs/nfs4state.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 542cdf71229f..738eb2789266 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct
> > > nfs_client *clp)
> > >  	struct rpc_clnt *clnt = clp->cl_rpcclient;
> > >  	bool swapon = false;
> > >  
> > > -	if (clnt->cl_shutdown)
> > > +	if (clnt->cl_shutdown || clp->cl_cons_state < 0)
> > 
> > Would it be simpler to just set cl_shutdown when this occurs instead
> > of
> > having to check cl_cons_state as well?
> 
> Do we need the check for clnt->cl_shutdown at all here? I'd expect any
> caller of this function to already hold a reference to the client,
> which means that the RPC client should still be up.

Not necessarily? Just because you hold a reference to the rpc_clnt
doesn't mean that it's still up, AFAIU.

For instance, if you end up using the "shutdown" file in sysfs, any RPC
still in flight will hold a reference to the client. Writing to
"shutdown" will set cl_shutdown to 1 and then cancel all the RPCs, but
there is at least a window of time where we have an elevated refcount
but the client is no longer valid.


> 
> I'm a little suspicious of the check in nfs41_sequence_call_done() too.
> 

Me too. I think this is probably an indicator that we need to carefully
audit how cl_shutdown is used and clarify what it means. Luckily there
are only a handful of places that reference it:

The call_start check is fine I thinkhhuhdljkfjltkuddjrig, though maybe
we should add cl_shutdown checks in later states? The other places that
check it come from this commit:

    6ad477a69ad8 NFSv4: Clean up some shutdown loops

Should we convert both of those checks to look at clp->cl_cons_state
instead?

> > 
> > >  		return;
> > >  
> > >  	set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state);
> > > @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const
> > > struct nfs_server *server, struct nfs4_
> > >  	dprintk("%s: scheduling stateid recovery for server %s\n",
> > > __func__,
> > >  			clp->cl_hostname);
> > >  	nfs4_schedule_state_manager(clp);
> > > -	return 0;
> > > +	return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0;
> > >  }
> > >  EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);
> > >  
> > 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>
Trond Myklebust March 25, 2025, 8:30 p.m. UTC | #4
On Tue, 2025-03-25 at 15:44 -0400, Jeff Layton wrote:
> On Tue, 2025-03-25 at 18:48 +0000, Trond Myklebust wrote:
> > On Tue, 2025-03-25 at 13:59 -0400, Jeff Layton wrote:
> > > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote:
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > 
> > > > If someone calls nfs_mark_client_ready(clp, status) with a
> > > > negative
> > > > value for status, then that should signal that the nfs_client
> > > > is no
> > > > longer valid.
> > > > 
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@hammerspace.com>
> > > > ---
> > > >  fs/nfs/nfs4state.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > > index 542cdf71229f..738eb2789266 100644
> > > > --- a/fs/nfs/nfs4state.c
> > > > +++ b/fs/nfs/nfs4state.c
> > > > @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct
> > > > nfs_client *clp)
> > > >  	struct rpc_clnt *clnt = clp->cl_rpcclient;
> > > >  	bool swapon = false;
> > > >  
> > > > -	if (clnt->cl_shutdown)
> > > > +	if (clnt->cl_shutdown || clp->cl_cons_state < 0)
> > > 
> > > Would it be simpler to just set cl_shutdown when this occurs
> > > instead
> > > of
> > > having to check cl_cons_state as well?
> > 
> > Do we need the check for clnt->cl_shutdown at all here? I'd expect
> > any
> > caller of this function to already hold a reference to the client,
> > which means that the RPC client should still be up.
> 
> Not necessarily? Just because you hold a reference to the rpc_clnt
> doesn't mean that it's still up, AFAIU.
> 
> For instance, if you end up using the "shutdown" file in sysfs, any
> RPC
> still in flight will hold a reference to the client. Writing to
> "shutdown" will set cl_shutdown to 1 and then cancel all the RPCs,
> but
> there is at least a window of time where we have an elevated refcount
> but the client is no longer valid.

The shutdown of the nfs_client RPC client happens in nfs_free_client().

Oh wait... Crap... Why is a per-nfs_server function like
shutdown_store() reaching into the nfs_client? That's borked and needs
to be fixed.

> 
> 
> > 
> > I'm a little suspicious of the check in nfs41_sequence_call_done()
> > too.
> > 
> 
> Me too. I think this is probably an indicator that we need to
> carefully
> audit how cl_shutdown is used and clarify what it means. Luckily
> there
> are only a handful of places that reference it:
> 
> The call_start check is fine I thinkhhuhdljkfjltkuddjrig, though
> maybe
> we should add cl_shutdown checks in later states? The other places
> that
> check it come from this commit:
> 
>     6ad477a69ad8 NFSv4: Clean up some shutdown loops
> 
> Should we convert both of those checks to look at clp->cl_cons_state
> instead?

Yes.

> 
> > > 
> > > >  		return;
> > > >  
> > > >  	set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state);
> > > > @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const
> > > > struct nfs_server *server, struct nfs4_
> > > >  	dprintk("%s: scheduling stateid recovery for server
> > > > %s\n",
> > > > __func__,
> > > >  			clp->cl_hostname);
> > > >  	nfs4_schedule_state_manager(clp);
> > > > -	return 0;
> > > > +	return clp->cl_cons_state < 0 ? clp->cl_cons_state :
> > > > 0;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);
> > > >  
> > > 
> > 
> > -- 
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> > 
> > 
>
diff mbox series

Patch

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 542cdf71229f..738eb2789266 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1198,7 +1198,7 @@  void nfs4_schedule_state_manager(struct nfs_client *clp)
 	struct rpc_clnt *clnt = clp->cl_rpcclient;
 	bool swapon = false;
 
-	if (clnt->cl_shutdown)
+	if (clnt->cl_shutdown || clp->cl_cons_state < 0)
 		return;
 
 	set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state);
@@ -1403,7 +1403,7 @@  int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_
 	dprintk("%s: scheduling stateid recovery for server %s\n", __func__,
 			clp->cl_hostname);
 	nfs4_schedule_state_manager(clp);
-	return 0;
+	return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0;
 }
 EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);