Message ID | 56bc4d7e614a6d9d0aa520c71bd0ffb102e3ef08.1742919341.git.trond.myklebust@hammerspace.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Ensure that ENETUNREACH terminates state recovery | expand |
On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > If someone calls nfs_mark_client_ready(clp, status) with a negative > value for status, then that should signal that the nfs_client is no > longer valid. > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/nfs/nfs4state.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index 542cdf71229f..738eb2789266 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) > struct rpc_clnt *clnt = clp->cl_rpcclient; > bool swapon = false; > > - if (clnt->cl_shutdown) > + if (clnt->cl_shutdown || clp->cl_cons_state < 0) Would it be simpler to just set cl_shutdown when this occurs instead of having to check cl_cons_state as well? > return; > > set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); > @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_ > dprintk("%s: scheduling stateid recovery for server %s\n", __func__, > clp->cl_hostname); > nfs4_schedule_state_manager(clp); > - return 0; > + return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0; > } > EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery); >
On Tue, 2025-03-25 at 13:59 -0400, Jeff Layton wrote: > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > If someone calls nfs_mark_client_ready(clp, status) with a negative > > value for status, then that should signal that the nfs_client is no > > longer valid. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/nfs/nfs4state.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > index 542cdf71229f..738eb2789266 100644 > > --- a/fs/nfs/nfs4state.c > > +++ b/fs/nfs/nfs4state.c > > @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct > > nfs_client *clp) > > struct rpc_clnt *clnt = clp->cl_rpcclient; > > bool swapon = false; > > > > - if (clnt->cl_shutdown) > > + if (clnt->cl_shutdown || clp->cl_cons_state < 0) > > Would it be simpler to just set cl_shutdown when this occurs instead > of > having to check cl_cons_state as well? Do we need the check for clnt->cl_shutdown at all here? I'd expect any caller of this function to already hold a reference to the client, which means that the RPC client should still be up. I'm a little suspicious of the check in nfs41_sequence_call_done() too. > > > return; > > > > set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); > > @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const > > struct nfs_server *server, struct nfs4_ > > dprintk("%s: scheduling stateid recovery for server %s\n", > > __func__, > > clp->cl_hostname); > > nfs4_schedule_state_manager(clp); > > - return 0; > > + return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0; > > } > > EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery); > > >
On Tue, 2025-03-25 at 18:48 +0000, Trond Myklebust wrote: > On Tue, 2025-03-25 at 13:59 -0400, Jeff Layton wrote: > > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote: > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > If someone calls nfs_mark_client_ready(clp, status) with a negative > > > value for status, then that should signal that the nfs_client is no > > > longer valid. > > > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > > --- > > > fs/nfs/nfs4state.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > > index 542cdf71229f..738eb2789266 100644 > > > --- a/fs/nfs/nfs4state.c > > > +++ b/fs/nfs/nfs4state.c > > > @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct > > > nfs_client *clp) > > > struct rpc_clnt *clnt = clp->cl_rpcclient; > > > bool swapon = false; > > > > > > - if (clnt->cl_shutdown) > > > + if (clnt->cl_shutdown || clp->cl_cons_state < 0) > > > > Would it be simpler to just set cl_shutdown when this occurs instead > > of > > having to check cl_cons_state as well? > > Do we need the check for clnt->cl_shutdown at all here? I'd expect any > caller of this function to already hold a reference to the client, > which means that the RPC client should still be up. Not necessarily? Just because you hold a reference to the rpc_clnt doesn't mean that it's still up, AFAIU. For instance, if you end up using the "shutdown" file in sysfs, any RPC still in flight will hold a reference to the client. Writing to "shutdown" will set cl_shutdown to 1 and then cancel all the RPCs, but there is at least a window of time where we have an elevated refcount but the client is no longer valid. > > I'm a little suspicious of the check in nfs41_sequence_call_done() too. > Me too. I think this is probably an indicator that we need to carefully audit how cl_shutdown is used and clarify what it means. Luckily there are only a handful of places that reference it: The call_start check is fine I thinkhhuhdljkfjltkuddjrig, though maybe we should add cl_shutdown checks in later states? The other places that check it come from this commit: 6ad477a69ad8 NFSv4: Clean up some shutdown loops Should we convert both of those checks to look at clp->cl_cons_state instead? > > > > > return; > > > > > > set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); > > > @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const > > > struct nfs_server *server, struct nfs4_ > > > dprintk("%s: scheduling stateid recovery for server %s\n", > > > __func__, > > > clp->cl_hostname); > > > nfs4_schedule_state_manager(clp); > > > - return 0; > > > + return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0; > > > } > > > EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery); > > > > > > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >
On Tue, 2025-03-25 at 15:44 -0400, Jeff Layton wrote: > On Tue, 2025-03-25 at 18:48 +0000, Trond Myklebust wrote: > > On Tue, 2025-03-25 at 13:59 -0400, Jeff Layton wrote: > > > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > If someone calls nfs_mark_client_ready(clp, status) with a > > > > negative > > > > value for status, then that should signal that the nfs_client > > > > is no > > > > longer valid. > > > > > > > > Signed-off-by: Trond Myklebust > > > > <trond.myklebust@hammerspace.com> > > > > --- > > > > fs/nfs/nfs4state.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > > > index 542cdf71229f..738eb2789266 100644 > > > > --- a/fs/nfs/nfs4state.c > > > > +++ b/fs/nfs/nfs4state.c > > > > @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct > > > > nfs_client *clp) > > > > struct rpc_clnt *clnt = clp->cl_rpcclient; > > > > bool swapon = false; > > > > > > > > - if (clnt->cl_shutdown) > > > > + if (clnt->cl_shutdown || clp->cl_cons_state < 0) > > > > > > Would it be simpler to just set cl_shutdown when this occurs > > > instead > > > of > > > having to check cl_cons_state as well? > > > > Do we need the check for clnt->cl_shutdown at all here? I'd expect > > any > > caller of this function to already hold a reference to the client, > > which means that the RPC client should still be up. > > Not necessarily? Just because you hold a reference to the rpc_clnt > doesn't mean that it's still up, AFAIU. > > For instance, if you end up using the "shutdown" file in sysfs, any > RPC > still in flight will hold a reference to the client. Writing to > "shutdown" will set cl_shutdown to 1 and then cancel all the RPCs, > but > there is at least a window of time where we have an elevated refcount > but the client is no longer valid. The shutdown of the nfs_client RPC client happens in nfs_free_client(). Oh wait... Crap... Why is a per-nfs_server function like shutdown_store() reaching into the nfs_client? That's borked and needs to be fixed. > > > > > > I'm a little suspicious of the check in nfs41_sequence_call_done() > > too. > > > > Me too. I think this is probably an indicator that we need to > carefully > audit how cl_shutdown is used and clarify what it means. Luckily > there > are only a handful of places that reference it: > > The call_start check is fine I thinkhhuhdljkfjltkuddjrig, though > maybe > we should add cl_shutdown checks in later states? The other places > that > check it come from this commit: > > 6ad477a69ad8 NFSv4: Clean up some shutdown loops > > Should we convert both of those checks to look at clp->cl_cons_state > instead? Yes. > > > > > > > > return; > > > > > > > > set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); > > > > @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const > > > > struct nfs_server *server, struct nfs4_ > > > > dprintk("%s: scheduling stateid recovery for server > > > > %s\n", > > > > __func__, > > > > clp->cl_hostname); > > > > nfs4_schedule_state_manager(clp); > > > > - return 0; > > > > + return clp->cl_cons_state < 0 ? clp->cl_cons_state : > > > > 0; > > > > } > > > > EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery); > > > > > > > > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > > >
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 542cdf71229f..738eb2789266 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1198,7 +1198,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) struct rpc_clnt *clnt = clp->cl_rpcclient; bool swapon = false; - if (clnt->cl_shutdown) + if (clnt->cl_shutdown || clp->cl_cons_state < 0) return; set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); @@ -1403,7 +1403,7 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_ dprintk("%s: scheduling stateid recovery for server %s\n", __func__, clp->cl_hostname); nfs4_schedule_state_manager(clp); - return 0; + return clp->cl_cons_state < 0 ? clp->cl_cons_state : 0; } EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);