Message ID | b1989a8316da4dcaaaaedad3b9d234d212be1083.1742919341.git.trond.myklebust@hammerspace.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Ensure that ENETUNREACH terminates state recovery | expand |
On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > If a containerised process is killed and causes an ENETUNREACH or > ENETDOWN error to be propagated to the state manager, then mark the > nfs_client as being dead so that we don't loop in functions that are > expecting recovery to succeed. > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/nfs/nfs4state.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index 738eb2789266..14ba3f96e6fc 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct nfs_client *clp) > pr_warn_ratelimited("NFS: state manager%s%s failed on NFSv4 server %s" > " with error %d\n", section_sep, section, > clp->cl_hostname, -status); > - ssleep(1); > + switch (status) { > + case -ENETDOWN: > + case -ENETUNREACH: > + nfs_mark_client_ready(clp, -EIO); > + break; Should this be conditional on clnt->cl_netunreach_fatal being true? > + default: > + ssleep(1); > + break; > + } > out_drain: > memalloc_nofs_restore(memflags); > nfs4_end_drain_session(clp);
On Tue, 2025-03-25 at 14:04 -0400, Jeff Layton wrote: > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > If a containerised process is killed and causes an ENETUNREACH or > > ENETDOWN error to be propagated to the state manager, then mark the > > nfs_client as being dead so that we don't loop in functions that > > are > > expecting recovery to succeed. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/nfs/nfs4state.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > index 738eb2789266..14ba3f96e6fc 100644 > > --- a/fs/nfs/nfs4state.c > > +++ b/fs/nfs/nfs4state.c > > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct > > nfs_client *clp) > > pr_warn_ratelimited("NFS: state manager%s%s failed on > > NFSv4 server %s" > > " with error %d\n", section_sep, section, > > clp->cl_hostname, -status); > > - ssleep(1); > > + switch (status) { > > + case -ENETDOWN: > > + case -ENETUNREACH: > > + nfs_mark_client_ready(clp, -EIO); > > + break; > > Should this be conditional on clnt->cl_netunreach_fatal being true? I should hope not. We shouldn't ever be seeing these errors here if it is false. > > > + default: > > + ssleep(1); > > + break; > > + } > > out_drain: > > memalloc_nofs_restore(memflags); > > nfs4_end_drain_session(clp); >
On Tue, 2025-03-25 at 18:50 +0000, Trond Myklebust wrote: > On Tue, 2025-03-25 at 14:04 -0400, Jeff Layton wrote: > > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote: > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > If a containerised process is killed and causes an ENETUNREACH or > > > ENETDOWN error to be propagated to the state manager, then mark the > > > nfs_client as being dead so that we don't loop in functions that > > > are > > > expecting recovery to succeed. > > > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > > --- > > > fs/nfs/nfs4state.c | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > > index 738eb2789266..14ba3f96e6fc 100644 > > > --- a/fs/nfs/nfs4state.c > > > +++ b/fs/nfs/nfs4state.c > > > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct > > > nfs_client *clp) > > > pr_warn_ratelimited("NFS: state manager%s%s failed on > > > NFSv4 server %s" > > > " with error %d\n", section_sep, section, > > > clp->cl_hostname, -status); > > > - ssleep(1); > > > + switch (status) { > > > + case -ENETDOWN: > > > + case -ENETUNREACH: > > > + nfs_mark_client_ready(clp, -EIO); > > > + break; > > > > Should this be conditional on clnt->cl_netunreach_fatal being true? > > I should hope not. We shouldn't ever be seeing these errors here if it > is false. > Oh right, the only way that these errors bubble up from sunrpc layer is if this option is enabled. You can add: Reviewed-by: Jeff Layton <jlayton@kernel.org>
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 738eb2789266..14ba3f96e6fc 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct nfs_client *clp) pr_warn_ratelimited("NFS: state manager%s%s failed on NFSv4 server %s" " with error %d\n", section_sep, section, clp->cl_hostname, -status); - ssleep(1); + switch (status) { + case -ENETDOWN: + case -ENETUNREACH: + nfs_mark_client_ready(clp, -EIO); + break; + default: + ssleep(1); + break; + } out_drain: memalloc_nofs_restore(memflags); nfs4_end_drain_session(clp);