diff mbox series

[v2,4/4] NFSv4: Treat ENETUNREACH errors as fatal for state recovery

Message ID b1989a8316da4dcaaaaedad3b9d234d212be1083.1742919341.git.trond.myklebust@hammerspace.com (mailing list archive)
State New
Headers show
Series Ensure that ENETUNREACH terminates state recovery | expand

Commit Message

Trond Myklebust March 25, 2025, 4:17 p.m. UTC
From: Trond Myklebust <trond.myklebust@hammerspace.com>

If a containerised process is killed and causes an ENETUNREACH or
ENETDOWN error to be propagated to the state manager, then mark the
nfs_client as being dead so that we don't loop in functions that are
expecting recovery to succeed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4state.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Jeff Layton March 25, 2025, 6:04 p.m. UTC | #1
On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> If a containerised process is killed and causes an ENETUNREACH or
> ENETDOWN error to be propagated to the state manager, then mark the
> nfs_client as being dead so that we don't loop in functions that are
> expecting recovery to succeed.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/nfs4state.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 738eb2789266..14ba3f96e6fc 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct nfs_client *clp)
>  	pr_warn_ratelimited("NFS: state manager%s%s failed on NFSv4 server %s"
>  			" with error %d\n", section_sep, section,
>  			clp->cl_hostname, -status);
> -	ssleep(1);
> +	switch (status) {
> +	case -ENETDOWN:
> +	case -ENETUNREACH:
> +		nfs_mark_client_ready(clp, -EIO);
> +		break;

Should this be conditional on clnt->cl_netunreach_fatal being true?
 
> +	default:
> +		ssleep(1);
> +		break;
> +	}
>  out_drain:
>  	memalloc_nofs_restore(memflags);
>  	nfs4_end_drain_session(clp);
Trond Myklebust March 25, 2025, 6:50 p.m. UTC | #2
On Tue, 2025-03-25 at 14:04 -0400, Jeff Layton wrote:
> On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > If a containerised process is killed and causes an ENETUNREACH or
> > ENETDOWN error to be propagated to the state manager, then mark the
> > nfs_client as being dead so that we don't loop in functions that
> > are
> > expecting recovery to succeed.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/nfs4state.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 738eb2789266..14ba3f96e6fc 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct
> > nfs_client *clp)
> >  	pr_warn_ratelimited("NFS: state manager%s%s failed on
> > NFSv4 server %s"
> >  			" with error %d\n", section_sep, section,
> >  			clp->cl_hostname, -status);
> > -	ssleep(1);
> > +	switch (status) {
> > +	case -ENETDOWN:
> > +	case -ENETUNREACH:
> > +		nfs_mark_client_ready(clp, -EIO);
> > +		break;
> 
> Should this be conditional on clnt->cl_netunreach_fatal being true?

I should hope not. We shouldn't ever be seeing these errors here if it
is false.

>  
> > +	default:
> > +		ssleep(1);
> > +		break;
> > +	}
> >  out_drain:
> >  	memalloc_nofs_restore(memflags);
> >  	nfs4_end_drain_session(clp);
>
Jeff Layton March 25, 2025, 7:26 p.m. UTC | #3
On Tue, 2025-03-25 at 18:50 +0000, Trond Myklebust wrote:
> On Tue, 2025-03-25 at 14:04 -0400, Jeff Layton wrote:
> > On Tue, 2025-03-25 at 12:17 -0400, trondmy@kernel.org wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > 
> > > If a containerised process is killed and causes an ENETUNREACH or
> > > ENETDOWN error to be propagated to the state manager, then mark the
> > > nfs_client as being dead so that we don't loop in functions that
> > > are
> > > expecting recovery to succeed.
> > > 
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > >  fs/nfs/nfs4state.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 738eb2789266..14ba3f96e6fc 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct
> > > nfs_client *clp)
> > >  	pr_warn_ratelimited("NFS: state manager%s%s failed on
> > > NFSv4 server %s"
> > >  			" with error %d\n", section_sep, section,
> > >  			clp->cl_hostname, -status);
> > > -	ssleep(1);
> > > +	switch (status) {
> > > +	case -ENETDOWN:
> > > +	case -ENETUNREACH:
> > > +		nfs_mark_client_ready(clp, -EIO);
> > > +		break;
> > 
> > Should this be conditional on clnt->cl_netunreach_fatal being true?
> 
> I should hope not. We shouldn't ever be seeing these errors here if it
> is false.
> 

Oh right, the only way that these errors bubble up from sunrpc layer is
if this option is enabled. You can add:

Reviewed-by: Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 738eb2789266..14ba3f96e6fc 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2739,7 +2739,15 @@  static void nfs4_state_manager(struct nfs_client *clp)
 	pr_warn_ratelimited("NFS: state manager%s%s failed on NFSv4 server %s"
 			" with error %d\n", section_sep, section,
 			clp->cl_hostname, -status);
-	ssleep(1);
+	switch (status) {
+	case -ENETDOWN:
+	case -ENETUNREACH:
+		nfs_mark_client_ready(clp, -EIO);
+		break;
+	default:
+		ssleep(1);
+		break;
+	}
 out_drain:
 	memalloc_nofs_restore(memflags);
 	nfs4_end_drain_session(clp);