diff mbox

[3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

Message ID 1375726682.7337.29.camel@leira.trondhjem.org (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust Aug. 5, 2013, 6:18 p.m. UTC
On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 16:15:01 +0000

> "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> 

> > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001

> > From: Trond Myklebust <Trond.Myklebust@netapp.com>

> > Date: Mon, 5 Aug 2013 12:06:12 -0400

> > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from

> >  nlmclnt_setlockargs

> > MIME-Version: 1.0

> > Content-Type: text/plain; charset=UTF-8

> > Content-Transfer-Encoding: 8bit

> > 

> > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in

> > which case we're in entirely the wrong namespace.

> > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move

> > exit_task_namespaces() outside of exit_notify()) now means that

> > exit_task_work() is called after exit_task_namespaces(), which

> > triggers an Oops when we're freeing up the locks.

> > 

> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

> > Cc: Toralf Förster <toralf.foerster@gmx.de>

> > Cc: Oleg Nesterov <oleg@redhat.com>

> > Cc: Nix <nix@esperi.org.uk>

> > Cc: Jeff Layton <jlayton@redhat.com>

> > ---

> >  fs/lockd/clntproc.c | 5 +++--

> >  1 file changed, 3 insertions(+), 2 deletions(-)

> > 

> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c

> > index 9760ecb..acd3947 100644

> > --- a/fs/lockd/clntproc.c

> > +++ b/fs/lockd/clntproc.c

> > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)

> >  {

> >  	struct nlm_args	*argp = &req->a_args;

> >  	struct nlm_lock	*lock = &argp->lock;

> > +	char *nodename = req->a_host->h_rpcclnt->cl_nodename;

> >  

> >  	nlmclnt_next_cookie(&argp->cookie);

> >  	memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));

> > -	lock->caller  = utsname()->nodename;

> > +	lock->caller  = nodename;

> >  	lock->oh.data = req->a_owner;

> >  	lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",

> >  				(unsigned int)fl->fl_u.nfs_fl.owner->pid,

> > -				utsname()->nodename);

> > +				nodename);

> >  	lock->svid = fl->fl_u.nfs_fl.owner->pid;

> >  	lock->fl.fl_start = fl->fl_start;

> >  	lock->fl.fl_end = fl->fl_end;

> 

> Looks good to me...

> 

> Reviewed-by: Jeff Layton <jlayton@redhat.com>

> 

> Trond, any thoughts on the other oops that Nix posted? The issue there

> seems to be that we're trying to do the pathwalk to the rpcbind unix

> socket from exit_task_work(), but that's happening after we've already

> called exit_fs().

> 

> The trivial answer seems to be to simply call exit_task_work() before

> exit_fs() there, but it seems like we ought to be doing the upcall to

> rpcbind in a mount namespace from which we know we can reach the

> socket...


Isn't it enough to just do the same thing as we did for gss proxy? i.e.
set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.

See attachment.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

Comments

Jeff Layton Aug. 5, 2013, 6:33 p.m. UTC | #1
On Mon, 5 Aug 2013 18:18:03 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> > On Mon, 5 Aug 2013 16:15:01 +0000
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > 
> > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > >  nlmclnt_setlockargs
> > > MIME-Version: 1.0
> > > Content-Type: text/plain; charset=UTF-8
> > > Content-Transfer-Encoding: 8bit
> > > 
> > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > > which case we're in entirely the wrong namespace.
> > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > > exit_task_namespaces() outside of exit_notify()) now means that
> > > exit_task_work() is called after exit_task_namespaces(), which
> > > triggers an Oops when we're freeing up the locks.
> > > 
> > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > > Cc: Toralf Förster <toralf.foerster@gmx.de>
> > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > Cc: Nix <nix@esperi.org.uk>
> > > Cc: Jeff Layton <jlayton@redhat.com>
> > > ---
> > >  fs/lockd/clntproc.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > index 9760ecb..acd3947 100644
> > > --- a/fs/lockd/clntproc.c
> > > +++ b/fs/lockd/clntproc.c
> > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> > >  {
> > >  	struct nlm_args	*argp = &req->a_args;
> > >  	struct nlm_lock	*lock = &argp->lock;
> > > +	char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> > >  
> > >  	nlmclnt_next_cookie(&argp->cookie);
> > >  	memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> > > -	lock->caller  = utsname()->nodename;
> > > +	lock->caller  = nodename;
> > >  	lock->oh.data = req->a_owner;
> > >  	lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > >  				(unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > > -				utsname()->nodename);
> > > +				nodename);
> > >  	lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > >  	lock->fl.fl_start = fl->fl_start;
> > >  	lock->fl.fl_end = fl->fl_end;
> > 
> > Looks good to me...
> > 
> > Reviewed-by: Jeff Layton <jlayton@redhat.com>
> > 
> > Trond, any thoughts on the other oops that Nix posted? The issue there
> > seems to be that we're trying to do the pathwalk to the rpcbind unix
> > socket from exit_task_work(), but that's happening after we've already
> > called exit_fs().
> > 
> > The trivial answer seems to be to simply call exit_task_work() before
> > exit_fs() there, but it seems like we ought to be doing the upcall to
> > rpcbind in a mount namespace from which we know we can reach the
> > socket...
> 
> Isn't it enough to just do the same thing as we did for gss proxy? i.e.
> set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
> 
> See attachment.

Yeah, that looks like a reasonable thing to do...

OTOH, Is there any other way for a unix socket to end up disconnected
other than if we were to close it? Maybe if rpcbind stopped, the socket
unlinked and recreated and then started again?

If so then you still could potentially end up in this situation even if
you didn't autoclose it.
diff mbox

Patch

From ab56d77893815b1b9f0aaa7a89cee7c832a31cff Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH] SUNRPC: Don't auto-disconnect from the local rpcbind socket

There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.9.x
---
 net/sunrpc/rpcb_clnt.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 3df764d..4b00555 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -238,6 +238,15 @@  static int rpcb_create_local_unix(struct net *net)
 		.program	= &rpcb_program,
 		.version	= RPCBVERS_2,
 		.authflavor	= RPC_AUTH_NULL,
+		/*
+		 * We turn off the idle timeout to prevent the kernel
+		 * from automatically disconnecting the socket.
+		 * Otherwise, we'd have to cache the mount namespace
+		 * of the caller and somehow pass that to the socket
+		 * reconnect code.
+		 */
+		.flags		= RPC_CLNT_CREATE_NOPING |
+				  RPC_CLNT_CREATE_NO_IDLE_TIMEOUT,
 	};
 	struct rpc_clnt *clnt, *clnt4;
 	int result = 0;
-- 
1.8.3.1