NFS Force Unmounting

Message ID	87fu9ph2g7.fsf@notabene.neil.brown.name (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> From: NeilBrown <neilb@suse.com> To: Joshua Watt <jpewhacker@gmail.com>, Jeff Layton <jlayton@redhat.com>, Trond Myklebust <trond.myklebust@primarydata.com> Date: Wed, 08 Nov 2017 14:30:32 +1100 Cc: "J. Bruce Fields" <bfields@fieldses.org>, Linux NFS Mailing List <linux-nfs@vger.kernel.org>, Al Viro <viro@zeniv.linux.org.uk>, David Howells <dhowells@redhat.com> Subject: Re: NFS Force Unmounting In-Reply-To: <CAJdd5GZ-EMj8f3RaV_WUdeZs6aWbEAeC6xX8BadMD-aOFL3_jA@mail.gmail.com> References: <1508951506.2542.51.camel@gmail.com> <20171030202045.GA6168@fieldses.org> <87h8ugwdev.fsf@notabene.neil.brown.name> <1509557061.4755.27.camel@redhat.com> <87efphvbhy.fsf@notabene.neil.brown.name> <1509624549.4569.28.camel@redhat.com> <CAJdd5GZ-EMj8f3RaV_WUdeZs6aWbEAeC6xX8BadMD-aOFL3_jA@mail.gmail.com> Message-ID: <87fu9ph2g7.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

NeilBrown Nov. 8, 2017, 3:30 a.m. UTC

What to people think of the following as an approach
to Joshua's need?

It isn't complete by itself: it needs a couple of changes to
nfs-utils so that it doesn't stat the mountpoint on remount,
and it might need another kernel change so that the "mount" system
call performs the same sort of careful lookup for remount as  the umount
system call does, but those are relatively small details.

This is the patch that you will either love of hate.

With this patch, Joshua (or any other sysadmin) could:

  mount -o remount,retrans=0,timeo=1 /path

and then new requests on any mountpoint from that server will timeout
quickly.
Then
  umount -f /path
  umount -f /path

should kill off any existing requests that use the old timeout. (I just
tested and I did need this twice to kill of an "ls -l".
The first was an 'open' systemcall, then next as 'newfstat'. I wonder
why the getattr for fstat didn't use the new timeout...)

Thoughts?
NeilBrown

Jeff Layton Nov. 8, 2017, 12:08 p.m. UTC | #1

On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote:
> What to people think of the following as an approach
> to Joshua's need?
> 
> It isn't complete by itself: it needs a couple of changes to
> nfs-utils so that it doesn't stat the mountpoint on remount,
> and it might need another kernel change so that the "mount" system
> call performs the same sort of careful lookup for remount as  the umount
> system call does, but those are relatively small details.
> 

Yeah, that'd be good.

> This is the patch that you will either love of hate.
> 
> With this patch, Joshua (or any other sysadmin) could:
> 
>   mount -o remount,retrans=0,timeo=1 /path
> 
> and then new requests on any mountpoint from that server will timeout
> quickly.
> Then
>   umount -f /path
>   umount -f /path
> 
> should kill off any existing requests that use the old timeout. (I just
> tested and I did need this twice to kill of an "ls -l".
> The first was an 'open' systemcall, then next as 'newfstat'. I wonder
> why the getattr for fstat didn't use the new timeout...)
> 
> Thoughts?
> NeilBrown
> 
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index c9d24bae3025..ced12fcec349 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -2210,27 +2210,39 @@ static int nfs_validate_text_mount_data(void *options,
>  		~(NFS_MOUNT_UNSHARED | NFS_MOUNT_NORESVPORT))
>  
>  static int
> -nfs_compare_remount_data(struct nfs_server *nfss,
> -			 struct nfs_parsed_mount_data *data)
> +nfs_compare_and_set_remount_data(struct nfs_server *nfss,
> +				 struct nfs_parsed_mount_data *data)
>  {
>  	if ((data->flags ^ nfss->flags) & NFS_REMOUNT_CMP_FLAGMASK ||
>  	    data->rsize != nfss->rsize ||
>  	    data->wsize != nfss->wsize ||
>  	    data->version != nfss->nfs_client->rpc_ops->version ||
>  	    data->minorversion != nfss->nfs_client->cl_minorversion ||
> -	    data->retrans != nfss->client->cl_timeout->to_retries ||
>  	    !nfs_auth_info_match(&data->auth_info, nfss->client->cl_auth->au_flavor) ||
>  	    data->acregmin != nfss->acregmin / HZ ||
>  	    data->acregmax != nfss->acregmax / HZ ||
>  	    data->acdirmin != nfss->acdirmin / HZ ||
>  	    data->acdirmax != nfss->acdirmax / HZ ||
> -	    data->timeo != (10U * nfss->client->cl_timeout->to_initval / HZ) ||
>  	    data->nfs_server.port != nfss->port ||
>  	    data->nfs_server.addrlen != nfss->nfs_client->cl_addrlen ||
>  	    !rpc_cmp_addr((struct sockaddr *)&data->nfs_server.address,
>  			  (struct sockaddr *)&nfss->nfs_client->cl_addr))
>  		return -EINVAL;
>  
> +	if (data->retrans != nfss->client->cl_timeout->to_retries ||
> +	    data->timeo != (10U * nfss->client->cl_timeout->to_initval / HZ)) {
> +		/* Note that this will affect all mounts from the same server,
> +		 * that use the same protocol.  The timeouts are always forced
> +		 * to be the same.
> +		 */
> +		struct rpc_clnt *cl = nfss->client;
> +		if (cl->cl_timeout != &cl->cl_timeout_default)
> +			memcpy(&cl->cl_timeout_default, cl->cl_timeout,
> +			       sizeof(struct rpc_timeout));
> +		cl->cl_timeout_default.to_retries = data->retrans;
> +		cl->cl_timeout_default.to_initval = data->timeo * HZ / 10U;
> +	}
> +

No objection to allowing more mount options to be changed on remount, we
really ought to allow a lot more options to be changed on remount if we
can reasonably pull it off.

>  	return 0;
>  }
>  
> @@ -2244,7 +2256,8 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data)
>  	struct nfs4_mount_data *options4 = (struct nfs4_mount_data *)raw_data;
>  	u32 nfsvers = nfss->nfs_client->rpc_ops->version;
>  
> -	sync_filesystem(sb);
> +	if (sb->s_readonly_remount)
> +		sync_filesystem(sb);
>  
>  	/*
>  	 * Userspace mount programs that send binary options generally send
> @@ -2295,7 +2308,7 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data)
>  		*flags |= MS_SYNCHRONOUS;
>  
>  	/* compare new mount options with old ones */
> -	error = nfs_compare_remount_data(nfss, data);
> +	error = nfs_compare_and_set_remount_data(nfss, data);
>  out:
>  	kfree(data);
>  	return error;

Looks like a reasonable approach overall to preventing new RPCs from
being dispatched once the "force" umount runs.

I do wonder if this ought to be more automatic when you specify -f on
the umount. Having to manually do a remount first doesn't seem very
admin-friendly.

J. Bruce Fields Nov. 8, 2017, 3:52 p.m. UTC | #2

On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote:
> On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote:
> > What to people think of the following as an approach
> > to Joshua's need?
> > 
> > It isn't complete by itself: it needs a couple of changes to
> > nfs-utils so that it doesn't stat the mountpoint on remount,
> > and it might need another kernel change so that the "mount" system
> > call performs the same sort of careful lookup for remount as  the umount
> > system call does, but those are relatively small details.
> > 
> 
> Yeah, that'd be good.
> 
> > This is the patch that you will either love of hate.
> > 
> > With this patch, Joshua (or any other sysadmin) could:
> > 
> >   mount -o remount,retrans=0,timeo=1 /path
> > 
> > and then new requests on any mountpoint from that server will timeout
> > quickly.
> > Then
> >   umount -f /path
> >   umount -f /path
...
> Looks like a reasonable approach overall to preventing new RPCs from
> being dispatched once the "force" umount runs.

I've lost track of the discussion--after this patch, how close are we to
a guaranteed force unmount?  I assume there are still a few obstacles.

> I do wonder if this ought to be more automatic when you specify -f on
> the umount. Having to manually do a remount first doesn't seem very
> admin-friendly.

It's an odd interface.  Maybe we could wrap it in something more
intuitive.

I'd be nervous about making "umount -f" do it.  I think administrators
could be unpleasantly surprised in some cases if an "umount -f" affects
other mounts of the same server.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NeilBrown Nov. 8, 2017, 10:34 p.m. UTC | #3

On Wed, Nov 08 2017, J. Bruce Fields wrote:

> On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote:
>> On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote:
>> > What to people think of the following as an approach
>> > to Joshua's need?
>> > 
>> > It isn't complete by itself: it needs a couple of changes to
>> > nfs-utils so that it doesn't stat the mountpoint on remount,
>> > and it might need another kernel change so that the "mount" system
>> > call performs the same sort of careful lookup for remount as  the umount
>> > system call does, but those are relatively small details.
>> > 
>> 
>> Yeah, that'd be good.
>> 
>> > This is the patch that you will either love of hate.
>> > 
>> > With this patch, Joshua (or any other sysadmin) could:
>> > 
>> >   mount -o remount,retrans=0,timeo=1 /path
>> > 
>> > and then new requests on any mountpoint from that server will timeout
>> > quickly.
>> > Then
>> >   umount -f /path
>> >   umount -f /path
> ...
>> Looks like a reasonable approach overall to preventing new RPCs from
>> being dispatched once the "force" umount runs.
>
> I've lost track of the discussion--after this patch, how close are we to
> a guaranteed force unmount?  I assume there are still a few obstacles.

This isn't really about forced unmount.
The way forward to forced unmount it:
 - make all waits on NFS be TASK_KILLABLE
 - figure out what happens to dirty data when all processes have
   been killed.

This is about allowing processes to be told that the filesystem is dead
so that can respond (without blocking indefinitely) without
necessarily being killed.
With a local filesystem you can (in some cases) kill the underlying
device and all processes will start getting EIO.  This is providing
similar functionality for NFS.

>
>> I do wonder if this ought to be more automatic when you specify -f on
>> the umount. Having to manually do a remount first doesn't seem very
>> admin-friendly.
>
> It's an odd interface.  Maybe we could wrap it in something more
> intuitive.
>
> I'd be nervous about making "umount -f" do it.  I think administrators
> could be unpleasantly surprised in some cases if an "umount -f" affects
> other mounts of the same server.

I was all set to tell you that it already does, but then tested and
found it doesn't and ....

struct nfs_server (which sb->s_fs_info points to) contains

	struct nfs_client *	nfs_client;	/* shared client and NFS4 state */

which is shared between different mounts from the same server, and

	struct rpc_clnt *	client;		/* RPC client handle */

which isn't shared.
struct nfs_client contains
	struct rpc_clnt *	cl_rpcclient;

which server->client is clones from.

The timeouts that apply to a mount are the ones from server->client,
and so apply only to that mount (I thought they were shared, but that is
a thought from years ago, and maybe it was wrong at the time).
umount_begin aborts all rpcs associated with server->client.

So the 'remount,retrans=0,timeo=1' that I propose would only affect the
one superblock (all bind-mounts of course, included sharecache mounts).

The comment in my code was wrong.

Thanks,
NeilBrown

Trond Myklebust Nov. 8, 2017, 11:52 p.m. UTC | #4

On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote:
> On Wed, Nov 08 2017, J. Bruce Fields wrote:

> 

> > On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote:

> > > On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote:

> > > > What to people think of the following as an approach

> > > > to Joshua's need?

> > > > 

> > > > It isn't complete by itself: it needs a couple of changes to

> > > > nfs-utils so that it doesn't stat the mountpoint on remount,

> > > > and it might need another kernel change so that the "mount"

> > > > system

> > > > call performs the same sort of careful lookup for remount

> > > > as  the umount

> > > > system call does, but those are relatively small details.

> > > > 

> > > 

> > > Yeah, that'd be good.

> > > 

> > > > This is the patch that you will either love of hate.

> > > > 

> > > > With this patch, Joshua (or any other sysadmin) could:

> > > > 

> > > >   mount -o remount,retrans=0,timeo=1 /path

> > > > 

> > > > and then new requests on any mountpoint from that server will

> > > > timeout

> > > > quickly.

> > > > Then

> > > >   umount -f /path

> > > >   umount -f /path

> > 

> > ...

> > > Looks like a reasonable approach overall to preventing new RPCs

> > > from

> > > being dispatched once the "force" umount runs.

> > 

> > I've lost track of the discussion--after this patch, how close are

> > we to

> > a guaranteed force unmount?  I assume there are still a few

> > obstacles.

> 

> This isn't really about forced unmount.

> The way forward to forced unmount it:

>  - make all waits on NFS be TASK_KILLABLE

>  - figure out what happens to dirty data when all processes have

>    been killed.

> 

> This is about allowing processes to be told that the filesystem is

> dead

> so that can respond (without blocking indefinitely) without

> necessarily being killed.

> With a local filesystem you can (in some cases) kill the underlying

> device and all processes will start getting EIO.  This is providing

> similar functionality for NFS.

> 

> > 

> > > I do wonder if this ought to be more automatic when you specify

> > > -f on

> > > the umount. Having to manually do a remount first doesn't seem

> > > very

> > > admin-friendly.

> > 

> > It's an odd interface.  Maybe we could wrap it in something more

> > intuitive.

> > 

> > I'd be nervous about making "umount -f" do it.  I think

> > administrators

> > could be unpleasantly surprised in some cases if an "umount -f"

> > affects

> > other mounts of the same server.

> 

> I was all set to tell you that it already does, but then tested and

> found it doesn't and ....

> 

> struct nfs_server (which sb->s_fs_info points to) contains

> 

> 	struct nfs_client *	nfs_client;	/* shared client

> and NFS4 state */

> 

> which is shared between different mounts from the same server, and

> 

> 	struct rpc_clnt *	client;		/* RPC client

> handle */

> 

> which isn't shared.

> struct nfs_client contains

> 	struct rpc_clnt *	cl_rpcclient;

> 

> which server->client is clones from.

> 

> The timeouts that apply to a mount are the ones from server->client,

> and so apply only to that mount (I thought they were shared, but that

> is

> a thought from years ago, and maybe it was wrong at the time).

> umount_begin aborts all rpcs associated with server->client.

> 

> So the 'remount,retrans=0,timeo=1' that I propose would only affect

> the

> one superblock (all bind-mounts of course, included sharecache

> mounts).

> 

> The comment in my code was wrong.

> 


I by far prefer an operation that changes the superblock state to be
done using 'mount -o remount'. The idea of a 'umount -f' that makes the
superblock irreversibly change state is clearly not desirable in an
environment where the same superblock could be bind mounted and/or
mounted in multiple private namespaces.

IOW: 'remount,retrans=0,timeo=1' would be slightly preferable to
hacking up "umount -f" because it is reversible.

That said, there is no reason why we couldn't implement a single mount
option that implements something akin to this "umount -f" behaviour
(and that can be reversed by resetting it through a second 'remount').
It seems to me that the requested behaviour is already pretty close to
what we already implement with the RPC_TASK_SOFTCONN option.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

Joshua Watt Nov. 9, 2017, 7:48 p.m. UTC | #5

On Wed, 2017-11-08 at 23:52 +0000, Trond Myklebust wrote:
> On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote:
> > On Wed, Nov 08 2017, J. Bruce Fields wrote:
> > 
> > > On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote:
> > > > On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote:
> > > > > What to people think of the following as an approach
> > > > > to Joshua's need?
> > > > > 
> > > > > It isn't complete by itself: it needs a couple of changes to
> > > > > nfs-utils so that it doesn't stat the mountpoint on remount,
> > > > > and it might need another kernel change so that the "mount"
> > > > > system
> > > > > call performs the same sort of careful lookup for remount
> > > > > as  the umount
> > > > > system call does, but those are relatively small details.
> > > > > 
> > > > 
> > > > Yeah, that'd be good.
> > > > 
> > > > > This is the patch that you will either love of hate.
> > > > > 
> > > > > With this patch, Joshua (or any other sysadmin) could:
> > > > > 
> > > > >   mount -o remount,retrans=0,timeo=1 /path
> > > > > 
> > > > > and then new requests on any mountpoint from that server will
> > > > > timeout
> > > > > quickly.
> > > > > Then
> > > > >   umount -f /path
> > > > >   umount -f /path
> > > 
> > > ...
> > > > Looks like a reasonable approach overall to preventing new RPCs
> > > > from
> > > > being dispatched once the "force" umount runs.
> > > 
> > > I've lost track of the discussion--after this patch, how close
> > > are
> > > we to
> > > a guaranteed force unmount?  I assume there are still a few
> > > obstacles.
> > 
> > This isn't really about forced unmount.
> > The way forward to forced unmount it:
> >  - make all waits on NFS be TASK_KILLABLE
> >  - figure out what happens to dirty data when all processes have
> >    been killed.
> > 
> > This is about allowing processes to be told that the filesystem is
> > dead
> > so that can respond (without blocking indefinitely) without
> > necessarily being killed.
> > With a local filesystem you can (in some cases) kill the underlying
> > device and all processes will start getting EIO.  This is providing
> > similar functionality for NFS.
> > 
> > > 
> > > > I do wonder if this ought to be more automatic when you specify
> > > > -f on
> > > > the umount. Having to manually do a remount first doesn't seem
> > > > very
> > > > admin-friendly.
> > > 
> > > It's an odd interface.  Maybe we could wrap it in something more
> > > intuitive.
> > > 
> > > I'd be nervous about making "umount -f" do it.  I think
> > > administrators
> > > could be unpleasantly surprised in some cases if an "umount -f"
> > > affects
> > > other mounts of the same server.
> > 
> > I was all set to tell you that it already does, but then tested and
> > found it doesn't and ....

I tried mounting two different remote paths from an NFS4 server, and
when I did 'remount,retrans=0', it changed the parameter for both of
them meaning they are indeed shaing the struct nfs_server (which based
on my reading of the code is what I would have expected). What
procedure did you use to test this?

> > 
> > struct nfs_server (which sb->s_fs_info points to) contains
> > 
> > 	struct nfs_client *	nfs_client;	/* shared client
> > and NFS4 state */
> > 
> > which is shared between different mounts from the same server, and
> > 
> > 	struct rpc_clnt *	client;		/* RPC client
> > handle */
> > 
> > which isn't shared.
> > struct nfs_client contains
> > 	struct rpc_clnt *	cl_rpcclient;
> > 
> > which server->client is clones from.
> > 
> > The timeouts that apply to a mount are the ones from server-
> > >client,
> > and so apply only to that mount (I thought they were shared, but
> > that
> > is
> > a thought from years ago, and maybe it was wrong at the time).
> > umount_begin aborts all rpcs associated with server->client.
> > 
> > So the 'remount,retrans=0,timeo=1' that I propose would only affect
> > the
> > one superblock (all bind-mounts of course, included sharecache
> > mounts).
> > 
> > The comment in my code was wrong.
> > 
> 
> I by far prefer an operation that changes the superblock state to be
> done using 'mount -o remount'. The idea of a 'umount -f' that makes
> the
> superblock irreversibly change state is clearly not desirable in an
> environment where the same superblock could be bind mounted and/or
> mounted in multiple private namespaces.
> 
> IOW: 'remount,retrans=0,timeo=1' would be slightly preferable to
> hacking up "umount -f" because it is reversible.
> 
> That said, there is no reason why we couldn't implement a single
> mount
> option that implements something akin to this "umount -f" behaviour
> (and that can be reversed by resetting it through a second
> 'remount').
> It seems to me that the requested behaviour is already pretty close
> to
> what we already implement with the RPC_TASK_SOFTCONN option.

I *think* my patch series pretty much does just that (not via
RPC_TASK_SOFTCONN). I might have broken the thread by posting with the
wrong parent... anyway you find can find them here: http://www.spinics.
net/lists/linux-nfs/msg66348.html. 

I added support for toggling the forcekill flag via remount, which I
think might be close to what you are looking for, e.g. if you mount
without the forcekill option, umount -f does what it always has. You
can then "-o remount,forcekill" and umount -f will cause all RPC tasks
to fail with -EIO. The number of patches has grown, so I currently have
this on a GitHub branch (including Neil's patch for testing): https://g
ithub.com/JPEWdev/linux/tree/sunrpc-kill, but I can send them to the
list if that would be helpful.

Using a changable mount option is fine, but it does mean that the
option again applies to all mounts from the same server (at least as
far as I can understand: I could be wrong and Neil could be right - see
above). Choosing the behavior at mount time (and not allowing it to
change after) has the advantage that you can choose which mounts it
applies to and which it doesn't. Although, I suppose if you really
wanted that behavior, you should be mounting with "nosharecache"
anyway?

> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com

Thanks,
Joshua Watt
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NeilBrown Nov. 10, 2017, 12:16 a.m. UTC | #6

On Thu, Nov 09 2017, Joshua Watt wrote:

> On Wed, 2017-11-08 at 23:52 +0000, Trond Myklebust wrote:
>> On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote:
>> > On Wed, Nov 08 2017, J. Bruce Fields wrote:
>> > 
>> > > 
>> > > I'd be nervous about making "umount -f" do it.  I think
>> > > administrators
>> > > could be unpleasantly surprised in some cases if an "umount -f"
>> > > affects
>> > > other mounts of the same server.
>> > 
>> > I was all set to tell you that it already does, but then tested and
>> > found it doesn't and ....
>
> I tried mounting two different remote paths from an NFS4 server, and
> when I did 'remount,retrans=0', it changed the parameter for both of
> them meaning they are indeed shaing the struct nfs_server (which based
> on my reading of the code is what I would have expected). What
> procedure did you use to test this?

I was using nosharecache, because I was thinking "of course it will
affect all mounts that use sharecache, that is really the same as
a bind-mount".  But I should have been explicit.

With nosharecache, "umount -f" or remount only affects the one mount.
With sharecache (the default) or bind mounts, "umount -f" and remount
affects the underlying superblock which might be mounted at multiple places.

NeilBrown

NFS Force Unmounting

Commit Message

Comments

Patch