Message ID | 87fu9ph2g7.fsf@notabene.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote: > What to people think of the following as an approach > to Joshua's need? > > It isn't complete by itself: it needs a couple of changes to > nfs-utils so that it doesn't stat the mountpoint on remount, > and it might need another kernel change so that the "mount" system > call performs the same sort of careful lookup for remount as the umount > system call does, but those are relatively small details. > Yeah, that'd be good. > This is the patch that you will either love of hate. > > With this patch, Joshua (or any other sysadmin) could: > > mount -o remount,retrans=0,timeo=1 /path > > and then new requests on any mountpoint from that server will timeout > quickly. > Then > umount -f /path > umount -f /path > > should kill off any existing requests that use the old timeout. (I just > tested and I did need this twice to kill of an "ls -l". > The first was an 'open' systemcall, then next as 'newfstat'. I wonder > why the getattr for fstat didn't use the new timeout...) > > Thoughts? > NeilBrown > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index c9d24bae3025..ced12fcec349 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -2210,27 +2210,39 @@ static int nfs_validate_text_mount_data(void *options, > ~(NFS_MOUNT_UNSHARED | NFS_MOUNT_NORESVPORT)) > > static int > -nfs_compare_remount_data(struct nfs_server *nfss, > - struct nfs_parsed_mount_data *data) > +nfs_compare_and_set_remount_data(struct nfs_server *nfss, > + struct nfs_parsed_mount_data *data) > { > if ((data->flags ^ nfss->flags) & NFS_REMOUNT_CMP_FLAGMASK || > data->rsize != nfss->rsize || > data->wsize != nfss->wsize || > data->version != nfss->nfs_client->rpc_ops->version || > data->minorversion != nfss->nfs_client->cl_minorversion || > - data->retrans != nfss->client->cl_timeout->to_retries || > !nfs_auth_info_match(&data->auth_info, nfss->client->cl_auth->au_flavor) || > data->acregmin != nfss->acregmin / HZ || > data->acregmax != nfss->acregmax / HZ || > data->acdirmin != nfss->acdirmin / HZ || > data->acdirmax != nfss->acdirmax / HZ || > - data->timeo != (10U * nfss->client->cl_timeout->to_initval / HZ) || > data->nfs_server.port != nfss->port || > data->nfs_server.addrlen != nfss->nfs_client->cl_addrlen || > !rpc_cmp_addr((struct sockaddr *)&data->nfs_server.address, > (struct sockaddr *)&nfss->nfs_client->cl_addr)) > return -EINVAL; > > + if (data->retrans != nfss->client->cl_timeout->to_retries || > + data->timeo != (10U * nfss->client->cl_timeout->to_initval / HZ)) { > + /* Note that this will affect all mounts from the same server, > + * that use the same protocol. The timeouts are always forced > + * to be the same. > + */ > + struct rpc_clnt *cl = nfss->client; > + if (cl->cl_timeout != &cl->cl_timeout_default) > + memcpy(&cl->cl_timeout_default, cl->cl_timeout, > + sizeof(struct rpc_timeout)); > + cl->cl_timeout_default.to_retries = data->retrans; > + cl->cl_timeout_default.to_initval = data->timeo * HZ / 10U; > + } > + No objection to allowing more mount options to be changed on remount, we really ought to allow a lot more options to be changed on remount if we can reasonably pull it off. > return 0; > } > > @@ -2244,7 +2256,8 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) > struct nfs4_mount_data *options4 = (struct nfs4_mount_data *)raw_data; > u32 nfsvers = nfss->nfs_client->rpc_ops->version; > > - sync_filesystem(sb); > + if (sb->s_readonly_remount) > + sync_filesystem(sb); > > /* > * Userspace mount programs that send binary options generally send > @@ -2295,7 +2308,7 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) > *flags |= MS_SYNCHRONOUS; > > /* compare new mount options with old ones */ > - error = nfs_compare_remount_data(nfss, data); > + error = nfs_compare_and_set_remount_data(nfss, data); > out: > kfree(data); > return error; Looks like a reasonable approach overall to preventing new RPCs from being dispatched once the "force" umount runs. I do wonder if this ought to be more automatic when you specify -f on the umount. Having to manually do a remount first doesn't seem very admin-friendly.
On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote: > On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote: > > What to people think of the following as an approach > > to Joshua's need? > > > > It isn't complete by itself: it needs a couple of changes to > > nfs-utils so that it doesn't stat the mountpoint on remount, > > and it might need another kernel change so that the "mount" system > > call performs the same sort of careful lookup for remount as the umount > > system call does, but those are relatively small details. > > > > Yeah, that'd be good. > > > This is the patch that you will either love of hate. > > > > With this patch, Joshua (or any other sysadmin) could: > > > > mount -o remount,retrans=0,timeo=1 /path > > > > and then new requests on any mountpoint from that server will timeout > > quickly. > > Then > > umount -f /path > > umount -f /path ... > Looks like a reasonable approach overall to preventing new RPCs from > being dispatched once the "force" umount runs. I've lost track of the discussion--after this patch, how close are we to a guaranteed force unmount? I assume there are still a few obstacles. > I do wonder if this ought to be more automatic when you specify -f on > the umount. Having to manually do a remount first doesn't seem very > admin-friendly. It's an odd interface. Maybe we could wrap it in something more intuitive. I'd be nervous about making "umount -f" do it. I think administrators could be unpleasantly surprised in some cases if an "umount -f" affects other mounts of the same server. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 08 2017, J. Bruce Fields wrote: > On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote: >> On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote: >> > What to people think of the following as an approach >> > to Joshua's need? >> > >> > It isn't complete by itself: it needs a couple of changes to >> > nfs-utils so that it doesn't stat the mountpoint on remount, >> > and it might need another kernel change so that the "mount" system >> > call performs the same sort of careful lookup for remount as the umount >> > system call does, but those are relatively small details. >> > >> >> Yeah, that'd be good. >> >> > This is the patch that you will either love of hate. >> > >> > With this patch, Joshua (or any other sysadmin) could: >> > >> > mount -o remount,retrans=0,timeo=1 /path >> > >> > and then new requests on any mountpoint from that server will timeout >> > quickly. >> > Then >> > umount -f /path >> > umount -f /path > ... >> Looks like a reasonable approach overall to preventing new RPCs from >> being dispatched once the "force" umount runs. > > I've lost track of the discussion--after this patch, how close are we to > a guaranteed force unmount? I assume there are still a few obstacles. This isn't really about forced unmount. The way forward to forced unmount it: - make all waits on NFS be TASK_KILLABLE - figure out what happens to dirty data when all processes have been killed. This is about allowing processes to be told that the filesystem is dead so that can respond (without blocking indefinitely) without necessarily being killed. With a local filesystem you can (in some cases) kill the underlying device and all processes will start getting EIO. This is providing similar functionality for NFS. > >> I do wonder if this ought to be more automatic when you specify -f on >> the umount. Having to manually do a remount first doesn't seem very >> admin-friendly. > > It's an odd interface. Maybe we could wrap it in something more > intuitive. > > I'd be nervous about making "umount -f" do it. I think administrators > could be unpleasantly surprised in some cases if an "umount -f" affects > other mounts of the same server. I was all set to tell you that it already does, but then tested and found it doesn't and .... struct nfs_server (which sb->s_fs_info points to) contains struct nfs_client * nfs_client; /* shared client and NFS4 state */ which is shared between different mounts from the same server, and struct rpc_clnt * client; /* RPC client handle */ which isn't shared. struct nfs_client contains struct rpc_clnt * cl_rpcclient; which server->client is clones from. The timeouts that apply to a mount are the ones from server->client, and so apply only to that mount (I thought they were shared, but that is a thought from years ago, and maybe it was wrong at the time). umount_begin aborts all rpcs associated with server->client. So the 'remount,retrans=0,timeo=1' that I propose would only affect the one superblock (all bind-mounts of course, included sharecache mounts). The comment in my code was wrong. Thanks, NeilBrown
On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote: > On Wed, Nov 08 2017, J. Bruce Fields wrote: > > > On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote: > > > On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote: > > > > What to people think of the following as an approach > > > > to Joshua's need? > > > > > > > > It isn't complete by itself: it needs a couple of changes to > > > > nfs-utils so that it doesn't stat the mountpoint on remount, > > > > and it might need another kernel change so that the "mount" > > > > system > > > > call performs the same sort of careful lookup for remount > > > > as the umount > > > > system call does, but those are relatively small details. > > > > > > > > > > Yeah, that'd be good. > > > > > > > This is the patch that you will either love of hate. > > > > > > > > With this patch, Joshua (or any other sysadmin) could: > > > > > > > > mount -o remount,retrans=0,timeo=1 /path > > > > > > > > and then new requests on any mountpoint from that server will > > > > timeout > > > > quickly. > > > > Then > > > > umount -f /path > > > > umount -f /path > > > > ... > > > Looks like a reasonable approach overall to preventing new RPCs > > > from > > > being dispatched once the "force" umount runs. > > > > I've lost track of the discussion--after this patch, how close are > > we to > > a guaranteed force unmount? I assume there are still a few > > obstacles. > > This isn't really about forced unmount. > The way forward to forced unmount it: > - make all waits on NFS be TASK_KILLABLE > - figure out what happens to dirty data when all processes have > been killed. > > This is about allowing processes to be told that the filesystem is > dead > so that can respond (without blocking indefinitely) without > necessarily being killed. > With a local filesystem you can (in some cases) kill the underlying > device and all processes will start getting EIO. This is providing > similar functionality for NFS. > > > > > > I do wonder if this ought to be more automatic when you specify > > > -f on > > > the umount. Having to manually do a remount first doesn't seem > > > very > > > admin-friendly. > > > > It's an odd interface. Maybe we could wrap it in something more > > intuitive. > > > > I'd be nervous about making "umount -f" do it. I think > > administrators > > could be unpleasantly surprised in some cases if an "umount -f" > > affects > > other mounts of the same server. > > I was all set to tell you that it already does, but then tested and > found it doesn't and .... > > struct nfs_server (which sb->s_fs_info points to) contains > > struct nfs_client * nfs_client; /* shared client > and NFS4 state */ > > which is shared between different mounts from the same server, and > > struct rpc_clnt * client; /* RPC client > handle */ > > which isn't shared. > struct nfs_client contains > struct rpc_clnt * cl_rpcclient; > > which server->client is clones from. > > The timeouts that apply to a mount are the ones from server->client, > and so apply only to that mount (I thought they were shared, but that > is > a thought from years ago, and maybe it was wrong at the time). > umount_begin aborts all rpcs associated with server->client. > > So the 'remount,retrans=0,timeo=1' that I propose would only affect > the > one superblock (all bind-mounts of course, included sharecache > mounts). > > The comment in my code was wrong. > I by far prefer an operation that changes the superblock state to be done using 'mount -o remount'. The idea of a 'umount -f' that makes the superblock irreversibly change state is clearly not desirable in an environment where the same superblock could be bind mounted and/or mounted in multiple private namespaces. IOW: 'remount,retrans=0,timeo=1' would be slightly preferable to hacking up "umount -f" because it is reversible. That said, there is no reason why we couldn't implement a single mount option that implements something akin to this "umount -f" behaviour (and that can be reversed by resetting it through a second 'remount'). It seems to me that the requested behaviour is already pretty close to what we already implement with the RPC_TASK_SOFTCONN option. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com
On Wed, 2017-11-08 at 23:52 +0000, Trond Myklebust wrote: > On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote: > > On Wed, Nov 08 2017, J. Bruce Fields wrote: > > > > > On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote: > > > > On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote: > > > > > What to people think of the following as an approach > > > > > to Joshua's need? > > > > > > > > > > It isn't complete by itself: it needs a couple of changes to > > > > > nfs-utils so that it doesn't stat the mountpoint on remount, > > > > > and it might need another kernel change so that the "mount" > > > > > system > > > > > call performs the same sort of careful lookup for remount > > > > > as the umount > > > > > system call does, but those are relatively small details. > > > > > > > > > > > > > Yeah, that'd be good. > > > > > > > > > This is the patch that you will either love of hate. > > > > > > > > > > With this patch, Joshua (or any other sysadmin) could: > > > > > > > > > > mount -o remount,retrans=0,timeo=1 /path > > > > > > > > > > and then new requests on any mountpoint from that server will > > > > > timeout > > > > > quickly. > > > > > Then > > > > > umount -f /path > > > > > umount -f /path > > > > > > ... > > > > Looks like a reasonable approach overall to preventing new RPCs > > > > from > > > > being dispatched once the "force" umount runs. > > > > > > I've lost track of the discussion--after this patch, how close > > > are > > > we to > > > a guaranteed force unmount? I assume there are still a few > > > obstacles. > > > > This isn't really about forced unmount. > > The way forward to forced unmount it: > > - make all waits on NFS be TASK_KILLABLE > > - figure out what happens to dirty data when all processes have > > been killed. > > > > This is about allowing processes to be told that the filesystem is > > dead > > so that can respond (without blocking indefinitely) without > > necessarily being killed. > > With a local filesystem you can (in some cases) kill the underlying > > device and all processes will start getting EIO. This is providing > > similar functionality for NFS. > > > > > > > > > I do wonder if this ought to be more automatic when you specify > > > > -f on > > > > the umount. Having to manually do a remount first doesn't seem > > > > very > > > > admin-friendly. > > > > > > It's an odd interface. Maybe we could wrap it in something more > > > intuitive. > > > > > > I'd be nervous about making "umount -f" do it. I think > > > administrators > > > could be unpleasantly surprised in some cases if an "umount -f" > > > affects > > > other mounts of the same server. > > > > I was all set to tell you that it already does, but then tested and > > found it doesn't and .... I tried mounting two different remote paths from an NFS4 server, and when I did 'remount,retrans=0', it changed the parameter for both of them meaning they are indeed shaing the struct nfs_server (which based on my reading of the code is what I would have expected). What procedure did you use to test this? > > > > struct nfs_server (which sb->s_fs_info points to) contains > > > > struct nfs_client * nfs_client; /* shared client > > and NFS4 state */ > > > > which is shared between different mounts from the same server, and > > > > struct rpc_clnt * client; /* RPC client > > handle */ > > > > which isn't shared. > > struct nfs_client contains > > struct rpc_clnt * cl_rpcclient; > > > > which server->client is clones from. > > > > The timeouts that apply to a mount are the ones from server- > > >client, > > and so apply only to that mount (I thought they were shared, but > > that > > is > > a thought from years ago, and maybe it was wrong at the time). > > umount_begin aborts all rpcs associated with server->client. > > > > So the 'remount,retrans=0,timeo=1' that I propose would only affect > > the > > one superblock (all bind-mounts of course, included sharecache > > mounts). > > > > The comment in my code was wrong. > > > > I by far prefer an operation that changes the superblock state to be > done using 'mount -o remount'. The idea of a 'umount -f' that makes > the > superblock irreversibly change state is clearly not desirable in an > environment where the same superblock could be bind mounted and/or > mounted in multiple private namespaces. > > IOW: 'remount,retrans=0,timeo=1' would be slightly preferable to > hacking up "umount -f" because it is reversible. > > That said, there is no reason why we couldn't implement a single > mount > option that implements something akin to this "umount -f" behaviour > (and that can be reversed by resetting it through a second > 'remount'). > It seems to me that the requested behaviour is already pretty close > to > what we already implement with the RPC_TASK_SOFTCONN option. I *think* my patch series pretty much does just that (not via RPC_TASK_SOFTCONN). I might have broken the thread by posting with the wrong parent... anyway you find can find them here: http://www.spinics. net/lists/linux-nfs/msg66348.html. I added support for toggling the forcekill flag via remount, which I think might be close to what you are looking for, e.g. if you mount without the forcekill option, umount -f does what it always has. You can then "-o remount,forcekill" and umount -f will cause all RPC tasks to fail with -EIO. The number of patches has grown, so I currently have this on a GitHub branch (including Neil's patch for testing): https://g ithub.com/JPEWdev/linux/tree/sunrpc-kill, but I can send them to the list if that would be helpful. Using a changable mount option is fine, but it does mean that the option again applies to all mounts from the same server (at least as far as I can understand: I could be wrong and Neil could be right - see above). Choosing the behavior at mount time (and not allowing it to change after) has the advantage that you can choose which mounts it applies to and which it doesn't. Although, I suppose if you really wanted that behavior, you should be mounting with "nosharecache" anyway? > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com Thanks, Joshua Watt -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 09 2017, Joshua Watt wrote: > On Wed, 2017-11-08 at 23:52 +0000, Trond Myklebust wrote: >> On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote: >> > On Wed, Nov 08 2017, J. Bruce Fields wrote: >> > >> > > >> > > I'd be nervous about making "umount -f" do it. I think >> > > administrators >> > > could be unpleasantly surprised in some cases if an "umount -f" >> > > affects >> > > other mounts of the same server. >> > >> > I was all set to tell you that it already does, but then tested and >> > found it doesn't and .... > > I tried mounting two different remote paths from an NFS4 server, and > when I did 'remount,retrans=0', it changed the parameter for both of > them meaning they are indeed shaing the struct nfs_server (which based > on my reading of the code is what I would have expected). What > procedure did you use to test this? I was using nosharecache, because I was thinking "of course it will affect all mounts that use sharecache, that is really the same as a bind-mount". But I should have been explicit. With nosharecache, "umount -f" or remount only affects the one mount. With sharecache (the default) or bind mounts, "umount -f" and remount affects the underlying superblock which might be mounted at multiple places. NeilBrown
diff --git a/fs/nfs/super.c b/fs/nfs/super.c index c9d24bae3025..ced12fcec349 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2210,27 +2210,39 @@ static int nfs_validate_text_mount_data(void *options, ~(NFS_MOUNT_UNSHARED | NFS_MOUNT_NORESVPORT)) static int -nfs_compare_remount_data(struct nfs_server *nfss, - struct nfs_parsed_mount_data *data) +nfs_compare_and_set_remount_data(struct nfs_server *nfss, + struct nfs_parsed_mount_data *data) { if ((data->flags ^ nfss->flags) & NFS_REMOUNT_CMP_FLAGMASK || data->rsize != nfss->rsize || data->wsize != nfss->wsize || data->version != nfss->nfs_client->rpc_ops->version || data->minorversion != nfss->nfs_client->cl_minorversion || - data->retrans != nfss->client->cl_timeout->to_retries || !nfs_auth_info_match(&data->auth_info, nfss->client->cl_auth->au_flavor) || data->acregmin != nfss->acregmin / HZ || data->acregmax != nfss->acregmax / HZ || data->acdirmin != nfss->acdirmin / HZ || data->acdirmax != nfss->acdirmax / HZ || - data->timeo != (10U * nfss->client->cl_timeout->to_initval / HZ) || data->nfs_server.port != nfss->port || data->nfs_server.addrlen != nfss->nfs_client->cl_addrlen || !rpc_cmp_addr((struct sockaddr *)&data->nfs_server.address, (struct sockaddr *)&nfss->nfs_client->cl_addr)) return -EINVAL; + if (data->retrans != nfss->client->cl_timeout->to_retries || + data->timeo != (10U * nfss->client->cl_timeout->to_initval / HZ)) { + /* Note that this will affect all mounts from the same server, + * that use the same protocol. The timeouts are always forced + * to be the same. + */ + struct rpc_clnt *cl = nfss->client; + if (cl->cl_timeout != &cl->cl_timeout_default) + memcpy(&cl->cl_timeout_default, cl->cl_timeout, + sizeof(struct rpc_timeout)); + cl->cl_timeout_default.to_retries = data->retrans; + cl->cl_timeout_default.to_initval = data->timeo * HZ / 10U; + } + return 0; } @@ -2244,7 +2256,8 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) struct nfs4_mount_data *options4 = (struct nfs4_mount_data *)raw_data; u32 nfsvers = nfss->nfs_client->rpc_ops->version; - sync_filesystem(sb); + if (sb->s_readonly_remount) + sync_filesystem(sb); /* * Userspace mount programs that send binary options generally send @@ -2295,7 +2308,7 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data) *flags |= MS_SYNCHRONOUS; /* compare new mount options with old ones */ - error = nfs_compare_remount_data(nfss, data); + error = nfs_compare_and_set_remount_data(nfss, data); out: kfree(data); return error;