diff mbox series

[RFC-PATCH] nfsd: provide a procfs entry to release stateids of a particular local filesystem

Message ID 1567518908-1720-1-git-send-email-alex@zadara.com (mailing list archive)
State New, archived
Headers show
Series [RFC-PATCH] nfsd: provide a procfs entry to release stateids of a particular local filesystem | expand

Commit Message

Alex Lyakas Sept. 3, 2019, 1:55 p.m. UTC
This patch addresses the following issue:
- Create two local file systems FS1 and FS2 on the server machine S.
- Export both FS1 and FS2 through nfsd to the same nfs client, running on client machine C.
- On C, mount both exported file systems and start writing files to both of them.
- After few minutes, on server machine S, un-export FS1 only.
- Do not unmount FS1 on the client machine C prior to un-exporting.
- Also, FS2 remains exported to C.
- Now we want to unmount FS1 on the server machine S, but we fail, because there are still open files on FS1 held by nfsd.

Debugging this issue showed the following root cause: there is a nfs4_client entry for the client C.
This entry has two nfs4_openowners, for FS1 and FS2, although FS1 was un-exported.
Looking at the stateids of both openowners, we see that they have stateids of kind NFS4_OPEN_STID,
and each stateid is holding a nfs4_file. The reason we cannot unmount FS1, is because we still have
an openowner for FS1, holding open-stateids, which hold open files on FS1.

The laundromat doesn't help in this case, because it can only decide per-nfs4_client that it should be purged.
But in this case, since FS2 is still exported to C, there is no reason to purge the nfs4_client.

This situation remains until we un-export FS2 as well.
Then the whole nfs4_client is purged, and all the files get closed, and we can unmount both FS1 and FS2.

This patch allows user-space to tell nfsd to release stateids of a particular local filesystem.
After that, it is possible to unmount the local filesystem.

This patch is based on kernel 4.14.99, which we currently use.
That's why marking it as RFC.

Signed-off-by: Alex Lyakas <alex@zadara.com>
---
 fs/nfsd/nfs4state.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfsd/nfsctl.c    |  46 ++++++++++++++++++++++
 fs/nfsd/state.h     |   2 +
 3 files changed, 154 insertions(+), 1 deletion(-)

Comments

J. Bruce Fields Sept. 6, 2019, 4:12 p.m. UTC | #1
On Tue, Sep 03, 2019 at 04:55:08PM +0300, Alex Lyakas wrote:
> This patch addresses the following issue:

Thanks for the patch and the good explanation!

I'd rather we just call nfs4_release_stateids() from write_unlock_fs().

That modifies the behavior of the existing unlock_filesystem interface,
so, yes, adding a new file would be the more conservative approach.

But really I think that the only use of unlock_filesystem is to allow
unmounting, and the fact that it only handles NLM locks is just a bug
that we should fix.

You'll want to cover delegations as well.  And probably pNFS layouts.
It'd be OK to do that incrementally in followup patches.

Style nits:

I assume all the print statements are just for temporary debugging.

Try to keep lines to 80 characters.  I'd break the inner loop of
nfs4_release_stateids() into a separate nfs4_release_client_stateids(sb,
clp).

--b.

> - Create two local file systems FS1 and FS2 on the server machine S.
> - Export both FS1 and FS2 through nfsd to the same nfs client, running on client machine C.
> - On C, mount both exported file systems and start writing files to both of them.
> - After few minutes, on server machine S, un-export FS1 only.
> - Do not unmount FS1 on the client machine C prior to un-exporting.
> - Also, FS2 remains exported to C.
> - Now we want to unmount FS1 on the server machine S, but we fail, because there are still open files on FS1 held by nfsd.
> 
> Debugging this issue showed the following root cause: there is a nfs4_client entry for the client C.
> This entry has two nfs4_openowners, for FS1 and FS2, although FS1 was un-exported.
> Looking at the stateids of both openowners, we see that they have stateids of kind NFS4_OPEN_STID,
> and each stateid is holding a nfs4_file. The reason we cannot unmount FS1, is because we still have
> an openowner for FS1, holding open-stateids, which hold open files on FS1.
> 
> The laundromat doesn't help in this case, because it can only decide per-nfs4_client that it should be purged.
> But in this case, since FS2 is still exported to C, there is no reason to purge the nfs4_client.
> 
> This situation remains until we un-export FS2 as well.
> Then the whole nfs4_client is purged, and all the files get closed, and we can unmount both FS1 and FS2.
> 
> This patch allows user-space to tell nfsd to release stateids of a particular local filesystem.
> After that, it is possible to unmount the local filesystem.
> 
> This patch is based on kernel 4.14.99, which we currently use.
> That's why marking it as RFC.
> 
> Signed-off-by: Alex Lyakas <alex@zadara.com>
> ---
>  fs/nfsd/nfs4state.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/nfsd/nfsctl.c    |  46 ++++++++++++++++++++++
>  fs/nfsd/state.h     |   2 +
>  3 files changed, 154 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 3cf0b2e..4081753 100755
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6481,13 +6481,13 @@ struct nfs4_client_reclaim *
>  	return nfs_ok;
>  }
>  
> -#ifdef CONFIG_NFSD_FAULT_INJECTION
>  static inline void
>  put_client(struct nfs4_client *clp)
>  {
>  	atomic_dec(&clp->cl_refcount);
>  }
>  
> +#ifdef CONFIG_NFSD_FAULT_INJECTION
>  static struct nfs4_client *
>  nfsd_find_client(struct sockaddr_storage *addr, size_t addr_size)
>  {
> @@ -6811,6 +6811,7 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
>  
>  	return count;
>  }
> +#endif /* CONFIG_NFSD_FAULT_INJECTION */
>  
>  static void
>  nfsd_reap_openowners(struct list_head *reaplist)
> @@ -6826,6 +6827,7 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
>  	}
>  }
>  
> +#ifdef CONFIG_NFSD_FAULT_INJECTION
>  u64
>  nfsd_inject_forget_client_openowners(struct sockaddr_storage *addr,
>  				     size_t addr_size)
> @@ -7072,6 +7074,109 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
>  #endif /* CONFIG_NFSD_FAULT_INJECTION */
>  
>  /*
> + * Attempts to release the stateids that have open files on the specified superblock.
> + */
> +void
> +nfs4_release_stateids(struct super_block * sb)
> +{
> +	struct nfsd_net *nn = net_generic(current->nsproxy->net_ns, nfsd_net_id);
> +	struct nfs4_client *clp = NULL;
> +	struct nfs4_openowner *oo = NULL, *oo_next = NULL;
> +	LIST_HEAD(openowner_reaplist);
> +	unsigned int n_openowners = 0;
> +
> +	if (!nfsd_netns_ready(nn))
> +		return;
> +
> +	pr_info("=== Release stateids for sb=%p ===\n", sb);
> +
> +	spin_lock(&nn->client_lock);
> +	list_for_each_entry(clp, &nn->client_lru, cl_lru) {
> +		char cl_addr[INET6_ADDRSTRLEN] = {'\0'};
> +
> +		rpc_ntop((struct sockaddr*)&clp->cl_addr, cl_addr, sizeof(cl_addr));
> +		pr_debug("Looking at client=%p/%s cl_clientid=%u:%u refcnt=%d\n",
> +			     clp, cl_addr, clp->cl_clientid.cl_boot, clp->cl_clientid.cl_id,
> +			     atomic_read(&clp->cl_refcount));
> +
> +		spin_lock(&clp->cl_lock);
> +		list_for_each_entry_safe(oo, oo_next, &clp->cl_openowners, oo_perclient) {
> +			struct nfs4_ol_stateid *stp = NULL;
> +			bool found_my_sb = false, found_other_sb = false;
> +			struct super_block *other_sb = NULL;
> +
> +			pr_debug(" Openowner %p %.*s\n", oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data);
> +			pr_debug(" oo_close_lru=%s oo_last_closed_stid=%p refcnt=%d so_is_open_owner=%u\n",
> +				     list_empty(&oo->oo_close_lru) ? "N" : "Y", oo->oo_last_closed_stid,
> +				     atomic_read(&oo->oo_owner.so_count), oo->oo_owner.so_is_open_owner);
> +
> +			list_for_each_entry(stp, &oo->oo_owner.so_stateids, st_perstateowner) {
> +				struct nfs4_file *fp = NULL;
> +				struct file *filp = NULL;
> +				struct super_block *f_sb = NULL;
> +				if (stp->st_stid.sc_file == NULL)
> +					continue;
> +
> +				fp = stp->st_stid.sc_file;
> +				filp = find_any_file(fp);
> +				if (filp != NULL)
> +					f_sb = file_inode(filp)->i_sb;
> +				pr_debug("   filp=%p sb=%p my_sb=%p\n", filp, f_sb, sb);
> +				if (f_sb == sb) {
> +					found_my_sb = true;
> +				} else {
> +					found_other_sb = true;
> +					other_sb = f_sb;
> +				}
> +				if (filp != NULL)
> +					fput(filp);
> +			}
> +
> +			/* openowner does not have files from needed fs, skip it */
> +			if (!found_my_sb)
> +				continue;
> +
> +			/*
> +			 * we do not expect same openowhner having open files from more than one fs.
> +			 * but if it happens, we cannot release this openowner.
> +			 */
> +			if (found_other_sb) {
> +				pr_warn(" client=%p/%s openowner %p %.*s has files from sb=%p but also from sb=%p, skipping it!\n",
> +					    clp, cl_addr, oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data, sb, other_sb);
> +				continue;
> +			}
> +
> +			/*
> +			 * Each OPEN stateid holds a refcnt on the openowner (and LOCK stateid holds a refcnt on the lockowner).
> +			 * This refcnt is dropped when nfs4_free_ol_stateid is called, which calls nfs4_put_stateowner.
> +			 * The last refcnt drop, unhashes and frees the openowner.
> +			 * As a result, after we free the last stateid, the openowner will be also be freed.
> +			 * But we still need the openowner to be around, because we need to call release_last_closed_stateid(),
> +			 * which is what release_openowner() does (we are doing equivalent of that).
> +			 * So we need to grab an extra refcnt for the openowner here.
> +			 */
> +			nfs4_get_stateowner(&oo->oo_owner);
> +
> +			/* see: nfsd_collect_client_openowners(), nfsd_foreach_client_openowner() */
> +			unhash_openowner_locked(oo);
> +			/*
> +			 * By incrementing cl_refcount under "nn->client_lock" we, hopefully, protect that client from being killed via mark_client_expired_locked().
> +			 * We increment cl_refcount once per each openowner.
> +			 */
> +			atomic_inc(&clp->cl_refcount);
> +			list_add(&oo->oo_perclient, &openowner_reaplist);
> +			++n_openowners;
> +		}
> +		spin_unlock(&clp->cl_lock);
> +	}
> +	spin_unlock(&nn->client_lock);
> +
> +	pr_info("Collected %u openowners for removal (sb=%p)\n", n_openowners, sb);
> +
> +	nfsd_reap_openowners(&openowner_reaplist);
> +}
> +
> +/*
>   * Since the lifetime of a delegation isn't limited to that of an open, a
>   * client may quite reasonably hang on to a delegation as long as it has
>   * the inode cached.  This becomes an obvious problem the first time a
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 4824363..8b38186 100755
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -37,6 +37,7 @@ enum {
>  	NFSD_Fh,
>  	NFSD_FO_UnlockIP,
>  	NFSD_FO_UnlockFS,
> +	NFSD_FO_ReleaseStateIds,
>  	NFSD_Threads,
>  	NFSD_Pool_Threads,
>  	NFSD_Pool_Stats,
> @@ -64,6 +65,7 @@ enum {
>  static ssize_t write_filehandle(struct file *file, char *buf, size_t size);
>  static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size);
>  static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size);
> +static ssize_t write_release_stateids(struct file *file, char *buf, size_t size);
>  static ssize_t write_threads(struct file *file, char *buf, size_t size);
>  static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
>  static ssize_t write_versions(struct file *file, char *buf, size_t size);
> @@ -81,6 +83,7 @@ static ssize_t (*write_op[])(struct file *, char *, size_t) = {
>  	[NFSD_Fh] = write_filehandle,
>  	[NFSD_FO_UnlockIP] = write_unlock_ip,
>  	[NFSD_FO_UnlockFS] = write_unlock_fs,
> +	[NFSD_FO_ReleaseStateIds] = write_release_stateids,
>  	[NFSD_Threads] = write_threads,
>  	[NFSD_Pool_Threads] = write_pool_threads,
>  	[NFSD_Versions] = write_versions,
> @@ -328,6 +331,47 @@ static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size)
>  }
>  
>  /**
> + * write_release_stateids - Release stateids of a local file system
> + *
> + * Experimental.
> + *
> + * Input:
> + *			buf:	'\n'-terminated C string containing the
> + *				absolute pathname of a local file system
> + *			size:	length of C string in @buf
> + * Output:
> + *	On success:	returns zero if all openowners were released
> + *	On error:	return code is negative errno value
> + */
> +static ssize_t write_release_stateids(struct file *file, char *buf, size_t size)
> +{
> +	struct path path;
> +	char *fo_path = NULL;
> +	int error = 0;
> +
> +	/* sanity check */
> +	if (size == 0)
> +		return -EINVAL;
> +
> +	if (buf[size-1] != '\n')
> +		return -EINVAL;
> +
> +	fo_path = buf;
> +	error = qword_get(&buf, fo_path, size);
> +	if (error < 0)
> +		return -EINVAL;
> +
> +	error = kern_path(fo_path, 0, &path);
> +	if (error)
> +		return error;
> +
> +	nfs4_release_stateids(path.dentry->d_sb);
> +
> +	path_put(&path);
> +	return 0;
> +}
> +
> +/**
>   * write_filehandle - Get a variable-length NFS file handle by path
>   *
>   * On input, the buffer contains a '\n'-terminated C string comprised of
> @@ -1167,6 +1211,8 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
>  					&transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_FO_UnlockFS] = {"unlock_filesystem",
>  					&transaction_ops, S_IWUSR|S_IRUSR},
> +		[NFSD_FO_ReleaseStateIds] = {"release_stateids",
> +					&transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 86aa92d..acee094 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -632,6 +632,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
>  							struct nfsd_net *nn);
>  extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
>  
> +extern void nfs4_release_stateids(struct super_block *sb);
> +
>  struct nfs4_file *find_file(struct knfsd_fh *fh);
>  void put_nfs4_file(struct nfs4_file *fi);
>  static inline void get_nfs4_file(struct nfs4_file *fi)
> -- 
> 1.9.1
Alex Lyakas Sept. 10, 2019, 7 p.m. UTC | #2
Hi Bruce,

Thanks for reviewing the patch.

I addressed your comments, and ran the patch through checkpatch.pl.
Patch v2 is on its way.

On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Tue, Sep 03, 2019 at 04:55:08PM +0300, Alex Lyakas wrote:
> > This patch addresses the following issue:
>
> Thanks for the patch and the good explanation!
>
> I'd rather we just call nfs4_release_stateids() from write_unlock_fs().
>
> That modifies the behavior of the existing unlock_filesystem interface,
> so, yes, adding a new file would be the more conservative approach.
>
> But really I think that the only use of unlock_filesystem is to allow
> unmounting, and the fact that it only handles NLM locks is just a bug
> that we should fix.
>
> You'll want to cover delegations as well.  And probably pNFS layouts.
> It'd be OK to do that incrementally in followup patches.
Unfortunately, I don't have much understanding of what these are, and
how to cover them)
>
> Style nits:
>
> I assume all the print statements are just for temporary debugging.
I left only two prints with pr_info, which give some indication that
the operation has started, and whether it did anything useful. Rest of
the prints are with pr_debug.
>
> Try to keep lines to 80 characters.  I'd break the inner loop of
> nfs4_release_stateids() into a separate nfs4_release_client_stateids(sb,
> clp).
>
> --b.
>
> > - Create two local file systems FS1 and FS2 on the server machine S.
> > - Export both FS1 and FS2 through nfsd to the same nfs client, running on client machine C.
> > - On C, mount both exported file systems and start writing files to both of them.
> > - After few minutes, on server machine S, un-export FS1 only.
> > - Do not unmount FS1 on the client machine C prior to un-exporting.
> > - Also, FS2 remains exported to C.
> > - Now we want to unmount FS1 on the server machine S, but we fail, because there are still open files on FS1 held by nfsd.
> >
> > Debugging this issue showed the following root cause: there is a nfs4_client entry for the client C.
> > This entry has two nfs4_openowners, for FS1 and FS2, although FS1 was un-exported.
> > Looking at the stateids of both openowners, we see that they have stateids of kind NFS4_OPEN_STID,
> > and each stateid is holding a nfs4_file. The reason we cannot unmount FS1, is because we still have
> > an openowner for FS1, holding open-stateids, which hold open files on FS1.
> >
> > The laundromat doesn't help in this case, because it can only decide per-nfs4_client that it should be purged.
> > But in this case, since FS2 is still exported to C, there is no reason to purge the nfs4_client.
> >
> > This situation remains until we un-export FS2 as well.
> > Then the whole nfs4_client is purged, and all the files get closed, and we can unmount both FS1 and FS2.
> >
> > This patch allows user-space to tell nfsd to release stateids of a particular local filesystem.
> > After that, it is possible to unmount the local filesystem.
> >
> > This patch is based on kernel 4.14.99, which we currently use.
> > That's why marking it as RFC.
> >
> > Signed-off-by: Alex Lyakas <alex@zadara.com>
> > ---
> >  fs/nfsd/nfs4state.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  fs/nfsd/nfsctl.c    |  46 ++++++++++++++++++++++
> >  fs/nfsd/state.h     |   2 +
> >  3 files changed, 154 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 3cf0b2e..4081753 100755
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -6481,13 +6481,13 @@ struct nfs4_client_reclaim *
> >       return nfs_ok;
> >  }
> >
> > -#ifdef CONFIG_NFSD_FAULT_INJECTION
> >  static inline void
> >  put_client(struct nfs4_client *clp)
> >  {
> >       atomic_dec(&clp->cl_refcount);
> >  }
> >
> > +#ifdef CONFIG_NFSD_FAULT_INJECTION
> >  static struct nfs4_client *
> >  nfsd_find_client(struct sockaddr_storage *addr, size_t addr_size)
> >  {
> > @@ -6811,6 +6811,7 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
> >
> >       return count;
> >  }
> > +#endif /* CONFIG_NFSD_FAULT_INJECTION */
> >
> >  static void
> >  nfsd_reap_openowners(struct list_head *reaplist)
> > @@ -6826,6 +6827,7 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
> >       }
> >  }
> >
> > +#ifdef CONFIG_NFSD_FAULT_INJECTION
> >  u64
> >  nfsd_inject_forget_client_openowners(struct sockaddr_storage *addr,
> >                                    size_t addr_size)
> > @@ -7072,6 +7074,109 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
> >  #endif /* CONFIG_NFSD_FAULT_INJECTION */
> >
> >  /*
> > + * Attempts to release the stateids that have open files on the specified superblock.
> > + */
> > +void
> > +nfs4_release_stateids(struct super_block * sb)
> > +{
> > +     struct nfsd_net *nn = net_generic(current->nsproxy->net_ns, nfsd_net_id);
> > +     struct nfs4_client *clp = NULL;
> > +     struct nfs4_openowner *oo = NULL, *oo_next = NULL;
> > +     LIST_HEAD(openowner_reaplist);
> > +     unsigned int n_openowners = 0;
> > +
> > +     if (!nfsd_netns_ready(nn))
> > +             return;
> > +
> > +     pr_info("=== Release stateids for sb=%p ===\n", sb);
> > +
> > +     spin_lock(&nn->client_lock);
> > +     list_for_each_entry(clp, &nn->client_lru, cl_lru) {
> > +             char cl_addr[INET6_ADDRSTRLEN] = {'\0'};
> > +
> > +             rpc_ntop((struct sockaddr*)&clp->cl_addr, cl_addr, sizeof(cl_addr));
> > +             pr_debug("Looking at client=%p/%s cl_clientid=%u:%u refcnt=%d\n",
> > +                          clp, cl_addr, clp->cl_clientid.cl_boot, clp->cl_clientid.cl_id,
> > +                          atomic_read(&clp->cl_refcount));
> > +
> > +             spin_lock(&clp->cl_lock);
> > +             list_for_each_entry_safe(oo, oo_next, &clp->cl_openowners, oo_perclient) {
> > +                     struct nfs4_ol_stateid *stp = NULL;
> > +                     bool found_my_sb = false, found_other_sb = false;
> > +                     struct super_block *other_sb = NULL;
> > +
> > +                     pr_debug(" Openowner %p %.*s\n", oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data);
> > +                     pr_debug(" oo_close_lru=%s oo_last_closed_stid=%p refcnt=%d so_is_open_owner=%u\n",
> > +                                  list_empty(&oo->oo_close_lru) ? "N" : "Y", oo->oo_last_closed_stid,
> > +                                  atomic_read(&oo->oo_owner.so_count), oo->oo_owner.so_is_open_owner);
> > +
> > +                     list_for_each_entry(stp, &oo->oo_owner.so_stateids, st_perstateowner) {
> > +                             struct nfs4_file *fp = NULL;
> > +                             struct file *filp = NULL;
> > +                             struct super_block *f_sb = NULL;
> > +                             if (stp->st_stid.sc_file == NULL)
> > +                                     continue;
> > +
> > +                             fp = stp->st_stid.sc_file;
> > +                             filp = find_any_file(fp);
> > +                             if (filp != NULL)
> > +                                     f_sb = file_inode(filp)->i_sb;
> > +                             pr_debug("   filp=%p sb=%p my_sb=%p\n", filp, f_sb, sb);
> > +                             if (f_sb == sb) {
> > +                                     found_my_sb = true;
> > +                             } else {
> > +                                     found_other_sb = true;
> > +                                     other_sb = f_sb;
> > +                             }
> > +                             if (filp != NULL)
> > +                                     fput(filp);
> > +                     }
> > +
> > +                     /* openowner does not have files from needed fs, skip it */
> > +                     if (!found_my_sb)
> > +                             continue;
> > +
> > +                     /*
> > +                      * we do not expect same openowhner having open files from more than one fs.
> > +                      * but if it happens, we cannot release this openowner.
> > +                      */
> > +                     if (found_other_sb) {
> > +                             pr_warn(" client=%p/%s openowner %p %.*s has files from sb=%p but also from sb=%p, skipping it!\n",
> > +                                         clp, cl_addr, oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data, sb, other_sb);
> > +                             continue;
> > +                     }
> > +
> > +                     /*
> > +                      * Each OPEN stateid holds a refcnt on the openowner (and LOCK stateid holds a refcnt on the lockowner).
> > +                      * This refcnt is dropped when nfs4_free_ol_stateid is called, which calls nfs4_put_stateowner.
> > +                      * The last refcnt drop, unhashes and frees the openowner.
> > +                      * As a result, after we free the last stateid, the openowner will be also be freed.
> > +                      * But we still need the openowner to be around, because we need to call release_last_closed_stateid(),
> > +                      * which is what release_openowner() does (we are doing equivalent of that).
> > +                      * So we need to grab an extra refcnt for the openowner here.
> > +                      */
> > +                     nfs4_get_stateowner(&oo->oo_owner);
> > +
> > +                     /* see: nfsd_collect_client_openowners(), nfsd_foreach_client_openowner() */
> > +                     unhash_openowner_locked(oo);
> > +                     /*
> > +                      * By incrementing cl_refcount under "nn->client_lock" we, hopefully, protect that client from being killed via mark_client_expired_locked().
> > +                      * We increment cl_refcount once per each openowner.
> > +                      */
> > +                     atomic_inc(&clp->cl_refcount);
> > +                     list_add(&oo->oo_perclient, &openowner_reaplist);
> > +                     ++n_openowners;
> > +             }
> > +             spin_unlock(&clp->cl_lock);
> > +     }
> > +     spin_unlock(&nn->client_lock);
> > +
> > +     pr_info("Collected %u openowners for removal (sb=%p)\n", n_openowners, sb);
> > +
> > +     nfsd_reap_openowners(&openowner_reaplist);
> > +}
> > +
> > +/*
> >   * Since the lifetime of a delegation isn't limited to that of an open, a
> >   * client may quite reasonably hang on to a delegation as long as it has
> >   * the inode cached.  This becomes an obvious problem the first time a
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 4824363..8b38186 100755
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -37,6 +37,7 @@ enum {
> >       NFSD_Fh,
> >       NFSD_FO_UnlockIP,
> >       NFSD_FO_UnlockFS,
> > +     NFSD_FO_ReleaseStateIds,
> >       NFSD_Threads,
> >       NFSD_Pool_Threads,
> >       NFSD_Pool_Stats,
> > @@ -64,6 +65,7 @@ enum {
> >  static ssize_t write_filehandle(struct file *file, char *buf, size_t size);
> >  static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size);
> >  static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size);
> > +static ssize_t write_release_stateids(struct file *file, char *buf, size_t size);
> >  static ssize_t write_threads(struct file *file, char *buf, size_t size);
> >  static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
> >  static ssize_t write_versions(struct file *file, char *buf, size_t size);
> > @@ -81,6 +83,7 @@ static ssize_t (*write_op[])(struct file *, char *, size_t) = {
> >       [NFSD_Fh] = write_filehandle,
> >       [NFSD_FO_UnlockIP] = write_unlock_ip,
> >       [NFSD_FO_UnlockFS] = write_unlock_fs,
> > +     [NFSD_FO_ReleaseStateIds] = write_release_stateids,
> >       [NFSD_Threads] = write_threads,
> >       [NFSD_Pool_Threads] = write_pool_threads,
> >       [NFSD_Versions] = write_versions,
> > @@ -328,6 +331,47 @@ static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size)
> >  }
> >
> >  /**
> > + * write_release_stateids - Release stateids of a local file system
> > + *
> > + * Experimental.
> > + *
> > + * Input:
> > + *                   buf:    '\n'-terminated C string containing the
> > + *                           absolute pathname of a local file system
> > + *                   size:   length of C string in @buf
> > + * Output:
> > + *   On success:     returns zero if all openowners were released
> > + *   On error:       return code is negative errno value
> > + */
> > +static ssize_t write_release_stateids(struct file *file, char *buf, size_t size)
> > +{
> > +     struct path path;
> > +     char *fo_path = NULL;
> > +     int error = 0;
> > +
> > +     /* sanity check */
> > +     if (size == 0)
> > +             return -EINVAL;
> > +
> > +     if (buf[size-1] != '\n')
> > +             return -EINVAL;
> > +
> > +     fo_path = buf;
> > +     error = qword_get(&buf, fo_path, size);
> > +     if (error < 0)
> > +             return -EINVAL;
> > +
> > +     error = kern_path(fo_path, 0, &path);
> > +     if (error)
> > +             return error;
> > +
> > +     nfs4_release_stateids(path.dentry->d_sb);
> > +
> > +     path_put(&path);
> > +     return 0;
> > +}
> > +
> > +/**
> >   * write_filehandle - Get a variable-length NFS file handle by path
> >   *
> >   * On input, the buffer contains a '\n'-terminated C string comprised of
> > @@ -1167,6 +1211,8 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
> >                                       &transaction_ops, S_IWUSR|S_IRUSR},
> >               [NFSD_FO_UnlockFS] = {"unlock_filesystem",
> >                                       &transaction_ops, S_IWUSR|S_IRUSR},
> > +             [NFSD_FO_ReleaseStateIds] = {"release_stateids",
> > +                                     &transaction_ops, S_IWUSR|S_IRUSR},
> >               [NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
> >               [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
> >               [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index 86aa92d..acee094 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -632,6 +632,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
> >                                                       struct nfsd_net *nn);
> >  extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
> >
> > +extern void nfs4_release_stateids(struct super_block *sb);
> > +
> >  struct nfs4_file *find_file(struct knfsd_fh *fh);
> >  void put_nfs4_file(struct nfs4_file *fi);
> >  static inline void get_nfs4_file(struct nfs4_file *fi)
> > --
> > 1.9.1
J. Bruce Fields Sept. 10, 2019, 8:25 p.m. UTC | #3
On Tue, Sep 10, 2019 at 10:00:24PM +0300, Alex Lyakas wrote:
> I addressed your comments, and ran the patch through checkpatch.pl.
> Patch v2 is on its way.

Thanks for the revision!  I need to spend the next week or so catching
up on some other review and then I'll get back to this.

For now:

> On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > You'll want to cover delegations as well.  And probably pNFS layouts.
> > It'd be OK to do that incrementally in followup patches.
> Unfortunately, I don't have much understanding of what these are, and
> how to cover them)

Delegations are give the client the right to cache files across opens.
I'm a little surprised your patches are working for you without handling
delegations.  There may be something about your environment that's
preventing delegations from being given out.  In the NFSv4.0 case they
require the server to make a tcp connection back the client, which is
easy blocked by firewalls or NAT.  Might be worth testing with v4.1 or
4.2.

Anyway, so we probably also want to walk the client's dl_perclnt list
and look for matching files.

--b.
John Gallagher Sept. 16, 2019, 10:28 p.m. UTC | #4
On Tue, Sep 3, 2019 at 6:57 AM Alex Lyakas <alex@zadara.com> wrote:
> This patch allows user-space to tell nfsd to release stateids of a particular local filesystem.
> After that, it is possible to unmount the local filesystem.

We recently ran into this exact same issue. A solution along these
lines would be very useful to us as well. I am curious, though, is it
feasible to release all state related to a filesystem immediately when
it is unexported? It seems like that would be ideal from the
perspective of the administrator of the server, but perhaps there are
technical reasons why that isn't easy or even possible.

-John
Chuck Lever Sept. 17, 2019, 2:16 p.m. UTC | #5
> On Sep 16, 2019, at 6:28 PM, John Gallagher <john.gallagher@delphix.com> wrote:
> 
> On Tue, Sep 3, 2019 at 6:57 AM Alex Lyakas <alex@zadara.com> wrote:
>> This patch allows user-space to tell nfsd to release stateids of a particular local filesystem.
>> After that, it is possible to unmount the local filesystem.
> 
> We recently ran into this exact same issue. A solution along these
> lines would be very useful to us as well. I am curious, though, is it
> feasible to release all state related to a filesystem immediately when
> it is unexported? It seems like that would be ideal from the
> perspective of the administrator of the server, but perhaps there are
> technical reasons why that isn't easy or even possible.

My two cents: I was surprised by Bruce's claim that simply unexporting a
local filesystem does not reliably render it unmountable. IMO that behavior
would also be surprising (and possibly inconvenient) to server administrators,
especially in this era of elastic and automated system configuration.


--
Chuck Lever
Alex Lyakas Sept. 22, 2019, 6:52 a.m. UTC | #6
Hi Bruce,

I do see in the code that a delegation stateid also holds an open file on 
the file system. In my experiments, however, the nfs4_client::cl_delegations 
list was always empty. I put an extra print to print a warning if it's not, 
but did not hit this.

Thanks,
Alex.



-----Original Message----- 
From: J. Bruce Fields
Sent: Tuesday, September 10, 2019 11:25 PM
To: Alex Lyakas
Cc: linux-nfs@vger.kernel.org ; Shyam Kaushik
Subject: Re: [RFC-PATCH] nfsd: provide a procfs entry to release stateids of 
a particular local filesystem

On Tue, Sep 10, 2019 at 10:00:24PM +0300, Alex Lyakas wrote:
> I addressed your comments, and ran the patch through checkpatch.pl.
> Patch v2 is on its way.

Thanks for the revision!  I need to spend the next week or so catching
up on some other review and then I'll get back to this.

For now:

> On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields <bfields@fieldses.org> 
> wrote:
> > You'll want to cover delegations as well.  And probably pNFS layouts.
> > It'd be OK to do that incrementally in followup patches.
> Unfortunately, I don't have much understanding of what these are, and
> how to cover them)

Delegations are give the client the right to cache files across opens.
I'm a little surprised your patches are working for you without handling
delegations.  There may be something about your environment that's
preventing delegations from being given out.  In the NFSv4.0 case they
require the server to make a tcp connection back the client, which is
easy blocked by firewalls or NAT.  Might be worth testing with v4.1 or
4.2.

Anyway, so we probably also want to walk the client's dl_perclnt list
and look for matching files.

--b.
J. Bruce Fields Sept. 23, 2019, 4:25 p.m. UTC | #7
On Sun, Sep 22, 2019 at 09:52:36AM +0300, Alex Lyakas wrote:
> I do see in the code that a delegation stateid also holds an open
> file on the file system. In my experiments, however, the
> nfs4_client::cl_delegations list was always empty. I put an extra
> print to print a warning if it's not, but did not hit this.

Do you know what version of NFS the clients are using?  (4.0, 4.1, 4.2?)

--b.

> 
> Thanks,
> Alex.
> 
> 
> 
> -----Original Message----- From: J. Bruce Fields
> Sent: Tuesday, September 10, 2019 11:25 PM
> To: Alex Lyakas
> Cc: linux-nfs@vger.kernel.org ; Shyam Kaushik
> Subject: Re: [RFC-PATCH] nfsd: provide a procfs entry to release
> stateids of a particular local filesystem
> 
> On Tue, Sep 10, 2019 at 10:00:24PM +0300, Alex Lyakas wrote:
> >I addressed your comments, and ran the patch through checkpatch.pl.
> >Patch v2 is on its way.
> 
> Thanks for the revision!  I need to spend the next week or so catching
> up on some other review and then I'll get back to this.
> 
> For now:
> 
> >On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields
> ><bfields@fieldses.org> wrote:
> >> You'll want to cover delegations as well.  And probably pNFS layouts.
> >> It'd be OK to do that incrementally in followup patches.
> >Unfortunately, I don't have much understanding of what these are, and
> >how to cover them)
> 
> Delegations are give the client the right to cache files across opens.
> I'm a little surprised your patches are working for you without handling
> delegations.  There may be something about your environment that's
> preventing delegations from being given out.  In the NFSv4.0 case they
> require the server to make a tcp connection back the client, which is
> easy blocked by firewalls or NAT.  Might be worth testing with v4.1 or
> 4.2.
> 
> Anyway, so we probably also want to walk the client's dl_perclnt list
> and look for matching files.
> 
> --b.
Alex Lyakas Sept. 24, 2019, 2:35 p.m. UTC | #8
Hi Bruce,

All client mount points look like:
10.2.7.22:/export/s1 /mnt/s1 nfs4 
rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.2.7.102,local_lock=none,addr=10.2.7.22 
0 0

So I believe these are all 4.0. The client and the server are in the same 
subnet, without any NAT or firewalls.

Thanks,
Alex.


-----Original Message----- 
From: J. Bruce Fields
Sent: Monday, September 23, 2019 7:25 PM
To: Alex Lyakas
Cc: linux-nfs@vger.kernel.org ; Shyam Kaushik
Subject: Re: [RFC-PATCH] nfsd: provide a procfs entry to release stateids of 
a particular local filesystem

On Sun, Sep 22, 2019 at 09:52:36AM +0300, Alex Lyakas wrote:
> I do see in the code that a delegation stateid also holds an open
> file on the file system. In my experiments, however, the
> nfs4_client::cl_delegations list was always empty. I put an extra
> print to print a warning if it's not, but did not hit this.

Do you know what version of NFS the clients are using?  (4.0, 4.1, 4.2?)

--b.

>
> Thanks,
> Alex.
>
>
>
> -----Original Message----- From: J. Bruce Fields
> Sent: Tuesday, September 10, 2019 11:25 PM
> To: Alex Lyakas
> Cc: linux-nfs@vger.kernel.org ; Shyam Kaushik
> Subject: Re: [RFC-PATCH] nfsd: provide a procfs entry to release
> stateids of a particular local filesystem
>
> On Tue, Sep 10, 2019 at 10:00:24PM +0300, Alex Lyakas wrote:
> >I addressed your comments, and ran the patch through checkpatch.pl.
> >Patch v2 is on its way.
>
> Thanks for the revision!  I need to spend the next week or so catching
> up on some other review and then I'll get back to this.
>
> For now:
>
> >On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields
> ><bfields@fieldses.org> wrote:
> >> You'll want to cover delegations as well.  And probably pNFS layouts.
> >> It'd be OK to do that incrementally in followup patches.
> >Unfortunately, I don't have much understanding of what these are, and
> >how to cover them)
>
> Delegations are give the client the right to cache files across opens.
> I'm a little surprised your patches are working for you without handling
> delegations.  There may be something about your environment that's
> preventing delegations from being given out.  In the NFSv4.0 case they
> require the server to make a tcp connection back the client, which is
> easy blocked by firewalls or NAT.  Might be worth testing with v4.1 or
> 4.2.
>
> Anyway, so we probably also want to walk the client's dl_perclnt list
> and look for matching files.
>
> --b.
Alex Lyakas Dec. 5, 2019, 11:47 a.m. UTC | #9
Hi Bruce,

Have you had a chance to review the V2 of the patch?

Thanks,
Alex.


-----Original Message----- 
From: J. Bruce Fields
Sent: Tuesday, September 10, 2019 11:25 PM
To: Alex Lyakas
Cc: linux-nfs@vger.kernel.org ; Shyam Kaushik
Subject: Re: [RFC-PATCH] nfsd: provide a procfs entry to release stateids of 
a particular local filesystem

On Tue, Sep 10, 2019 at 10:00:24PM +0300, Alex Lyakas wrote:
> I addressed your comments, and ran the patch through checkpatch.pl.
> Patch v2 is on its way.

Thanks for the revision!  I need to spend the next week or so catching
up on some other review and then I'll get back to this.

For now:

> On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields <bfields@fieldses.org> 
> wrote:
> > You'll want to cover delegations as well.  And probably pNFS layouts.
> > It'd be OK to do that incrementally in followup patches.
> Unfortunately, I don't have much understanding of what these are, and
> how to cover them)

Delegations are give the client the right to cache files across opens.
I'm a little surprised your patches are working for you without handling
delegations.  There may be something about your environment that's
preventing delegations from being given out.  In the NFSv4.0 case they
require the server to make a tcp connection back the client, which is
easy blocked by firewalls or NAT.  Might be worth testing with v4.1 or
4.2.

Anyway, so we probably also want to walk the client's dl_perclnt list
and look for matching files.

--b.
J. Bruce Fields Dec. 5, 2019, 3:03 p.m. UTC | #10
On Thu, Dec 05, 2019 at 01:47:09PM +0200, Alex Lyakas wrote:
> Hi Bruce,
> 
> Have you had a chance to review the V2 of the patch?

I'll take a quick look.--b.

> 
> Thanks,
> Alex.
> 
> 
> -----Original Message----- From: J. Bruce Fields
> Sent: Tuesday, September 10, 2019 11:25 PM
> To: Alex Lyakas
> Cc: linux-nfs@vger.kernel.org ; Shyam Kaushik
> Subject: Re: [RFC-PATCH] nfsd: provide a procfs entry to release
> stateids of a particular local filesystem
> 
> On Tue, Sep 10, 2019 at 10:00:24PM +0300, Alex Lyakas wrote:
> >I addressed your comments, and ran the patch through checkpatch.pl.
> >Patch v2 is on its way.
> 
> Thanks for the revision!  I need to spend the next week or so catching
> up on some other review and then I'll get back to this.
> 
> For now:
> 
> >On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields
> ><bfields@fieldses.org> wrote:
> >> You'll want to cover delegations as well.  And probably pNFS layouts.
> >> It'd be OK to do that incrementally in followup patches.
> >Unfortunately, I don't have much understanding of what these are, and
> >how to cover them)
> 
> Delegations are give the client the right to cache files across opens.
> I'm a little surprised your patches are working for you without handling
> delegations.  There may be something about your environment that's
> preventing delegations from being given out.  In the NFSv4.0 case they
> require the server to make a tcp connection back the client, which is
> easy blocked by firewalls or NAT.  Might be worth testing with v4.1 or
> 4.2.
> 
> Anyway, so we probably also want to walk the client's dl_perclnt list
> and look for matching files.
> 
> --b.
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3cf0b2e..4081753 100755
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6481,13 +6481,13 @@  struct nfs4_client_reclaim *
 	return nfs_ok;
 }
 
-#ifdef CONFIG_NFSD_FAULT_INJECTION
 static inline void
 put_client(struct nfs4_client *clp)
 {
 	atomic_dec(&clp->cl_refcount);
 }
 
+#ifdef CONFIG_NFSD_FAULT_INJECTION
 static struct nfs4_client *
 nfsd_find_client(struct sockaddr_storage *addr, size_t addr_size)
 {
@@ -6811,6 +6811,7 @@  static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
 
 	return count;
 }
+#endif /* CONFIG_NFSD_FAULT_INJECTION */
 
 static void
 nfsd_reap_openowners(struct list_head *reaplist)
@@ -6826,6 +6827,7 @@  static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
 	}
 }
 
+#ifdef CONFIG_NFSD_FAULT_INJECTION
 u64
 nfsd_inject_forget_client_openowners(struct sockaddr_storage *addr,
 				     size_t addr_size)
@@ -7072,6 +7074,109 @@  static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
 #endif /* CONFIG_NFSD_FAULT_INJECTION */
 
 /*
+ * Attempts to release the stateids that have open files on the specified superblock.
+ */
+void
+nfs4_release_stateids(struct super_block * sb)
+{
+	struct nfsd_net *nn = net_generic(current->nsproxy->net_ns, nfsd_net_id);
+	struct nfs4_client *clp = NULL;
+	struct nfs4_openowner *oo = NULL, *oo_next = NULL;
+	LIST_HEAD(openowner_reaplist);
+	unsigned int n_openowners = 0;
+
+	if (!nfsd_netns_ready(nn))
+		return;
+
+	pr_info("=== Release stateids for sb=%p ===\n", sb);
+
+	spin_lock(&nn->client_lock);
+	list_for_each_entry(clp, &nn->client_lru, cl_lru) {
+		char cl_addr[INET6_ADDRSTRLEN] = {'\0'};
+
+		rpc_ntop((struct sockaddr*)&clp->cl_addr, cl_addr, sizeof(cl_addr));
+		pr_debug("Looking at client=%p/%s cl_clientid=%u:%u refcnt=%d\n",
+			     clp, cl_addr, clp->cl_clientid.cl_boot, clp->cl_clientid.cl_id,
+			     atomic_read(&clp->cl_refcount));
+
+		spin_lock(&clp->cl_lock);
+		list_for_each_entry_safe(oo, oo_next, &clp->cl_openowners, oo_perclient) {
+			struct nfs4_ol_stateid *stp = NULL;
+			bool found_my_sb = false, found_other_sb = false;
+			struct super_block *other_sb = NULL;
+
+			pr_debug(" Openowner %p %.*s\n", oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data);
+			pr_debug(" oo_close_lru=%s oo_last_closed_stid=%p refcnt=%d so_is_open_owner=%u\n",
+				     list_empty(&oo->oo_close_lru) ? "N" : "Y", oo->oo_last_closed_stid,
+				     atomic_read(&oo->oo_owner.so_count), oo->oo_owner.so_is_open_owner);
+
+			list_for_each_entry(stp, &oo->oo_owner.so_stateids, st_perstateowner) {
+				struct nfs4_file *fp = NULL;
+				struct file *filp = NULL;
+				struct super_block *f_sb = NULL;
+				if (stp->st_stid.sc_file == NULL)
+					continue;
+
+				fp = stp->st_stid.sc_file;
+				filp = find_any_file(fp);
+				if (filp != NULL)
+					f_sb = file_inode(filp)->i_sb;
+				pr_debug("   filp=%p sb=%p my_sb=%p\n", filp, f_sb, sb);
+				if (f_sb == sb) {
+					found_my_sb = true;
+				} else {
+					found_other_sb = true;
+					other_sb = f_sb;
+				}
+				if (filp != NULL)
+					fput(filp);
+			}
+
+			/* openowner does not have files from needed fs, skip it */
+			if (!found_my_sb)
+				continue;
+
+			/*
+			 * we do not expect same openowhner having open files from more than one fs.
+			 * but if it happens, we cannot release this openowner.
+			 */
+			if (found_other_sb) {
+				pr_warn(" client=%p/%s openowner %p %.*s has files from sb=%p but also from sb=%p, skipping it!\n",
+					    clp, cl_addr, oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data, sb, other_sb);
+				continue;
+			}
+
+			/*
+			 * Each OPEN stateid holds a refcnt on the openowner (and LOCK stateid holds a refcnt on the lockowner).
+			 * This refcnt is dropped when nfs4_free_ol_stateid is called, which calls nfs4_put_stateowner.
+			 * The last refcnt drop, unhashes and frees the openowner.
+			 * As a result, after we free the last stateid, the openowner will be also be freed.
+			 * But we still need the openowner to be around, because we need to call release_last_closed_stateid(),
+			 * which is what release_openowner() does (we are doing equivalent of that).
+			 * So we need to grab an extra refcnt for the openowner here.
+			 */
+			nfs4_get_stateowner(&oo->oo_owner);
+
+			/* see: nfsd_collect_client_openowners(), nfsd_foreach_client_openowner() */
+			unhash_openowner_locked(oo);
+			/*
+			 * By incrementing cl_refcount under "nn->client_lock" we, hopefully, protect that client from being killed via mark_client_expired_locked().
+			 * We increment cl_refcount once per each openowner.
+			 */
+			atomic_inc(&clp->cl_refcount);
+			list_add(&oo->oo_perclient, &openowner_reaplist);
+			++n_openowners;
+		}
+		spin_unlock(&clp->cl_lock);
+	}
+	spin_unlock(&nn->client_lock);
+
+	pr_info("Collected %u openowners for removal (sb=%p)\n", n_openowners, sb);
+
+	nfsd_reap_openowners(&openowner_reaplist);
+}
+
+/*
  * Since the lifetime of a delegation isn't limited to that of an open, a
  * client may quite reasonably hang on to a delegation as long as it has
  * the inode cached.  This becomes an obvious problem the first time a
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 4824363..8b38186 100755
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -37,6 +37,7 @@  enum {
 	NFSD_Fh,
 	NFSD_FO_UnlockIP,
 	NFSD_FO_UnlockFS,
+	NFSD_FO_ReleaseStateIds,
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Pool_Stats,
@@ -64,6 +65,7 @@  enum {
 static ssize_t write_filehandle(struct file *file, char *buf, size_t size);
 static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size);
 static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size);
+static ssize_t write_release_stateids(struct file *file, char *buf, size_t size);
 static ssize_t write_threads(struct file *file, char *buf, size_t size);
 static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
 static ssize_t write_versions(struct file *file, char *buf, size_t size);
@@ -81,6 +83,7 @@  static ssize_t (*write_op[])(struct file *, char *, size_t) = {
 	[NFSD_Fh] = write_filehandle,
 	[NFSD_FO_UnlockIP] = write_unlock_ip,
 	[NFSD_FO_UnlockFS] = write_unlock_fs,
+	[NFSD_FO_ReleaseStateIds] = write_release_stateids,
 	[NFSD_Threads] = write_threads,
 	[NFSD_Pool_Threads] = write_pool_threads,
 	[NFSD_Versions] = write_versions,
@@ -328,6 +331,47 @@  static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size)
 }
 
 /**
+ * write_release_stateids - Release stateids of a local file system
+ *
+ * Experimental.
+ *
+ * Input:
+ *			buf:	'\n'-terminated C string containing the
+ *				absolute pathname of a local file system
+ *			size:	length of C string in @buf
+ * Output:
+ *	On success:	returns zero if all openowners were released
+ *	On error:	return code is negative errno value
+ */
+static ssize_t write_release_stateids(struct file *file, char *buf, size_t size)
+{
+	struct path path;
+	char *fo_path = NULL;
+	int error = 0;
+
+	/* sanity check */
+	if (size == 0)
+		return -EINVAL;
+
+	if (buf[size-1] != '\n')
+		return -EINVAL;
+
+	fo_path = buf;
+	error = qword_get(&buf, fo_path, size);
+	if (error < 0)
+		return -EINVAL;
+
+	error = kern_path(fo_path, 0, &path);
+	if (error)
+		return error;
+
+	nfs4_release_stateids(path.dentry->d_sb);
+
+	path_put(&path);
+	return 0;
+}
+
+/**
  * write_filehandle - Get a variable-length NFS file handle by path
  *
  * On input, the buffer contains a '\n'-terminated C string comprised of
@@ -1167,6 +1211,8 @@  static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 					&transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_FO_UnlockFS] = {"unlock_filesystem",
 					&transaction_ops, S_IWUSR|S_IRUSR},
+		[NFSD_FO_ReleaseStateIds] = {"release_stateids",
+					&transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 86aa92d..acee094 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -632,6 +632,8 @@  extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
 							struct nfsd_net *nn);
 extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
 
+extern void nfs4_release_stateids(struct super_block *sb);
+
 struct nfs4_file *find_file(struct knfsd_fh *fh);
 void put_nfs4_file(struct nfs4_file *fi);
 static inline void get_nfs4_file(struct nfs4_file *fi)