NFS: fix umount of pnfs filesystems

Message ID	1306891804-8070-1-git-send-email-dros@netapp.com (mailing list archive)
State	New, archived
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.3) with ESMTP id p511USJp026912 for <patchwork-linux-nfs@patchwork.kernel.org>; Wed, 1 Jun 2011 01:30:28 GMT From: Weston Andros Adamson <dros@netapp.com> To: trond@netapp.com Cc: linux-nfs@vger.kernel.org, Weston Andros Adamson <dros@netapp.com> Subject: [PATCH] NFS: fix umount of pnfs filesystems Date: Tue, 31 May 2011 21:30:04 -0400 Message-Id: <1306891804-8070-1-git-send-email-dros@netapp.com> Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Message ID

1306891804-8070-1-git-send-email-dros@netapp.com (mailing list archive)

State

New, archived

Headers

From: Weston Andros Adamson <dros@netapp.com>
To: trond@netapp.com
Cc: linux-nfs@vger.kernel.org, Weston Andros Adamson <dros@netapp.com>
Subject: [PATCH] NFS: fix umount of pnfs filesystems
Date: Tue, 31 May 2011 21:30:04 -0400
Message-Id: <1306891804-8070-1-git-send-email-dros@netapp.com>
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Weston Andros Adamson June 1, 2011, 1:30 a.m. UTC

Unmounting a pnfs filesystem hangs using filelayout and possibly others.
This fixes the use of the rcu protected node by making use of a new 'tmpnode'
for the temporary purge list. Also, the spinlock shouldn't be held when calling
synchronize_rcu().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
---
Fix proposed by Trond

 fs/nfs/pnfs.h     |    1 +
 fs/nfs/pnfs_dev.c |   11 +++++++----
 2 files changed, 8 insertions(+), 4 deletions(-)

Comments

Boaz Harrosh June 1, 2011, 6:38 a.m. UTC | #1

On 06/01/2011 04:30 AM, Weston Andros Adamson wrote:
> Unmounting a pnfs filesystem hangs using filelayout and possibly others.
> This fixes the use of the rcu protected node by making use of a new 'tmpnode'
> for the temporary purge list. Also, the spinlock shouldn't be held when calling
> synchronize_rcu().
> 

I like the new code, but I have two questions:

* Why didn't I see this hang? (Maybe because I run uni-processor)
* How do you have this problem. Usually with the regular usage of device-cache
  all deviceids get released when layouts go away. So by the time client_purge comes
  in there are no more devices in the cache. At objects-ld I take an extra ref at
  first add_dev which is then get released in client_purge?

Thanks
Boaz

> Signed-off-by: Weston Andros Adamson <dros@netapp.com>
> ---
> Fix proposed by Trond
> 
>  fs/nfs/pnfs.h     |    1 +
>  fs/nfs/pnfs_dev.c |   11 +++++++----
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 48d0a8e..96bf4e6 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -186,6 +186,7 @@ int pnfs_ld_read_done(struct nfs_read_data *);
>  /* pnfs_dev.c */
>  struct nfs4_deviceid_node {
>  	struct hlist_node		node;
> +	struct hlist_node		tmpnode;
>  	const struct pnfs_layoutdriver_type *ld;
>  	const struct nfs_client		*nfs_client;
>  	struct nfs4_deviceid		deviceid;
> diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
> index c65e133..81e642f 100644
> --- a/fs/nfs/pnfs_dev.c
> +++ b/fs/nfs/pnfs_dev.c
> @@ -174,6 +174,7 @@ nfs4_init_deviceid_node(struct nfs4_deviceid_node *d,
>  			const struct nfs4_deviceid *id)
>  {
>  	INIT_HLIST_NODE(&d->node);
> +	INIT_HLIST_NODE(&d->tmpnode);
>  	d->ld = ld;
>  	d->nfs_client = nfs_client;
>  	d->deviceid = *id;
> @@ -241,21 +242,25 @@ _deviceid_purge_client(const struct nfs_client *clp, long hash)
>  	struct hlist_node *n, *next;
>  	HLIST_HEAD(tmp);
>  
> +	spin_lock(&nfs4_deviceid_lock);
>  	rcu_read_lock();
>  	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
>  		if (d->nfs_client == clp && atomic_read(&d->ref)) {
>  			hlist_del_init_rcu(&d->node);
> -			hlist_add_head(&d->node, &tmp);
> +			hlist_add_head(&d->tmpnode, &tmp);
>  		}
>  	rcu_read_unlock();
> +	spin_unlock(&nfs4_deviceid_lock);
>  
>  	if (hlist_empty(&tmp))
>  		return;
>  
>  	synchronize_rcu();
> -	hlist_for_each_entry_safe(d, n, next, &tmp, node)
> +	while (!hlist_empty(&tmp)) {
>  		if (atomic_dec_and_test(&d->ref))
>  			d->ld->free_deviceid_node(d);
> +		hlist_del_init(&d->tmpnode);
> +	}
>  }
>  
>  void
> @@ -263,8 +268,6 @@ nfs4_deviceid_purge_client(const struct nfs_client *clp)
>  {
>  	long h;
>  
> -	spin_lock(&nfs4_deviceid_lock);
>  	for (h = 0; h < NFS4_DEVICE_ID_HASH_SIZE; h++)
>  		_deviceid_purge_client(clp, h);
> -	spin_unlock(&nfs4_deviceid_lock);
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Weston Andros Adamson June 1, 2011, 2:39 p.m. UTC | #2

On Jun 1, 2011, at 2:38 AM, Boaz Harrosh wrote:

> On 06/01/2011 04:30 AM, Weston Andros Adamson wrote:
>> Unmounting a pnfs filesystem hangs using filelayout and possibly others.
>> This fixes the use of the rcu protected node by making use of a new 'tmpnode'
>> for the temporary purge list. Also, the spinlock shouldn't be held when calling
>> synchronize_rcu().
>> 
> 
> I like the new code, but I have two questions:
> 
> * Why didn't I see this hang? (Maybe because I run uni-processor)
> * How do you have this problem. Usually with the regular usage of device-cache
>  all deviceids get released when layouts go away. So by the time client_purge comes
>  in there are no more devices in the cache. At objects-ld I take an extra ref at
>  first add_dev which is then get released in client_purge?
> 
> Thanks
> Boaz

Yeah, I saw the hangs on a vmware guest with two processors.

I believe it looked like this (I can revert and test again if you want the full traces):

1) Umount process calls client_purge, calls synchronize_rcu while holding lock
2) another process notices that it should call client_purge, spins trying acquire lock

With the file layout, there are definitely devices in the cache when umount is called.
Note, I could only reproduce this by: mounting remote fs, write one byte to a file on remote fs, umount.
If I get rid of the "write one byte" step, there is no hang.

-dros--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 48d0a8e..96bf4e6 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -186,6 +186,7 @@  int pnfs_ld_read_done(struct nfs_read_data *);
 /* pnfs_dev.c */
 struct nfs4_deviceid_node {
 	struct hlist_node		node;
+	struct hlist_node		tmpnode;
 	const struct pnfs_layoutdriver_type *ld;
 	const struct nfs_client		*nfs_client;
 	struct nfs4_deviceid		deviceid;
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index c65e133..81e642f 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -174,6 +174,7 @@  nfs4_init_deviceid_node(struct nfs4_deviceid_node *d,
 			const struct nfs4_deviceid *id)
 {
 	INIT_HLIST_NODE(&d->node);
+	INIT_HLIST_NODE(&d->tmpnode);
 	d->ld = ld;
 	d->nfs_client = nfs_client;
 	d->deviceid = *id;
@@ -241,21 +242,25 @@  _deviceid_purge_client(const struct nfs_client *clp, long hash)
 	struct hlist_node *n, *next;
 	HLIST_HEAD(tmp);
 
+	spin_lock(&nfs4_deviceid_lock);
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
 		if (d->nfs_client == clp && atomic_read(&d->ref)) {
 			hlist_del_init_rcu(&d->node);
-			hlist_add_head(&d->node, &tmp);
+			hlist_add_head(&d->tmpnode, &tmp);
 		}
 	rcu_read_unlock();
+	spin_unlock(&nfs4_deviceid_lock);
 
 	if (hlist_empty(&tmp))
 		return;
 
 	synchronize_rcu();
-	hlist_for_each_entry_safe(d, n, next, &tmp, node)
+	while (!hlist_empty(&tmp)) {
 		if (atomic_dec_and_test(&d->ref))
 			d->ld->free_deviceid_node(d);
+		hlist_del_init(&d->tmpnode);
+	}
 }
 
 void
@@ -263,8 +268,6 @@  nfs4_deviceid_purge_client(const struct nfs_client *clp)
 {
 	long h;
 
-	spin_lock(&nfs4_deviceid_lock);
 	for (h = 0; h < NFS4_DEVICE_ID_HASH_SIZE; h++)
 		_deviceid_purge_client(clp, h);
-	spin_unlock(&nfs4_deviceid_lock);
 }

NFS: fix umount of pnfs filesystems

Commit Message

Comments

Patch