diff mbox

nfs: fix inifinite loop at nfs4_layoutcommit_release

Message ID F19688880B763E40B28B2B462677FBF805C3299C9C@MX09A.corp.emc.com (mailing list archive)
State New, archived
Headers show

Commit Message

tao.peng@emc.com Sept. 8, 2011, 10 a.m. UTC
Hi, Gusev,

> -----Original Message-----
> From: Vitaliy Gusev [mailto:gusev.vitaliy@nexenta.com]
> Sent: Wednesday, September 07, 2011 6:14 AM
> To: Trond Myklebust
> Cc: Vitaliy Gusev; Peng, Tao; linux-nfs@vger.kernel.org
> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
> 
> >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode,
> struct list_head *listp)
> >>
> >>   	list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) {
> >>   		if (lseg->pls_range.iomode == IOMODE_RW&&
> >> -		    test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags))
> >> +		    test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&&
> >> +		    list_empty(&lseg->pls_lc_list))
> >>   			list_add(&lseg->pls_lc_list, listp);
> >>   	}
> >>   }
> >
> > If the lseg is already part of one layoutcommit, but we're sending a
> > second one for the same range (presumably because we wrote more data in
> > the same region), then the above causes the lseg to be excluded.
> 
> 
> Yes, lseg is excluded, This patch does exactly only exclusion of lseg.
> lseg is used here only to get/put reference on this lseg, so skipping is
> correct.
> 
> 
> However, checking on list_empty can occur (on other CPU) in the middle:
> 
> 	list_del_init(&lseg->pls_lc_list);
> Here >>>>>>
> 	if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT,
> 			       &lseg->pls_flags))
> 		put_lseg(lseg);
> 
> 
> So list_del_init must be executed under the same lock as
> pnfs_list_write_lseg, i.e. inode->i_lock.
Yes, you are right. How about following patch?

From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001
From: Peng Tao <bergwolf@gmail.com>
Date: Thu, 8 Sep 2011 00:57:02 -0400
Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free

Since there can be more than one layoutcommit proc happen the same time,
lseg list create/free should be protected. Otherwise lseg list
may get corrupted.

Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/nfs4proc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

Vitaliy Gusev Sept. 8, 2011, 1:02 p.m. UTC | #1
On 09/08/2011 02:00 PM, tao.peng@emc.com wrote:
> Hi, Gusev,

Hello Tao!

> Yes, you are right. How about following patch?
>
>  From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001
> From: Peng Tao<bergwolf@gmail.com>
> Date: Thu, 8 Sep 2011 00:57:02 -0400
> Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free
>
> Since there can be more than one layoutcommit proc happen the same time,
> lseg list create/free should be protected. Otherwise lseg list
> may get corrupted.
>
> Reported-by: Vitaliy Gusev<gusev.vitaliy@nexenta.com>
> Signed-off-by: Peng Tao<peng_tao@emc.com>
> ---
>   fs/nfs/nfs4proc.c |    2 ++
>   1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 8c77039..da7c20c 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void *calldata)
>   	struct pnfs_layout_segment *lseg, *tmp;
>
>   	pnfs_cleanup_layoutcommit(data);
> +	spin_lock(&data->args.inode->i_lock);

I think lock over list_del_init(&lseg->pls_lc_list) is enough, because

1. here lseg is deleted from unique (placed in stack) data and there is 
no any race during deletion.


2. Only one thing must be protected - list_empty status at 
pnfs_list_write_lseg (after my patch).

    The problem is that list_del_init is placed before 
test_and_clear_bit and spinlock can be some kind of barrier for 
protection against adding lseg to new data->lseg_list at
pnfs_list_write_lseg.

    Do reordering list_del_init with test_and_clear_bit is not good, 
because lseg can be invalid after put_lseg.


>   	/* Matched by references in pnfs_set_layoutcommit */
>   	list_for_each_entry_safe(lseg, tmp,&data->lseg_list, pls_lc_list) {
>   		list_del_init(&lseg->pls_lc_list);
> @@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void *calldata)
>   				&lseg->pls_flags))
>   			put_lseg(lseg);
>   	}
> +	spin_unlock(&data->args.inode->i_lock);
>   	put_rpccred(data->cred);
>   	kfree(data);
>   }

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peng Tao Sept. 8, 2011, 3:09 p.m. UTC | #2
Hi, Gusev,
On Thu, Sep 8, 2011 at 9:02 PM, Vitaliy Gusev <gusev.vitaliy@nexenta.com> wrote:
> On 09/08/2011 02:00 PM, tao.peng@emc.com wrote:
>>
>> Hi, Gusev,
>
> Hello Tao!
>
>> Yes, you are right. How about following patch?
>>
>>  From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001
>> From: Peng Tao<bergwolf@gmail.com>
>> Date: Thu, 8 Sep 2011 00:57:02 -0400
>> Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free
>>
>> Since there can be more than one layoutcommit proc happen the same time,
>> lseg list create/free should be protected. Otherwise lseg list
>> may get corrupted.
>>
>> Reported-by: Vitaliy Gusev<gusev.vitaliy@nexenta.com>
>> Signed-off-by: Peng Tao<peng_tao@emc.com>
>> ---
>>  fs/nfs/nfs4proc.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 8c77039..da7c20c 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void
>> *calldata)
>>        struct pnfs_layout_segment *lseg, *tmp;
>>
>>        pnfs_cleanup_layoutcommit(data);
>> +       spin_lock(&data->args.inode->i_lock);
>
> I think lock over list_del_init(&lseg->pls_lc_list) is enough, because
I put the spinlock outside of the loop because the critical section is
short enough and it should be faster than grabbing/dropping the inode
lock for every entry in the list, agree?

Thanks,
Tao

>
> 1. here lseg is deleted from unique (placed in stack) data and there is no
> any race during deletion.
>
>
> 2. Only one thing must be protected - list_empty status at
> pnfs_list_write_lseg (after my patch).
>
>   The problem is that list_del_init is placed before test_and_clear_bit and
> spinlock can be some kind of barrier for protection against adding lseg to
> new data->lseg_list at
> pnfs_list_write_lseg.
>
>   Do reordering list_del_init with test_and_clear_bit is not good, because
> lseg can be invalid after put_lseg.
>
>
>>        /* Matched by references in pnfs_set_layoutcommit */
>>        list_for_each_entry_safe(lseg, tmp,&data->lseg_list, pls_lc_list) {
>>                list_del_init(&lseg->pls_lc_list);
>> @@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void
>> *calldata)
>>                                &lseg->pls_flags))
>>                        put_lseg(lseg);
>>        }
>> +       spin_unlock(&data->args.inode->i_lock);
>>        put_rpccred(data->cred);
>>        kfree(data);
>>  }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
diff mbox

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8c77039..da7c20c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5964,6 +5964,7 @@  static void nfs4_layoutcommit_release(void *calldata)
 	struct pnfs_layout_segment *lseg, *tmp;
 
 	pnfs_cleanup_layoutcommit(data);
+	spin_lock(&data->args.inode->i_lock);
 	/* Matched by references in pnfs_set_layoutcommit */
 	list_for_each_entry_safe(lseg, tmp, &data->lseg_list, pls_lc_list) {
 		list_del_init(&lseg->pls_lc_list);
@@ -5971,6 +5972,7 @@  static void nfs4_layoutcommit_release(void *calldata)
 				       &lseg->pls_flags))
 			put_lseg(lseg);
 	}
+	spin_unlock(&data->args.inode->i_lock);
 	put_rpccred(data->cred);
 	kfree(data);
 }