diff mbox

[V2,1/1] NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list

Message ID 1305091198-27378-1-git-send-email-andros@netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andy Adamson May 11, 2011, 5:19 a.m. UTC
From: Andy Adamson <andros@netapp.com>

Prevents an infinite loop as list was never emptied.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/pnfs.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

Comments

Vitaliy Gusev May 14, 2011, 10:50 p.m. UTC | #1
On 01/-10/-28163 10:59 PM, Andy Adamson wrote:
> From: Andy Adamson<andros@netapp.com>
>
> Prevents an infinite loop as list was never emptied.
>
> Signed-off-by: Andy Adamson<andros@netapp.com>
> +++ b/fs/nfs/pnfs.c
> @@ -383,6 +383,7 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>   				plh_layouts);
>   		dprintk("%s freeing layout for inode %lu\n", __func__,
>   			lo->plh_inode->i_ino);
> +		list_del_init(&lo->plh_layouts);
>   		pnfs_destroy_layout(NFS_I(lo->plh_inode));

Shouldn't pnfs_destroy_layout() do it ?

Really see:

   pnfs_destroy_layout(struct nfs_inode *nfsi)
   {
   	struct pnfs_layout_hdr *lo;
   	LIST_HEAD(tmp_list);

   	spin_lock(&nfsi->vfs_inode.i_lock);
   	lo = nfsi->layout;
    ^^^^^^^^^^^^^^^^^^^^
Here is our "lo".


   	if (lo) {
   		lo->plh_block_lgets++; /* permanently block new LAYOUTGETs */
   		mark_matching_lsegs_invalid(lo, &tmp_list, IOMODE_ANY);
   	}
   	spin_unlock(&nfsi->vfs_inode.i_lock);
   	pnfs_free_lseg_list(&tmp_list);
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It does cleanup and deletes lo from the list.


I think really problem is more deeper. I investigated  debug messages:

    [  701.210784] put_lseg: lseg ffff88002cc36108 ref 5721 valid 1
    [  701.463495] pnfs_destroy_all_layouts freeing layout for inode 9
    [  701.465382] mark_matching_lsegs_invalid:Begin lo ffff880031b35ed8
    [  701.467172] mark_matching_lsegs_invalid: freeing lseg 
ffff88002cc36108 iomode 2 offset 0 length 18446744073709551615


    [701.470401] 701.470401] mark_lseg_invalid: lseg ffff88002cc36108 
ref 5720
    [  701.472071] mark_matching_lsegs_invalid:Return 1
    [  701.473623] pnfs_destroy_all_layouts freeing layout for inode 9
    [  701.475302] mark_matching_lsegs_invalid:Begin lo ffff880031b35ed8
    [  701.476981] mark_matching_lsegs_invalid: freeing lseg 
ffff88002cc36108 iomode 2 offset 0 length 18446744073709551615
    [  701.480549] mark_matching_lsegs_invalid:Return 1


    [  701.482136] pnfs_destroy_all_layouts freeing layout for inode 9
    [  701.483802] mark_matching_lsegs_invalid:Begin lo ffff880031b35ed8
    [  701.485461] mark_matching_lsegs_invalid: freeing lseg 
ffff88002cc36108 iomode 2 offset 0 length 18446744073709551615
    [  701.488598] mark_matching_lsegs_invalid:Return 1   ...


"Return 1" shows that mark_lseg_invalid() didn't do anything, because at 
first call it mark segment as invalid.

Also you can see that segment ref counter for last put_lseg() is 5720. I 
suppose that leak of refconter is real reason of the inifinite loop.


---
Thanks,
Vitaliy Gusev
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Adamson May 16, 2011, 5:44 p.m. UTC | #2
On May 14, 2011, at 6:50 PM, Vitaliy Gusev wrote:

> On 01/-10/-28163 10:59 PM, Andy Adamson wrote:
>> From: Andy Adamson<andros@netapp.com>
>> 
>> Prevents an infinite loop as list was never emptied.
>> 
>> Signed-off-by: Andy Adamson<andros@netapp.com>
>> +++ b/fs/nfs/pnfs.c
>> @@ -383,6 +383,7 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>  				plh_layouts);
>>  		dprintk("%s freeing layout for inode %lu\n", __func__,
>>  			lo->plh_inode->i_ino);
>> +		list_del_init(&lo->plh_layouts);
>>  		pnfs_destroy_layout(NFS_I(lo->plh_inode));
> 
> Shouldn't pnfs_destroy_layout() do it ?

pnfs_destroy_layout can't do it. The list is local to pnfs_destroy_all_layouts.  It's confusing because both pnfs_destroy_layout and pnfs_destroy_all_layouts have a local tmp_list used for different purposes.

-->Andy


> 
> Really see:
> 
>  pnfs_destroy_layout(struct nfs_inode *nfsi)
>  {
>  	struct pnfs_layout_hdr *lo;
>  	LIST_HEAD(tmp_list);
> 
>  	spin_lock(&nfsi->vfs_inode.i_lock);
>  	lo = nfsi->layout;
>   ^^^^^^^^^^^^^^^^^^^^
> Here is our "lo".
> 
> 
>  	if (lo) {
>  		lo->plh_block_lgets++; /* permanently block new LAYOUTGETs */
>  		mark_matching_lsegs_invalid(lo, &tmp_list, IOMODE_ANY);
>  	}
>  	spin_unlock(&nfsi->vfs_inode.i_lock);
>  	pnfs_free_lseg_list(&tmp_list);
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> It does cleanup and deletes lo from the list.

> 
> 
> I think really problem is more deeper. I investigated  debug messages:
> 
>   [  701.210784] put_lseg: lseg ffff88002cc36108 ref 5721 valid 1
>   [  701.463495] pnfs_destroy_all_layouts freeing layout for inode 9
>   [  701.465382] mark_matching_lsegs_invalid:Begin lo ffff880031b35ed8
>   [  701.467172] mark_matching_lsegs_invalid: freeing lseg ffff88002cc36108 iomode 2 offset 0 length 18446744073709551615
> 
> 
>   [701.470401] 701.470401] mark_lseg_invalid: lseg ffff88002cc36108 ref 5720
>   [  701.472071] mark_matching_lsegs_invalid:Return 1
>   [  701.473623] pnfs_destroy_all_layouts freeing layout for inode 9
>   [  701.475302] mark_matching_lsegs_invalid:Begin lo ffff880031b35ed8
>   [  701.476981] mark_matching_lsegs_invalid: freeing lseg ffff88002cc36108 iomode 2 offset 0 length 18446744073709551615
>   [  701.480549] mark_matching_lsegs_invalid:Return 1
> 
> 
>   [  701.482136] pnfs_destroy_all_layouts freeing layout for inode 9
>   [  701.483802] mark_matching_lsegs_invalid:Begin lo ffff880031b35ed8
>   [  701.485461] mark_matching_lsegs_invalid: freeing lseg ffff88002cc36108 iomode 2 offset 0 length 18446744073709551615
>   [  701.488598] mark_matching_lsegs_invalid:Return 1   ...
> 
> 
> "Return 1" shows that mark_lseg_invalid() didn't do anything, because at first call it mark segment as invalid.
> 
> Also you can see that segment ref counter for last put_lseg() is 5720. I suppose that leak of refconter is real reason of the inifinite loop.
> 
> 
> ---
> Thanks,
> Vitaliy Gusev

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vitaliy Gusev May 16, 2011, 7:39 p.m. UTC | #3
On 05/16/2011 09:44 PM, Andy Adamson wrote:
>
> On May 14, 2011, at 6:50 PM, Vitaliy Gusev wrote:
>
>> On 01/-10/-28163 10:59 PM, Andy Adamson wrote:
>>> From: Andy Adamson<andros@netapp.com>
>>>
>>> Prevents an infinite loop as list was never emptied.
>>>
>>> Signed-off-by: Andy Adamson<andros@netapp.com>
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -383,6 +383,7 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>>   				plh_layouts);
>>>   		dprintk("%s freeing layout for inode %lu\n", __func__,
>>>   			lo->plh_inode->i_ino);
>>> +		list_del_init(&lo->plh_layouts);
>>>   		pnfs_destroy_layout(NFS_I(lo->plh_inode));
>>
>> Shouldn't pnfs_destroy_layout() do it ?
>
> pnfs_destroy_layout can't do it. The list is local to pnfs_destroy_all_layouts.
> It's confusing because both pnfs_destroy_layout and pnfs_destroy_all_layouts have
 > a local tmp_list used for different purposes.

Yes purposes are different but "lo" is the same and list_del_init() in 
pnfs_free_lseg_list() see the same "lo" from pnfs_destroy_all_layouts.


  pnfs_destroy_layout(struct nfs_inode *nfsi):

       lo = nfsi->layout;

    >>> after that mark_matching_lsegs_invalid adds "lo" to tmplist_v2

	
  pnfs_free_lseg_list(tmplist_v2):

	lo = list_first_entry(free_me, struct pnfs_layout_segment,
			      pls_list)->pls_layout;

   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Here is "lo" from pnfs_destroy_all_layouts.


	if (test_bit(NFS_LAYOUT_DESTROYED, &lo->plh_flags)) {
		struct nfs_client *clp;

		clp = NFS_SERVER(lo->plh_inode)->nfs_client;
		spin_lock(&clp->cl_lock);
		list_del_init(&lo->plh_layouts);
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Here is second deleting "lo" that was already deleted in 
pnfs_destroy_all_layouts.

So if all is ok, it will delete list twice.

I suppose, either current schema has to be changed or list_del_init() 
should be removed from pnfs_destoy_all_layouts with fixing 
mark_matching_lsegs_invalid and ref counter.


---
Gusev Vitaliy
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Adamson May 17, 2011, 3:29 p.m. UTC | #4
On May 16, 2011, at 3:39 PM, Vitaliy Gusev wrote:

> On 05/16/2011 09:44 PM, Andy Adamson wrote:
>> 
>> On May 14, 2011, at 6:50 PM, Vitaliy Gusev wrote:
>> 
>>> On 01/-10/-28163 10:59 PM, Andy Adamson wrote:
>>>> From: Andy Adamson<andros@netapp.com>
>>>> 
>>>> Prevents an infinite loop as list was never emptied.
>>>> 
>>>> Signed-off-by: Andy Adamson<andros@netapp.com>
>>>> +++ b/fs/nfs/pnfs.c
>>>> @@ -383,6 +383,7 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>>>  				plh_layouts);
>>>>  		dprintk("%s freeing layout for inode %lu\n", __func__,
>>>>  			lo->plh_inode->i_ino);
>>>> +		list_del_init(&lo->plh_layouts);
>>>>  		pnfs_destroy_layout(NFS_I(lo->plh_inode));
>>> 
>>> Shouldn't pnfs_destroy_layout() do it ?
>> 
>> pnfs_destroy_layout can't do it. The list is local to pnfs_destroy_all_layouts.
>> It's confusing because both pnfs_destroy_layout and pnfs_destroy_all_layouts have
> > a local tmp_list used for different purposes.
> 
> Yes purposes are different but "lo" is the same and list_del_init() in pnfs_free_lseg_list() see the same "lo" from pnfs_destroy_all_layouts.
> 
> 
> pnfs_destroy_layout(struct nfs_inode *nfsi):
> 
>      lo = nfsi->layout;
> 
>   >>> after that mark_matching_lsegs_invalid adds "lo" to tmplist_v2
> 
> 	
> pnfs_free_lseg_list(tmplist_v2):
> 
> 	lo = list_first_entry(free_me, struct pnfs_layout_segment,
> 			      pls_list)->pls_layout;
> 
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  Here is "lo" from pnfs_destroy_all_layouts.
> 
> 
> 	if (test_bit(NFS_LAYOUT_DESTROYED, &lo->plh_flags)) {
> 		struct nfs_client *clp;
> 
> 		clp = NFS_SERVER(lo->plh_inode)->nfs_client;
> 		spin_lock(&clp->cl_lock);
> 		list_del_init(&lo->plh_layouts);
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  Here is second deleting "lo" that was already deleted in pnfs_destroy_all_layouts.
> 
> So if all is ok, it will delete list twice.
> 
> I suppose, either current schema has to be changed or list_del_init() should be removed from pnfs_destoy_all_layouts with fixing mark_matching_lsegs_invalid and ref counter.

There is a lot more to do to recover from an MDS lease expiration, the only place pnfs_destroy_all_layouts is called from. Our current code does not go far enough. We need to stop using any layouts and deviceIDs (and data servers) obtained under the old clientid.

Thanks for your review. I'll post a solution and look forward to your comments

-->Andy

> 
> 
> ---
> Gusev Vitaliy

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index ff681ab..65455f5 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -383,6 +383,7 @@  pnfs_destroy_all_layouts(struct nfs_client *clp)
 				plh_layouts);
 		dprintk("%s freeing layout for inode %lu\n", __func__,
 			lo->plh_inode->i_ino);
+		list_del_init(&lo->plh_layouts);
 		pnfs_destroy_layout(NFS_I(lo->plh_inode));
 	}
 }