NFS41: Drop lseg ref before fallthru to MDS

Message ID	1309743002-1658-1-git-send-email-bergwolf@gmail.com (mailing list archive)
State	New, archived
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p67EEHMa022324 for <patchwork-linux-nfs@patchwork.kernel.org>; Thu, 7 Jul 2011 14:14:17 GMT From: Peng Tao <bergwolf@gmail.com> To: Trond.Myklebust@netapp.com Cc: linux-nfs@vger.kernel.org, Peng Tao <peng_tao@emc.com> Subject: [PATCH] NFS41: Drop lseg ref before fallthru to MDS Date: Sun, 3 Jul 2011 18:30:02 -0700 Message-Id: <1309743002-1658-1-git-send-email-bergwolf@gmail.com> Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Peng Tao July 4, 2011, 1:30 a.m. UTC

There is no need to keep lseg reference when read/write through MDS.
This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
is not NULL.

Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/pnfs.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

Benny Halevy July 9, 2011, 2:10 p.m. UTC | #1

On 2011-07-04 04:30, Peng Tao wrote:
> There is no need to keep lseg reference when read/write through MDS.
> This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
> because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
> is not NULL.
> 
> Signed-off-by: Peng Tao <peng_tao@emc.com>

Looks good to me.

Benny

> ---
>  fs/nfs/pnfs.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 30a0394..55fdf02 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data)
>  
>  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>  		data->pnfs_error);
> +
> +	put_lseg(data->lseg);
> +	data->lseg = NULL;
>  	status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
>  				    data->mds_ops, NFS_FILE_SYNC);
>  	return status ? : -EAGAIN;
> @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data)
>  
>  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>  		data->pnfs_error);
> +
> +	put_lseg(data->lseg);
> +	data->lseg = NULL;
>  	status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
>  				   data->mds_ops);
>  	return status ? : -EAGAIN;
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

tao.peng@emc.com July 20, 2011, 5:52 a.m. UTC | #2

Hi, Trond,

Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns?

Thanks,
Tao

> -----Original Message-----
> From: Benny Halevy [mailto:bhalevy@tonian.com]
> Sent: Saturday, July 09, 2011 10:10 PM
> To: Peng Tao
> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao
> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
> 
> On 2011-07-04 04:30, Peng Tao wrote:
> > There is no need to keep lseg reference when read/write through MDS.
> > This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
> > because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
> > is not NULL.
> >
> > Signed-off-by: Peng Tao <peng_tao@emc.com>
> 
> Looks good to me.
> 
> Benny
> 
> > ---
> >  fs/nfs/pnfs.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index 30a0394..55fdf02 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data)
> >
> >  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
> >  		data->pnfs_error);
> > +
> > +	put_lseg(data->lseg);
> > +	data->lseg = NULL;
> >  	status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
> >  				    data->mds_ops, NFS_FILE_SYNC);
> >  	return status ? : -EAGAIN;
> > @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data)
> >
> >  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
> >  		data->pnfs_error);
> > +
> > +	put_lseg(data->lseg);
> > +	data->lseg = NULL;
> >  	status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
> >  				   data->mds_ops);
> >  	return status ? : -EAGAIN;

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust July 25, 2011, 7:13 p.m. UTC | #3

On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: 
> Hi, Trond,
> 
> Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns?
> 
> Thanks,
> Tao

The whole pnfs_ld_write_done thing is bogus and needs to be replaced
with something sane. It is trying to initiate a WRITE RPC call with the
wrong block size, and is calling the MDS rpc_call_done() and
rpc_release() with an uninitialised rpc task pointer.

Ditto for pnfs_ld_read_done.

Cheers
  Trond

> > -----Original Message-----
> > From: Benny Halevy [mailto:bhalevy@tonian.com]
> > Sent: Saturday, July 09, 2011 10:10 PM
> > To: Peng Tao
> > Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao
> > Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
> > 
> > On 2011-07-04 04:30, Peng Tao wrote:
> > > There is no need to keep lseg reference when read/write through MDS.
> > > This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
> > > because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
> > > is not NULL.
> > >
> > > Signed-off-by: Peng Tao <peng_tao@emc.com>
> > 
> > Looks good to me.
> > 
> > Benny
> > 
> > > ---
> > >  fs/nfs/pnfs.c |    6 ++++++
> > >  1 files changed, 6 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > > index 30a0394..55fdf02 100644
> > > --- a/fs/nfs/pnfs.c
> > > +++ b/fs/nfs/pnfs.c
> > > @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data)
> > >
> > >  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
> > >  		data->pnfs_error);
> > > +
> > > +	put_lseg(data->lseg);
> > > +	data->lseg = NULL;
> > >  	status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
> > >  				    data->mds_ops, NFS_FILE_SYNC);
> > >  	return status ? : -EAGAIN;
> > > @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data)
> > >
> > >  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
> > >  		data->pnfs_error);
> > > +
> > > +	put_lseg(data->lseg);
> > > +	data->lseg = NULL;
> > >  	status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
> > >  				   data->mds_ops);
> > >  	return status ? : -EAGAIN;
>

Jim Rees July 25, 2011, 9:35 p.m. UTC | #4

Trond Myklebust wrote:

  On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: 
  > Hi, Trond,
  > 
  > Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns?
  > 
  > Thanks,
  > Tao

  The whole pnfs_ld_write_done thing is bogus and needs to be replaced
  with something sane. It is trying to initiate a WRITE RPC call with the
  wrong block size, and is calling the MDS rpc_call_done() and
  rpc_release() with an uninitialised rpc task pointer.

  Ditto for pnfs_ld_read_done.

Benny / Boaz, have you run into this problem with object layout?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peng Tao July 26, 2011, 3:37 p.m. UTC | #5

Hi, Trond,

On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote:
>> Hi, Trond,
>>
>> Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns?
>>
>> Thanks,
>> Tao
>
> The whole pnfs_ld_write_done thing is bogus and needs to be replaced
> with something sane. It is trying to initiate a WRITE RPC call with the
> wrong block size, and is calling the MDS rpc_call_done() and
> rpc_release() with an uninitialised rpc task pointer.
>
> Ditto for pnfs_ld_read_done.
Thanks for your explanation. Is there any plan on how to fix
pnfs_ld_read/write_done? Basically, we would need an interface that
can redirect the IO to MDS if pnfs_error is set or do all necessary
cleanup work to end read/write if pnfs_error is 0. IMHO, the
recoalesce logic need to access nfs_pageio_descriptor but we do not
have that information at pnfs_ld_read/write_done.

Best,
Tao

>
> Cheers
>  Trond
>
>> > -----Original Message-----
>> > From: Benny Halevy [mailto:bhalevy@tonian.com]
>> > Sent: Saturday, July 09, 2011 10:10 PM
>> > To: Peng Tao
>> > Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao
>> > Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
>> >
>> > On 2011-07-04 04:30, Peng Tao wrote:
>> > > There is no need to keep lseg reference when read/write through MDS.
>> > > This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
>> > > because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
>> > > is not NULL.
>> > >
>> > > Signed-off-by: Peng Tao <peng_tao@emc.com>
>> >
>> > Looks good to me.
>> >
>> > Benny
>> >
>> > > ---
>> > >  fs/nfs/pnfs.c |    6 ++++++
>> > >  1 files changed, 6 insertions(+), 0 deletions(-)
>> > >
>> > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> > > index 30a0394..55fdf02 100644
>> > > --- a/fs/nfs/pnfs.c
>> > > +++ b/fs/nfs/pnfs.c
>> > > @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data)
>> > >
>> > >   dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>> > >           data->pnfs_error);
>> > > +
>> > > + put_lseg(data->lseg);
>> > > + data->lseg = NULL;
>> > >   status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
>> > >                               data->mds_ops, NFS_FILE_SYNC);
>> > >   return status ? : -EAGAIN;
>> > > @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data)
>> > >
>> > >   dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>> > >           data->pnfs_error);
>> > > +
>> > > + put_lseg(data->lseg);
>> > > + data->lseg = NULL;
>> > >   status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
>> > >                              data->mds_ops);
>> > >   return status ? : -EAGAIN;
>>
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Rees July 26, 2011, 4:08 p.m. UTC | #6

Myklebust, Trond wrote:

  > Thanks for your explanation. Is there any plan on how to fix
  > pnfs_ld_read/write_done? Basically, we would need an interface that
  > can redirect the IO to MDS if pnfs_error is set or do all necessary
  > cleanup work to end read/write if pnfs_error is 0. IMHO, the
  > recoalesce logic need to access nfs_pageio_descriptor but we do not
  > have that information at pnfs_ld_read/write_done.

  As far as I can see, the right thing to do is to mark the layout as
  invalid and then redirty the page. It should be easy to have fsync()
  re-send the pages in this case. These should be extremely rare events,
  since we expect to catch most of the pNFS failures when we do the actual
  LAYOUTGET in the ->pg_init().

  My main worry is for aio/dio where there is no good mechanism for
  retrying. I'm still working on that...

What do you suggest we do for the current set of patches that add block
layout to pnfs?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust July 26, 2011, 4:14 p.m. UTC | #7

> -----Original Message-----
> From: Jim Rees [mailto:rees@umich.edu]
> Sent: Tuesday, July 26, 2011 12:08 PM
> To: Myklebust, Trond
> Cc: Peng Tao; tao.peng@emc.com; linux-nfs@vger.kernel.org;
> bhalevy@tonian.com
> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
> 
> Myklebust, Trond wrote:
> 
>   > Thanks for your explanation. Is there any plan on how to fix
>   > pnfs_ld_read/write_done? Basically, we would need an interface
that
>   > can redirect the IO to MDS if pnfs_error is set or do all
necessary
>   > cleanup work to end read/write if pnfs_error is 0. IMHO, the
>   > recoalesce logic need to access nfs_pageio_descriptor but we do
not
>   > have that information at pnfs_ld_read/write_done.
> 
>   As far as I can see, the right thing to do is to mark the layout as
>   invalid and then redirty the page. It should be easy to have fsync()
>   re-send the pages in this case. These should be extremely rare
> events,
>   since we expect to catch most of the pNFS failures when we do the
> actual
>   LAYOUTGET in the ->pg_init().
> 
>   My main worry is for aio/dio where there is no good mechanism for
>   retrying. I'm still working on that...
> 
> What do you suggest we do for the current set of patches that add
block
> layout to pnfs?

If you are calling pnfs_ld_read/write_done, then don't change anything:
it is easier to fix this in one spot rather than several.
However someone needs to start working on fixing the code in
pnfs_ld_read/write_done to do the right thing. If nobody else has the
cycles, then I can do that but I'd prefer to have someone who can easily
test the resulting code do it.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benny Halevy July 26, 2011, 4:34 p.m. UTC | #8

On 2011-07-25 15:13, Trond Myklebust wrote:
> On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: 
>> Hi, Trond,
>>
>> Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns?
>>
>> Thanks,
>> Tao
> 
> The whole pnfs_ld_write_done thing is bogus and needs to be replaced
> with something sane. It is trying to initiate a WRITE RPC call with the
> wrong block size,

I was under the impression that your re-coalesce work will take
care of that. Is there anything else that needs to be done?

> and is calling the MDS rpc_call_done() and
> rpc_release() with an uninitialised rpc task pointer.

So on this path there is indeed no active rpc task so we're using the
task structure in the struct nfs_write_data.  I agree that having
a helper function at the rpc layer to initialize it to a meaningful
value indicating there is no active rpc task would be a useful thing.

But the fix Peng sent is for the fallback path where we initiate
I/O to the MDS and we do build a rpc task properly.  On this path
lseg indeed needs to be put and set to NULL.

Benny

> 
> Ditto for pnfs_ld_read_done.
> 
> Cheers
>   Trond
> 
>>> -----Original Message-----
>>> From: Benny Halevy [mailto:bhalevy@tonian.com]
>>> Sent: Saturday, July 09, 2011 10:10 PM
>>> To: Peng Tao
>>> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao
>>> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
>>>
>>> On 2011-07-04 04:30, Peng Tao wrote:
>>>> There is no need to keep lseg reference when read/write through MDS.
>>>> This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
>>>> because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
>>>> is not NULL.
>>>>
>>>> Signed-off-by: Peng Tao <peng_tao@emc.com>
>>>
>>> Looks good to me.
>>>
>>> Benny
>>>
>>>> ---
>>>>  fs/nfs/pnfs.c |    6 ++++++
>>>>  1 files changed, 6 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>> index 30a0394..55fdf02 100644
>>>> --- a/fs/nfs/pnfs.c
>>>> +++ b/fs/nfs/pnfs.c
>>>> @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data)
>>>>
>>>>  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>>>>  		data->pnfs_error);
>>>> +
>>>> +	put_lseg(data->lseg);
>>>> +	data->lseg = NULL;
>>>>  	status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
>>>>  				    data->mds_ops, NFS_FILE_SYNC);
>>>>  	return status ? : -EAGAIN;
>>>> @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data)
>>>>
>>>>  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>>>>  		data->pnfs_error);
>>>> +
>>>> +	put_lseg(data->lseg);
>>>> +	data->lseg = NULL;
>>>>  	status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
>>>>  				   data->mds_ops);
>>>>  	return status ? : -EAGAIN;
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Benny Halevy July 26, 2011, 4:37 p.m. UTC | #9

On 2011-07-26 12:14, Myklebust, Trond wrote:
>> -----Original Message-----
>> From: Jim Rees [mailto:rees@umich.edu]
>> Sent: Tuesday, July 26, 2011 12:08 PM
>> To: Myklebust, Trond
>> Cc: Peng Tao; tao.peng@emc.com; linux-nfs@vger.kernel.org;
>> bhalevy@tonian.com
>> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
>>
>> Myklebust, Trond wrote:
>>
>>   > Thanks for your explanation. Is there any plan on how to fix
>>   > pnfs_ld_read/write_done? Basically, we would need an interface
> that
>>   > can redirect the IO to MDS if pnfs_error is set or do all
> necessary
>>   > cleanup work to end read/write if pnfs_error is 0. IMHO, the
>>   > recoalesce logic need to access nfs_pageio_descriptor but we do
> not
>>   > have that information at pnfs_ld_read/write_done.
>>
>>   As far as I can see, the right thing to do is to mark the layout as
>>   invalid and then redirty the page. It should be easy to have fsync()
>>   re-send the pages in this case. These should be extremely rare
>> events,
>>   since we expect to catch most of the pNFS failures when we do the
>> actual
>>   LAYOUTGET in the ->pg_init().
>>
>>   My main worry is for aio/dio where there is no good mechanism for
>>   retrying. I'm still working on that...
>>
>> What do you suggest we do for the current set of patches that add
> block
>> layout to pnfs?
> 
> If you are calling pnfs_ld_read/write_done, then don't change anything:
> it is easier to fix this in one spot rather than several.

Even if we plan on fixing this for the next merge window I think
there's value with the current fix even if it's going to be replaced
with a better fix along the road.

> However someone needs to start working on fixing the code in
> pnfs_ld_read/write_done to do the right thing. If nobody else has the
> cycles, then I can do that but I'd prefer to have someone who can easily
> test the resulting code do it.

I don't have cycles to code this either but I'll be happy to help
with looking at the design and reviewing the implementation.

Benny

> 
> Cheers
>   Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peng Tao July 26, 2011, 5:32 p.m. UTC | #10

On Tue, Jul 26, 2011 at 11:50 PM, Myklebust, Trond
<Trond.Myklebust@netapp.com> wrote:
>> -----Original Message-----
>> From: Peng Tao [mailto:bergwolf@gmail.com]
>> Sent: Tuesday, July 26, 2011 11:37 AM
>> To: Myklebust, Trond
>> Cc: tao.peng@emc.com; linux-nfs@vger.kernel.org; bhalevy@tonian.com
>> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
>>
>> Hi, Trond,
>>
>> On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust
>> <Trond.Myklebust@netapp.com> wrote:
>> > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote:
>> >> Hi, Trond,
>> >>
>> >> Any comments on this patch? I still get kernel crash when pnfs write
>> is attempted but fails and calls pnfs_ld_write_done(). It seems object
>> layout uses the same code path as well. But I don't find the patch in
>> either your tree or Benny's tree. Are there any concerns?
>> >>
>> >> Thanks,
>> >> Tao
>> >
>> > The whole pnfs_ld_write_done thing is bogus and needs to be replaced
>> > with something sane. It is trying to initiate a WRITE RPC call with
>> the
>> > wrong block size, and is calling the MDS rpc_call_done() and
>> > rpc_release() with an uninitialised rpc task pointer.
>> >
>> > Ditto for pnfs_ld_read_done.
>> Thanks for your explanation. Is there any plan on how to fix
>> pnfs_ld_read/write_done? Basically, we would need an interface that
>> can redirect the IO to MDS if pnfs_error is set or do all necessary
>> cleanup work to end read/write if pnfs_error is 0. IMHO, the
>> recoalesce logic need to access nfs_pageio_descriptor but we do not
>> have that information at pnfs_ld_read/write_done.
>
> As far as I can see, the right thing to do is to mark the layout as invalid and then redirty the page. It should be easy to have fsync() re-send the pages in this case. These should be extremely rare events, since we expect to catch most of the pNFS failures when we do the actual LAYOUTGET in the ->pg_init().
Agreed. This should be easier than re-coalescing and sending to MDS at
read/write_done.

>
> My main worry is for aio/dio where there is no good mechanism for retrying. I'm still working on that...
For dio, we may have to send the failed pages to MDS instead of
relying on next fsync() to retry.


Thanks,
Tao

>
> Cheers
>  Trond
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peng Tao July 26, 2011, 5:57 p.m. UTC | #11

On Wed, Jul 27, 2011 at 1:37 AM, Myklebust, Trond
<Trond.Myklebust@netapp.com> wrote:
>> -----Original Message-----
>> From: Peng Tao [mailto:bergwolf@gmail.com]
>> Sent: Tuesday, July 26, 2011 1:33 PM
>> To: Myklebust, Trond
>> Cc: tao.peng@emc.com; linux-nfs@vger.kernel.org; bhalevy@tonian.com
>> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
>>
>> On Tue, Jul 26, 2011 at 11:50 PM, Myklebust, Trond
>> <Trond.Myklebust@netapp.com> wrote:
>> >> -----Original Message-----
>> >> From: Peng Tao [mailto:bergwolf@gmail.com]
>> >> Sent: Tuesday, July 26, 2011 11:37 AM
>> >> To: Myklebust, Trond
>> >> Cc: tao.peng@emc.com; linux-nfs@vger.kernel.org; bhalevy@tonian.com
>> >> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
>> >>
>> >> Hi, Trond,
>> >>
>> >> On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust
>> >> <Trond.Myklebust@netapp.com> wrote:
>> >> > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote:
>> >> >> Hi, Trond,
>> >> >>
>> >> >> Any comments on this patch? I still get kernel crash when pnfs
>> write
>> >> is attempted but fails and calls pnfs_ld_write_done(). It seems
>> object
>> >> layout uses the same code path as well. But I don't find the patch
>> in
>> >> either your tree or Benny's tree. Are there any concerns?
>> >> >>
>> >> >> Thanks,
>> >> >> Tao
>> >> >
>> >> > The whole pnfs_ld_write_done thing is bogus and needs to be
>> replaced
>> >> > with something sane. It is trying to initiate a WRITE RPC call
>> with
>> >> the
>> >> > wrong block size, and is calling the MDS rpc_call_done() and
>> >> > rpc_release() with an uninitialised rpc task pointer.
>> >> >
>> >> > Ditto for pnfs_ld_read_done.
>> >> Thanks for your explanation. Is there any plan on how to fix
>> >> pnfs_ld_read/write_done? Basically, we would need an interface that
>> >> can redirect the IO to MDS if pnfs_error is set or do all necessary
>> >> cleanup work to end read/write if pnfs_error is 0. IMHO, the
>> >> recoalesce logic need to access nfs_pageio_descriptor but we do not
>> >> have that information at pnfs_ld_read/write_done.
>> >
>> > As far as I can see, the right thing to do is to mark the layout as
>> invalid and then redirty the page. It should be easy to have fsync()
>> re-send the pages in this case. These should be extremely rare events,
>> since we expect to catch most of the pNFS failures when we do the
>> actual LAYOUTGET in the ->pg_init().
>> Agreed. This should be easier than re-coalescing and sending to MDS at
>> read/write_done.
>>
>> >
>> > My main worry is for aio/dio where there is no good mechanism for
>> retrying. I'm still working on that...
>> For dio, we may have to send the failed pages to MDS instead of
>> relying on next fsync() to retry.
>
> The problem isn't what to do, it is more one of _who_ does it. The rpciod/nfsiod queues aren't the ideal place to set up a resend since it involves allocating memory.
How about having a pnfs private workqueue to take care of the resend?
There are some other places default workqueue is used in io path in
both block and object layout code. It can be problematic if the
default workqueue is blocked.  e.g. if someone on the default
workqueue allocates memory and reclaim code comes into pnfs path.
Using default workqueue here can cause application hang forever. If we
have a private workqueue, these problems can be solve IMO.

Best,
Tao


>
> Cheers
>  Trond
>

NFS41: Drop lseg ref before fallthru to MDS

Commit Message

Comments

Patch