diff mbox

pnfsblock: init pg_bsize properly

Message ID 1313197450-4595-1-git-send-email-bergwolf@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Peng Tao Aug. 13, 2011, 1:04 a.m. UTC
pg_bsize is server->wsize/rsize by default. We would want to use the lseg length.

Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/blocklayout/blocklayout.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

Comments

Boaz Harrosh Aug. 16, 2011, 9:05 p.m. UTC | #1
On 08/12/2011 06:04 PM, Peng Tao wrote:
> pg_bsize is server->wsize/rsize by default. We would want to use the lseg length.
> 

Hi

What is the problem you are trying to solve with this patch?

From what I understand the only place that actually cares about
pg_bsize is nfs_generic_pg_test() which is only used in MDS
read/write. In the pNFS RW, the LD and pnfs has it's own .pg_test()
check that should not concern with pg_bsize (Unless for pnfs-files
which does). So the idea is that pg_bsize is the maximum set by
MDS server in regard to IO through MDS. And it should not be changed
by client.

If it is not what you see then we should fix it. But should never
override MDS wsize/rsize.

> Signed-off-by: Peng Tao <peng_tao@emc.com>
> ---
>  fs/nfs/blocklayout/blocklayout.c |   20 ++++++++++++++++++--
>  1 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 36648e1..9143e61 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -919,14 +919,30 @@ bl_clear_layoutdriver(struct nfs_server *server)
>  	return 0;
>  }
>  
> +static void bl_pg_init_read(struct nfs_pageio_descriptor *pgio,
> +			    struct nfs_page *req)
> +{
> +	pnfs_generic_pg_init_read(pgio, req);
> +	if (pgio->pg_lseg)
> +		pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
> +}
> +
> +static void bl_pg_init_write(struct nfs_pageio_descriptor *pgio,
> +			     struct nfs_page *req)
> +{
> +	pnfs_generic_pg_init_write(pgio, req);
> +	if (pgio->pg_lseg)
> +		pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
> +}
> +
>  static const struct nfs_pageio_ops bl_pg_read_ops = {
> -	.pg_init = pnfs_generic_pg_init_read,
> +	.pg_init = bl_pg_init_read,
>  	.pg_test = pnfs_generic_pg_test,

I see here that you do not override .pg_test. This is your problem
look at objio_osd::objio_pg_test() it checks for similar boundaries
at the objects side. This is where you need to do these checks
for blocks as well.

>  	.pg_doio = pnfs_generic_pg_readpages,
>  };
>  
>  static const struct nfs_pageio_ops bl_pg_write_ops = {
> -	.pg_init = pnfs_generic_pg_init_write,
> +	.pg_init = bl_pg_init_write,
>  	.pg_test = pnfs_generic_pg_test,

Same here

>  	.pg_doio = pnfs_generic_pg_writepages,
>  };

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Aug. 17, 2011, 7:15 a.m. UTC | #2
On 2011-08-17 00:05, Boaz Harrosh wrote:
> On 08/12/2011 06:04 PM, Peng Tao wrote:
>> pg_bsize is server->wsize/rsize by default. We would want to use the lseg length.
>>
> 
> Hi
> 
> What is the problem you are trying to solve with this patch?
> 
> From what I understand the only place that actually cares about
> pg_bsize is nfs_generic_pg_test() which is only used in MDS
> read/write. In the pNFS RW, the LD and pnfs has it's own .pg_test()
> check that should not concern with pg_bsize (Unless for pnfs-files
> which does). So the idea is that pg_bsize is the maximum set by
> MDS server in regard to IO through MDS. And it should not be changed
> by client.
> 
> If it is not what you see then we should fix it. But should never
> override MDS wsize/rsize.

I second that.

Benny

> 
>> Signed-off-by: Peng Tao <peng_tao@emc.com>
>> ---
>>  fs/nfs/blocklayout/blocklayout.c |   20 ++++++++++++++++++--
>>  1 files changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>> index 36648e1..9143e61 100644
>> --- a/fs/nfs/blocklayout/blocklayout.c
>> +++ b/fs/nfs/blocklayout/blocklayout.c
>> @@ -919,14 +919,30 @@ bl_clear_layoutdriver(struct nfs_server *server)
>>  	return 0;
>>  }
>>  
>> +static void bl_pg_init_read(struct nfs_pageio_descriptor *pgio,
>> +			    struct nfs_page *req)
>> +{
>> +	pnfs_generic_pg_init_read(pgio, req);
>> +	if (pgio->pg_lseg)
>> +		pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>> +}
>> +
>> +static void bl_pg_init_write(struct nfs_pageio_descriptor *pgio,
>> +			     struct nfs_page *req)
>> +{
>> +	pnfs_generic_pg_init_write(pgio, req);
>> +	if (pgio->pg_lseg)
>> +		pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>> +}
>> +
>>  static const struct nfs_pageio_ops bl_pg_read_ops = {
>> -	.pg_init = pnfs_generic_pg_init_read,
>> +	.pg_init = bl_pg_init_read,
>>  	.pg_test = pnfs_generic_pg_test,
> 
> I see here that you do not override .pg_test. This is your problem
> look at objio_osd::objio_pg_test() it checks for similar boundaries
> at the objects side. This is where you need to do these checks
> for blocks as well.
> 
>>  	.pg_doio = pnfs_generic_pg_readpages,
>>  };
>>  
>>  static const struct nfs_pageio_ops bl_pg_write_ops = {
>> -	.pg_init = pnfs_generic_pg_init_write,
>> +	.pg_init = bl_pg_init_write,
>>  	.pg_test = pnfs_generic_pg_test,
> 
> Same here
> 
>>  	.pg_doio = pnfs_generic_pg_writepages,
>>  };
> 
> Thanks
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peng Tao Aug. 17, 2011, 9:35 a.m. UTC | #3
Hi, Benny and Boaz,

On Wed, Aug 17, 2011 at 3:15 PM, Benny Halevy <bhalevy@tonian.com> wrote:
>
> On 2011-08-17 00:05, Boaz Harrosh wrote:
>> On 08/12/2011 06:04 PM, Peng Tao wrote:
>>> pg_bsize is server->wsize/rsize by default. We would want to use the lseg length.
>>>
>>
>> Hi
>>
>> What is the problem you are trying to solve with this patch?
>>
>> From what I understand the only place that actually cares about
>> pg_bsize is nfs_generic_pg_test() which is only used in MDS
>> read/write. In the pNFS RW, the LD and pnfs has it's own .pg_test()
>> check that should not concern with pg_bsize (Unless for pnfs-files
>> which does). So the idea is that pg_bsize is the maximum set by
>> MDS server in regard to IO through MDS. And it should not be changed
>> by client.
>>
>> If it is not what you see then we should fix it. But should never
>> override MDS wsize/rsize.
In pnfs_do_multiple_reads/pnfs_do_multiple_writes, data->mds_ops will
be set as desc->pg_rpc_callops, which is determined in
nfs_generic_flush/nfs_generic_pagein according to desc->pg_bsize. For
blocklayout, we wouldn't want to set data->mds_ops to
partial_read/write ops, so I write the patch to use lseg length as
pg_bsize.

LD can override pg_bsize in pg_init because
nfs_pageio_reset_read_mds/nfs_pageio_reset_write_mds will reset it to
server rsize/wsize if pnfs is not tried.

Sorry that I didn't explain it clearly in the commit log...


>
> I second that.
>
> Benny
>
>>
>>> Signed-off-by: Peng Tao <peng_tao@emc.com>
>>> ---
>>>  fs/nfs/blocklayout/blocklayout.c |   20 ++++++++++++++++++--
>>>  1 files changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>>> index 36648e1..9143e61 100644
>>> --- a/fs/nfs/blocklayout/blocklayout.c
>>> +++ b/fs/nfs/blocklayout/blocklayout.c
>>> @@ -919,14 +919,30 @@ bl_clear_layoutdriver(struct nfs_server *server)
>>>      return 0;
>>>  }
>>>
>>> +static void bl_pg_init_read(struct nfs_pageio_descriptor *pgio,
>>> +                        struct nfs_page *req)
>>> +{
>>> +    pnfs_generic_pg_init_read(pgio, req);
>>> +    if (pgio->pg_lseg)
>>> +            pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>>> +}
>>> +
>>> +static void bl_pg_init_write(struct nfs_pageio_descriptor *pgio,
>>> +                         struct nfs_page *req)
>>> +{
>>> +    pnfs_generic_pg_init_write(pgio, req);
>>> +    if (pgio->pg_lseg)
>>> +            pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>>> +}
>>> +
>>>  static const struct nfs_pageio_ops bl_pg_read_ops = {
>>> -    .pg_init = pnfs_generic_pg_init_read,
>>> +    .pg_init = bl_pg_init_read,
>>>      .pg_test = pnfs_generic_pg_test,
>>
>> I see here that you do not override .pg_test. This is your problem
>> look at objio_osd::objio_pg_test() it checks for similar boundaries
>> at the objects side. This is where you need to do these checks
>> for blocks as well.
For blocklayout, we don't need to force each IO under a certain size.
Currently (w/ and w/o this patch) the lseg coverage is the only
constraint for pagelist length. So pnfs_generic_pg_test is enough for
blocklayout.

Thanks,
Tao

>>
>>>      .pg_doio = pnfs_generic_pg_readpages,
>>>  };
>>>
>>>  static const struct nfs_pageio_ops bl_pg_write_ops = {
>>> -    .pg_init = pnfs_generic_pg_init_write,
>>> +    .pg_init = bl_pg_init_write,
>>>      .pg_test = pnfs_generic_pg_test,
>>
>> Same here
>>
>>>      .pg_doio = pnfs_generic_pg_writepages,
>>>  };
>>
>> Thanks
>> Boaz
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Aug. 17, 2011, 4:27 p.m. UTC | #4
On 2011-08-17 12:35, Peng Tao wrote:
> Hi, Benny and Boaz,
> 
> On Wed, Aug 17, 2011 at 3:15 PM, Benny Halevy <bhalevy@tonian.com> wrote:
>>
>> On 2011-08-17 00:05, Boaz Harrosh wrote:
>>> On 08/12/2011 06:04 PM, Peng Tao wrote:
>>>> pg_bsize is server->wsize/rsize by default. We would want to use the lseg length.
>>>>
>>>
>>> Hi
>>>
>>> What is the problem you are trying to solve with this patch?
>>>
>>> From what I understand the only place that actually cares about
>>> pg_bsize is nfs_generic_pg_test() which is only used in MDS
>>> read/write. In the pNFS RW, the LD and pnfs has it's own .pg_test()
>>> check that should not concern with pg_bsize (Unless for pnfs-files
>>> which does). So the idea is that pg_bsize is the maximum set by
>>> MDS server in regard to IO through MDS. And it should not be changed
>>> by client.
>>>
>>> If it is not what you see then we should fix it. But should never
>>> override MDS wsize/rsize.
> In pnfs_do_multiple_reads/pnfs_do_multiple_writes, data->mds_ops will
> be set as desc->pg_rpc_callops, which is determined in
> nfs_generic_flush/nfs_generic_pagein according to desc->pg_bsize. For
> blocklayout, we wouldn't want to set data->mds_ops to
> partial_read/write ops, so I write the patch to use lseg length as
> pg_bsize.
> 
> LD can override pg_bsize in pg_init because
> nfs_pageio_reset_read_mds/nfs_pageio_reset_write_mds will reset it to
> server rsize/wsize if pnfs is not tried.
> 
> Sorry that I didn't explain it clearly in the commit log...
> 
> 

To reflect that maybe we should also rename pg_bsize to pg_iosize.

Benny

>>
>> I second that.
>>
>> Benny
>>
>>>
>>>> Signed-off-by: Peng Tao <peng_tao@emc.com>
>>>> ---
>>>>  fs/nfs/blocklayout/blocklayout.c |   20 ++++++++++++++++++--
>>>>  1 files changed, 18 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>>>> index 36648e1..9143e61 100644
>>>> --- a/fs/nfs/blocklayout/blocklayout.c
>>>> +++ b/fs/nfs/blocklayout/blocklayout.c
>>>> @@ -919,14 +919,30 @@ bl_clear_layoutdriver(struct nfs_server *server)
>>>>      return 0;
>>>>  }
>>>>
>>>> +static void bl_pg_init_read(struct nfs_pageio_descriptor *pgio,
>>>> +                        struct nfs_page *req)
>>>> +{
>>>> +    pnfs_generic_pg_init_read(pgio, req);
>>>> +    if (pgio->pg_lseg)
>>>> +            pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>>>> +}
>>>> +
>>>> +static void bl_pg_init_write(struct nfs_pageio_descriptor *pgio,
>>>> +                         struct nfs_page *req)
>>>> +{
>>>> +    pnfs_generic_pg_init_write(pgio, req);
>>>> +    if (pgio->pg_lseg)
>>>> +            pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>>>> +}
>>>> +
>>>>  static const struct nfs_pageio_ops bl_pg_read_ops = {
>>>> -    .pg_init = pnfs_generic_pg_init_read,
>>>> +    .pg_init = bl_pg_init_read,
>>>>      .pg_test = pnfs_generic_pg_test,
>>>
>>> I see here that you do not override .pg_test. This is your problem
>>> look at objio_osd::objio_pg_test() it checks for similar boundaries
>>> at the objects side. This is where you need to do these checks
>>> for blocks as well.
> For blocklayout, we don't need to force each IO under a certain size.
> Currently (w/ and w/o this patch) the lseg coverage is the only
> constraint for pagelist length. So pnfs_generic_pg_test is enough for
> blocklayout.
> 
> Thanks,
> Tao
> 
>>>
>>>>      .pg_doio = pnfs_generic_pg_readpages,
>>>>  };
>>>>
>>>>  static const struct nfs_pageio_ops bl_pg_write_ops = {
>>>> -    .pg_init = pnfs_generic_pg_init_write,
>>>> +    .pg_init = bl_pg_init_write,
>>>>      .pg_test = pnfs_generic_pg_test,
>>>
>>> Same here
>>>
>>>>      .pg_doio = pnfs_generic_pg_writepages,
>>>>  };
>>>
>>> Thanks
>>> Boaz
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peng Tao Aug. 18, 2011, 2:34 p.m. UTC | #5
On Thu, Aug 18, 2011 at 12:27 AM, Benny Halevy <bhalevy@tonian.com> wrote:
> On 2011-08-17 12:35, Peng Tao wrote:
>> Hi, Benny and Boaz,
>>
>> On Wed, Aug 17, 2011 at 3:15 PM, Benny Halevy <bhalevy@tonian.com> wrote:
>>>
>>> On 2011-08-17 00:05, Boaz Harrosh wrote:
>>>> On 08/12/2011 06:04 PM, Peng Tao wrote:
>>>>> pg_bsize is server->wsize/rsize by default. We would want to use the lseg length.
>>>>>
>>>>
>>>> Hi
>>>>
>>>> What is the problem you are trying to solve with this patch?
>>>>
>>>> From what I understand the only place that actually cares about
>>>> pg_bsize is nfs_generic_pg_test() which is only used in MDS
>>>> read/write. In the pNFS RW, the LD and pnfs has it's own .pg_test()
>>>> check that should not concern with pg_bsize (Unless for pnfs-files
>>>> which does). So the idea is that pg_bsize is the maximum set by
>>>> MDS server in regard to IO through MDS. And it should not be changed
>>>> by client.
>>>>
>>>> If it is not what you see then we should fix it. But should never
>>>> override MDS wsize/rsize.
>> In pnfs_do_multiple_reads/pnfs_do_multiple_writes, data->mds_ops will
>> be set as desc->pg_rpc_callops, which is determined in
>> nfs_generic_flush/nfs_generic_pagein according to desc->pg_bsize. For
>> blocklayout, we wouldn't want to set data->mds_ops to
>> partial_read/write ops, so I write the patch to use lseg length as
>> pg_bsize.
>>
>> LD can override pg_bsize in pg_init because
>> nfs_pageio_reset_read_mds/nfs_pageio_reset_write_mds will reset it to
>> server rsize/wsize if pnfs is not tried.
>>
>> Sorry that I didn't explain it clearly in the commit log...
>>
>>
>
> To reflect that maybe we should also rename pg_bsize to pg_iosize.
For pnfs, in fact we are not using pg_bsize as the iosize limit. It's
just that if pg_bsize is smaller than PAGE_CACHE_SIZE, partial
read/write ops will be used. I'm afraid that if we rename pg_bsize to
pg_iosize, people would really think it is the limit for read/write
iosize, which it really isn't. :)

Thanks,
Tao
>
> Benny
>
>>>
>>> I second that.
>>>
>>> Benny
>>>
>>>>
>>>>> Signed-off-by: Peng Tao <peng_tao@emc.com>
>>>>> ---
>>>>>  fs/nfs/blocklayout/blocklayout.c |   20 ++++++++++++++++++--
>>>>>  1 files changed, 18 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>>>>> index 36648e1..9143e61 100644
>>>>> --- a/fs/nfs/blocklayout/blocklayout.c
>>>>> +++ b/fs/nfs/blocklayout/blocklayout.c
>>>>> @@ -919,14 +919,30 @@ bl_clear_layoutdriver(struct nfs_server *server)
>>>>>      return 0;
>>>>>  }
>>>>>
>>>>> +static void bl_pg_init_read(struct nfs_pageio_descriptor *pgio,
>>>>> +                        struct nfs_page *req)
>>>>> +{
>>>>> +    pnfs_generic_pg_init_read(pgio, req);
>>>>> +    if (pgio->pg_lseg)
>>>>> +            pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>>>>> +}
>>>>> +
>>>>> +static void bl_pg_init_write(struct nfs_pageio_descriptor *pgio,
>>>>> +                         struct nfs_page *req)
>>>>> +{
>>>>> +    pnfs_generic_pg_init_write(pgio, req);
>>>>> +    if (pgio->pg_lseg)
>>>>> +            pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
>>>>> +}
>>>>> +
>>>>>  static const struct nfs_pageio_ops bl_pg_read_ops = {
>>>>> -    .pg_init = pnfs_generic_pg_init_read,
>>>>> +    .pg_init = bl_pg_init_read,
>>>>>      .pg_test = pnfs_generic_pg_test,
>>>>
>>>> I see here that you do not override .pg_test. This is your problem
>>>> look at objio_osd::objio_pg_test() it checks for similar boundaries
>>>> at the objects side. This is where you need to do these checks
>>>> for blocks as well.
>> For blocklayout, we don't need to force each IO under a certain size.
>> Currently (w/ and w/o this patch) the lseg coverage is the only
>> constraint for pagelist length. So pnfs_generic_pg_test is enough for
>> blocklayout.
>>
>> Thanks,
>> Tao
>>
>>>>
>>>>>      .pg_doio = pnfs_generic_pg_readpages,
>>>>>  };
>>>>>
>>>>>  static const struct nfs_pageio_ops bl_pg_write_ops = {
>>>>> -    .pg_init = pnfs_generic_pg_init_write,
>>>>> +    .pg_init = bl_pg_init_write,
>>>>>      .pg_test = pnfs_generic_pg_test,
>>>>
>>>> Same here
>>>>
>>>>>      .pg_doio = pnfs_generic_pg_writepages,
>>>>>  };
>>>>
>>>> Thanks
>>>> Boaz
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Boaz Harrosh Aug. 22, 2011, 11:52 p.m. UTC | #6
On 08/17/2011 02:35 AM, Peng Tao wrote:
> Hi, Benny and Boaz,
> 
<snip>

> In pnfs_do_multiple_reads/pnfs_do_multiple_writes, data->mds_ops will
> be set as desc->pg_rpc_callops, which is determined in
> nfs_generic_flush/nfs_generic_pagein according to desc->pg_bsize. For
> blocklayout, we wouldn't want to set data->mds_ops to
> partial_read/write ops, so I write the patch to use lseg length as
> pg_bsize.
> 

Do you mean in the case where MDS sets (pg_bsize < PAGE_SIZE) ?

Right, that is a problem. (Theoretically, because the pNFSD-Linux server
does not do that. Do you have a Server that does?)

> LD can override pg_bsize in pg_init because
> nfs_pageio_reset_read_mds/nfs_pageio_reset_write_mds will reset it to
> server rsize/wsize if pnfs is not tried.
> 

So if it is the "pg_bsize < PAGE_SIZE" but pNFS-IO case then I don't
like your patch, at all. We should fix the generic code to behave
properly, and not let LDs hack their way out. (For example what about
objects and files LDs)

There is a few ways you can fix the generic code. One is override the
desc->pg_rpc_callops for the pNFS case to always be the same one. Or
override the test for (pg_bsize < PAGE_SIZE) in the pNFS case if we have
a lseg. Or some other clean way.

But please don't fix it like that, inside each LD driver.

[ Trond Fred
  One thing I do not understand about the files-layout operations. You
  have explained in the passed that r/wsize sent from the MDS is also the
  same one for each DS. So if we take an example of rsize beeing 2MB
  and there is a stripping of 2 DS for that layout.(Say strip_unit==rsize)
  Then we need to read 1/2 of that page from one DS and the 2/2 half from the
  second. Will current partial_read/write work if going through files-LD?
]

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peng Tao Aug. 23, 2011, 3:01 p.m. UTC | #7
Hi, Trond and Boaz,

On Tue, Aug 23, 2011 at 8:00 AM, Myklebust, Trond
<Trond.Myklebust@netapp.com> wrote:
>> -----Original Message-----
>> From: Boaz Harrosh [mailto:bharrosh@panasas.com]
>> Sent: Monday, August 22, 2011 7:52 PM
>> To: Peng Tao
>> Cc: Benny Halevy; linux-nfs@vger.kernel.org; Peng Tao; Myklebust,
>> Trond; Isaman, Fred
>> Subject: Re: [PATCH] pnfsblock: init pg_bsize properly
>>
>> On 08/17/2011 02:35 AM, Peng Tao wrote:
>> > Hi, Benny and Boaz,
>> >
>> <snip>
>>
>> > In pnfs_do_multiple_reads/pnfs_do_multiple_writes, data->mds_ops will
>> > be set as desc->pg_rpc_callops, which is determined in
>> > nfs_generic_flush/nfs_generic_pagein according to desc->pg_bsize. For
>> > blocklayout, we wouldn't want to set data->mds_ops to
>> > partial_read/write ops, so I write the patch to use lseg length as
>> > pg_bsize.
>> >
>>
>> Do you mean in the case where MDS sets (pg_bsize < PAGE_SIZE) ?
>>
>> Right, that is a problem. (Theoretically, because the pNFSD-Linux
>> server
>> does not do that. Do you have a Server that does?)
No, I don't have a server does that. But it is a server config option
and we can't force users not to change it. So better fix it at client
side.

>>
>> > LD can override pg_bsize in pg_init because
>> > nfs_pageio_reset_read_mds/nfs_pageio_reset_write_mds will reset it to
>> > server rsize/wsize if pnfs is not tried.
>> >
>>
>> So if it is the "pg_bsize < PAGE_SIZE" but pNFS-IO case then I don't
>> like your patch, at all. We should fix the generic code to behave
>> properly, and not let LDs hack their way out. (For example what about
>> objects and files LDs)
>>
>> There is a few ways you can fix the generic code. One is override the
>> desc->pg_rpc_callops for the pNFS case to always be the same one. Or
>> override the test for (pg_bsize < PAGE_SIZE) in the pNFS case if we
>> have
>> a lseg. Or some other clean way.
I was under the impression that for object and file layouts, partial
read/write rpc ops are still needed for DS IO when DS r/wsize is
smaller than PAGE_SIZE...

>>
>> But please don't fix it like that, inside each LD driver.
>>
>> [ Trond Fred
>>   One thing I do not understand about the files-layout operations. You
>>   have explained in the passed that r/wsize sent from the MDS is also
>> the
>>   same one for each DS. So if we take an example of rsize beeing 2MB
>>   and there is a stripping of 2 DS for that layout.(Say
>> strip_unit==rsize)
>>   Then we need to read 1/2 of that page from one DS and the 2/2 half
>> from the
>>   second. Will current partial_read/write work if going through files-
>> LD?
>> ]
>
> No. The stripe size may be smaller than the r/wsize, in which case we're in the same boat as the blocks and objects.
So this is a generic issue. For file and object layout, do you need to
use partial read/write rpc ops in any case? For block layout, we would
like to never use it in LD. But I'm not sure about file and object
case. Could you confirm?

Thanks,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Aug. 23, 2011, 9:19 p.m. UTC | #8
On 08/23/2011 08:01 AM, Peng Tao wrote:
> So this is a generic issue. For file and object layout, do you need to
> use partial read/write rpc ops in any case? For block layout, we would
> like to never use it in LD. But I'm not sure about file and object
> case. Could you confirm?
> 

For objects it is like blocks. NEVER (ever) use partial rpc ops. r/wsize
are not relevant for obj-LD IO.

(As I understand from Trond also with files the LD should inspect the
 situation and have the final disposition. But someone will need to write
 some files-LD code for that, Fred, Andy?)

> Thanks,
> Tao

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jim Rees Aug. 25, 2011, 8:15 p.m. UTC | #9
Boaz Harrosh wrote:

  On 08/23/2011 08:01 AM, Peng Tao wrote:
  > So this is a generic issue. For file and object layout, do you need to
  > use partial read/write rpc ops in any case? For block layout, we would
  > like to never use it in LD. But I'm not sure about file and object
  > case. Could you confirm?
  > 
  
  For objects it is like blocks. NEVER (ever) use partial rpc ops. r/wsize
  are not relevant for obj-LD IO.
  
  (As I understand from Trond also with files the LD should inspect the
   situation and have the final disposition. But someone will need to write
   some files-LD code for that, Fred, Andy?)
  
  > Thanks,
  > Tao

We discussed this on the call today.  Boaz is going to write a brief
description of how to fix this in the generic layer, then I'm going to
implement it.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 36648e1..9143e61 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -919,14 +919,30 @@  bl_clear_layoutdriver(struct nfs_server *server)
 	return 0;
 }
 
+static void bl_pg_init_read(struct nfs_pageio_descriptor *pgio,
+			    struct nfs_page *req)
+{
+	pnfs_generic_pg_init_read(pgio, req);
+	if (pgio->pg_lseg)
+		pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
+}
+
+static void bl_pg_init_write(struct nfs_pageio_descriptor *pgio,
+			     struct nfs_page *req)
+{
+	pnfs_generic_pg_init_write(pgio, req);
+	if (pgio->pg_lseg)
+		pgio->pg_bsize = pgio->pg_lseg->pls_range.length;
+}
+
 static const struct nfs_pageio_ops bl_pg_read_ops = {
-	.pg_init = pnfs_generic_pg_init_read,
+	.pg_init = bl_pg_init_read,
 	.pg_test = pnfs_generic_pg_test,
 	.pg_doio = pnfs_generic_pg_readpages,
 };
 
 static const struct nfs_pageio_ops bl_pg_write_ops = {
-	.pg_init = pnfs_generic_pg_init_write,
+	.pg_init = bl_pg_init_write,
 	.pg_test = pnfs_generic_pg_test,
 	.pg_doio = pnfs_generic_pg_writepages,
 };