diff mbox

[PATCHSET,v6,0/26] pnfs for 2.6.40

Message ID 4DDBCBBA.5040700@panasas.com (mailing list archive)
State New, archived
Headers show

Commit Message

Benny Halevy May 24, 2011, 3:16 p.m. UTC
On 2011-05-23 21:50, Boaz Harrosh wrote:
> On 05/23/2011 07:33 PM, Benny Halevy wrote:
> Benny Hi
> 
> I have a problem that the default wsize is very small 64K and
> I get small IOs. I found that the governing member right now
> is NFS_SERVER()->wsize
> 
> I did the below hack on My current code, but you took that away
> from me.
> 
> diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
> index ec40408..f7b09e1 100644
> --- a/fs/nfs/objlayout/objlayout.c
> +++ b/fs/nfs/objlayout/objlayout.c
> +	server->wsize = ((PAGE_SIZE - sizeof(struct bio)) / sizeof(struct bio_vec))
> +			* PAGE_SIZE * 2;
>  
> -	dprintk("%s: Return data=%p\n", __func__, data);
> +	dprintk("%s: Return data=%p wsize=0x%x\n", __func__, data, server->wsize);
>  	return 0;
>  }
> 
> What do you want that we do to replace this. The default 64K is to small.
> I don't mind that for pnfs it will be ~0 and the pg_test() will test
> for maxc_size as well. But then we'll also need the current size or the
> start_index
> 
> Boaz

How about this approach?

git diff --stat -p -M
 fs/nfs/pagelist.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

 		if (!nfs_can_coalesce_requests(prev, req, desc))
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Benny Halevy May 24, 2011, 3:49 p.m. UTC | #1
On 2011-05-24 18:34, Myklebust, Trond wrote:
> On Tue, 2011-05-24 at 18:16 +0300, Benny Halevy wrote:
>> On 2011-05-23 21:50, Boaz Harrosh wrote:
>> > On 05/23/2011 07:33 PM, Benny Halevy wrote:
>> > Benny Hi
>> >
>> > I have a problem that the default wsize is very small 64K and
>> > I get small IOs. I found that the governing member right now
>> > is NFS_SERVER()->wsize
>> >
>> > I did the below hack on My current code, but you took that away
>> > from me.
>> >
>> > diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
>> > index ec40408..f7b09e1 100644
>> > --- a/fs/nfs/objlayout/objlayout.c
>> > +++ b/fs/nfs/objlayout/objlayout.c
>> > +   server->wsize = ((PAGE_SIZE - sizeof(struct bio)) /
> sizeof(struct bio_vec))
>> > +                   * PAGE_SIZE * 2;
>> > 
>> > -   dprintk("%s: Return data=%p\n", __func__, data);
>> > +   dprintk("%s: Return data=%p wsize=0x%x\n", __func__, data,
> server->wsize);
>> >     return 0;
>> >  }
>> >
>> > What do you want that we do to replace this. The default 64K is to
> small.
>> > I don't mind that for pnfs it will be ~0 and the pg_test() will test
>> > for maxc_size as well. But then we'll also need the current size or the
>> > start_index
>> >
>> > Boaz
>>
>> How about this approach?
>>
>> git diff --stat -p -M
>>  fs/nfs/pagelist.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index c80add6..3f5508b 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -293,7 +293,7 @@ static int nfs_pageio_do_add_request(struct
>> nfs_pageio_descriptor *desc,
>>               if (desc->pg_bsize < PAGE_SIZE)
>>                       return 0;
>>               newlen += desc->pg_count;
>> -             if (newlen > desc->pg_bsize)
>> +             if (newlen > desc->pg_bsize && !desc->pg_test)
>>                       return 0;
>>               prev = nfs_list_entry(desc->pg_list.prev);
>>               if (!nfs_can_coalesce_requests(prev, req, desc))
> 
> Alternatively, clean the above up by putting the newlen > desc->pg_bsize
> test into a default nfs_generic_test_coalesce() and require ordinary NFS
> reads and writes to set that as their desc->pg_test().

Good idea!

I'll send a RFC patch including the generic pnfs pg_test for
the layout drivers.

Fred - I hope you haven't started working on pg_test, have you?
Please let me know.

Benny

> 
> Cheers
>   Trond
> --
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh May 24, 2011, 3:56 p.m. UTC | #2
On 05/24/2011 06:34 PM, Myklebust, Trond wrote:
> On Tue, 2011-05-24 at 18:16 +0300, Benny Halevy wrote:
>> On 2011-05-23 21:50, Boaz Harrosh wrote:
>> > On 05/23/2011 07:33 PM, Benny Halevy wrote:
>> > Benny Hi
>> >
>> > I have a problem that the default wsize is very small 64K and
>> > I get small IOs. I found that the governing member right now
>> > is NFS_SERVER()->wsize
>> >
>> > I did the below hack on My current code, but you took that away
>> > from me.
>> >
>> > diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
>> > index ec40408..f7b09e1 100644
>> > --- a/fs/nfs/objlayout/objlayout.c
>> > +++ b/fs/nfs/objlayout/objlayout.c
>> > +   server->wsize = ((PAGE_SIZE - sizeof(struct bio)) / sizeof(struct bio_vec))
>> > +                   * PAGE_SIZE * 2;
>> > 
>> > -   dprintk("%s: Return data=%p\n", __func__, data);
>> > +   dprintk("%s: Return data=%p wsize=0x%x\n", __func__, data, server->wsize);
>> >     return 0;
>> >  }
>> >
>> > What do you want that we do to replace this. The default 64K is to small.
>> > I don't mind that for pnfs it will be ~0 and the pg_test() will test
>> > for maxc_size as well. But then we'll also need the current size or the
>> > start_index
>> >
>> > Boaz
>>
>> How about this approach?
>>
>> git diff --stat -p -M
>>  fs/nfs/pagelist.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index c80add6..3f5508b 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -293,7 +293,7 @@ static int nfs_pageio_do_add_request(struct
>> nfs_pageio_descriptor *desc,
>>               if (desc->pg_bsize < PAGE_SIZE)
>>                       return 0;
>>               newlen += desc->pg_count;
>> -             if (newlen > desc->pg_bsize)
>> +             if (newlen > desc->pg_bsize && !desc->pg_test)
>>                       return 0;
>>               prev = nfs_list_entry(desc->pg_list.prev);
>>               if (!nfs_can_coalesce_requests(prev, req, desc))
> 
> Alternatively, clean the above up by putting the newlen > desc->pg_bsize
> test into a default nfs_generic_test_coalesce() and require ordinary NFS
> reads and writes to set that as their desc->pg_test().
> 
> Cheers
>   Trond

This all approach sounds very good to me. But please I have some questions?

In the nfs_pageio_descriptor passed to pg_test() together with the two pages:
Which member means the current byte_size (or page_count?) and what is the
meaning of some of these fields

struct nfs_pageio_descriptor {
        ....
	unsigned long		pg_bytes_written;
		Is this for result after read/write?

	size_t			pg_count;
		Is this the number of pages added up to now?
		Do we also have the start of the first page?

	size_t			pg_bsize;
		So I understand this is the max allowed pages. Does
		that mean also the allocated size or Just the negotiated
		size with the server? (Really bad name if you ask me)

	unsigned int		pg_base;
		Is that the index of the first page? That cannot be, the page->index
		needs to be 64bit. So what is this then?
	
	...
};

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust May 24, 2011, 4:21 p.m. UTC | #3
On Tue, 2011-05-24 at 18:56 +0300, Boaz Harrosh wrote: 
> In the nfs_pageio_descriptor passed to pg_test() together with the two pages:
> Which member means the current byte_size (or page_count?) and what is the
> meaning of some of these fields
> 
> struct nfs_pageio_descriptor {
>         ....
> 	unsigned long		pg_bytes_written;
> 		Is this for result after read/write?

This is the total number of bytes we successfully called
nfs_pageio_doio() for. In other words, it should represent the total
number of bytes we put on the wire.

> 	size_t			pg_count;
> 		Is this the number of pages added up to now?
> 		Do we also have the start of the first page?

This is the number of bytes we have successfully coalesced into the
current i/o.

> 	size_t			pg_bsize;
> 		So I understand this is the max allowed pages. Does
> 		that mean also the allocated size or Just the negotiated
> 		size with the server? (Really bad name if you ask me)

It means the 'block size'. In ordinary NFS parlance that will be the
'rsize' or the 'wsize'.

> 	unsigned int		pg_base;
> 		Is that the index of the first page? That cannot be, the page->index
> 		needs to be 64bit. So what is this then?

It is used when dealing with I/O requests that are not page aligned.

If you consider the pages that we are to write out in the current I/O as
a single buffer, then the pg_base is the offset of the first byte to
write out/read in.
Boaz Harrosh May 24, 2011, 4:58 p.m. UTC | #4
On 05/24/2011 07:21 PM, Trond Myklebust wrote:
> On Tue, 2011-05-24 at 18:56 +0300, Boaz Harrosh wrote: 
>> In the nfs_pageio_descriptor passed to pg_test() together with the two pages:
>> Which member means the current byte_size (or page_count?) and what is the
>> meaning of some of these fields
>>
>> struct nfs_pageio_descriptor {
>>         ....
>> 	unsigned long		pg_bytes_written;
>> 		Is this for result after read/write?
> 
> This is the total number of bytes we successfully called
> nfs_pageio_doio() for. In other words, it should represent the total
> number of bytes we put on the wire.
> 
>> 	size_t			pg_count;
>> 		Is this the number of pages added up to now?
>> 		Do we also have the start of the first page?
> 
> This is the number of bytes we have successfully coalesced into the
> current i/o.
> 
>> 	size_t			pg_bsize;
>> 		So I understand this is the max allowed pages. Does
>> 		that mean also the allocated size or Just the negotiated
>> 		size with the server? (Really bad name if you ask me)
> 
> It means the 'block size'. In ordinary NFS parlance that will be the
> 'rsize' or the 'wsize'.
> 
>> 	unsigned int		pg_base;
>> 		Is that the index of the first page? That cannot be, the page->index
>> 		needs to be 64bit. So what is this then?
> 
> It is used when dealing with I/O requests that are not page aligned.
> 
> If you consider the pages that we are to write out in the current I/O as
> a single buffer, then the pg_base is the offset of the first byte to
> write out/read in.
> 

Thanks Trond, I think I understand and can write the proper implementation
for objects->pg_test now. One more question. I need the file offset
of the beginning of the write would that then be:

int objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
		      struct nfs_page *req)
}
	struct nfs_page *first_pg = list_entry(pgio->pg_list.next, struct nfs_page, wb_list);

	u64 io_offset = (first_pg->wb_index << PAGE_SHIFT) + first_pg->wb_offset;
	...
}

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust May 24, 2011, 5:05 p.m. UTC | #5
On Tue, 2011-05-24 at 19:58 +0300, Boaz Harrosh wrote: 
> Thanks Trond, I think I understand and can write the proper implementation
> for objects->pg_test now. One more question. I need the file offset
> of the beginning of the write would that then be:
> 
> int objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
> 		      struct nfs_page *req)
> }
> 	struct nfs_page *first_pg = list_entry(pgio->pg_list.next, struct nfs_page, wb_list);
> 
> 	u64 io_offset = (first_pg->wb_index << PAGE_SHIFT) + first_pg->wb_offset;
> 	...
> }

Yes. That looks correct.
Fred Isaman May 24, 2011, 5:07 p.m. UTC | #6
On Tue, May 24, 2011 at 11:49 AM, Benny Halevy <bhalevy@panasas.com> wrote:

> I'll send a RFC patch including the generic pnfs pg_test for
> the layout drivers.
>
> Fred - I hope you haven't started working on pg_test, have you?
> Please let me know.
>
> Benny
>
>

It is all yours.

Fred
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh May 24, 2011, 5:07 p.m. UTC | #7
On 05/24/2011 08:05 PM, Trond Myklebust wrote:
> On Tue, 2011-05-24 at 19:58 +0300, Boaz Harrosh wrote: 
>> Thanks Trond, I think I understand and can write the proper implementation
>> for objects->pg_test now. One more question. I need the file offset
>> of the beginning of the write would that then be:
>>
>> int objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
>> 		      struct nfs_page *req)
>> }
>> 	struct nfs_page *first_pg = list_entry(pgio->pg_list.next, struct nfs_page, wb_list);
>>
>> 	u64 io_offset = (first_pg->wb_index << PAGE_SHIFT) + first_pg->wb_offset;
>> 	...
>> }
> 
> Yes. That looks correct.
> 

Grate the new skim with pg_test looks perfect. We have all we need and it will be
really nice with the raid5/6 code. I think I will use a similar system in exofs
now.

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index c80add6..3f5508b 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -293,7 +293,7 @@  static int nfs_pageio_do_add_request(struct
nfs_pageio_descriptor *desc,
 		if (desc->pg_bsize < PAGE_SIZE)
 			return 0;
 		newlen += desc->pg_count;
-		if (newlen > desc->pg_bsize)
+		if (newlen > desc->pg_bsize && !desc->pg_test)
 			return 0;
 		prev = nfs_list_entry(desc->pg_list.prev);