Message ID | 20200404162826.181808-1-hubcap@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | orangefs: complete Christoph's "remember count" reversion. | expand |
On Sat, Apr 04, 2020 at 12:28:26PM -0400, hubcap@kernel.org wrote: > As an aside, the page cache has been a blessing and a curse > for us. Since we started using it, small IO has improved > incredibly, but our max speed hits a plateau before it otherwise > would have. I think because of all the page size copies we have > to do to fill our 4 meg native buffers. I try to read about all > the new work going into the page cache in lwn, and make some > sense of the new code :-). One thing I remember is when > Christoph Lameter said "the page cache does not scale", but > I know the new work is focused on that. If anyone has any > thoughts about how we could make improvments on filling our > native buffers from the page cache (larger page sizes?), > feel free to offer any help... Umm, 4MB native buffers are ... really big ;-) I wasn't planning on going past PMD_SIZE (ie 2MB on x86) for the readahead large pages, but if a filesystem wants that, then I should change that plan. What I was planning for, but don't quite have an implementation nailed down for yet, is allowing filesystems to grow the readahead beyond that wanted by the generic code. Filesystems which implement compression frequently want blocks in the 256kB size range. It seems like OrangeFS would fit with that scheme, as long as I don't put a limit on what the filesystem asks for. So yes, I think within the next year, you should be able to tell the page cache to allocate 4MB pages. You will still need a fallback path for when memory is too fragmented to allocate new pages, but if you're using 4MB pages, then hopefully we'll be able to reclaim a clean 4MB pages from elsewhere in the page cache and supply you with a new one.
Matthew> So yes, I think within the next year, you should be Matthew> able to tell the page cache to allocate 4MB pages. I can't find the ascii thumbs up emoji :-) ... -Mike On Sat, Apr 4, 2020 at 1:43 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Sat, Apr 04, 2020 at 12:28:26PM -0400, hubcap@kernel.org wrote: > > As an aside, the page cache has been a blessing and a curse > > for us. Since we started using it, small IO has improved > > incredibly, but our max speed hits a plateau before it otherwise > > would have. I think because of all the page size copies we have > > to do to fill our 4 meg native buffers. I try to read about all > > the new work going into the page cache in lwn, and make some > > sense of the new code :-). One thing I remember is when > > Christoph Lameter said "the page cache does not scale", but > > I know the new work is focused on that. If anyone has any > > thoughts about how we could make improvments on filling our > > native buffers from the page cache (larger page sizes?), > > feel free to offer any help... > > Umm, 4MB native buffers are ... really big ;-) I wasn't planning on > going past PMD_SIZE (ie 2MB on x86) for the readahead large pages, > but if a filesystem wants that, then I should change that plan. > > What I was planning for, but don't quite have an implementation nailed > down for yet, is allowing filesystems to grow the readahead beyond that > wanted by the generic code. Filesystems which implement compression > frequently want blocks in the 256kB size range. It seems like OrangeFS > would fit with that scheme, as long as I don't put a limit on what the > filesystem asks for. > > So yes, I think within the next year, you should be able to tell the > page cache to allocate 4MB pages. You will still need a fallback path > for when memory is too fragmented to allocate new pages, but if you're > using 4MB pages, then hopefully we'll be able to reclaim a clean 4MB > pages from elsewhere in the page cache and supply you with a new one.
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c index 961c0fd8675a..fb0884626d18 100644 --- a/fs/orangefs/inode.c +++ b/fs/orangefs/inode.c @@ -259,46 +259,19 @@ static int orangefs_readpage(struct file *file, struct page *page) pgoff_t index; /* which page */ struct page *next_page; char *kaddr; - struct orangefs_read_options *ro = file->private_data; loff_t read_size; - loff_t roundedup; int buffer_index = -1; /* orangefs shared memory slot */ int slot_index; /* index into slot */ int remaining; /* - * If they set some miniscule size for "count" in read(2) - * (for example) then let's try to read a page, or the whole file - * if it is smaller than a page. Once "count" goes over a page - * then lets round up to the highest page size multiple that is - * less than or equal to "count" and do that much orangefs IO and - * try to fill as many pages as we can from it. - * - * "count" should be represented in ro->blksiz. - * - * inode->i_size = file size. + * Get up to this many bytes from Orangefs at a time and try + * to fill them into the page cache at once. + * Tests with dd made this seem like a reasonable static + * number, if there was interest perhaps this number could + * be made setable through sysfs... */ - if (ro) { - if (ro->blksiz < PAGE_SIZE) { - if (inode->i_size < PAGE_SIZE) - read_size = inode->i_size; - else - read_size = PAGE_SIZE; - } else { - roundedup = ((PAGE_SIZE - 1) & ro->blksiz) ? - ((ro->blksiz + PAGE_SIZE) & ~(PAGE_SIZE -1)) : - ro->blksiz; - if (roundedup > inode->i_size) - read_size = inode->i_size; - else - read_size = roundedup; - - } - } else { - read_size = PAGE_SIZE; - } - if (!read_size) - read_size = PAGE_SIZE; + read_size = 524288; if (PageDirty(page)) orangefs_launder_page(page);