diff mbox

[PATCHv6,08/37] filemap: handle huge pages in do_generic_file_read()

Message ID 20170126115819.58875-9-kirill.shutemov@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

kirill.shutemov@linux.intel.com Jan. 26, 2017, 11:57 a.m. UTC
Most of work happans on head page. Only when we need to do copy data to
userspace we find relevant subpage.

We are still limited by PAGE_SIZE per iteration. Lifting this limitation
would require some more work.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/filemap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Feb. 9, 2017, 9:55 p.m. UTC | #1
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote:
> +++ b/mm/filemap.c
> @@ -1886,6 +1886,7 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
>  			if (unlikely(page == NULL))
>  				goto no_cached_page;
>  		}
> +		page = compound_head(page);

We got this page from find_get_page(), which gets it from
pagecache_get_page(), which gets it from find_get_entry() ... which
(unless I'm lost in your patch series) returns the head page.  So this
line is redundant, right?

But then down in filemap_fault, we have:

        VM_BUG_ON_PAGE(page->index != offset, page);

... again, maybe I'm lost somewhere in your patch series, but I don't see
anywhere you remove that line (or modify it).  So are you not testing
with VM debugging enabled, or are you not doing a test which includes
mapping a file with huge pages, reading from it (to get the page in cache),
then faulting on an address that is not in the first 4kB of that 2MB?
Kirill A . Shutemov Feb. 13, 2017, 3:33 p.m. UTC | #2
On Thu, Feb 09, 2017 at 01:55:05PM -0800, Matthew Wilcox wrote:
> On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote:
> > +++ b/mm/filemap.c
> > @@ -1886,6 +1886,7 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
> >  			if (unlikely(page == NULL))
> >  				goto no_cached_page;
> >  		}
> > +		page = compound_head(page);
> 
> We got this page from find_get_page(), which gets it from
> pagecache_get_page(), which gets it from find_get_entry() ... which
> (unless I'm lost in your patch series) returns the head page.  So this
> line is redundant, right?

No. pagecache_get_page() returns subpage. See description of the first
patch.

> But then down in filemap_fault, we have:
> 
>         VM_BUG_ON_PAGE(page->index != offset, page);
> 
> ... again, maybe I'm lost somewhere in your patch series, but I don't see
> anywhere you remove that line (or modify it).

This should be fine as find_get_page() returns subpage.

> So are you not testing
> with VM debugging enabled, or are you not doing a test which includes
> mapping a file with huge pages, reading from it (to get the page in cache),
> then faulting on an address that is not in the first 4kB of that 2MB?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Matthew Wilcox Feb. 13, 2017, 4:01 p.m. UTC | #3
On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote:
> No. pagecache_get_page() returns subpage. See description of the first
> patch.

Your description says:

> We also change interface for page-cache lookup function:
> 
>   - functions that lookup for pages[1] would return subpages of THP
>     relevant for requested indexes;
> 
>   - functions that lookup for entries[2] would return one entry per-THP
>     and index will point to index of head page (basically, round down to
>     HPAGE_PMD_NR);
> 
> This would provide balanced exposure of multi-order entires to the rest
> of the kernel.
> 
> [1] find_get_pages(), pagecache_get_page(), pagevec_lookup(), etc.
> [2] find_get_entry(), find_get_entries(), pagevec_lookup_entries(), etc.

I'm saying:

> > We got this page from find_get_page(), which gets it from
> > pagecache_get_page(), which gets it from find_get_entry() ... which
> > (unless I'm lost in your patch series) returns the head page.

Am I guilty of debugging documentation rather than code?
Matthew Wilcox Feb. 13, 2017, 4:09 p.m. UTC | #4
On Mon, Feb 13, 2017 at 08:01:17AM -0800, Matthew Wilcox wrote:
> On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote:
> > No. pagecache_get_page() returns subpage. See description of the first
> > patch.

Oh, I re-read patch 1 and it made sense now.  I missed the bit where
pagecache_get_page() called page_subpage().
Matthew Wilcox Feb. 13, 2017, 4:28 p.m. UTC | #5
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote:
> Most of work happans on head page. Only when we need to do copy data to
> userspace we find relevant subpage.
> 
> We are still limited by PAGE_SIZE per iteration. Lifting this limitation
> would require some more work.

Now that I debugged that bit of my brain, here's a more sensible suggestion.

> @@ -1886,6 +1886,7 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
>  			if (unlikely(page == NULL))
>  				goto no_cached_page;
>  		}
> +		page = compound_head(page);
>  		if (PageReadahead(page)) {
>  			page_cache_async_readahead(mapping,
>  					ra, filp, page,

We're going backwards and forwards a lot between subpages and page heads.
I'd like to see us do this:

static inline struct page *pagecache_get_page(struct address_space *mapping,
			pgoff_t offset, int fgp_flags, gfp_t cache_gfp_mask)
{
	struct page *page = pagecache_get_head(mapping, offset, fgp_flags,
								cache_gfp_mask);
	return page ? find_subpage(page, offset) : NULL;
}

static inline struct page *find_get_head(struct address_space *mapping,
					pgoff_t offset)
{
	return pagecache_get_head(mapping, offset, 0, 0);
}

and then we can switch do_generic_file_read() to call find_get_head(),
eliminating the conversion back and forth between subpages and head pages.
diff mbox

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 301327685a71..6cba69176ea9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1886,6 +1886,7 @@  static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
 			if (unlikely(page == NULL))
 				goto no_cached_page;
 		}
+		page = compound_head(page);
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(mapping,
 					ra, filp, page,
@@ -1967,7 +1968,8 @@  static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
 		 * now we can copy it to user space...
 		 */
 
-		ret = copy_page_to_iter(page, offset, nr, iter);
+		ret = copy_page_to_iter(page + index - page->index, offset,
+				nr, iter);
 		offset += ret;
 		index += offset >> PAGE_SHIFT;
 		offset &= ~PAGE_MASK;
@@ -2385,6 +2387,7 @@  int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * because there really aren't any performance issues here
 	 * and we need to check for errors.
 	 */
+	page = compound_head(page);
 	ClearPageError(page);
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {