mbox series

[RFC,00/15] Large pages in the page-cache

Message ID 20190925005214.27240-1-willy@infradead.org (mailing list archive)
Headers show
Series Large pages in the page-cache | expand

Message

Matthew Wilcox Sept. 25, 2019, 12:51 a.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Here's what I'm currently playing with.  I'm having trouble _testing_
it, but since akpm's patches were just merged into Linus' tree, I
thought this would be a good point to send out my current work tree.
Thanks to kbuild bot for finding a bunch of build problems ;-)

Matthew Wilcox (Oracle) (12):
  mm: Use vm_fault error code directly
  fs: Introduce i_blocks_per_page
  mm: Add file_offset_of_ helpers
  iomap: Support large pages
  xfs: Support large pages
  xfs: Pass a page to xfs_finish_page_writeback
  mm: Make prep_transhuge_page tail-callable
  mm: Add __page_cache_alloc_order
  mm: Allow large pages to be added to the page cache
  mm: Allow find_get_page to be used for large pages
  mm: Remove hpage_nr_pages
  xfs: Use filemap_huge_fault

William Kucharski (3):
  mm: Support removing arbitrary sized pages from mapping
  mm: Add a huge page fault handler for files
  mm: Align THP mappings for non-DAX

 drivers/net/ethernet/ibm/ibmveth.c |   2 -
 drivers/nvdimm/btt.c               |   4 +-
 drivers/nvdimm/pmem.c              |   3 +-
 fs/iomap/buffered-io.c             | 121 +++++++----
 fs/jfs/jfs_metapage.c              |   2 +-
 fs/xfs/xfs_aops.c                  |  37 ++--
 fs/xfs/xfs_file.c                  |   5 +-
 include/linux/huge_mm.h            |  15 +-
 include/linux/iomap.h              |   2 +-
 include/linux/mm.h                 |  12 ++
 include/linux/mm_inline.h          |   6 +-
 include/linux/pagemap.h            |  73 ++++++-
 mm/filemap.c                       | 311 ++++++++++++++++++++++++++---
 mm/gup.c                           |   2 +-
 mm/huge_memory.c                   |  11 +-
 mm/internal.h                      |   4 +-
 mm/memcontrol.c                    |  14 +-
 mm/memory_hotplug.c                |   4 +-
 mm/mempolicy.c                     |   2 +-
 mm/migrate.c                       |  19 +-
 mm/mlock.c                         |   9 +-
 mm/page_io.c                       |   4 +-
 mm/page_vma_mapped.c               |   6 +-
 mm/rmap.c                          |   8 +-
 mm/swap.c                          |   4 +-
 mm/swap_state.c                    |   4 +-
 mm/swapfile.c                      |   2 +-
 mm/vmscan.c                        |   9 +-
 28 files changed, 519 insertions(+), 176 deletions(-)

Comments

Hillf Danton Oct. 2, 2019, 1:07 p.m. UTC | #1
On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> +/**
> + * file_offset_of_page - File offset of this page.
> + * @page: Page cache page.
> + *
> + * Context: Any context.
> + * Return: The offset of the first byte of this page.
>   */
> -static inline loff_t page_offset(struct page *page)
> +static inline loff_t file_offset_of_page(struct page *page)
>  {
>  	return ((loff_t)page->index) << PAGE_SHIFT;
>  }
>  
>  static inline loff_t page_file_offset(struct page *page)
>  {
>  	return ((loff_t)page_index(page)) << PAGE_SHIFT;

Would you like to specify the need to build a moon on the moon,
with another name though?

> -- 
> 2.23.0
Hillf Danton Oct. 2, 2019, 1:32 p.m. UTC | #2
On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> 
> @@ -1415,6 +1415,8 @@ static inline void clear_page_pfmemalloc(struct page *page)
>  extern void pagefault_out_of_memory(void);
>  
>  #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)

With the above define, the page_offset function is not named as badly
as 03/15 claims.

> +#define offset_in_this_page(page, p)	\
> +	((unsigned long)(p) & (page_size(page) - 1))

What if Ted will post a rfc with offset_in_that_page defined next week?
Hillf Danton Oct. 3, 2019, 4:08 a.m. UTC | #3
On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> 
> The only part of the bvec we were accessing was the bv_page, so just
> pass that instead of the whole bvec.

Change is added in ABI without a bit of win.
Changes like this are not needed.
Hillf Danton Oct. 3, 2019, 5:08 a.m. UTC | #4
On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> 
> @@ -354,7 +354,7 @@ vma_address(struct page *page, struct vm_area_struct *vma)
>  	unsigned long start, end;
> 
>  	start = __vma_address(page, vma);
> -	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
> +	end = start + page_size(page) - 1;
> 
> @@ -57,7 +57,7 @@ static inline bool pfn_in_hpage(struct page *hpage, unsigned long pfn)
>  	unsigned long hpage_pfn = page_to_pfn(hpage);
> 
>  	/* THP can be referenced by any subpage */
> -	return pfn >= hpage_pfn && pfn - hpage_pfn < hpage_nr_pages(hpage);
> +	return (pfn - hpage_pfn) < compound_nr(hpage);
>  }
> 
> @@ -264,7 +264,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
>  	unsigned long start, end;
> 
>  	start = __vma_address(page, vma);
> -	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
> +	end = start + page_size(page) - 1;
> 
>  	if (unlikely(end < vma->vm_start || start >= vma->vm_end))

Be certain that nothing is added other than mechanical replacings in
the above hunks.
Matthew Wilcox Oct. 4, 2019, 7:33 p.m. UTC | #5
Your mail program is still broken.  This shows up as a reply to the 0/15
email instead of as a reply to the 3/15 email.

On Wed, Oct 02, 2019 at 09:07:53PM +0800, Hillf Danton wrote:
> On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> > +/**
> > + * file_offset_of_page - File offset of this page.
> > + * @page: Page cache page.
> > + *
> > + * Context: Any context.
> > + * Return: The offset of the first byte of this page.
> >   */
> > -static inline loff_t page_offset(struct page *page)
> > +static inline loff_t file_offset_of_page(struct page *page)
> >  {
> >  	return ((loff_t)page->index) << PAGE_SHIFT;
> >  }
> >  
> >  static inline loff_t page_file_offset(struct page *page)
> >  {
> >  	return ((loff_t)page_index(page)) << PAGE_SHIFT;
> 
> Would you like to specify the need to build a moon on the moon,
> with another name though?

I have no idea what you mean.  Is this an idiom in your native language,
perhaps?
Matthew Wilcox Oct. 4, 2019, 7:34 p.m. UTC | #6
On Wed, Oct 02, 2019 at 09:32:11PM +0800, Hillf Danton wrote:
> 
> On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> > 
> > @@ -1415,6 +1415,8 @@ static inline void clear_page_pfmemalloc(struct page *page)
> >  extern void pagefault_out_of_memory(void);
> >  
> >  #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)
> 
> With the above define, the page_offset function is not named as badly
> as 03/15 claims.

Just because there exists a function that does the job, does not mean that
the other function is correctly named.

> > +#define offset_in_this_page(page, p)	\
> > +	((unsigned long)(p) & (page_size(page) - 1))
> 
> What if Ted will post a rfc with offset_in_that_page defined next week?

Are you trying to be funny?
Matthew Wilcox Oct. 4, 2019, 7:35 p.m. UTC | #7
On Thu, Oct 03, 2019 at 12:08:46PM +0800, Hillf Danton wrote:
> 
> On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> > 
> > The only part of the bvec we were accessing was the bv_page, so just
> > pass that instead of the whole bvec.
> 
> Change is added in ABI without a bit of win.
> Changes like this are not needed.

ABI?  This is a static function.  The original recommendation to do this
came from Christoph, who I would trust over you as a referee of what
changes to make to XFS.
Matthew Wilcox Oct. 4, 2019, 7:36 p.m. UTC | #8
On Thu, Oct 03, 2019 at 01:08:59PM +0800, Hillf Danton wrote:
> 
> On Tue, 24 Sep 2019 17:52:02 -0700 From: Matthew Wilcox (Oracle)
> > 
> > @@ -354,7 +354,7 @@ vma_address(struct page *page, struct vm_area_struct *vma)
> >  	unsigned long start, end;
> > 
> >  	start = __vma_address(page, vma);
> > -	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
> > +	end = start + page_size(page) - 1;
> > 
> > @@ -57,7 +57,7 @@ static inline bool pfn_in_hpage(struct page *hpage, unsigned long pfn)
> >  	unsigned long hpage_pfn = page_to_pfn(hpage);
> > 
> >  	/* THP can be referenced by any subpage */
> > -	return pfn >= hpage_pfn && pfn - hpage_pfn < hpage_nr_pages(hpage);
> > +	return (pfn - hpage_pfn) < compound_nr(hpage);
> >  }
> > 
> > @@ -264,7 +264,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
> >  	unsigned long start, end;
> > 
> >  	start = __vma_address(page, vma);
> > -	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
> > +	end = start + page_size(page) - 1;
> > 
> >  	if (unlikely(end < vma->vm_start || start >= vma->vm_end))
> 
> Be certain that nothing is added other than mechanical replacings in
> the above hunks.

Are you saying I've made a mistake?  If so, please be clear.