mbox series

[00/75] MM folio patches for 5.18

Message ID 20220204195852.1751729-1-willy@infradead.org (mailing list archive)
Headers show
Series MM folio patches for 5.18 | expand

Message

Matthew Wilcox Feb. 4, 2022, 7:57 p.m. UTC
Whole series availabke through git, and shortly in linux-next:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next
or git://git.infradead.org/users/willy/pagecache.git for-next

The first few patches should look familiar to most; these are converting
the GUP code to folios (and a few other things).  Most are well-reviewed,
but I did have to make significant changes to a few patches to accommodate
John's recent bugfix, so I dropped the R-b from them.

After the GUP changes, I started working on vmscan, trying to convert
all of shrink_page_list() to use a folio.  The pages it works on are
folios by definition (since they're chained through ->lru and ->lru
occupies the same bytes of memory as ->compound_head, so they can't be
tail pages).  This is a ridiculously large function, and I'm only part
of the way through it.  I have, however, finished converting rmap_walk()
and friends to take a folio instead of a page.

Midway through, there's a short detour to fix up page_vma_mapped_walk to
work on an explicit PFN range instead of a page.  I had been intending to
convert that to use a folio, but with page_mapped_in_vma() really just
wanting to know about one page (even if it's a head page) and Muchun
wanting to walk pageless memory, making all the users use PFNs just
seemed like the right thing to do.

The last 9 patches actually start adding large folios to the page cache.
This is where I expect the most trouble, but they've been stable in my
testing for a while.

Matthew Wilcox (Oracle) (74):
  mm/gup: Increment the page refcount before the pincount
  mm/gup: Remove for_each_compound_range()
  mm/gup: Remove for_each_compound_head()
  mm/gup: Change the calling convention for compound_range_next()
  mm/gup: Optimise compound_range_next()
  mm/gup: Change the calling convention for compound_next()
  mm/gup: Fix some contiguous memmap assumptions
  mm/gup: Remove an assumption of a contiguous memmap
  mm/gup: Handle page split race more efficiently
  mm/gup: Remove hpage_pincount_add()
  mm/gup: Remove hpage_pincount_sub()
  mm: Make compound_pincount always available
  mm: Add folio_pincount_ptr()
  mm: Turn page_maybe_dma_pinned() into folio_maybe_dma_pinned()
  mm/gup: Add try_get_folio() and try_grab_folio()
  mm/gup: Convert try_grab_page() to use a folio
  mm: Remove page_cache_add_speculative() and
    page_cache_get_speculative()
  mm/gup: Add gup_put_folio()
  mm/hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
  mm/gup: Convert gup_pte_range() to use a folio
  mm/gup: Convert gup_hugepte() to use a folio
  mm/gup: Convert gup_huge_pmd() to use a folio
  mm/gup: Convert gup_huge_pud() to use a folio
  mm/gup: Convert gup_huge_pgd() to use a folio
  mm/gup: Turn compound_next() into gup_folio_next()
  mm/gup: Turn compound_range_next() into gup_folio_range_next()
  mm: Turn isolate_lru_page() into folio_isolate_lru()
  mm/gup: Convert check_and_migrate_movable_pages() to use a folio
  mm/workingset: Convert workingset_eviction() to take a folio
  mm/memcg: Convert mem_cgroup_swapout() to take a folio
  mm: Add lru_to_folio()
  mm: Turn putback_lru_page() into folio_putback_lru()
  mm/vmscan: Convert __remove_mapping() to take a folio
  mm/vmscan: Turn page_check_dirty_writeback() into
    folio_check_dirty_writeback()
  mm: Turn head_compound_mapcount() into folio_entire_mapcount()
  mm: Add folio_mapcount()
  mm: Add split_folio_to_list()
  mm: Add folio_is_zone_device() and folio_is_device_private()
  mm: Add folio_pgoff()
  mm: Add pvmw_set_page() and pvmw_set_folio()
  hexagon: Add pmd_pfn()
  mm: Convert page_vma_mapped_walk to work on PFNs
  mm/page_idle: Convert page_idle_clear_pte_refs() to use a folio
  mm/rmap: Use a folio in page_mkclean_one()
  mm/rmap: Turn page_referenced() into folio_referenced()
  mm/mlock: Turn clear_page_mlock() into folio_end_mlock()
  mm/mlock: Turn mlock_vma_page() into mlock_vma_folio()
  mm/rmap: Turn page_mlock() into folio_mlock()
  mm/mlock: Turn munlock_vma_page() into munlock_vma_folio()
  mm/huge_memory: Convert __split_huge_pmd() to take a folio
  mm/rmap: Convert try_to_unmap() to take a folio
  mm/rmap: Convert try_to_migrate() to folios
  mm/rmap: Convert make_device_exclusive_range() to use folios
  mm/migrate: Convert remove_migration_ptes() to folios
  mm/damon: Convert damon_pa_mkold() to use a folio
  mm/damon: Convert damon_pa_young() to use a folio
  mm/rmap: Turn page_lock_anon_vma_read() into
    folio_lock_anon_vma_read()
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Convert rmap_walk() to take a folio
  mm/rmap: Constify the rmap_walk_control argument
  mm/vmscan: Free non-shmem folios without splitting them
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Account large folios correctly
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Convert pageout() to take a folio
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/filemap: Allow large folios to be added to the page cache
  mm: Fix READ_ONLY_THP warning
  mm: Make large folios depend on THP
  mm: Support arbitrary THP sizes
  mm/readahead: Add large folio readahead
  mm/readahead: Switch to page_cache_ra_order
  mm/filemap: Support VM_HUGEPAGE for file mappings
  selftests/vm/transhuge-stress: Support file-backed PMD folios

William Kucharski (1):
  mm/readahead: Align file mappings for non-DAX

 Documentation/core-api/pin_user_pages.rst     |  18 +-
 arch/hexagon/include/asm/pgtable.h            |   3 +-
 arch/powerpc/include/asm/mmu_context.h        |   1 -
 include/linux/huge_mm.h                       |  59 +--
 include/linux/hugetlb.h                       |   5 +
 include/linux/ksm.h                           |   6 +-
 include/linux/mm.h                            | 145 +++---
 include/linux/mm_types.h                      |   7 +-
 include/linux/pagemap.h                       |  32 +-
 include/linux/rmap.h                          |  50 ++-
 include/linux/swap.h                          |   6 +-
 include/trace/events/vmscan.h                 |  10 +-
 kernel/events/uprobes.c                       |   2 +-
 mm/damon/paddr.c                              |  52 ++-
 mm/debug.c                                    |  18 +-
 mm/filemap.c                                  |  59 ++-
 mm/folio-compat.c                             |  34 ++
 mm/gup.c                                      | 383 +++++++---------
 mm/huge_memory.c                              | 127 +++---
 mm/hugetlb.c                                  |   7 +-
 mm/internal.h                                 |  52 ++-
 mm/ksm.c                                      |  17 +-
 mm/memcontrol.c                               |  22 +-
 mm/memory-failure.c                           |  10 +-
 mm/memory_hotplug.c                           |  13 +-
 mm/migrate.c                                  |  90 ++--
 mm/mlock.c                                    | 136 +++---
 mm/page_alloc.c                               |   3 +-
 mm/page_idle.c                                |  26 +-
 mm/page_vma_mapped.c                          |  58 ++-
 mm/readahead.c                                | 108 ++++-
 mm/rmap.c                                     | 416 +++++++++---------
 mm/util.c                                     |  36 +-
 mm/vmscan.c                                   | 280 ++++++------
 mm/workingset.c                               |  25 +-
 tools/testing/selftests/vm/transhuge-stress.c |  35 +-
 36 files changed, 1270 insertions(+), 1081 deletions(-)

Comments

John Hubbard Feb. 13, 2022, 10:31 p.m. UTC | #1
On 2/4/22 11:57, Matthew Wilcox (Oracle) wrote:
> Whole series availabke through git, and shortly in linux-next:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next
> or git://git.infradead.org/users/willy/pagecache.git for-next

Hi Matthew,

I'm having trouble finding this series linux-next, or mmotm either. Has
the plan changed, or maybe I'm just Doing It Wrong? :)

Background as to why (you can skip this part unless you're wondering):

Locally, I've based a small but critical patch on top of this series. It
introduces a new routine:

     void pin_user_page(struct page *page);

...which is a prerequisite for converting Direct IO over to use
FOLL_PIN.

For that, I am on the fence about whether to request putting the first
part of my conversion patchset into 5.18, or 5.19. Ideally, I'd like to
keep it based on your series, because otherwise there are a couple of
warts in pin_user_page() that have to be fixed up later. But on the
other hand, it would be nice to get the prerequisites in place, because
many filesystems need small changes.

Here's the diffs for "mm/gup: introduce pin_user_page()", for reference:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73b7e4bd250b..c2bb8099a56b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1963,6 +1963,7 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
  long pin_user_pages(unsigned long start, unsigned long nr_pages,
  		    unsigned int gup_flags, struct page **pages,
  		    struct vm_area_struct **vmas);
+void pin_user_page(struct page *page);
  long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
  		    struct page **pages, unsigned int gup_flags);
  long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index 7150ea002002..7d57c3452192 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3014,6 +3014,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
  }
  EXPORT_SYMBOL(pin_user_pages);

+/**
+ * pin_user_page() - apply a FOLL_PIN reference to a page ()
+ *
+ * @page: the page to be pinned.
+ *
+ * Similar to get_user_pages(), in that the page's refcount is elevated using
+ * FOLL_PIN rules.
+ *
+ * IMPORTANT: That means that the caller must release the page via
+ * unpin_user_page().
+ *
+ */
+void pin_user_page(struct page *page)
+{
+	struct folio *folio = page_folio(page);
+
+	WARN_ON_ONCE(folio_ref_count(folio) <= 0);
+
+	/*
+	 * Similar to try_grab_page(): be sure to *also*
+	 * increment the normal page refcount field at least once,
+	 * so that the page really is pinned.
+	 */
+	if (folio_test_large(folio)) {
+		folio_ref_add(folio, 1);
+		atomic_add(1, folio_pincount_ptr(folio));
+	} else {
+		folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+	}
+
+	node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+}
+EXPORT_SYMBOL(pin_user_page);
+
  /*
   * pin_user_pages_unlocked() is the FOLL_PIN variant of
   * get_user_pages_unlocked(). Behavior is the same, except that this one sets


thanks,
Matthew Wilcox Feb. 14, 2022, 4:33 a.m. UTC | #2
On Fri, Feb 04, 2022 at 07:57:37PM +0000, Matthew Wilcox (Oracle) wrote:
> Whole series availabke through git, and shortly in linux-next:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next
> or git://git.infradead.org/users/willy/pagecache.git for-next

I've just pushed out a new version to infradead.  I'll probably forget
a few things, but major differences:

 - Incorporate various fixes from others including:
   - Implement pmd_pfn() for various arches from Mike Rapoport
   - lru_to_folio() now a function from Christoph Hellwig
   - Change a unpin_user_page() call to gup_put_folio() from Mark Hemment
 - Use DEFINE_PAGE_VMA_WALK() and DEFINE_FOLIO_VMA_WALK() instead of the
   pvmw_set_page()/folio() calls that were in this patch set.
 - A new set of ten patches around invalidate_inode_page().  I'll send
   them out as a fresh patchset tomorrow.
 - Add various Reviewed-by trailers.
 - Updated to -rc4.