mbox series

[v2,00/40] mm/rmap: interface overhaul

Message ID 20231220224504.646757-1-david@redhat.com (mailing list archive)
Headers show
Series mm/rmap: interface overhaul | expand

Message

David Hildenbrand Dec. 20, 2023, 10:44 p.m. UTC
This series overhauls the rmap interface, to get rid of the "bool compound"
/ RMAP_COMPOUND parameter with the goal of making the interface less error
prone, more future proof, and more natural to extend to "batching". Also,
this converts the interface to always consume folio+subpage, which speeds
up operations on large folios.

Further, this series adds PTE-batching variants for 4 rmap functions,
whereby only folio_add_anon_rmap_ptes() is used for batching in this series
when PTE-remapping a PMD-mapped THP. folio_remove_rmap_ptes(),
folio_try_dup_anon_rmap_ptes() and folio_dup_file_rmap_ptes() will soon
come in handy[1,2].

This series performs a lot of folio conversion along the way. Most of the
added LOC in the diff are only due to documentation.

As we're moving to a pte/pmd interface where we clearly express the
mapping granularity we are dealing with, we first get the remainder of
hugetlb out of the way, as it is special and expected to remain special: it
treats everything as a "single logical PTE" and only currently allows
entire mappings.

Even if we'd ever support partial mappings, I strongly assume the interface
and implementation will still differ heavily: hopefull we can avoid working
on subpages/subpage mapcounts completely and only add a "count" parameter
for them to enable batching.

New (extended) hugetlb interface that operates on entire folio:
 * hugetlb_add_new_anon_rmap() -> Already existed
 * hugetlb_add_anon_rmap() -> Already existed
 * hugetlb_try_dup_anon_rmap()
 * hugetlb_try_share_anon_rmap()
 * hugetlb_add_file_rmap()
 * hugetlb_remove_rmap()

New "ordinary" interface for small folios / THP::
 * folio_add_new_anon_rmap() -> Already existed
 * folio_add_anon_rmap_[pte|ptes|pmd]()
 * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
 * folio_try_share_anon_rmap_[pte|pmd]()
 * folio_add_file_rmap_[pte|ptes|pmd]()
 * folio_dup_file_rmap_[pte|ptes|pmd]()
 * folio_remove_rmap_[pte|ptes|pmd]()

folio_add_new_anon_rmap() will always map at the largest granularity
possible (currently, a single PMD to cover a PMD-sized THP). Could be
extended if ever required.

In the future, we might want "_pud" variants and eventually "_pmds"
variants for batching.

I ran some simple microbenchmarks on an Intel(R) Xeon(R) Silver 4210R:
measuring munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE
remapping PMD-mapped THPs on 1 GiB of memory.

For small folios, there is barely a change (< 1% improvement for me).

For PTE-mapped THP:
* PTE-remapping a PMD-mapped THP is more than 10% faster.
* fork() is more than 4% faster.
* MADV_DONTNEED is 2% faster
* COW when writing only a single byte on a COW-shared PTE is 1% faster
* munmap() barely changes (< 1%).

[1] https://lkml.kernel.org/r/20230810103332.3062143-1-ryan.roberts@arm.com
[2] https://lkml.kernel.org/r/20231204105440.61448-1-ryan.roberts@arm.com

---

If we pull this into mm/unstable in 2023, I'll have my notebook ready to
debug next to the Christmas tree. ;)

Based on current mm/mm-unstable. Compile-tested with/wihout THP on x86-64
and with defconig on a bunch more. Tested on x86-64.

v1 -> v2:
* Rebased on top of mm-unstable (minor conflicts)
* Move some sanity checks from #6 into #2 -> #5 and leave the remainder in
  #6
* Call it "rmap_level" instead of "rmap_mode".
* Consistently use "int" instead of "unsigned int" in rmap code
* Drop some stale comments
* Minor comment/description fixups + additions
* Spotted one last comment leftover, addressed in the (new) last patch
* Added RBs

RFC -> v1:
* Rebased on top of mm-unstable (containing mTHP)
* Use switch()-case and _always_inline for helper functions
* Fixed some (intermittend) compile issues and some smaller stuff
* folio_try_dup_anon_rmap_[pte|ptes|pmd]() rewrite
* Pass nr_pages consistently as "int"
* Simplify sanity checks
* Added RBs

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>

David Hildenbrand (40):
  mm/rmap: rename hugepage_add* to hugetlb_add*
  mm/rmap: introduce and use hugetlb_remove_rmap()
  mm/rmap: introduce and use hugetlb_add_file_rmap()
  mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
  mm/rmap: introduce and use hugetlb_try_share_anon_rmap()
  mm/rmap: add hugetlb sanity checks for anon rmap handling
  mm/rmap: convert folio_add_file_rmap_range() into
    folio_add_file_rmap_[pte|ptes|pmd]()
  mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()
  mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()
  mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()
  mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()
  mm/rmap: remove page_add_file_rmap()
  mm/rmap: factor out adding folio mappings into __folio_add_rmap()
  mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()
  mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()
  mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()
  mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
  mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
  mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
  mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
  mm/rmap: remove page_add_anon_rmap()
  mm/rmap: remove RMAP_COMPOUND
  mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()
  kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()
  mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()
  mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()
  mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()
  mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()
  mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()
  mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()
  Documentation: stop referring to page_remove_rmap()
  mm/rmap: remove page_remove_rmap()
  mm/rmap: convert page_dup_file_rmap() to
    folio_dup_file_rmap_[pte|ptes|pmd]()
  mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()
  mm/huge_memory: page_try_dup_anon_rmap() ->
    folio_try_dup_anon_rmap_pmd()
  mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()
  mm/rmap: remove page_try_dup_anon_rmap()
  mm: convert page_try_share_anon_rmap() to
    folio_try_share_anon_rmap_[pte|pmd]()
  mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED
  mm: remove one last reference to page_add_*_rmap()

 Documentation/mm/transhuge.rst       |   4 +-
 Documentation/mm/unevictable-lru.rst |   4 +-
 include/linux/mm.h                   |   6 +-
 include/linux/rmap.h                 | 397 +++++++++++++++++++-----
 kernel/events/uprobes.c              |   2 +-
 mm/filemap.c                         |  10 +-
 mm/gup.c                             |   2 +-
 mm/huge_memory.c                     |  85 +++---
 mm/hugetlb.c                         |  21 +-
 mm/internal.h                        |  14 +-
 mm/khugepaged.c                      |  17 +-
 mm/ksm.c                             |  15 +-
 mm/memory-failure.c                  |   4 +-
 mm/memory.c                          |  60 ++--
 mm/migrate.c                         |  12 +-
 mm/migrate_device.c                  |  41 +--
 mm/mmu_gather.c                      |   2 +-
 mm/rmap.c                            | 433 ++++++++++++++++-----------
 mm/swapfile.c                        |   2 +-
 mm/userfaultfd.c                     |   2 +-
 20 files changed, 739 insertions(+), 394 deletions(-)


base-commit: 2072407a394d0b3a3056f78a5630903da9471db0