mbox series

[v3,0/4] mm: introduce THP deferred setting

Message ID 20250414222456.43212-1-npache@redhat.com (mailing list archive)
Headers show
Series mm: introduce THP deferred setting | expand

Message

Nico Pache April 14, 2025, 10:24 p.m. UTC
This series is a follow-up to [1], which adds mTHP support to khugepaged.
mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl
configs to make sense. Without it global="defer" and  mTHP="inherit" case
is "undefined" behavior.

We've seen cases were customers switching from RHEL7 to RHEL8 see a
significant increase in the memory footprint for the same workloads.

Through our investigations we found that a large contributing factor to
the increase in RSS was an increase in THP usage.

For workloads like MySQL, or when using allocators like jemalloc, it is
often recommended to set /transparent_hugepages/enabled=never. This is
in part due to performance degradations and increased memory waste.

This series introduces enabled=defer, this setting acts as a middle
ground between always and madvise. If the mapping is MADV_HUGEPAGE, the
page fault handler will act normally, making a hugepage if possible. If
the allocation is not MADV_HUGEPAGE, then the page fault handler will
default to the base size allocation. The caveat is that khugepaged can
still operate on pages thats not MADV_HUGEPAGE.

This allows for three things... one, applications specifically designed to
use hugepages will get them, and two, applications that don't use
hugepages can still benefit from them without aggressively inserting
THPs at every possible chance. This curbs the memory waste, and defers
the use of hugepages to khugepaged. Khugepaged can then scan the memory
for eligible collapsing. Lastly there is the added benefit for those who want
THPs but experience higher latency PFs. Now you can get base page performance at
the PF handler and Hugepage performance for those mappings after they collapse.

Admins may want to lower max_ptes_none, if not, khugepaged may
aggressively collapse single allocations into hugepages.

TESTING:
- Built for x86_64, aarch64, ppc64le, and s390x
- selftests mm
- In [1] I provided a script [2] that has multiple access patterns
- lots of general use. These changes have been running in my VM for some time
- redis testing. This test was my original case for the defer mode. What I was
   able to prove was that THP=always leads to increased max_latency cases; hence
   why it is recommended to disable THPs for redis servers. However with 'defer'
   we dont have the max_latency spikes and can still get the system to utilize
   THPs. I further tested this with the mTHP defer setting and found that redis
   (and probably other jmalloc users) can utilize THPs via defer (+mTHP defer)
   without a large latency penalty and some potential gains.
   I uploaded some mmtest results here [3] which compares:
       stock+thp=never
       stock+(m)thp=always
       khugepaged-mthp + defer (max_ptes_none=64)

  The results show that (m)THPs can cause some throughput regression in some
  cases, but also has gains in other cases. The mTHP+defer results have more
  gains and less losses over the (m)THP=always case.

V3 Changes:
- moved some Documentation to the other series and merged the remaining
   Documentation updates into one

V2 Changes:
- base changes on mTHP khugepaged support
- Fix selftests parsing issue
- add mTHP defer option
- add mTHP defer Documentation

[1] - https://lore.kernel.org/lkml/20250414220557.35388-1-npache@redhat.com/
[2] - https://gitlab.com/npache/khugepaged_mthp_test
[3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.html

Nico Pache (4):
  mm: defer THP insertion to khugepaged
  mm: document (m)THP defer usage
  khugepaged: add defer option to mTHP options
  selftests: mm: add defer to thp setting parser

 Documentation/admin-guide/mm/transhuge.rst | 31 +++++++---
 include/linux/huge_mm.h                    | 18 +++++-
 mm/huge_memory.c                           | 69 +++++++++++++++++++---
 mm/khugepaged.c                            | 10 ++--
 tools/testing/selftests/mm/thp_settings.c  |  1 +
 tools/testing/selftests/mm/thp_settings.h  |  1 +
 6 files changed, 107 insertions(+), 23 deletions(-)

Comments

Andrew Morton April 15, 2025, 12:37 a.m. UTC | #1
This will need updating to latest kernels, please.

btw, hpage_collapse_scan_pmd() has a trace_mm_khugepaged_scan_pmd() in
it, which seems wrong...
Andrew Morton April 15, 2025, 12:50 a.m. UTC | #2
On Mon, 14 Apr 2025 17:37:35 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> This will need updating to latest kernels, please.

There I go, applying patchsets in reverse time order again.

"[PATCH v3 07/12] khugepaged: add mTHP support" has one reject against
current mainline, and two against current mm.git.