mbox series

[RFC,v2,0/2] Support large folios for tmpfs

Message ID cover.1727338549.git.baolin.wang@linux.alibaba.com (mailing list archive)
Headers show
Series Support large folios for tmpfs | expand

Message

Baolin Wang Sept. 26, 2024, 8:27 a.m. UTC
Hi,

This RFC patch series attempts to support large folios for tmpfs. The first
patch is based on Daniel's previous patches in [1], mainly using the length
in the write and fallocate paths to get a highest order hint for large
order allocation. The last patch adds mTHP filter control for tmpfs if mTHP
is set for the following reasons:

1. Maintain backward compatibility for the control interface. Tmpfs already
has a global 'huge=' mount option and '/sys/kernel/mm/transparent_hugepage/shmem_enabled'
interface to control large order allocations. mTHP extends this capability to a
per-size basis while maintaining good interface compatibility.

2. For the large order allocation of writable mmap() faults in tmpfs, we need
something like the mTHP interfaces to control large orders, as well as ensuring
consistent interfaces with shmem.

3. Ryan pointed out that large order allocations based on write length could
lead to memory fragmentation issue. Just quoting Ryan's comment [2]:
"And it's possible (likely even, in my opinion) that allocating lots of different
folio sizes will exacerbate memory fragmentation, leading to more order-0
fallbacks, which would hurt the overall system performance in the long run, vs
restricting to a couple of folio sizes."

4. Some hardware preferences, such as for the ARM64 architecture, can better
utilize the cont-pte feature to reduce TLB pressure and optimize performance with
a 64K size folio. Using mTHP can better leverage these hardware advantages.

Any comments and suggestions are appreciated. Thanks.

[1] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
[2] https://lore.kernel.org/all/e83e1687-3e3c-40d0-bf0e-225871647092@arm.com/

Changes from RFC v1:
 - Drop patch 1.
 - Use 'write_end' to calculate the length in shmem_allowable_huge_orders().
 - Update shmem_mapping_size_order() per Daniel.

Baolin Wang (1):
  mm: shmem: use mTHP interface to control huge orders for tmpfs

Daniel Gomez (1):
  mm: shmem: add large folio support to the write and fallocate paths

 mm/memory.c |  4 ++--
 mm/shmem.c  | 66 +++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 58 insertions(+), 12 deletions(-)

Comments

Matthew Wilcox Sept. 26, 2024, 12:20 p.m. UTC | #1
On Thu, Sep 26, 2024 at 04:27:25PM +0800, Baolin Wang wrote:
> This RFC patch series attempts to support large folios for tmpfs. The first
> patch is based on Daniel's previous patches in [1], mainly using the length
> in the write and fallocate paths to get a highest order hint for large
> order allocation. The last patch adds mTHP filter control for tmpfs if mTHP
> is set for the following reasons:
> 
> 1. Maintain backward compatibility for the control interface. Tmpfs already
> has a global 'huge=' mount option and '/sys/kernel/mm/transparent_hugepage/shmem_enabled'
> interface to control large order allocations. mTHP extends this capability to a
> per-size basis while maintaining good interface compatibility.

... it's confusing as hell to anyone who tries to understand it and
you've made it more complicated.  Well done.

> 2. For the large order allocation of writable mmap() faults in tmpfs, we need
> something like the mTHP interfaces to control large orders, as well as ensuring
> consistent interfaces with shmem.

tmpfs and shmem do NOT need to be consistent!  I don't know why anyone
thinks this is a goal.  tmpfs should be consistent with OTHER FILE
SYSTEMS.  shmem should do the right thing for the shared anon use case.

> 3. Ryan pointed out that large order allocations based on write length could
> lead to memory fragmentation issue. Just quoting Ryan's comment [2]:
> "And it's possible (likely even, in my opinion) that allocating lots of different
> folio sizes will exacerbate memory fragmentation, leading to more order-0
> fallbacks, which would hurt the overall system performance in the long run, vs
> restricting to a couple of folio sizes."

I disagree with this.  It's a buddy allocator; it's resistant to this
kind of fragmentation.