mbox series

[v5,0/9] Merge arm64/riscv hugetlbfs contpte support

Message ID 20250321130635.227011-1-alexghiti@rivosinc.com (mailing list archive)
Headers show
Series Merge arm64/riscv hugetlbfs contpte support | expand

Message

Alexandre Ghiti March 21, 2025, 1:06 p.m. UTC
This patchset intends to merge the contiguous ptes hugetlbfs implementation
of arm64 and riscv.

Both arm64 and riscv support the use of contiguous ptes to map pages that
are larger than the default page table size, respectively called contpte
and svnapot.

The riscv implementation differs from the arm64's in that the LSBs of the
pfn of a svnapot pte are used to store the size of the mapping, allowing
for future sizes to be added (for now only 64KB is supported). That's an
issue for the core mm code which expects to find the *real* pfn a pte points
to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
and restores the size of the mapping when it is written to a page table.

The following patches are just merges of the 2 different implementations
that currently exist in arm64 and riscv which are very similar. It paves
the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
reimplementing the same in riscv.

This patchset was tested by running the libhugetlbfs testsuite with 64KB
and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).

[1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/

v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/

Changes in v5:
  - Fix "int i" unused variable in patch 2 (as reported by PW)
  - Fix !svnapot build
  - Fix arch_make_huge_pte() which returned a real napot pte
  - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
    avoid leaking real napot pfns to core mm
  - Fix arch_contpte_get_num_contig() that used to always try to get the
    mapping size from the ptep, which does not work if the ptep comes the core mm
  - Rebase on top of 6.14-rc7 + fix for
    huge_ptep_get_and_clear()/huge_pte_clear()
    https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/

Changes in v4:
  - Rebase on top of 6.13

Changes in v3:
  - Split set_ptes and ptep_get into internal and external API (Ryan)
  - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
    we split hugetlb functions from contpte functions (actually riscv contpte
    functions to support THP will come into another series) (Ryan)
  - Rebase on top of 6.11-rc1

Changes in v2:
  - Rebase on top of 6.9-rc3

Alexandre Ghiti (9):
  riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
  riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
  mm: Use common huge_ptep_get() function for riscv/arm64
  mm: Use common set_huge_pte_at() function for riscv/arm64
  mm: Use common huge_pte_clear() function for riscv/arm64
  mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
  mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
  mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
  mm: Use common huge_ptep_clear_flush() function for riscv/arm64

 arch/arm64/Kconfig                  |   1 +
 arch/arm64/include/asm/hugetlb.h    |  22 +--
 arch/arm64/include/asm/pgtable.h    |  68 ++++++-
 arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
 arch/riscv/Kconfig                  |   1 +
 arch/riscv/include/asm/hugetlb.h    |  36 +---
 arch/riscv/include/asm/pgtable-64.h |  11 ++
 arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
 arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
 arch/riscv/mm/pgtable.c             |   6 +-
 include/linux/hugetlb_contpte.h     |  39 ++++
 mm/Kconfig                          |   3 +
 mm/Makefile                         |   1 +
 mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
 14 files changed, 583 insertions(+), 622 deletions(-)
 create mode 100644 include/linux/hugetlb_contpte.h
 create mode 100644 mm/hugetlb_contpte.c

Comments

Christophe Leroy March 21, 2025, 5:24 p.m. UTC | #1
Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit :
> This patchset intends to merge the contiguous ptes hugetlbfs implementation
> of arm64 and riscv.

Can we also add powerpc in the dance ?

powerpc also use contiguous PTEs allthough there is not (yet) a special 
name for it:
- b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages
- e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages

powerpc also use configuous PMDs/PUDs for larger hugepages:
- 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD")
- 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd")
- 0549e7666373 ("powerpc/8xx: rework support for 8M pages using 
contiguous PTE entries")

Christophe

> 
> Both arm64 and riscv support the use of contiguous ptes to map pages that
> are larger than the default page table size, respectively called contpte
> and svnapot.
> 
> The riscv implementation differs from the arm64's in that the LSBs of the
> pfn of a svnapot pte are used to store the size of the mapping, allowing
> for future sizes to be added (for now only 64KB is supported). That's an
> issue for the core mm code which expects to find the *real* pfn a pte points
> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
> and restores the size of the mapping when it is written to a page table.
> 
> The following patches are just merges of the 2 different implementations
> that currently exist in arm64 and riscv which are very similar. It paves
> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
> reimplementing the same in riscv.
> 
> This patchset was tested by running the libhugetlbfs testsuite with 64KB
> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
> 
> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
> 
> Changes in v5:
>    - Fix "int i" unused variable in patch 2 (as reported by PW)
>    - Fix !svnapot build
>    - Fix arch_make_huge_pte() which returned a real napot pte
>    - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
>      avoid leaking real napot pfns to core mm
>    - Fix arch_contpte_get_num_contig() that used to always try to get the
>      mapping size from the ptep, which does not work if the ptep comes the core mm
>    - Rebase on top of 6.14-rc7 + fix for
>      huge_ptep_get_and_clear()/huge_pte_clear()
>      https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
> 
> Changes in v4:
>    - Rebase on top of 6.13
> 
> Changes in v3:
>    - Split set_ptes and ptep_get into internal and external API (Ryan)
>    - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
>      we split hugetlb functions from contpte functions (actually riscv contpte
>      functions to support THP will come into another series) (Ryan)
>    - Rebase on top of 6.11-rc1
> 
> Changes in v2:
>    - Rebase on top of 6.9-rc3
> 
> Alexandre Ghiti (9):
>    riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>    riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
>    mm: Use common huge_ptep_get() function for riscv/arm64
>    mm: Use common set_huge_pte_at() function for riscv/arm64
>    mm: Use common huge_pte_clear() function for riscv/arm64
>    mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>    mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>    mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>    mm: Use common huge_ptep_clear_flush() function for riscv/arm64
> 
>   arch/arm64/Kconfig                  |   1 +
>   arch/arm64/include/asm/hugetlb.h    |  22 +--
>   arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>   arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>   arch/riscv/Kconfig                  |   1 +
>   arch/riscv/include/asm/hugetlb.h    |  36 +---
>   arch/riscv/include/asm/pgtable-64.h |  11 ++
>   arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>   arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>   arch/riscv/mm/pgtable.c             |   6 +-
>   include/linux/hugetlb_contpte.h     |  39 ++++
>   mm/Kconfig                          |   3 +
>   mm/Makefile                         |   1 +
>   mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>   14 files changed, 583 insertions(+), 622 deletions(-)
>   create mode 100644 include/linux/hugetlb_contpte.h
>   create mode 100644 mm/hugetlb_contpte.c
>
Alexandre Ghiti March 25, 2025, 12:36 p.m. UTC | #2
Hi Christophe,

On 21/03/2025 18:24, Christophe Leroy wrote:
>
>
> Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit :
>> This patchset intends to merge the contiguous ptes hugetlbfs 
>> implementation
>> of arm64 and riscv.
>
> Can we also add powerpc in the dance ?
>
> powerpc also use contiguous PTEs allthough there is not (yet) a 
> special name for it:
> - b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages
> - e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages
>
> powerpc also use configuous PMDs/PUDs for larger hugepages:
> - 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD")
> - 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd")
> - 0549e7666373 ("powerpc/8xx: rework support for 8M pages using 
> contiguous PTE entries")


So I have been looking at the powerpc hugetlb implementation and I have 
to admit that I'm struggling to find similarities with how arm64 and 
riscv deal with contiguous pte mappings.

I think the 2 main characteristics of contpte (arm64) and svnapot 
(riscv) are the break-before-make requirement and the HW A/D update on 
only a single pte. Those make the handling of hugetlb pages very similar 
between arm64 and riscv.

But I may have missed something, the powerpc hugetlb implementation is 
quite "scattered" because of the radix/hash page table and 32/64 bit.

Thanks,

Alex


>
> Christophe
>
>>
>> Both arm64 and riscv support the use of contiguous ptes to map pages 
>> that
>> are larger than the default page table size, respectively called contpte
>> and svnapot.
>>
>> The riscv implementation differs from the arm64's in that the LSBs of 
>> the
>> pfn of a svnapot pte are used to store the size of the mapping, allowing
>> for future sizes to be added (for now only 64KB is supported). That's an
>> issue for the core mm code which expects to find the *real* pfn a pte 
>> points
>> to. Patch 1 fixes that by always returning svnapot ptes with the real 
>> pfn
>> and restores the size of the mapping when it is written to a page table.
>>
>> The following patches are just merges of the 2 different implementations
>> that currently exist in arm64 and riscv which are very similar. It paves
>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
>> reimplementing the same in riscv.
>>
>> This patchset was tested by running the libhugetlbfs testsuite with 64KB
>> and 2MB pages on both architectures (on a 4KB base page size arm64 
>> kernel).
>>
>> [1] 
>> https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
>>
>> v4: 
>> https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
>> v3: 
>> https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
>> v2: 
>> https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
>> v1: 
>> https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
>>
>> Changes in v5:
>>    - Fix "int i" unused variable in patch 2 (as reported by PW)
>>    - Fix !svnapot build
>>    - Fix arch_make_huge_pte() which returned a real napot pte
>>    - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot 
>> aware to
>>      avoid leaking real napot pfns to core mm
>>    - Fix arch_contpte_get_num_contig() that used to always try to get 
>> the
>>      mapping size from the ptep, which does not work if the ptep 
>> comes the core mm
>>    - Rebase on top of 6.14-rc7 + fix for
>>      huge_ptep_get_and_clear()/huge_pte_clear()
>> https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
>>
>> Changes in v4:
>>    - Rebase on top of 6.13
>>
>> Changes in v3:
>>    - Split set_ptes and ptep_get into internal and external API (Ryan)
>>    - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE 
>> so that
>>      we split hugetlb functions from contpte functions (actually 
>> riscv contpte
>>      functions to support THP will come into another series) (Ryan)
>>    - Rebase on top of 6.11-rc1
>>
>> Changes in v2:
>>    - Rebase on top of 6.9-rc3
>>
>> Alexandre Ghiti (9):
>>    riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>>    riscv: Restore the pfn in a NAPOT pte when manipulated by core mm 
>> code
>>    mm: Use common huge_ptep_get() function for riscv/arm64
>>    mm: Use common set_huge_pte_at() function for riscv/arm64
>>    mm: Use common huge_pte_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>>    mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>>    mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>>    mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>>
>>   arch/arm64/Kconfig                  |   1 +
>>   arch/arm64/include/asm/hugetlb.h    |  22 +--
>>   arch/arm64/include/asm/pgtable.h    |  68 ++++++-
>>   arch/arm64/mm/hugetlbpage.c         | 294 +---------------------------
>>   arch/riscv/Kconfig                  |   1 +
>>   arch/riscv/include/asm/hugetlb.h    |  36 +---
>>   arch/riscv/include/asm/pgtable-64.h |  11 ++
>>   arch/riscv/include/asm/pgtable.h    | 222 ++++++++++++++++++---
>>   arch/riscv/mm/hugetlbpage.c         | 243 +----------------------
>>   arch/riscv/mm/pgtable.c             |   6 +-
>>   include/linux/hugetlb_contpte.h     |  39 ++++
>>   mm/Kconfig                          |   3 +
>>   mm/Makefile                         |   1 +
>>   mm/hugetlb_contpte.c                | 258 ++++++++++++++++++++++++
>>   14 files changed, 583 insertions(+), 622 deletions(-)
>>   create mode 100644 include/linux/hugetlb_contpte.h
>>   create mode 100644 mm/hugetlb_contpte.c
>>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv