Message ID | 20250321130635.227011-1-alexghiti@rivosinc.com (mailing list archive) |
---|---|
Headers | show |
Series | Merge arm64/riscv hugetlbfs contpte support | expand |
Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit : > This patchset intends to merge the contiguous ptes hugetlbfs implementation > of arm64 and riscv. Can we also add powerpc in the dance ? powerpc also use contiguous PTEs allthough there is not (yet) a special name for it: - b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages - e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages powerpc also use configuous PMDs/PUDs for larger hugepages: - 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD") - 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd") - 0549e7666373 ("powerpc/8xx: rework support for 8M pages using contiguous PTE entries") Christophe > > Both arm64 and riscv support the use of contiguous ptes to map pages that > are larger than the default page table size, respectively called contpte > and svnapot. > > The riscv implementation differs from the arm64's in that the LSBs of the > pfn of a svnapot pte are used to store the size of the mapping, allowing > for future sizes to be added (for now only 64KB is supported). That's an > issue for the core mm code which expects to find the *real* pfn a pte points > to. Patch 1 fixes that by always returning svnapot ptes with the real pfn > and restores the size of the mapping when it is written to a page table. > > The following patches are just merges of the 2 different implementations > that currently exist in arm64 and riscv which are very similar. It paves > the way to the reuse of the recent contpte THP work by Ryan [1] to avoid > reimplementing the same in riscv. > > This patchset was tested by running the libhugetlbfs testsuite with 64KB > and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). > > [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/ > > v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/ > v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ > v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/ > v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/ > > Changes in v5: > - Fix "int i" unused variable in patch 2 (as reported by PW) > - Fix !svnapot build > - Fix arch_make_huge_pte() which returned a real napot pte > - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to > avoid leaking real napot pfns to core mm > - Fix arch_contpte_get_num_contig() that used to always try to get the > mapping size from the ptep, which does not work if the ptep comes the core mm > - Rebase on top of 6.14-rc7 + fix for > huge_ptep_get_and_clear()/huge_pte_clear() > https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/ > > Changes in v4: > - Rebase on top of 6.13 > > Changes in v3: > - Split set_ptes and ptep_get into internal and external API (Ryan) > - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that > we split hugetlb functions from contpte functions (actually riscv contpte > functions to support THP will come into another series) (Ryan) > - Rebase on top of 6.11-rc1 > > Changes in v2: > - Rebase on top of 6.9-rc3 > > Alexandre Ghiti (9): > riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes > riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code > mm: Use common huge_ptep_get() function for riscv/arm64 > mm: Use common set_huge_pte_at() function for riscv/arm64 > mm: Use common huge_pte_clear() function for riscv/arm64 > mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 > mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 > mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 > mm: Use common huge_ptep_clear_flush() function for riscv/arm64 > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/hugetlb.h | 22 +-- > arch/arm64/include/asm/pgtable.h | 68 ++++++- > arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- > arch/riscv/Kconfig | 1 + > arch/riscv/include/asm/hugetlb.h | 36 +--- > arch/riscv/include/asm/pgtable-64.h | 11 ++ > arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- > arch/riscv/mm/hugetlbpage.c | 243 +---------------------- > arch/riscv/mm/pgtable.c | 6 +- > include/linux/hugetlb_contpte.h | 39 ++++ > mm/Kconfig | 3 + > mm/Makefile | 1 + > mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ > 14 files changed, 583 insertions(+), 622 deletions(-) > create mode 100644 include/linux/hugetlb_contpte.h > create mode 100644 mm/hugetlb_contpte.c >
Hi Christophe, On 21/03/2025 18:24, Christophe Leroy wrote: > > > Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit : >> This patchset intends to merge the contiguous ptes hugetlbfs >> implementation >> of arm64 and riscv. > > Can we also add powerpc in the dance ? > > powerpc also use contiguous PTEs allthough there is not (yet) a > special name for it: > - b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages > - e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages > > powerpc also use configuous PMDs/PUDs for larger hugepages: > - 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD") > - 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd") > - 0549e7666373 ("powerpc/8xx: rework support for 8M pages using > contiguous PTE entries") So I have been looking at the powerpc hugetlb implementation and I have to admit that I'm struggling to find similarities with how arm64 and riscv deal with contiguous pte mappings. I think the 2 main characteristics of contpte (arm64) and svnapot (riscv) are the break-before-make requirement and the HW A/D update on only a single pte. Those make the handling of hugetlb pages very similar between arm64 and riscv. But I may have missed something, the powerpc hugetlb implementation is quite "scattered" because of the radix/hash page table and 32/64 bit. Thanks, Alex > > Christophe > >> >> Both arm64 and riscv support the use of contiguous ptes to map pages >> that >> are larger than the default page table size, respectively called contpte >> and svnapot. >> >> The riscv implementation differs from the arm64's in that the LSBs of >> the >> pfn of a svnapot pte are used to store the size of the mapping, allowing >> for future sizes to be added (for now only 64KB is supported). That's an >> issue for the core mm code which expects to find the *real* pfn a pte >> points >> to. Patch 1 fixes that by always returning svnapot ptes with the real >> pfn >> and restores the size of the mapping when it is written to a page table. >> >> The following patches are just merges of the 2 different implementations >> that currently exist in arm64 and riscv which are very similar. It paves >> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid >> reimplementing the same in riscv. >> >> This patchset was tested by running the libhugetlbfs testsuite with 64KB >> and 2MB pages on both architectures (on a 4KB base page size arm64 >> kernel). >> >> [1] >> https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/ >> >> v4: >> https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/ >> v3: >> https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ >> v2: >> https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/ >> v1: >> https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/ >> >> Changes in v5: >> - Fix "int i" unused variable in patch 2 (as reported by PW) >> - Fix !svnapot build >> - Fix arch_make_huge_pte() which returned a real napot pte >> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot >> aware to >> avoid leaking real napot pfns to core mm >> - Fix arch_contpte_get_num_contig() that used to always try to get >> the >> mapping size from the ptep, which does not work if the ptep >> comes the core mm >> - Rebase on top of 6.14-rc7 + fix for >> huge_ptep_get_and_clear()/huge_pte_clear() >> https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/ >> >> Changes in v4: >> - Rebase on top of 6.13 >> >> Changes in v3: >> - Split set_ptes and ptep_get into internal and external API (Ryan) >> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE >> so that >> we split hugetlb functions from contpte functions (actually >> riscv contpte >> functions to support THP will come into another series) (Ryan) >> - Rebase on top of 6.11-rc1 >> >> Changes in v2: >> - Rebase on top of 6.9-rc3 >> >> Alexandre Ghiti (9): >> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes >> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm >> code >> mm: Use common huge_ptep_get() function for riscv/arm64 >> mm: Use common set_huge_pte_at() function for riscv/arm64 >> mm: Use common huge_pte_clear() function for riscv/arm64 >> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 >> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 >> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 >> mm: Use common huge_ptep_clear_flush() function for riscv/arm64 >> >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/hugetlb.h | 22 +-- >> arch/arm64/include/asm/pgtable.h | 68 ++++++- >> arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- >> arch/riscv/Kconfig | 1 + >> arch/riscv/include/asm/hugetlb.h | 36 +--- >> arch/riscv/include/asm/pgtable-64.h | 11 ++ >> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- >> arch/riscv/mm/hugetlbpage.c | 243 +---------------------- >> arch/riscv/mm/pgtable.c | 6 +- >> include/linux/hugetlb_contpte.h | 39 ++++ >> mm/Kconfig | 3 + >> mm/Makefile | 1 + >> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ >> 14 files changed, 583 insertions(+), 622 deletions(-) >> create mode 100644 include/linux/hugetlb_contpte.h >> create mode 100644 mm/hugetlb_contpte.c >> > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv