Message ID | 20230228213738.272178-1-willy@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | New page table range API | expand |
On Tue, Feb 28, 2023 at 09:37:03PM +0000, Matthew Wilcox (Oracle) wrote: > This patchset changes the API used by the MM to set up page table entries. > The four APIs are: > set_ptes(mm, addr, ptep, pte, nr) > update_mmu_cache_range(vma, addr, ptep, nr) > flush_dcache_folio(folio) > flush_icache_pages(vma, page, nr) > > flush_dcache_folio() isn't technically new, but no architecture > implemented it, so I've done that for you. The old APIs remain around > but are mostly implemented by calling the new interfaces. > > The new APIs are based around setting up N page table entries at once. > The N entries belong to the same PMD, the same folio and the same VMA, > so ptep++ is a legitimate operation, and locking is taken care of for > you. Some architectures can do a better job of it than just a loop, > but I have hesitated to make too deep a change to architectures I don't > understand well. The new set_ptes() looks unnecessarily duplicated all over arch/ What do you say about adding the patch below on top of the series? Ideally it should be split into per-arch bits, but I can send it separately as a cleanup on top. diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h index 1e3354e9731b..65fb9e66675d 100644 --- a/arch/alpha/include/asm/pgtable.h +++ b/arch/alpha/include/asm/pgtable.h @@ -37,6 +37,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pte) += 1UL << 32; } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* PMD_SHIFT determines the size of the area a second-level page table can map */ diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h index 4a1b2ce204c6..06d8039180c0 100644 --- a/arch/arc/include/asm/pgtable-bits-arcv2.h +++ b/arch/arc/include/asm/pgtable-bits-arcv2.h @@ -100,19 +100,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot)); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, unsigned int nr); diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 6525ac82bd50..0d326b201797 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -209,6 +209,7 @@ extern void __sync_icache_dcache(pte_t pteval); void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval, unsigned int nr); +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d1b79dbff16..a8d6460c5c9f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -369,6 +369,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pte) += PAGE_SIZE; } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index a30ae048233e..e426f1820deb 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -91,20 +91,6 @@ static inline void set_pte(pte_t *p, pte_t pte) smp_mb(); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline pte_t *pmd_page_vaddr(pmd_t pmd) { unsigned long ptr; diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h index f58f1d920769..67ab91662e83 100644 --- a/arch/hexagon/include/asm/pgtable.h +++ b/arch/hexagon/include/asm/pgtable.h @@ -345,26 +345,6 @@ static inline int pte_exec(pte_t pte) #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT) #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval)) -/* - * set_ptes - update page table and do whatever magic may be - * necessary to make the underlying hardware/firmware take note. - * - * VM may require a virtual instruction to alert the MMU. - */ -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline unsigned long pmd_page_vaddr(pmd_t pmd) { return (unsigned long)__va(pmd_val(pmd) & PAGE_MASK); diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 0c2be4ea664b..65a6e3b30721 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -303,19 +303,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) *ptep = pteval; } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1) - /* * Make page protection values cacheable, uncacheable, or write- * combining. Note that "protection" is really a misnomer here as the diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h index 9154d317ffb4..d4b0ca7b4bf7 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -346,6 +346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h index 400206c17c97..8c2db20abdb6 100644 --- a/arch/m68k/include/asm/pgtable_mm.h +++ b/arch/m68k/include/asm/pgtable_mm.h @@ -32,20 +32,6 @@ *(pteptr) = (pteval); \ } while(0) -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - /* PMD_SHIFT determines the size of the area a second-level page table can map */ #if CONFIG_PGTABLE_LEVELS == 3 #define PMD_SHIFT 18 diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h index a01e1369b486..3e7643a986ad 100644 --- a/arch/microblaze/include/asm/pgtable.h +++ b/arch/microblaze/include/asm/pgtable.h @@ -335,20 +335,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte) *ptep = pte; } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += 1 << PFN_SHIFT_OFFSET; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 0cf0455e6ae8..18b77567ef72 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -108,6 +108,7 @@ do { \ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32) diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h index 8a77821a17a5..2a994b225a41 100644 --- a/arch/nios2/include/asm/pgtable.h +++ b/arch/nios2/include/asm/pgtable.h @@ -193,6 +193,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline int pmd_none(pmd_t pmd) diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h index 1a7077150d7b..8f27730a9ab7 100644 --- a/arch/openrisc/include/asm/pgtable.h +++ b/arch/openrisc/include/asm/pgtable.h @@ -47,20 +47,6 @@ extern void paging_init(void); */ #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval)) -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - /* * (pmds are folded into pgds so this doesn't get actually called, * but the define is needed for a generic inline function.) diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index 78ee9816f423..cd04e85cb012 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -73,6 +73,7 @@ extern void __update_cache(pte_t pte); mb(); \ } while(0) +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index bf1263ff7e67..f10b6c2f8ade 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -43,6 +43,7 @@ struct mm_struct; void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #define update_mmu_cache(vma, addr, ptep) \ update_mmu_cache_range(vma, addr, ptep, 1); diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 3a3a776fc047..8bc49496f8a6 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -473,6 +473,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pteval) += 1 << _PAGE_PFN_SHIFT; } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline void pte_clear(struct mm_struct *mm, diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 46bf475116f1..2fc20558af6b 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1346,6 +1346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h index 03ba1834e126..d2f17e944bea 100644 --- a/arch/sh/include/asm/pgtable_32.h +++ b/arch/sh/include/asm/pgtable_32.h @@ -319,6 +319,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h index 47ae55ea1837..7fbc7772a9b7 100644 --- a/arch/sparc/include/asm/pgtable_32.h +++ b/arch/sparc/include/asm/pgtable_32.h @@ -101,20 +101,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) srmmu_swap((unsigned long *)ptep, pte_val(pteval)); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline int srmmu_device_memory(unsigned long x) { return ((x & 0xF0000000) != 0); diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index d5c0088e0c6a..fddca662ba1b 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -924,6 +924,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1); #define pte_clear(mm,addr,ptep) \ diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index ca78c90ae74f..60d2b20ff218 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -242,20 +242,6 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval) if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - #define __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f424371ea143..1e5fd352880d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1019,22 +1019,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp) return res; } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - page_table_check_ptes_set(mm, addr, ptep, pte, nr); - - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte = __pte(pte_val(pte) + PAGE_SIZE); - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd) { diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h index 293101530541..adeee96518b9 100644 --- a/arch/xtensa/include/asm/pgtable.h +++ b/arch/xtensa/include/asm/pgtable.h @@ -306,20 +306,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte) update_pte(ptep, pte); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline void set_pmd(pmd_t *pmdp, pmd_t pmdval) { diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c63cd44777ec..ef204712eda3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -172,6 +172,24 @@ static inline int pmd_young(pmd_t pmd) } #endif +#ifndef set_ptes +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + page_table_check_ptes_set(mm, addr, ptep, pte, nr); + + for (;;) { + set_pte(ptep, pte); + if (--nr == 0) + break; + ptep++; + pte = __pte(pte_val(pte) + PAGE_SIZE); + } +} +#define set_ptes set_ptes +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) +#endif + #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep,
Hi Willy, On Tue, Feb 28, 2023 at 10:40 PM Matthew Wilcox (Oracle) <willy@infradead.org> wrote: > This patchset changes the API used by the MM to set up page table entries. > The four APIs are: > set_ptes(mm, addr, ptep, pte, nr) > update_mmu_cache_range(vma, addr, ptep, nr) > flush_dcache_folio(folio) > flush_icache_pages(vma, page, nr) > > flush_dcache_folio() isn't technically new, but no architecture > implemented it, so I've done that for you. The old APIs remain around > but are mostly implemented by calling the new interfaces. > > The new APIs are based around setting up N page table entries at once. > The N entries belong to the same PMD, the same folio and the same VMA, > so ptep++ is a legitimate operation, and locking is taken care of for > you. Some architectures can do a better job of it than just a loop, > but I have hesitated to make too deep a change to architectures I don't > understand well. > > One thing I have changed in every architecture is that PG_arch_1 is now a > per-folio bit instead of a per-page bit. This was something that would > have to happen eventually, and it makes sense to do it now rather than > iterate over every page involved in a cache flush and figure out if it > needs to happen. > > The point of all this is better performance, and Fengwei Yin has > measured improvement on x86. I suspect you'll see improvement on > your architecture too. Try the new will-it-scale test mentioned here: > https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ > You'll need to run it on an XFS filesystem and have > CONFIG_TRANSPARENT_HUGEPAGE set. Thanks for your series! > For testing, I've only run the code on x86. If an x86->foo compiler > exists in Debian, I've built defconfig. I'm relying on the buildbots > to tell me what I missed, and people who actually have the hardware to > tell me if it actually works. Seems to work fine on ARAnyM and qemu-system-m68k/virt, so Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Gr{oetje,eeting}s, Geert
On 28/02/2023 21:37, Matthew Wilcox (Oracle) wrote: > This patchset changes the API used by the MM to set up page table entries. > The four APIs are: > set_ptes(mm, addr, ptep, pte, nr) > update_mmu_cache_range(vma, addr, ptep, nr) > flush_dcache_folio(folio) > flush_icache_pages(vma, page, nr) > > flush_dcache_folio() isn't technically new, but no architecture > implemented it, so I've done that for you. The old APIs remain around > but are mostly implemented by calling the new interfaces. > > The new APIs are based around setting up N page table entries at once. > The N entries belong to the same PMD, the same folio and the same VMA, > so ptep++ is a legitimate operation, and locking is taken care of for > you. Some architectures can do a better job of it than just a loop, > but I have hesitated to make too deep a change to architectures I don't > understand well. > > One thing I have changed in every architecture is that PG_arch_1 is now a > per-folio bit instead of a per-page bit. This was something that would > have to happen eventually, and it makes sense to do it now rather than > iterate over every page involved in a cache flush and figure out if it > needs to happen. > > The point of all this is better performance, and Fengwei Yin has > measured improvement on x86. I suspect you'll see improvement on > your architecture too. Try the new will-it-scale test mentioned here: > https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ > You'll need to run it on an XFS filesystem and have > CONFIG_TRANSPARENT_HUGEPAGE set. > > For testing, I've only run the code on x86. If an x86->foo compiler > exists in Debian, I've built defconfig. I'm relying on the buildbots > to tell me what I missed, and people who actually have the hardware to > tell me if it actually works. > > I'd like to get this into the MM tree soon after the current merge window > closes, so quick feedback would be appreciated. I've boot-tested the series (with the Yin's typo fix for patch 32) on arm64 FVP and Ampere Altra. On the Altra, I also ran page_fault4 from will-it-scale, and see ~35% improvement from this series. So: Tested-by: Ryan Roberts <ryan.roberts@arm.com> Thanks, Ryan