Message ID | 1583452659-11801-1-git-send-email-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V15] mm/debug: Add tests validating architecture page table helpers | expand |
On Fri, 2020-03-06 at 05:27 +0530, Anshuman Khandual wrote: > This adds tests which will validate architecture page table helpers and > other accessors in their compliance with expected generic MM semantics. > This will help various architectures in validating changes to existing > page table helpers or addition of new ones. > > This test covers basic page table entry transformations including but not > limited to old, young, dirty, clean, write, write protect etc at various > level along with populating intermediate entries with next page table page > and validating them. > > Test page table pages are allocated from system memory with required size > and alignments. The mapped pfns at page table levels are derived from a > real pfn representing a valid kernel text symbol. This test gets called > inside kernel_init() right after async_synchronize_full(). > > This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected. Any > architecture, which is willing to subscribe this test will need to select > ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64, x86, s390 > and ppc32 platforms where the test is known to build and run successfully. > Going forward, other architectures too can subscribe the test after fixing > any build or runtime problems with their page table helpers. Meanwhile for > better platform coverage, the test can also be enabled with CONFIG_EXPERT > even without ARCH_HAS_DEBUG_VM_PGTABLE. > > Folks interested in making sure that a given platform's page table helpers > conform to expected generic MM semantics should enable the above config > which will just trigger this test during boot. Any non conformity here will > be reported as an warning which would need to be fixed. This test will help > catch any changes to the agreed upon semantics expected from generic MM and > enable platforms to accommodate it thereafter. OK, I get this working on powerpc hash MMU as well, so this? diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt index 64d0f9b15c49..c527d05c0459 100644 --- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt @@ -22,8 +22,7 @@ | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | - | powerpc/32: | ok | - | powerpc/64: | TODO | + | powerpc: | ok | | riscv: | TODO | | s390: | ok | | sh: | TODO | diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2e7eee523ba1..176930f40e07 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -116,7 +116,7 @@ config PPC # select ARCH_32BIT_OFF_T if PPC32 select ARCH_HAS_DEBUG_VIRTUAL - select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32 + select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 96a91bda3a85..98990a515268 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -256,7 +256,8 @@ static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep, pte_t pte = READ_ONCE(*ptep); pte = __pte(pte_val(pte) | RANDOM_ORVALUE); - WRITE_ONCE(*ptep, pte); + set_pte_at(mm, vaddr, ptep, pte); + barrier(); pte_clear(mm, vaddr, ptep); pte = READ_ONCE(*ptep); WARN_ON(!pte_none(pte));
On 03/07/2020 02:14 AM, Qian Cai wrote: > On Fri, 2020-03-06 at 05:27 +0530, Anshuman Khandual wrote: >> This adds tests which will validate architecture page table helpers and >> other accessors in their compliance with expected generic MM semantics. >> This will help various architectures in validating changes to existing >> page table helpers or addition of new ones. >> >> This test covers basic page table entry transformations including but not >> limited to old, young, dirty, clean, write, write protect etc at various >> level along with populating intermediate entries with next page table page >> and validating them. >> >> Test page table pages are allocated from system memory with required size >> and alignments. The mapped pfns at page table levels are derived from a >> real pfn representing a valid kernel text symbol. This test gets called >> inside kernel_init() right after async_synchronize_full(). >> >> This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected. Any >> architecture, which is willing to subscribe this test will need to select >> ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64, x86, s390 >> and ppc32 platforms where the test is known to build and run successfully. >> Going forward, other architectures too can subscribe the test after fixing >> any build or runtime problems with their page table helpers. Meanwhile for >> better platform coverage, the test can also be enabled with CONFIG_EXPERT >> even without ARCH_HAS_DEBUG_VM_PGTABLE. >> >> Folks interested in making sure that a given platform's page table helpers >> conform to expected generic MM semantics should enable the above config >> which will just trigger this test during boot. Any non conformity here will >> be reported as an warning which would need to be fixed. This test will help >> catch any changes to the agreed upon semantics expected from generic MM and >> enable platforms to accommodate it thereafter. > > OK, I get this working on powerpc hash MMU as well, so this? > > diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > index 64d0f9b15c49..c527d05c0459 100644 > --- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt > @@ -22,8 +22,7 @@ > | nios2: | TODO | > | openrisc: | TODO | > | parisc: | TODO | > - | powerpc/32: | ok | > - | powerpc/64: | TODO | > + | powerpc: | ok | > | riscv: | TODO | > | s390: | ok | > | sh: | TODO | > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 2e7eee523ba1..176930f40e07 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -116,7 +116,7 @@ config PPC > # > select ARCH_32BIT_OFF_T if PPC32 > select ARCH_HAS_DEBUG_VIRTUAL > - select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32 > + select ARCH_HAS_DEBUG_VM_PGTABLE > select ARCH_HAS_DEVMEM_IS_ALLOWED > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_FORTIFY_SOURCE > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > index 96a91bda3a85..98990a515268 100644 > --- a/mm/debug_vm_pgtable.c > +++ b/mm/debug_vm_pgtable.c > @@ -256,7 +256,8 @@ static void __init pte_clear_tests(struct mm_struct *mm, > pte_t *ptep, > pte_t pte = READ_ONCE(*ptep); > > pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > - WRITE_ONCE(*ptep, pte); > + set_pte_at(mm, vaddr, ptep, pte); Hmm, set_pte_at() function is not preferred here for these tests. The idea is to avoid or atleast minimize TLB/cache flushes triggered from these sort of 'static' tests. set_pte_at() is platform provided and could/might trigger these flushes or some other platform specific synchronization stuff. Just wondering is there specific reason with respect to the soft lock up problem making it necessary to use set_pte_at() rather than a simple WRITE_ONCE() ? > + barrier(); > pte_clear(mm, vaddr, ptep); > pte = READ_ONCE(*ptep); > WARN_ON(!pte_none(pte)); >
> On Mar 6, 2020, at 7:03 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: > > Hmm, set_pte_at() function is not preferred here for these tests. The idea > is to avoid or atleast minimize TLB/cache flushes triggered from these sort > of 'static' tests. set_pte_at() is platform provided and could/might trigger > these flushes or some other platform specific synchronization stuff. Just Why is that important for this debugging option? > wondering is there specific reason with respect to the soft lock up problem > making it necessary to use set_pte_at() rather than a simple WRITE_ONCE() ? Looks at the s390 version of set_pte_at(), it has this comment, vmaddr); /* * Certain architectures need to do special things when PTEs * within a page table are directly modified. Thus, the following * hook is made available. */ I can only guess that powerpc could be the same here.
On 03/07/2020 06:04 AM, Qian Cai wrote: > > >> On Mar 6, 2020, at 7:03 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >> >> Hmm, set_pte_at() function is not preferred here for these tests. The idea >> is to avoid or atleast minimize TLB/cache flushes triggered from these sort >> of 'static' tests. set_pte_at() is platform provided and could/might trigger >> these flushes or some other platform specific synchronization stuff. Just > > Why is that important for this debugging option? Primarily reason is to avoid TLB/cache flush instructions on the system during these tests that only involve transforming different page table level entries through helpers. Unless really necessary, why should it emit any TLB/cache flush instructions ? > >> wondering is there specific reason with respect to the soft lock up problem >> making it necessary to use set_pte_at() rather than a simple WRITE_ONCE() ? > > Looks at the s390 version of set_pte_at(), it has this comment, > vmaddr); > > /* > * Certain architectures need to do special things when PTEs > * within a page table are directly modified. Thus, the following > * hook is made available. > */ > > I can only guess that powerpc could be the same here. This comment is present in multiple platforms while defining set_pte_at(). Is not 'barrier()' here alone good enough ? Else what exactly set_pte_at() does as compared to WRITE_ONCE() that avoids the soft lock up, just trying to understand.
> On Mar 6, 2020, at 7:56 PM, Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > > On 03/07/2020 06:04 AM, Qian Cai wrote: >> >> >>> On Mar 6, 2020, at 7:03 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >>> >>> Hmm, set_pte_at() function is not preferred here for these tests. The idea >>> is to avoid or atleast minimize TLB/cache flushes triggered from these sort >>> of 'static' tests. set_pte_at() is platform provided and could/might trigger >>> these flushes or some other platform specific synchronization stuff. Just >> >> Why is that important for this debugging option? > > Primarily reason is to avoid TLB/cache flush instructions on the system > during these tests that only involve transforming different page table > level entries through helpers. Unless really necessary, why should it > emit any TLB/cache flush instructions ? > >> >>> wondering is there specific reason with respect to the soft lock up problem >>> making it necessary to use set_pte_at() rather than a simple WRITE_ONCE() ? >> >> Looks at the s390 version of set_pte_at(), it has this comment, >> vmaddr); >> >> /* >> * Certain architectures need to do special things when PTEs >> * within a page table are directly modified. Thus, the following >> * hook is made available. >> */ >> >> I can only guess that powerpc could be the same here. > > This comment is present in multiple platforms while defining set_pte_at(). > Is not 'barrier()' here alone good enough ? Else what exactly set_pte_at() No, barrier() is not enough. > does as compared to WRITE_ONCE() that avoids the soft lock up, just trying > to understand. I surely can spend hours to figure which exact things in set_pte_at() is necessary for pte_clear() not to stuck, and then propose a solution and possible need to retest on multiple arches. I am not sure if that is a good use of my time just to saving a few TLB/cache flush on a debug kernel?
Le 07/03/2020 à 01:56, Anshuman Khandual a écrit : > > > On 03/07/2020 06:04 AM, Qian Cai wrote: >> >> >>> On Mar 6, 2020, at 7:03 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >>> >>> Hmm, set_pte_at() function is not preferred here for these tests. The idea >>> is to avoid or atleast minimize TLB/cache flushes triggered from these sort >>> of 'static' tests. set_pte_at() is platform provided and could/might trigger >>> these flushes or some other platform specific synchronization stuff. Just >> >> Why is that important for this debugging option? > > Primarily reason is to avoid TLB/cache flush instructions on the system > during these tests that only involve transforming different page table > level entries through helpers. Unless really necessary, why should it > emit any TLB/cache flush instructions ? What's the problem with thoses flushes ? > >> >>> wondering is there specific reason with respect to the soft lock up problem >>> making it necessary to use set_pte_at() rather than a simple WRITE_ONCE() ? >> >> Looks at the s390 version of set_pte_at(), it has this comment, >> vmaddr); >> >> /* >> * Certain architectures need to do special things when PTEs >> * within a page table are directly modified. Thus, the following >> * hook is made available. >> */ >> >> I can only guess that powerpc could be the same here. > > This comment is present in multiple platforms while defining set_pte_at(). > Is not 'barrier()' here alone good enough ? Else what exactly set_pte_at() > does as compared to WRITE_ONCE() that avoids the soft lock up, just trying > to understand. > Argh ! I didn't realise that you were writing directly into the page tables. When it works, that's only by chance I guess. To properly set the page table entries, set_pte_at() has to be used: - On powerpc 8xx, with 16k pages, the page table entry must be copied four times. set_pte_at() does it, WRITE_ONCE() doesn't. - On powerpc book3s/32 (hash MMU), the flag _PAGE_HASHPTE must be preserved among writes. set_pte_at() preserves it, WRITE_ONCE() doesn't. set_pte_at() also does a few other mandatory things, like calling pte_mkpte() So, the WRITE_ONCE() must definitely become a set_pte_at() Christophe
On 03/07/2020 12:35 PM, Christophe Leroy wrote: > > > Le 07/03/2020 à 01:56, Anshuman Khandual a écrit : >> >> >> On 03/07/2020 06:04 AM, Qian Cai wrote: >>> >>> >>>> On Mar 6, 2020, at 7:03 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >>>> >>>> Hmm, set_pte_at() function is not preferred here for these tests. The idea >>>> is to avoid or atleast minimize TLB/cache flushes triggered from these sort >>>> of 'static' tests. set_pte_at() is platform provided and could/might trigger >>>> these flushes or some other platform specific synchronization stuff. Just >>> >>> Why is that important for this debugging option? >> >> Primarily reason is to avoid TLB/cache flush instructions on the system >> during these tests that only involve transforming different page table >> level entries through helpers. Unless really necessary, why should it >> emit any TLB/cache flush instructions ? > > What's the problem with thoses flushes ? > >> >>> >>>> wondering is there specific reason with respect to the soft lock up problem >>>> making it necessary to use set_pte_at() rather than a simple WRITE_ONCE() ? >>> >>> Looks at the s390 version of set_pte_at(), it has this comment, >>> vmaddr); >>> >>> /* >>> * Certain architectures need to do special things when PTEs >>> * within a page table are directly modified. Thus, the following >>> * hook is made available. >>> */ >>> >>> I can only guess that powerpc could be the same here. >> >> This comment is present in multiple platforms while defining set_pte_at(). >> Is not 'barrier()' here alone good enough ? Else what exactly set_pte_at() >> does as compared to WRITE_ONCE() that avoids the soft lock up, just trying >> to understand. >> > > > Argh ! I didn't realise that you were writing directly into the page tables. When it works, that's only by chance I guess. > > To properly set the page table entries, set_pte_at() has to be used: > - On powerpc 8xx, with 16k pages, the page table entry must be copied four times. set_pte_at() does it, WRITE_ONCE() doesn't. > - On powerpc book3s/32 (hash MMU), the flag _PAGE_HASHPTE must be preserved among writes. set_pte_at() preserves it, WRITE_ONCE() doesn't. > > set_pte_at() also does a few other mandatory things, like calling pte_mkpte() > > So, the WRITE_ONCE() must definitely become a set_pte_at() Sure, will do. These are part of the clear tests that populates a given entry with a non zero value before clearing and testing it with pxx_none(). In that context, WRITE_ONCE() seemed sufficient. But pte_clear() might be closely tied with proper page table entry update and hence a preceding set_pte_at() will be better. There are still more WRITE_ONCE() for other page table levels during these clear tests. set_pmd_at() and set_pud_at() are defined on platforms that support (and enable) THP and PUD based THP respectively. Hence they could not be used for clear tests as remaining helpers pmd_clear(), pud_clear(), p4d_clear() and pgd_clear() still need to be validated with or without THP support and enablement. We should just leave all other WRITE_ONCE() instances unchanged. Please correct me if I am missing something here. > > Christophe >
diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt new file mode 100644 index 0000000..64d0f9b --- /dev/null +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt @@ -0,0 +1,35 @@ +# +# Feature name: debug-vm-pgtable +# Kconfig: ARCH_HAS_DEBUG_VM_PGTABLE +# description: arch supports pgtable tests for semantics compliance +# + ----------------------- + | arch |status| + ----------------------- + | alpha: | TODO | + | arc: | ok | + | arm: | TODO | + | arm64: | ok | + | c6x: | TODO | + | csky: | TODO | + | h8300: | TODO | + | hexagon: | TODO | + | ia64: | TODO | + | m68k: | TODO | + | microblaze: | TODO | + | mips: | TODO | + | nds32: | TODO | + | nios2: | TODO | + | openrisc: | TODO | + | parisc: | TODO | + | powerpc/32: | ok | + | powerpc/64: | TODO | + | riscv: | TODO | + | s390: | ok | + | sh: | TODO | + | sparc: | TODO | + | um: | TODO | + | unicore32: | TODO | + | x86: | ok | + | xtensa: | TODO | + ----------------------- diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index ff2a393..3e72e6c 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -6,6 +6,7 @@ config ARC def_bool y select ARC_TIMERS + select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SETUP_DMA_OPS diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0b30e88..aaf8ba4 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -11,6 +11,7 @@ config ARM64 select ACPI_PPTT if ACPI select ARCH_CLOCKSOURCE_DATA select ARCH_HAS_DEBUG_VIRTUAL + select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 497b7d0b..8d5ae14 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -116,6 +116,7 @@ config PPC # select ARCH_32BIT_OFF_T if PPC32 select ARCH_HAS_DEBUG_VIRTUAL + select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32 select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 8abe775..af284db 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -59,6 +59,7 @@ config KASAN_SHADOW_OFFSET config S390 def_bool y select ARCH_BINFMT_ELF_STATE + select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beea770..df8a19e5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -61,6 +61,7 @@ config X86 select ARCH_CLOCKSOURCE_INIT select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI select ARCH_HAS_DEBUG_VIRTUAL + select ARCH_HAS_DEBUG_VM_PGTABLE if !X86_PAE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FAST_MULTIPLIER diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 0b6c4042..fb0e76d 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { } struct mm_struct; +#define mm_p4d_folded mm_p4d_folded +static inline bool mm_p4d_folded(struct mm_struct *mm) +{ + return !pgtable_l5_enabled(); +} + void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte); void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte); diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index 2ad72d2c8..5339aa1 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -64,4 +64,9 @@ void dump_mm(const struct mm_struct *mm); #define VM_BUG_ON_PGFLAGS(cond, page) BUILD_BUG_ON_INVALID(cond) #endif +#ifdef CONFIG_DEBUG_VM_PGTABLE +void debug_vm_pgtable(void); +#else +static inline void debug_vm_pgtable(void) { } +#endif #endif diff --git a/init/main.c b/init/main.c index ee4947a..19cd790 100644 --- a/init/main.c +++ b/init/main.c @@ -94,6 +94,7 @@ #include <linux/rodata_test.h> #include <linux/jump_label.h> #include <linux/mem_encrypt.h> +#include <linux/mmdebug.h> #include <asm/io.h> #include <asm/bugs.h> @@ -1352,6 +1353,7 @@ static int __ref kernel_init(void *unused) kernel_init_freeable(); /* need to finish all async __init code before freeing the memory */ async_synchronize_full(); + debug_vm_pgtable(); ftrace_free_init_mem(); free_initmem(); mark_readonly(); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 69def4a..1b5bd9f 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -653,6 +653,12 @@ config SCHED_STACK_END_CHECK data corruption or a sporadic crash at a later stage once the region is examined. The runtime overhead introduced is minimal. +config ARCH_HAS_DEBUG_VM_PGTABLE + bool + help + An architecture should select this when it can successfully + build and run DEBUG_VM_PGTABLE. + config DEBUG_VM bool "Debug VM" depends on DEBUG_KERNEL @@ -688,6 +694,26 @@ config DEBUG_VM_PGFLAGS If unsure, say N. +config DEBUG_VM_PGTABLE + bool "Debug arch page table for semantics compliance" + depends on MMU + depends on !IA64 && !ARM + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT + default n if !ARCH_HAS_DEBUG_VM_PGTABLE + default y if DEBUG_VM + help + This option provides a debug method which can be used to test + architecture page table helper functions on various platforms in + verifying if they comply with expected generic MM semantics. This + will help architecture code in making sure that any changes or + new additions of these helpers still conform to expected + semantics of the generic MM. Platforms will have to opt in for + this through ARCH_HAS_DEBUG_VM_PGTABLE. Although it can also be + enabled through EXPERT without requiring code change. This test + is disabled on IA64 and ARM platforms where it fails to build. + + If unsure, say N. + config ARCH_HAS_DEBUG_VIRTUAL bool diff --git a/mm/Makefile b/mm/Makefile index 272e660..b0692e6 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -87,6 +87,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o +obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o obj-$(CONFIG_PAGE_OWNER) += page_owner.o obj-$(CONFIG_CLEANCACHE) += cleancache.o obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c new file mode 100644 index 0000000..96a91bd --- /dev/null +++ b/mm/debug_vm_pgtable.c @@ -0,0 +1,391 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This kernel test validates architecture page table helpers and + * accessors and helps in verifying their continued compliance with + * expected generic MM semantics. + * + * Copyright (C) 2019 ARM Ltd. + * + * Author: Anshuman Khandual <anshuman.khandual@arm.com> + */ +#define pr_fmt(fmt) "debug_vm_pgtable: %s: " fmt, __func__ + +#include <linux/gfp.h> +#include <linux/highmem.h> +#include <linux/hugetlb.h> +#include <linux/kernel.h> +#include <linux/kconfig.h> +#include <linux/mm.h> +#include <linux/mman.h> +#include <linux/mm_types.h> +#include <linux/module.h> +#include <linux/pfn_t.h> +#include <linux/printk.h> +#include <linux/random.h> +#include <linux/spinlock.h> +#include <linux/swap.h> +#include <linux/swapops.h> +#include <linux/start_kernel.h> +#include <linux/sched/mm.h> +#include <asm/pgalloc.h> +#include <asm/pgtable.h> + +/* + * Basic operations + * + * mkold(entry) = An old and not a young entry + * mkyoung(entry) = A young and not an old entry + * mkdirty(entry) = A dirty and not a clean entry + * mkclean(entry) = A clean and not a dirty entry + * mkwrite(entry) = A write and not a write protected entry + * wrprotect(entry) = A write protected and not a write entry + * pxx_bad(entry) = A mapped and non-table entry + * pxx_same(entry1, entry2) = Both entries hold the exact same value + */ +#define VMFLAGS (VM_READ|VM_WRITE|VM_EXEC) + +/* + * On s390 platform, the lower 4 bits are used to identify given page table + * entry type. But these bits might affect the ability to clear entries with + * pxx_clear() because of how dynamic page table folding works on s390. So + * while loading up the entries do not change the lower 4 bits. It does not + * have affect any other platform. + */ +#define S390_MASK_BITS 4 +#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS) +#define RANDOM_NZVALUE GENMASK(7, 0) + +static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + WARN_ON(!pte_same(pte, pte)); + WARN_ON(!pte_young(pte_mkyoung(pte_mkold(pte)))); + WARN_ON(!pte_dirty(pte_mkdirty(pte_mkclean(pte)))); + WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte)))); + WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); + WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); + WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd = pfn_pmd(pfn, prot); + + WARN_ON(!pmd_same(pmd, pmd)); + WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd)))); + WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd)))); + WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd)))); + WARN_ON(pmd_young(pmd_mkold(pmd_mkyoung(pmd)))); + WARN_ON(pmd_dirty(pmd_mkclean(pmd_mkdirty(pmd)))); + WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd)))); + /* + * A huge page does not point to next level page table + * entry. Hence this must qualify as pmd_bad(). + */ + WARN_ON(!pmd_bad(pmd_mkhuge(pmd))); +} + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) +{ + pud_t pud = pfn_pud(pfn, prot); + + WARN_ON(!pud_same(pud, pud)); + WARN_ON(!pud_young(pud_mkyoung(pud_mkold(pud)))); + WARN_ON(!pud_write(pud_mkwrite(pud_wrprotect(pud)))); + WARN_ON(pud_write(pud_wrprotect(pud_mkwrite(pud)))); + WARN_ON(pud_young(pud_mkold(pud_mkyoung(pud)))); + + if (mm_pmd_folded(mm)) + return; + + /* + * A huge page does not point to next level page table + * entry. Hence this must qualify as pud_bad(). + */ + WARN_ON(!pud_bad(pud_mkhuge(pud))); +} +#else +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { } +#endif +#else +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { } +#endif + +static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot) +{ + p4d_t p4d; + + memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t)); + WARN_ON(!p4d_same(p4d, p4d)); +} + +static void __init pgd_basic_tests(unsigned long pfn, pgprot_t prot) +{ + pgd_t pgd; + + memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t)); + WARN_ON(!pgd_same(pgd, pgd)); +} + +#ifndef __PAGETABLE_PUD_FOLDED +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) +{ + pud_t pud = READ_ONCE(*pudp); + + if (mm_pmd_folded(mm)) + return; + + pud = __pud(pud_val(pud) | RANDOM_ORVALUE); + WRITE_ONCE(*pudp, pud); + pud_clear(pudp); + pud = READ_ONCE(*pudp); + WARN_ON(!pud_none(pud)); +} + +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp, + pmd_t *pmdp) +{ + pud_t pud; + + if (mm_pmd_folded(mm)) + return; + /* + * This entry points to next level page table page. + * Hence this must not qualify as pud_bad(). + */ + pmd_clear(pmdp); + pud_clear(pudp); + pud_populate(mm, pudp, pmdp); + pud = READ_ONCE(*pudp); + WARN_ON(pud_bad(pud)); +} +#else +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) { } +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp, + pmd_t *pmdp) +{ +} +#endif + +#ifndef __PAGETABLE_P4D_FOLDED +static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp) +{ + p4d_t p4d = READ_ONCE(*p4dp); + + if (mm_pud_folded(mm)) + return; + + p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE); + WRITE_ONCE(*p4dp, p4d); + p4d_clear(p4dp); + p4d = READ_ONCE(*p4dp); + WARN_ON(!p4d_none(p4d)); +} + +static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, + pud_t *pudp) +{ + p4d_t p4d; + + if (mm_pud_folded(mm)) + return; + + /* + * This entry points to next level page table page. + * Hence this must not qualify as p4d_bad(). + */ + pud_clear(pudp); + p4d_clear(p4dp); + p4d_populate(mm, p4dp, pudp); + p4d = READ_ONCE(*p4dp); + WARN_ON(p4d_bad(p4d)); +} + +static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp) +{ + pgd_t pgd = READ_ONCE(*pgdp); + + if (mm_p4d_folded(mm)) + return; + + pgd = __pgd(pgd_val(pgd) | RANDOM_ORVALUE); + WRITE_ONCE(*pgdp, pgd); + pgd_clear(pgdp); + pgd = READ_ONCE(*pgdp); + WARN_ON(!pgd_none(pgd)); +} + +static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, + p4d_t *p4dp) +{ + pgd_t pgd; + + if (mm_p4d_folded(mm)) + return; + + /* + * This entry points to next level page table page. + * Hence this must not qualify as pgd_bad(). + */ + p4d_clear(p4dp); + pgd_clear(pgdp); + pgd_populate(mm, pgdp, p4dp); + pgd = READ_ONCE(*pgdp); + WARN_ON(pgd_bad(pgd)); +} +#else +static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp) { } +static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp) { } +static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, + pud_t *pudp) +{ +} +static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, + p4d_t *p4dp) +{ +} +#endif + +static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep, + unsigned long vaddr) +{ + pte_t pte = READ_ONCE(*ptep); + + pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + WRITE_ONCE(*ptep, pte); + pte_clear(mm, vaddr, ptep); + pte = READ_ONCE(*ptep); + WARN_ON(!pte_none(pte)); +} + +static void __init pmd_clear_tests(struct mm_struct *mm, pmd_t *pmdp) +{ + pmd_t pmd = READ_ONCE(*pmdp); + + pmd = __pmd(pmd_val(pmd) | RANDOM_ORVALUE); + WRITE_ONCE(*pmdp, pmd); + pmd_clear(pmdp); + pmd = READ_ONCE(*pmdp); + WARN_ON(!pmd_none(pmd)); +} + +static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp, + pgtable_t pgtable) +{ + pmd_t pmd; + + /* + * This entry points to next level page table page. + * Hence this must not qualify as pmd_bad(). + */ + pmd_clear(pmdp); + pmd_populate(mm, pmdp, pgtable); + pmd = READ_ONCE(*pmdp); + WARN_ON(pmd_bad(pmd)); +} + +static unsigned long __init get_random_vaddr(void) +{ + unsigned long random_vaddr, random_pages, total_user_pages; + + total_user_pages = (TASK_SIZE - FIRST_USER_ADDRESS) / PAGE_SIZE; + + random_pages = get_random_long() % total_user_pages; + random_vaddr = FIRST_USER_ADDRESS + random_pages * PAGE_SIZE; + + return random_vaddr; +} + +void __init debug_vm_pgtable(void) +{ + struct mm_struct *mm; + pgd_t *pgdp; + p4d_t *p4dp, *saved_p4dp; + pud_t *pudp, *saved_pudp; + pmd_t *pmdp, *saved_pmdp, pmd; + pte_t *ptep; + pgtable_t saved_ptep; + pgprot_t prot; + phys_addr_t paddr; + unsigned long vaddr, pte_aligned, pmd_aligned; + unsigned long pud_aligned, p4d_aligned, pgd_aligned; + spinlock_t *uninitialized_var(ptl); + + pr_info("Validating architecture page table helpers\n"); + prot = vm_get_page_prot(VMFLAGS); + vaddr = get_random_vaddr(); + mm = mm_alloc(); + if (!mm) { + pr_err("mm_struct allocation failed\n"); + return; + } + + /* + * PFN for mapping at PTE level is determined from a standard kernel + * text symbol. But pfns for higher page table levels are derived by + * masking lower bits of this real pfn. These derived pfns might not + * exist on the platform but that does not really matter as pfn_pxx() + * helpers will still create appropriate entries for the test. This + * helps avoid large memory block allocations to be used for mapping + * at higher page table levels. + */ + paddr = __pa_symbol(&start_kernel); + + pte_aligned = (paddr & PAGE_MASK) >> PAGE_SHIFT; + pmd_aligned = (paddr & PMD_MASK) >> PAGE_SHIFT; + pud_aligned = (paddr & PUD_MASK) >> PAGE_SHIFT; + p4d_aligned = (paddr & P4D_MASK) >> PAGE_SHIFT; + pgd_aligned = (paddr & PGDIR_MASK) >> PAGE_SHIFT; + WARN_ON(!pfn_valid(pte_aligned)); + + pgdp = pgd_offset(mm, vaddr); + p4dp = p4d_alloc(mm, pgdp, vaddr); + pudp = pud_alloc(mm, p4dp, vaddr); + pmdp = pmd_alloc(mm, pudp, vaddr); + ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl); + + /* + * Save all the page table page addresses as the page table + * entries will be used for testing with random or garbage + * values. These saved addresses will be used for freeing + * page table pages. + */ + pmd = READ_ONCE(*pmdp); + saved_p4dp = p4d_offset(pgdp, 0UL); + saved_pudp = pud_offset(p4dp, 0UL); + saved_pmdp = pmd_offset(pudp, 0UL); + saved_ptep = pmd_pgtable(pmd); + + pte_basic_tests(pte_aligned, prot); + pmd_basic_tests(pmd_aligned, prot); + pud_basic_tests(pud_aligned, prot); + p4d_basic_tests(p4d_aligned, prot); + pgd_basic_tests(pgd_aligned, prot); + + pte_clear_tests(mm, ptep, vaddr); + pmd_clear_tests(mm, pmdp); + pud_clear_tests(mm, pudp); + p4d_clear_tests(mm, p4dp); + pgd_clear_tests(mm, pgdp); + + pte_unmap_unlock(ptep, ptl); + + pmd_populate_tests(mm, pmdp, saved_ptep); + pud_populate_tests(mm, pudp, saved_pmdp); + p4d_populate_tests(mm, p4dp, saved_pudp); + pgd_populate_tests(mm, pgdp, saved_p4dp); + + p4d_free(mm, saved_p4dp); + pud_free(mm, saved_pudp); + pmd_free(mm, saved_pmdp); + pte_free(mm, saved_ptep); + + mm_dec_nr_puds(mm); + mm_dec_nr_pmds(mm); + mm_dec_nr_ptes(mm); + mmdrop(mm); +}