diff mbox series

[RFC] mm/pgtable/debug: Add test validating architecture page table helpers

Message ID 1564037723-26676-2-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm/pgtable/debug: Add test validating architecture page table helpers | expand

Commit Message

Anshuman Khandual July 25, 2019, 6:55 a.m. UTC
This adds a test module which will validate architecture page table helpers
and accessors regarding compliance with generic MM semantics expectations.
This will help various architectures in validating changes to the existing
page table helpers or addition of new ones.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <Mark.Brown@arm.com>
Cc: Steven Price <Steven.Price@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sri Krishna chowdary <schowdary@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 lib/Kconfig.debug       |  14 +++
 lib/Makefile            |   1 +
 lib/test_arch_pgtable.c | 290 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 305 insertions(+)
 create mode 100644 lib/test_arch_pgtable.c

Comments

Matthew Wilcox (Oracle) July 25, 2019, 2:39 p.m. UTC | #1
On Thu, Jul 25, 2019 at 12:25:23PM +0530, Anshuman Khandual wrote:
> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.

I think this is a really good idea.

>  lib/Kconfig.debug       |  14 +++
>  lib/Makefile            |   1 +
>  lib/test_arch_pgtable.c | 290 ++++++++++++++++++++++++++++++++++++++++++++++++

Is this the right place for it?  I worry that lib/ is going to get overloaded
with test code, and this feels more like mm/ test code.

> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +static void pmd_basic_tests(void)
> +{
> +	pmd_t pmd;
> +
> +	pmd = mk_pmd(page, prot);

But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
architectures doing the right thing if asked to make a PMD for a randomly
aligned page.

How about finding the physical address of something like kernel_init(),
and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that
address?  It's also better to pass in the pfn/page rather than using global
variables to communicate to the test functions.

> +	/*
> +	 * A huge page does not point to next level page table
> +	 * entry. Hence this must qualify as pmd_bad().
> +	 */
> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));

I didn't know that rule.  This is helpful because it gives us somewhere
to document all these tricksy little rules.

> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static void pud_basic_tests(void)

Is this the right ifdef?
Catalin Marinas July 25, 2019, 5:07 p.m. UTC | #2
On Thu, Jul 25, 2019 at 12:25:23PM +0530, Anshuman Khandual wrote:
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +static void pud_clear_tests(void)
> +{
> +	pud_t pud;
> +
> +	pud_clear(&pud);
> +	WARN_ON(!pud_none(pud));
> +}

For the clear tests, I think you should initialise the local variable to
something non-zero rather than rely on whatever was on the stack. In
case it fails, you have a more deterministic behaviour.
Russell King (Oracle) July 25, 2019, 9:38 p.m. UTC | #3
On Thu, Jul 25, 2019 at 07:39:21AM -0700, Matthew Wilcox wrote:
> On Thu, Jul 25, 2019 at 12:25:23PM +0530, Anshuman Khandual wrote:
> > This adds a test module which will validate architecture page table helpers
> > and accessors regarding compliance with generic MM semantics expectations.
> > This will help various architectures in validating changes to the existing
> > page table helpers or addition of new ones.
> 
> I think this is a really good idea.
> 
> >  lib/Kconfig.debug       |  14 +++
> >  lib/Makefile            |   1 +
> >  lib/test_arch_pgtable.c | 290 ++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Is this the right place for it?  I worry that lib/ is going to get overloaded
> with test code, and this feels more like mm/ test code.
> 
> > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > +static void pmd_basic_tests(void)
> > +{
> > +	pmd_t pmd;
> > +
> > +	pmd = mk_pmd(page, prot);
> 
> But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
> architectures doing the right thing if asked to make a PMD for a randomly
> aligned page.
> 
> How about finding the physical address of something like kernel_init(),
> and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that
> address?  It's also better to pass in the pfn/page rather than using global
> variables to communicate to the test functions.

There are architectures (32-bit ARM) where the kernel is mapped using
section mappings, and we don't expect the Linux page table walking to
work for section mappings.

> > +	/*
> > +	 * A huge page does not point to next level page table
> > +	 * entry. Hence this must qualify as pmd_bad().
> > +	 */
> > +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
> 
> I didn't know that rule.  This is helpful because it gives us somewhere
> to document all these tricksy little rules.
> 
> > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> > +static void pud_basic_tests(void)
> 
> Is this the right ifdef?
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Matthew Wilcox (Oracle) July 25, 2019, 9:42 p.m. UTC | #4
On Thu, Jul 25, 2019 at 10:38:58PM +0100, Russell King - ARM Linux admin wrote:
> On Thu, Jul 25, 2019 at 07:39:21AM -0700, Matthew Wilcox wrote:
> > But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
> > architectures doing the right thing if asked to make a PMD for a randomly
> > aligned page.
> > 
> > How about finding the physical address of something like kernel_init(),
> > and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that
> > address?  It's also better to pass in the pfn/page rather than using global
> > variables to communicate to the test functions.
> 
> There are architectures (32-bit ARM) where the kernel is mapped using
> section mappings, and we don't expect the Linux page table walking to
> work for section mappings.

This test doesn't go so far as to insert the PTE/PMD/PUD/... into the
page tables.  It merely needs an appropriately aligned PFN to create a
PTE/PMD/PUD/... from.
Russell King (Oracle) July 25, 2019, 9:54 p.m. UTC | #5
On Thu, Jul 25, 2019 at 12:25:23PM +0530, Anshuman Khandual wrote:
> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mark Brown <Mark.Brown@arm.com>
> Cc: Steven Price <Steven.Price@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Sri Krishna chowdary <schowdary@nvidia.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  lib/Kconfig.debug       |  14 +++
>  lib/Makefile            |   1 +
>  lib/test_arch_pgtable.c | 290 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 305 insertions(+)
>  create mode 100644 lib/test_arch_pgtable.c
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 5960e29..a27fe8d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1719,6 +1719,20 @@ config TEST_SORT
>  
>  	  If unsure, say N.
>  
> +config TEST_ARCH_PGTABLE
> +	tristate "Test arch page table helpers for semantics compliance"
> +	depends on MMU
> +	depends on DEBUG_KERNEL || m
> +	help
> +	  This options provides a kernel module which can be used to test
> +	  architecture page table helper functions on various platform in
> +	  verifing if they comply with expected generic MM semantics. This
> +	  will help architectures code in making sure that any changes or
> +	  new additions of these helpers will still conform to generic MM
> +	  expeted semantics.
> +
> +	  If unsure, say N.
> +
>  config KPROBES_SANITY_TEST
>  	bool "Kprobes sanity tests"
>  	depends on DEBUG_KERNEL
> diff --git a/lib/Makefile b/lib/Makefile
> index 095601c..0806d61 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -76,6 +76,7 @@ obj-$(CONFIG_TEST_VMALLOC) += test_vmalloc.o
>  obj-$(CONFIG_TEST_OVERFLOW) += test_overflow.o
>  obj-$(CONFIG_TEST_RHASHTABLE) += test_rhashtable.o
>  obj-$(CONFIG_TEST_SORT) += test_sort.o
> +obj-$(CONFIG_TEST_ARCH_PGTABLE) += test_arch_pgtable.o
>  obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o
>  obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o
>  obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
> diff --git a/lib/test_arch_pgtable.c b/lib/test_arch_pgtable.c
> new file mode 100644
> index 0000000..1396664
> --- /dev/null
> +++ b/lib/test_arch_pgtable.c
> @@ -0,0 +1,290 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * This kernel module validates architecture page table helpers &
> + * accessors and helps in verifying their continued compliance with
> + * generic MM semantics.
> + *
> + * Copyright (C) 2019 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#define pr_fmt(fmt) "test_arch_pgtable: %s " fmt, __func__
> +
> +#include <linux/kernel.h>
> +#include <linux/hugetlb.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/mm_types.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/swap.h>
> +#include <linux/swapops.h>
> +#include <linux/pfn_t.h>
> +#include <linux/gfp.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +
> +/*
> + * Basic operations
> + *
> + * mkold(entry)			= An old and not an young entry
> + * mkyoung(entry)		= An young and not an old entry
> + * mkdirty(entry)		= A dirty and not a clean entry
> + * mkclean(entry)		= A clean and not a dirty entry
> + * mkwrite(entry)		= An write and not an write protected entry
> + * wrprotect(entry)		= An write protected and not an write entry
> + * pxx_bad(entry)		= A mapped and non-table entry
> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
> + */
> +#define VMA_TEST_FLAGS (VM_READ|VM_WRITE|VM_EXEC)
> +
> +static struct vm_area_struct vma;
> +static struct mm_struct mm;
> +static struct page *page;
> +static pgprot_t prot;
> +static unsigned long pfn, addr;
> +
> +static void pte_basic_tests(void)
> +{
> +	pte_t pte;
> +
> +	pte = mk_pte(page, prot);
> +	WARN_ON(!pte_same(pte, pte));
> +	WARN_ON(!pte_young(pte_mkyoung(pte)));
> +	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
> +	WARN_ON(!pte_write(pte_mkwrite(pte)));
> +	WARN_ON(pte_young(pte_mkold(pte)));
> +	WARN_ON(pte_dirty(pte_mkclean(pte)));
> +	WARN_ON(pte_write(pte_wrprotect(pte)));
> +}
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +static void pmd_basic_tests(void)
> +{
> +	pmd_t pmd;
> +
> +	pmd = mk_pmd(page, prot);

mk_pmd() is provided on 32-bit ARM LPAE, which also sets
HAVE_ARCH_TRANSPARENT_HUGEPAGE, so this should be fine.

> +	WARN_ON(!pmd_same(pmd, pmd));
> +	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
> +	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
> +	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
> +	WARN_ON(pmd_young(pmd_mkold(pmd)));
> +	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
> +	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
> +	/*
> +	 * A huge page does not point to next level page table
> +	 * entry. Hence this must qualify as pmd_bad().
> +	 */
> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
> +}
> +#else
> +static void pmd_basic_tests(void) { }
> +#endif
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static void pud_basic_tests(void)
> +{
> +	pud_t pud;
> +
> +	pud = pfn_pud(pfn, prot);
> +	WARN_ON(!pud_same(pud, pud));
> +	WARN_ON(!pud_young(pud_mkyoung(pud)));
> +	WARN_ON(!pud_write(pud_mkwrite(pud)));
> +	WARN_ON(pud_write(pud_wrprotect(pud)));
> +	WARN_ON(pud_young(pud_mkold(pud)));
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +	/*
> +	 * A huge page does not point to next level page table
> +	 * entry. Hence this must qualify as pud_bad().
> +	 */
> +	WARN_ON(!pud_bad(pud_mkhuge(pud)));
> +#endif
> +}
> +#else
> +static void pud_basic_tests(void) { }
> +#endif
> +
> +static void p4d_basic_tests(void)
> +{
> +	pte_t pte;
> +	p4d_t p4d;
> +
> +	pte = mk_pte(page, prot);
> +	p4d = (p4d_t) { (pte_val(pte)) };
> +	WARN_ON(!p4d_same(p4d, p4d));

If the intention is to test p4d_same(), is this really a sufficient test?

> +}
> +
> +static void pgd_basic_tests(void)
> +{
> +	pte_t pte;
> +	pgd_t pgd;
> +
> +	pte = mk_pte(page, prot);
> +	pgd = (pgd_t) { (pte_val(pte)) };
> +	WARN_ON(!pgd_same(pgd, pgd));

If the intention is to test pgd_same(), is this really a sufficient test?

> +}
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +static void pud_clear_tests(void)
> +{
> +	pud_t pud;
> +
> +	pud_clear(&pud);
> +	WARN_ON(!pud_none(pud));
> +}
> +
> +static void pud_populate_tests(void)
> +{
> +	pmd_t pmd;
> +	pud_t pud;
> +
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pud_bad().
> +	 */
> +	pmd_clear(&pmd);

32-bit ARM sets __PAGETABLE_PMD_FOLDED so this is not a concern.

> +	pud_clear(&pud);
> +	pud_populate(&mm, &pud, &pmd);
> +	WARN_ON(pud_bad(pud));
> +}
> +#else
> +static void pud_clear_tests(void) { }
> +static void pud_populate_tests(void) { }
> +#endif
> +
> +#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
> +static void p4d_clear_tests(void)
> +{
> +	p4d_t p4d;
> +
> +	p4d_clear(&p4d);
> +	WARN_ON(!p4d_none(p4d));
> +}
> +
> +static void p4d_populate_tests(void)
> +{
> +	pud_t pud;
> +	p4d_t p4d;
> +
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as p4d_bad().
> +	 */
> +	pud_clear(&pud);
> +	p4d_clear(&p4d);
> +	p4d_populate(&mm, &p4d, &pud);
> +	WARN_ON(p4d_bad(p4d));
> +}
> +#else
> +static void p4d_clear_tests(void) { }
> +static void p4d_populate_tests(void) { }
> +#endif
> +
> +#ifndef __PAGETABLE_P4D_FOLDED
> +static void pgd_clear_tests(void)
> +{
> +	pgd_t pgd;
> +
> +	pgd_clear(&pgd);
> +	WARN_ON(!pgd_none(pgd));
> +}
> +
> +static void pgd_populate_tests(void)
> +{
> +	pgd_t p4d;
> +	pgd_t pgd;
> +
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pgd_bad().
> +	 */
> +	p4d_clear(&p4d);
> +	pgd_clear(&pgd);
> +	pgd_populate(&mm, &pgd, &p4d);
> +	WARN_ON(pgd_bad(pgd));
> +}
> +#else
> +static void pgd_clear_tests(void) { }
> +static void pgd_populate_tests(void) { }
> +#endif
> +
> +static void pxx_clear_tests(void)
> +{
> +	pte_t pte;
> +	pmd_t pmd;
> +
> +	pte_clear(NULL, 0, &pte);
> +	WARN_ON(!pte_none(pte));
> +
> +	pmd_clear(&pmd);

This really isn't going to be happy on 32-bit non-LPAE ARM.  Here, a
PMD is a 32-bit entry which is expected to be _within_ a proper PGD,
where a PGD is 16K in size, consisting of pairs of PMDs.

So, pmd_clear() expects to always be called for an _even_ PMD of the
pair, and will write to the even and following odd PMD.  Hence, the
above will scribble over the stack of this function.

> +	WARN_ON(!pmd_none(pmd));
> +
> +	pud_clear_tests();
> +	p4d_clear_tests();
> +	pgd_clear_tests();
> +}
> +
> +static void pxx_populate_tests(void)
> +{
> +	pmd_t pmd;
> +
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pmd_bad().
> +	 */
> +	memset(page, 0, sizeof(*page));
> +	pmd_clear(&pmd);

This really isn't going to be happy on 32-bit non-LPAE ARM.  Here, a
PMD is a 32-bit entry which is expected to be _within_ a proper PGD,
where a PGD is 16K in size, consisting of pairs of PMDs.

So, pmd_clear() expects to always be called for an _even_ PMD of the
pair, and will write to the even and following odd PMD.  Hence, the
above will scribble over the stack of this function.

> +	pmd_populate(&mm, &pmd, page);

This too has the same expectations on 32-bit non-LPAE ARM.

> +	WARN_ON(pmd_bad(pmd));
> +
> +	pud_populate_tests();
> +	p4d_populate_tests();
> +	pgd_populate_tests();
> +}
> +
> +static int variables_alloc(void)
> +{
> +	vma_init(&vma, &mm);
> +	prot = vm_get_page_prot(VMA_TEST_FLAGS);
> +	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!page) {
> +		pr_err("Test struct page allocation failed\n");
> +		return 1;
> +	}
> +	pfn = page_to_pfn(page);
> +	addr = 0;
> +	return 0;
> +}
> +
> +static void variables_free(void)
> +{
> +	free_page((unsigned long)page_address(page));
> +}
> +
> +static int __init arch_pgtable_tests_init(void)
> +{
> +	int ret;
> +
> +	ret = variables_alloc();
> +	if (ret) {
> +		pr_err("Test resource initialization failed\n");
> +		return 1;
> +	}
> +
> +	pte_basic_tests();
> +	pmd_basic_tests();
> +	pud_basic_tests();
> +	p4d_basic_tests();
> +	pgd_basic_tests();
> +	pxx_clear_tests();
> +	pxx_populate_tests();
> +	variables_free();
> +	return 0;
> +}
> +
> +static void __exit arch_pgtable_tests_exit(void) { }
> +
> +module_init(arch_pgtable_tests_init);
> +module_exit(arch_pgtable_tests_exit);
> +MODULE_LICENSE("GPL v2");
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Russell King (Oracle) July 25, 2019, 9:58 p.m. UTC | #6
On Thu, Jul 25, 2019 at 02:42:22PM -0700, Matthew Wilcox wrote:
> On Thu, Jul 25, 2019 at 10:38:58PM +0100, Russell King - ARM Linux admin wrote:
> > On Thu, Jul 25, 2019 at 07:39:21AM -0700, Matthew Wilcox wrote:
> > > But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
> > > architectures doing the right thing if asked to make a PMD for a randomly
> > > aligned page.
> > > 
> > > How about finding the physical address of something like kernel_init(),
> > > and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that
> > > address?  It's also better to pass in the pfn/page rather than using global
> > > variables to communicate to the test functions.
> > 
> > There are architectures (32-bit ARM) where the kernel is mapped using
> > section mappings, and we don't expect the Linux page table walking to
> > work for section mappings.
> 
> This test doesn't go so far as to insert the PTE/PMD/PUD/... into the
> page tables.  It merely needs an appropriately aligned PFN to create a
> PTE/PMD/PUD/... from.

Well, in any case,

c085ac68 t kernel_init

so I'm not sure that would be an improvement.
Matthew Wilcox (Oracle) July 25, 2019, 10:56 p.m. UTC | #7
On Thu, Jul 25, 2019 at 10:58:12PM +0100, Russell King - ARM Linux admin wrote:
> On Thu, Jul 25, 2019 at 02:42:22PM -0700, Matthew Wilcox wrote:
> > On Thu, Jul 25, 2019 at 10:38:58PM +0100, Russell King - ARM Linux admin wrote:
> > > On Thu, Jul 25, 2019 at 07:39:21AM -0700, Matthew Wilcox wrote:
> > > > But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
> > > > architectures doing the right thing if asked to make a PMD for a randomly
> > > > aligned page.
> > > > 
> > > > How about finding the physical address of something like kernel_init(),
> > > > and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that
> > > > address?  It's also better to pass in the pfn/page rather than using global
> > > > variables to communicate to the test functions.
> > > 
> > > There are architectures (32-bit ARM) where the kernel is mapped using
> > > section mappings, and we don't expect the Linux page table walking to
> > > work for section mappings.
> > 
> > This test doesn't go so far as to insert the PTE/PMD/PUD/... into the
> > page tables.  It merely needs an appropriately aligned PFN to create a
> > PTE/PMD/PUD/... from.
> 
> Well, in any case,
> 
> c085ac68 t kernel_init
> 
> so I'm not sure that would be an improvement.

I said "the corresponding pte/pmd/pud/p4d/pgd that encompasses that address"

So for a PTE, you'd use PFN 0xc085a000, for a PMD, you'd use PFN 0xc0000000
and for a PGD, you'd use PFN 0 (assuming 9 bits per level of table).
Anshuman Khandual July 26, 2019, 4:28 a.m. UTC | #8
On 07/25/2019 10:37 PM, Catalin Marinas wrote:
> On Thu, Jul 25, 2019 at 12:25:23PM +0530, Anshuman Khandual wrote:
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +static void pud_clear_tests(void)
>> +{
>> +	pud_t pud;
>> +
>> +	pud_clear(&pud);
>> +	WARN_ON(!pud_none(pud));
>> +}
> 
> For the clear tests, I think you should initialise the local variable to
> something non-zero rather than rely on whatever was on the stack. In
> case it fails, you have a more deterministic behaviour.

Sure, it makes sense, will change.
Anshuman Khandual July 26, 2019, 4:47 a.m. UTC | #9
On 07/25/2019 08:09 PM, Matthew Wilcox wrote:
> On Thu, Jul 25, 2019 at 12:25:23PM +0530, Anshuman Khandual wrote:
>> This adds a test module which will validate architecture page table helpers
>> and accessors regarding compliance with generic MM semantics expectations.
>> This will help various architectures in validating changes to the existing
>> page table helpers or addition of new ones.
> 
> I think this is a really good idea.
> 
>>  lib/Kconfig.debug       |  14 +++
>>  lib/Makefile            |   1 +
>>  lib/test_arch_pgtable.c | 290 ++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Is this the right place for it?  I worry that lib/ is going to get overloaded
> with test code, and this feels more like mm/ test code.

Sure this can be moved to mm/ but unlike existing test configs there (following)
lets keep some config description in mm/Kconfig. Will probably rename this as
CONFIG_DEBUG_ARCH_PGTABLE_TEST to match other tests.

CONFIG_DEBUG_KMEMLEAK_TEST
CONFIG_DEBUG_RODATA_TEST
CONFIG_MEMTEST

After moving to mm/ directory I guess it does not need a loadable module option.

> 
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> +static void pmd_basic_tests(void)
>> +{
>> +	pmd_t pmd;
>> +
>> +	pmd = mk_pmd(page, prot);
> 
> But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
> architectures doing the right thing if asked to make a PMD for a randomly
> aligned page.
> 
> How about finding the physical address of something like kernel_init(),

Physical address corresponding to the symbol in the kernel text segment ?

> and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that

So I guess this will help us use pte/pmd/pud/p4d/pgd entries from a real and
present mapping rather then making them up for test purpose. Although we are
not creating real page tables here just wondering if this could some how
affect these real mapping in anyway from some accessors. The current proposal
stays clear from anything real - allocates, evaluates and releases.

Also table entries at pmd/pud/p4d/pgd cannot be operated with accessors in the
test. THP and PUD THP will operate on leaf entries at pmd or pud levels. We need
them as leaf entries created from allocated (aligned) pfns. While determining
pte/pmd/pud/p4d/pgd for kernel_init() some of them will be table entries.

> address?  It's also better to pass in the pfn/page rather than using global
> variables to communicate to the test functions.

Sure those can be allocated and passed from the main function. Just wanted to
avoid page allocation in each individual tests.

> 
>> +	/*
>> +	 * A huge page does not point to next level page table
>> +	 * entry. Hence this must qualify as pmd_bad().
>> +	 */
>> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
> 
> I didn't know that rule.  This is helpful because it gives us somewhere
> to document all these tricksy little rules.

That is another objective this test has which will help settle semantics
in a clear and documented manner.

> 
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +static void pud_basic_tests(void)
> 
> Is this the right ifdef?

IIUC THP at PUD is where the pud_t entries are directly operated upon and the
corresponding accessors are present only when HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
is enabled. Am I missing something here ?
Anshuman Khandual July 26, 2019, 5:10 a.m. UTC | #10
On 07/26/2019 03:24 AM, Russell King - ARM Linux admin wrote:
> On Thu, Jul 25, 2019 at 12:25:23PM +0530, Anshuman Khandual wrote:
>> This adds a test module which will validate architecture page table helpers
>> and accessors regarding compliance with generic MM semantics expectations.
>> This will help various architectures in validating changes to the existing
>> page table helpers or addition of new ones.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Mark Brown <Mark.Brown@arm.com>
>> Cc: Steven Price <Steven.Price@arm.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Sri Krishna chowdary <schowdary@nvidia.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org
>>
>> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  lib/Kconfig.debug       |  14 +++
>>  lib/Makefile            |   1 +
>>  lib/test_arch_pgtable.c | 290 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 305 insertions(+)
>>  create mode 100644 lib/test_arch_pgtable.c
>>
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index 5960e29..a27fe8d 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -1719,6 +1719,20 @@ config TEST_SORT
>>  
>>  	  If unsure, say N.
>>  
>> +config TEST_ARCH_PGTABLE
>> +	tristate "Test arch page table helpers for semantics compliance"
>> +	depends on MMUpte/pmd/pud/p4d/pgd 
>> +	depends on DEBUG_KERNEL || m
>> +	help
>> +	  This options provides a kernel module which can be used to test
>> +	  architecture page table helper functions on various platform in
>> +	  verifing if they comply with expected generic MM semantics. This
>> +	  will help architectures code in making sure that any changes or
>> +	  new additions of these helpers will still conform to generic MM
>> +	  expeted semantics.
>> +
>> +	  If unsure, say N.
>> +
>>  config KPROBES_SANITY_TEST
>>  	bool "Kprobes sanity tests"
>>  	depends on DEBUG_KERNEL
>> diff --git a/lib/Makefile b/lib/Makefile
>> index 095601c..0806d61 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -76,6 +76,7 @@ obj-$(CONFIG_TEST_VMALLOC) += test_vmalloc.o
>>  obj-$(CONFIG_TEST_OVERFLOW) += test_overflow.o
>>  obj-$(CONFIG_TEST_RHASHTABLE) += test_rhashtable.o
>>  obj-$(CONFIG_TEST_SORT) += test_sort.o
>> +obj-$(CONFIG_TEST_ARCH_PGTABLE) += test_arch_pgtable.o
>>  obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o
>>  obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o
>>  obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
>> diff --git a/lib/test_arch_pgtable.c b/lib/test_arch_pgtable.c
>> new file mode 100644
>> index 0000000..1396664
>> --- /dev/null
>> +++ b/lib/test_arch_pgtable.c
>> @@ -0,0 +1,290 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * This kernel module validates architecture page table helpers &
>> + * accessors and helps in verifying their continued compliance with
>> + * generic MM semantics.
>> + *
>> + * Copyright (C) 2019 ARM Ltd.
>> + *
>> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
>> + */
>> +#define pr_fmt(fmt) "test_arch_pgtable: %s " fmt, __func__
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/mm.h>
>> +#include <linux/mman.h>
>> +#include <linux/mm_types.h>
>> +#include <linux/module.h>
>> +#include <linux/printk.h>
>> +#include <linux/swap.h>
>> +#include <linux/swapops.h>
>> +#include <linux/pfn_t.h>
>> +#include <linux/gfp.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/pgtable.h>
>> +
>> +/*
>> + * Basic operations
>> + *
>> + * mkold(entry)			= An old and not an young entry
>> + * mkyoung(entry)		= An young and not an old entry
>> + * mkdirty(entry)		= A dirty and not a clean entry
>> + * mkclean(entry)		= A clean and not a dirty entry
>> + * mkwrite(entry)		= An write and not an write protected entry
>> + * wrprotect(entry)		= An write protected and not an write entry
>> + * pxx_bad(entry)		= A mapped and non-table entry
>> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
>> + */
>> +#define VMA_TEST_FLAGS (VM_READ|VM_WRITE|VM_EXEC)
>> +
>> +static struct vm_area_struct vma;
>> +static struct mm_struct mm;
>> +static struct page *page;
>> +static pgprot_t prot;
>> +static unsigned long pfn, addr;
>> +
>> +static void pte_basic_tests(void)
>> +{
>> +	pte_t pte;
>> +
>> +	pte = mk_pte(page, prot);
>> +	WARN_ON(!pte_same(pte, pte));
>> +	WARN_ON(!pte_young(pte_mkyoung(pte)));
>> +	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
>> +	WARN_ON(!pte_write(pte_mkwrite(pte)));
>> +	WARN_ON(pte_young(pte_mkold(pte)));
>> +	WARN_ON(pte_dirty(pte_mkclean(pte)));
>> +	WARN_ON(pte_write(pte_wrprotect(pte)));
>> +}
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> +static void pmd_basic_tests(void)
>> +{
>> +	pmd_t pmd;
>> +
>> +	pmd = mk_pmd(page, prot);
> 
> mk_pmd() is provided on 32-bit ARM LPAE, which also sets
> HAVE_ARCH_TRANSPARENT_HUGEPAGE, so this should be fine.

Okay.

> 
>> +	WARN_ON(!pmd_same(pmd, pmd));
>> +	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
>> +	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
>> +	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
>> +	WARN_ON(pmd_young(pmd_mkold(pmd)));
>> +	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
>> +	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
>> +	/*
>> +	 * A huge page does not point to next level page table
>> +	 * entry. Hence this must qualify as pmd_bad().
>> +	 */
>> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>> +}
>> +#else
>> +static void pmd_basic_tests(void) { }
>> +#endif
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +static void pud_basic_tests(void)
>> +{
>> +	pud_t pud;
>> +
>> +	pud = pfn_pud(pfn, prot);
>> +	WARN_ON(!pud_same(pud, pud));
>> +	WARN_ON(!pud_young(pud_mkyoung(pud)));
>> +	WARN_ON(!pud_write(pud_mkwrite(pud)));
>> +	WARN_ON(pud_write(pud_wrprotect(pud)));
>> +	WARN_ON(pud_young(pud_mkold(pud)));
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +	/*
>> +	 * A huge page does not point to next level page table
>> +	 * entry. Hence this must qualify as pud_bad().
>> +	 */
>> +	WARN_ON(!pud_bad(pud_mkhuge(pud)));
>> +#endif
>> +}
>> +#else
>> +static void pud_basic_tests(void) { }
>> +#endif
>> +
>> +static void p4d_basic_tests(void)
>> +{
>> +	pte_t pte;
>> +	p4d_t p4d;
>> +
>> +	pte = mk_pte(page, prot);
>> +	p4d = (p4d_t) { (pte_val(pte)) };
>> +	WARN_ON(!p4d_same(p4d, p4d));
> 
> If the intention is to test p4d_same(), is this really a sufficient test?

p4d_same() just tests if two p4d entries have the same value. Hence any non-zero
value in there should be able to achieve that. Besides p4d does not have much
common helpers (as it gets often folded) to operate on an entry in order to create
other real world values. But if you have suggestions to make this better I am happy
to incorporate.

> 
>> +}
>> +
>> +static void pgd_basic_tests(void)
>> +{
>> +	pte_t pte;
>> +	pgd_t pgd;
>> +
>> +	pte = mk_pte(page, prot);
>> +	pgd = (pgd_t) { (pte_val(pte)) };
>> +	WARN_ON(!pgd_same(pgd, pgd));
> 
> If the intention is to test pgd_same(), is this really a sufficient test?

Same as above.

> 
>> +}
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +static void pud_clear_tests(void)
>> +{
>> +	pud_t pud;
>> +
>> +	pud_clear(&pud);
>> +	WARN_ON(!pud_none(pud));
>> +}
>> +
>> +static void pud_populate_tests(void)
>> +{
>> +	pmd_t pmd;
>> +	pud_t pud;
>> +
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pud_bad().
>> +	 */
>> +	pmd_clear(&pmd);
> 
> 32-bit ARM sets __PAGETABLE_PMD_FOLDED so this is not a concern.

Okay.

> 
>> +	pud_clear(&pud);
>> +	pud_populate(&mm, &pud, &pmd);
>> +	WARN_ON(pud_bad(pud));
>> +}
>> +#else
>> +static void pud_clear_tests(void) { }
>> +static void pud_populate_tests(void) { }
>> +#endif
>> +
>> +#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
>> +static void p4d_clear_tests(void)
>> +{
>> +	p4d_t p4d;
>> +
>> +	p4d_clear(&p4d);
>> +	WARN_ON(!p4d_none(p4d));
>> +}
>> +
>> +static void p4d_populate_tests(void)
>> +{
>> +	pud_t pud;
>> +	p4d_t p4d;
>> +
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as p4d_bad().
>> +	 */
>> +	pud_clear(&pud);
>> +	p4d_clear(&p4d);
>> +	p4d_populate(&mm, &p4d, &pud);
>> +	WARN_ON(p4d_bad(p4d));
>> +}
>> +#else
>> +static void p4d_clear_tests(void) { }
>> +static void p4d_populate_tests(void) { }
>> +#endif
>> +
>> +#ifndef __PAGETABLE_P4D_FOLDED
>> +static void pgd_clear_tests(void)
>> +{
>> +	pgd_t pgd;
>> +
>> +	pgd_clear(&pgd);
>> +	WARN_ON(!pgd_none(pgd));
>> +}
>> +
>> +static void pgd_populate_tests(void)
>> +{
>> +	pgd_t p4d;
>> +	pgd_t pgd;
>> +
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pgd_bad().
>> +	 */
>> +	p4d_clear(&p4d);
>> +	pgd_clear(&pgd);
>> +	pgd_populate(&mm, &pgd, &p4d);
>> +	WARN_ON(pgd_bad(pgd));
>> +}
>> +#else
>> +static void pgd_clear_tests(void) { }
>> +static void pgd_populate_tests(void) { }
>> +#endif
>> +
>> +static void pxx_clear_tests(void)
>> +{
>> +	pte_t pte;
>> +	pmd_t pmd;
>> +
>> +	pte_clear(NULL, 0, &pte);
>> +	WARN_ON(!pte_none(pte));
>> +
>> +	pmd_clear(&pmd);
> 
> This really isn't going to be happy on 32-bit non-LPAE ARM.  Here, a
> PMD is a 32-bit entry which is expected to be _within_ a proper PGD,
> where a PGD is 16K in size, consisting of pairs of PMDs.
> 
> So, pmd_clear() expects to always be called for an _even_ PMD of the
> pair, and will write to the even and following odd PMD.  Hence, the
> above will scribble over the stack of this function.

A pmd_clear() clears two consecutive 32 bit pmd_t entries not a single
one. So the stack needs to have two entries for such cases. I could see
only a single definition for pmd_none() on arm, hence pmd_none() should
be called on both the 32 bit entries cleared with pmd_clear() earlier ?

Though this could be accommodate with relevant non-LPAE ARM specific
config option but should we do that ? All the config wrappers in the test
right now are generic MM identifiable and nothing platform specific. The
primary idea is to test platform page table helpers as seen from generic
MM. Any suggestions how to incorporate this while still keeping the test
clear from platform specific details like these ?

> 
>> +	WARN_ON(!pmd_none(pmd));
>> +
>> +	pud_clear_tests();
>> +	p4d_clear_tests();
>> +	pgd_clear_tests();
>> +}
>> +
>> +static void pxx_populate_tests(void)
>> +{
>> +	pmd_t pmd;
>> +
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pmd_bad().
>> +	 */
>> +	memset(page, 0, sizeof(*page));
>> +	pmd_clear(&pmd);
> 
> This really isn't going to be happy on 32-bit non-LPAE ARM.  Here, a
> PMD is a 32-bit entry which is expected to be _within_ a proper PGD,
> where a PGD is 16K in size, consisting of pairs of PMDs.
> 
> So, pmd_clear() expects to always be called for an _even_ PMD of the
> pair, and will write to the even and following odd PMD.  Hence, the
> above will scribble over the stack of this function.

Same as above.

> 
>> +	pmd_populate(&mm, &pmd, page);
> 
> This too has the same expectations on 32-bit non-LPAE ARM.

Right this loads up both pmdp[0] and pmdp[1]. The issue is equivalent to
the one detailed above.
Matthew Wilcox (Oracle) July 26, 2019, 7:54 p.m. UTC | #11
On Fri, Jul 26, 2019 at 10:17:11AM +0530, Anshuman Khandual wrote:
> > But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
> > architectures doing the right thing if asked to make a PMD for a randomly
> > aligned page.
> > 
> > How about finding the physical address of something like kernel_init(),
> 
> Physical address corresponding to the symbol in the kernel text segment ?

Yes.  We need the address of something that's definitely memory.
The stack might be in vmalloc space.  We can't allocate memory from the
allocator that's PUD-aligned.  This seems like a reasonable approximation
to something that might work.

> > and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that
> 
> So I guess this will help us use pte/pmd/pud/p4d/pgd entries from a real and
> present mapping rather then making them up for test purpose. Although we are
> not creating real page tables here just wondering if this could some how
> affect these real mapping in anyway from some accessors. The current proposal
> stays clear from anything real - allocates, evaluates and releases.

I think that's a mistake.  As Russell said, the ARM p*d manipulation
functions expect to operate on tables, not on individual entries
constructed on the stack.

So I think the right thing to do here is allocate an mm, then do the
pgd_alloc / p4d_alloc / pud_alloc / pmd_alloc / pte_alloc() steps giving
you real page tables that you can manipulate.

Then destroy them, of course.  And don't access through them.

> >> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> >> +static void pud_basic_tests(void)
> > 
> > Is this the right ifdef?
> 
> IIUC THP at PUD is where the pud_t entries are directly operated upon and the
> corresponding accessors are present only when HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> is enabled. Am I missing something here ?

Maybe I am.  I thought we could end up operating on PUDs for kernel mappings,
even without transparent hugepages turned on.
Anshuman Khandual July 29, 2019, 8:32 a.m. UTC | #12
On 07/27/2019 01:24 AM, Matthew Wilcox wrote:
> On Fri, Jul 26, 2019 at 10:17:11AM +0530, Anshuman Khandual wrote:
>>> But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
>>> architectures doing the right thing if asked to make a PMD for a randomly
>>> aligned page.
>>>
>>> How about finding the physical address of something like kernel_init(),
>>
>> Physical address corresponding to the symbol in the kernel text segment ?
> 
> Yes.  We need the address of something that's definitely memory.
> The stack might be in vmalloc space.  We can't allocate memory from the
> allocator that's PUD-aligned.  This seems like a reasonable approximation
> to something that might work.

Okay sure. What is about vmalloc space being PUD aligned and how that is
problematic here ? Could you please give some details. Just being curious.

> 
>>> and using the corresponding pte/pmd/pud/p4d/pgd that encompasses that
>>
>> So I guess this will help us use pte/pmd/pud/p4d/pgd entries from a real and
>> present mapping rather then making them up for test purpose. Although we are
>> not creating real page tables here just wondering if this could some how
>> affect these real mapping in anyway from some accessors. The current proposal
>> stays clear from anything real - allocates, evaluates and releases.
> 
> I think that's a mistake.  As Russell said, the ARM p*d manipulation
> functions expect to operate on tables, not on individual entries
> constructed on the stack.

Hmm. I assume that it will take care of dual 32 bit entry updates on arm
platform through various helper functions as Russel had mentioned earlier.
After we create page table with p?d_alloc() functions and pick an entry at
each page table level.

> 
> So I think the right thing to do here is allocate an mm, then do the
> pgd_alloc / p4d_alloc / pud_alloc / pmd_alloc / pte_alloc() steps giving
> you real page tables that you can manipulate.
> 
> Then destroy them, of course.  And don't access through them.

mm_alloc() seems like a comprehensive helper to allocate and initialize a
mm_struct. But could we use mm_init() with 'current' in the driver context or we
need to create a dummy task_struct for this purpose. Some initial tests show that
p?d_alloc() and p?d_free() at each level with a fixed virtual address gives p?d_t
entries required at various page table level to test upon.

> 
>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>> +static void pud_basic_tests(void)
>>>
>>> Is this the right ifdef?
>>
>> IIUC THP at PUD is where the pud_t entries are directly operated upon and the
>> corresponding accessors are present only when HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> is enabled. Am I missing something here ?
> 
> Maybe I am.  I thought we could end up operating on PUDs for kernel mappings,
> even without transparent hugepages turned on.

In generic MM ? IIUC except ioremap mapping all other PUD handling for kernel virtual
range is platform specific. All the helpers used in the function pud_basic_tests() are
part of THP and used in mm/huge_memory.c
Matthew Wilcox (Oracle) July 30, 2019, 5:03 p.m. UTC | #13
On Mon, Jul 29, 2019 at 02:02:52PM +0530, Anshuman Khandual wrote:
> On 07/27/2019 01:24 AM, Matthew Wilcox wrote:
> > On Fri, Jul 26, 2019 at 10:17:11AM +0530, Anshuman Khandual wrote:
> >>> But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
> >>> architectures doing the right thing if asked to make a PMD for a randomly
> >>> aligned page.
> >>>
> >>> How about finding the physical address of something like kernel_init(),
> >>
> >> Physical address corresponding to the symbol in the kernel text segment ?
> > 
> > Yes.  We need the address of something that's definitely memory.
> > The stack might be in vmalloc space.  We can't allocate memory from the
> > allocator that's PUD-aligned.  This seems like a reasonable approximation
> > to something that might work.
> 
> Okay sure. What is about vmalloc space being PUD aligned and how that is
> problematic here ? Could you please give some details. Just being curious.

Those were two different sentences.

We can't use the address of something on the stack, because we don't
know whether the stack is in vmalloc space or in the direct map.

We can't use the address of something we've allocated from the page
allocator, because the page allocator can't give us PUD-aligned memory.

> > I think that's a mistake.  As Russell said, the ARM p*d manipulation
> > functions expect to operate on tables, not on individual entries
> > constructed on the stack.
> 
> Hmm. I assume that it will take care of dual 32 bit entry updates on arm
> platform through various helper functions as Russel had mentioned earlier.
> After we create page table with p?d_alloc() functions and pick an entry at
> each page table level.

Right.

> > So I think the right thing to do here is allocate an mm, then do the
> > pgd_alloc / p4d_alloc / pud_alloc / pmd_alloc / pte_alloc() steps giving
> > you real page tables that you can manipulate.
> > 
> > Then destroy them, of course.  And don't access through them.
> 
> mm_alloc() seems like a comprehensive helper to allocate and initialize a
> mm_struct. But could we use mm_init() with 'current' in the driver context or we
> need to create a dummy task_struct for this purpose. Some initial tests show that
> p?d_alloc() and p?d_free() at each level with a fixed virtual address gives p?d_t
> entries required at various page table level to test upon.

I think it's wise to start a new mm.  I'm not sure exactly what calls
to make to get one going.

> >>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> >>>> +static void pud_basic_tests(void)
> >>>
> >>> Is this the right ifdef?
> >>
> >> IIUC THP at PUD is where the pud_t entries are directly operated upon and the
> >> corresponding accessors are present only when HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> >> is enabled. Am I missing something here ?
> > 
> > Maybe I am.  I thought we could end up operating on PUDs for kernel mappings,
> > even without transparent hugepages turned on.
> 
> In generic MM ? IIUC except ioremap mapping all other PUD handling for kernel virtual
> range is platform specific. All the helpers used in the function pud_basic_tests() are
> part of THP and used in mm/huge_memory.c

But what about hugetlbfs?  And vmalloc can also use larger pages these days.
I don't think these tests should be conditional on transparent hugepages.
Anshuman Khandual Aug. 5, 2019, 4:35 a.m. UTC | #14
On 07/30/2019 10:33 PM, Matthew Wilcox wrote:
> On Mon, Jul 29, 2019 at 02:02:52PM +0530, Anshuman Khandual wrote:
>> On 07/27/2019 01:24 AM, Matthew Wilcox wrote:
>>> On Fri, Jul 26, 2019 at 10:17:11AM +0530, Anshuman Khandual wrote:
>>>>> But 'page' isn't necessarily PMD-aligned.  I don't think we can rely on
>>>>> architectures doing the right thing if asked to make a PMD for a randomly
>>>>> aligned page.
>>>>>
>>>>> How about finding the physical address of something like kernel_init(),
>>>>
>>>> Physical address corresponding to the symbol in the kernel text segment ?
>>>
>>> Yes.  We need the address of something that's definitely memory.
>>> The stack might be in vmalloc space.  We can't allocate memory from the
>>> allocator that's PUD-aligned.  This seems like a reasonable approximation
>>> to something that might work.
>>
>> Okay sure. What is about vmalloc space being PUD aligned and how that is
>> problematic here ? Could you please give some details. Just being curious.
> 
> Those were two different sentences.
> 
> We can't use the address of something on the stack, because we don't
> know whether the stack is in vmalloc space or in the direct map.

Okay because kernel stack might be on vmalloc() space.

> 
> We can't use the address of something we've allocated from the page
> allocator, because the page allocator can't give us PUD-aligned memory.

Because this test will be executed early during boot, alloc_contig_range()
makes sense for this purpose. Something like alloc_gigantic_page() which other
than getting the order from huge_page_order(h) is sort of a generic allocator.
Shall we make core part of the function a generic allocator for broader usage
in kernel in case the page allocator would not be sufficient like in this case
which requires a PUD size and a PUD aligned memory.

In case PUD aligned memory block cannot be allocated, pud_basic_tests() must
be skipped and a PMD aligned memory block should be used instead as fallback
for other tests.

>
>>> I think that's a mistake.  As Russell said, the ARM p*d manipulation
>>> functions expect to operate on tables, not on individual entries
>>> constructed on the stack.
>>
>> Hmm. I assume that it will take care of dual 32 bit entry updates on arm
>> platform through various helper functions as Russel had mentioned earlier.
>> After we create page table with p?d_alloc() functions and pick an entry at
>> each page table level.
> 
> Right.
> 
>>> So I think the right thing to do here is allocate an mm, then do the
>>> pgd_alloc / p4d_alloc / pud_alloc / pmd_alloc / pte_alloc() steps giving
>>> you real page tables that you can manipulate.
>>>
>>> Then destroy them, of course.  And don't access through them.
>>
>> mm_alloc() seems like a comprehensive helper to allocate and initialize a
>> mm_struct. But could we use mm_init() with 'current' in the driver context or we
>> need to create a dummy task_struct for this purpose. Some initial tests show that
>> p?d_alloc() and p?d_free() at each level with a fixed virtual address gives p?d_t
>> entries required at various page table level to test upon.
> 
> I think it's wise to start a new mm.  I'm not sure exactly what calls
> to make to get one going.> 
>>>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>>>> +static void pud_basic_tests(void)
>>>>>
>>>>> Is this the right ifdef?
>>>>
>>>> IIUC THP at PUD is where the pud_t entries are directly operated upon and the
>>>> corresponding accessors are present only when HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>> is enabled. Am I missing something here ?
>>>
>>> Maybe I am.  I thought we could end up operating on PUDs for kernel mappings,
>>> even without transparent hugepages turned on.
>>
>> In generic MM ? IIUC except ioremap mapping all other PUD handling for kernel virtual
>> range is platform specific. All the helpers used in the function pud_basic_tests() are
>> part of THP and used in mm/huge_memory.c
> 
> But what about hugetlbfs?  And vmalloc can also use larger pages these days.
> I don't think these tests should be conditional on transparent hugepages.

The current proposal restricts itself to very basic operations at each page
table level for now. I have subsequent patches which adds various MM feature
related specific helpers with respect to SPECIAL, DEVMAP, HugeTLB entries
etc. We can also explore platform specific helpers for ioremap and vmalloc.
But that is for subsequent patches and scope for current proposal is limited.

THP (or PUD THP) config wrappers are here because these helpers mentioned in
the current proposal are present only when THP (or PUD THP) is enabled but
are absent otherwise. Without these wrappers, we will have build failures.
Hence these wrappers are necessary.
diff mbox series

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5960e29..a27fe8d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1719,6 +1719,20 @@  config TEST_SORT
 
 	  If unsure, say N.
 
+config TEST_ARCH_PGTABLE
+	tristate "Test arch page table helpers for semantics compliance"
+	depends on MMU
+	depends on DEBUG_KERNEL || m
+	help
+	  This options provides a kernel module which can be used to test
+	  architecture page table helper functions on various platform in
+	  verifing if they comply with expected generic MM semantics. This
+	  will help architectures code in making sure that any changes or
+	  new additions of these helpers will still conform to generic MM
+	  expeted semantics.
+
+	  If unsure, say N.
+
 config KPROBES_SANITY_TEST
 	bool "Kprobes sanity tests"
 	depends on DEBUG_KERNEL
diff --git a/lib/Makefile b/lib/Makefile
index 095601c..0806d61 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -76,6 +76,7 @@  obj-$(CONFIG_TEST_VMALLOC) += test_vmalloc.o
 obj-$(CONFIG_TEST_OVERFLOW) += test_overflow.o
 obj-$(CONFIG_TEST_RHASHTABLE) += test_rhashtable.o
 obj-$(CONFIG_TEST_SORT) += test_sort.o
+obj-$(CONFIG_TEST_ARCH_PGTABLE) += test_arch_pgtable.o
 obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
diff --git a/lib/test_arch_pgtable.c b/lib/test_arch_pgtable.c
new file mode 100644
index 0000000..1396664
--- /dev/null
+++ b/lib/test_arch_pgtable.c
@@ -0,0 +1,290 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This kernel module validates architecture page table helpers &
+ * accessors and helps in verifying their continued compliance with
+ * generic MM semantics.
+ *
+ * Copyright (C) 2019 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#define pr_fmt(fmt) "test_arch_pgtable: %s " fmt, __func__
+
+#include <linux/kernel.h>
+#include <linux/hugetlb.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/pfn_t.h>
+#include <linux/gfp.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+/*
+ * Basic operations
+ *
+ * mkold(entry)			= An old and not an young entry
+ * mkyoung(entry)		= An young and not an old entry
+ * mkdirty(entry)		= A dirty and not a clean entry
+ * mkclean(entry)		= A clean and not a dirty entry
+ * mkwrite(entry)		= An write and not an write protected entry
+ * wrprotect(entry)		= An write protected and not an write entry
+ * pxx_bad(entry)		= A mapped and non-table entry
+ * pxx_same(entry1, entry2)	= Both entries hold the exact same value
+ */
+#define VMA_TEST_FLAGS (VM_READ|VM_WRITE|VM_EXEC)
+
+static struct vm_area_struct vma;
+static struct mm_struct mm;
+static struct page *page;
+static pgprot_t prot;
+static unsigned long pfn, addr;
+
+static void pte_basic_tests(void)
+{
+	pte_t pte;
+
+	pte = mk_pte(page, prot);
+	WARN_ON(!pte_same(pte, pte));
+	WARN_ON(!pte_young(pte_mkyoung(pte)));
+	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
+	WARN_ON(!pte_write(pte_mkwrite(pte)));
+	WARN_ON(pte_young(pte_mkold(pte)));
+	WARN_ON(pte_dirty(pte_mkclean(pte)));
+	WARN_ON(pte_write(pte_wrprotect(pte)));
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
+static void pmd_basic_tests(void)
+{
+	pmd_t pmd;
+
+	pmd = mk_pmd(page, prot);
+	WARN_ON(!pmd_same(pmd, pmd));
+	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
+	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
+	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
+	WARN_ON(pmd_young(pmd_mkold(pmd)));
+	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
+	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pmd_bad().
+	 */
+	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
+}
+#else
+static void pmd_basic_tests(void) { }
+#endif
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void pud_basic_tests(void)
+{
+	pud_t pud;
+
+	pud = pfn_pud(pfn, prot);
+	WARN_ON(!pud_same(pud, pud));
+	WARN_ON(!pud_young(pud_mkyoung(pud)));
+	WARN_ON(!pud_write(pud_mkwrite(pud)));
+	WARN_ON(pud_write(pud_wrprotect(pud)));
+	WARN_ON(pud_young(pud_mkold(pud)));
+
+#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pud_bad().
+	 */
+	WARN_ON(!pud_bad(pud_mkhuge(pud)));
+#endif
+}
+#else
+static void pud_basic_tests(void) { }
+#endif
+
+static void p4d_basic_tests(void)
+{
+	pte_t pte;
+	p4d_t p4d;
+
+	pte = mk_pte(page, prot);
+	p4d = (p4d_t) { (pte_val(pte)) };
+	WARN_ON(!p4d_same(p4d, p4d));
+}
+
+static void pgd_basic_tests(void)
+{
+	pte_t pte;
+	pgd_t pgd;
+
+	pte = mk_pte(page, prot);
+	pgd = (pgd_t) { (pte_val(pte)) };
+	WARN_ON(!pgd_same(pgd, pgd));
+}
+
+#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
+static void pud_clear_tests(void)
+{
+	pud_t pud;
+
+	pud_clear(&pud);
+	WARN_ON(!pud_none(pud));
+}
+
+static void pud_populate_tests(void)
+{
+	pmd_t pmd;
+	pud_t pud;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pud_bad().
+	 */
+	pmd_clear(&pmd);
+	pud_clear(&pud);
+	pud_populate(&mm, &pud, &pmd);
+	WARN_ON(pud_bad(pud));
+}
+#else
+static void pud_clear_tests(void) { }
+static void pud_populate_tests(void) { }
+#endif
+
+#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
+static void p4d_clear_tests(void)
+{
+	p4d_t p4d;
+
+	p4d_clear(&p4d);
+	WARN_ON(!p4d_none(p4d));
+}
+
+static void p4d_populate_tests(void)
+{
+	pud_t pud;
+	p4d_t p4d;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as p4d_bad().
+	 */
+	pud_clear(&pud);
+	p4d_clear(&p4d);
+	p4d_populate(&mm, &p4d, &pud);
+	WARN_ON(p4d_bad(p4d));
+}
+#else
+static void p4d_clear_tests(void) { }
+static void p4d_populate_tests(void) { }
+#endif
+
+#ifndef __PAGETABLE_P4D_FOLDED
+static void pgd_clear_tests(void)
+{
+	pgd_t pgd;
+
+	pgd_clear(&pgd);
+	WARN_ON(!pgd_none(pgd));
+}
+
+static void pgd_populate_tests(void)
+{
+	pgd_t p4d;
+	pgd_t pgd;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pgd_bad().
+	 */
+	p4d_clear(&p4d);
+	pgd_clear(&pgd);
+	pgd_populate(&mm, &pgd, &p4d);
+	WARN_ON(pgd_bad(pgd));
+}
+#else
+static void pgd_clear_tests(void) { }
+static void pgd_populate_tests(void) { }
+#endif
+
+static void pxx_clear_tests(void)
+{
+	pte_t pte;
+	pmd_t pmd;
+
+	pte_clear(NULL, 0, &pte);
+	WARN_ON(!pte_none(pte));
+
+	pmd_clear(&pmd);
+	WARN_ON(!pmd_none(pmd));
+
+	pud_clear_tests();
+	p4d_clear_tests();
+	pgd_clear_tests();
+}
+
+static void pxx_populate_tests(void)
+{
+	pmd_t pmd;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pmd_bad().
+	 */
+	memset(page, 0, sizeof(*page));
+	pmd_clear(&pmd);
+	pmd_populate(&mm, &pmd, page);
+	WARN_ON(pmd_bad(pmd));
+
+	pud_populate_tests();
+	p4d_populate_tests();
+	pgd_populate_tests();
+}
+
+static int variables_alloc(void)
+{
+	vma_init(&vma, &mm);
+	prot = vm_get_page_prot(VMA_TEST_FLAGS);
+	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!page) {
+		pr_err("Test struct page allocation failed\n");
+		return 1;
+	}
+	pfn = page_to_pfn(page);
+	addr = 0;
+	return 0;
+}
+
+static void variables_free(void)
+{
+	free_page((unsigned long)page_address(page));
+}
+
+static int __init arch_pgtable_tests_init(void)
+{
+	int ret;
+
+	ret = variables_alloc();
+	if (ret) {
+		pr_err("Test resource initialization failed\n");
+		return 1;
+	}
+
+	pte_basic_tests();
+	pmd_basic_tests();
+	pud_basic_tests();
+	p4d_basic_tests();
+	pgd_basic_tests();
+	pxx_clear_tests();
+	pxx_populate_tests();
+	variables_free();
+	return 0;
+}
+
+static void __exit arch_pgtable_tests_exit(void) { }
+
+module_init(arch_pgtable_tests_init);
+module_exit(arch_pgtable_tests_exit);
+MODULE_LICENSE("GPL v2");