mm: Introduce GFP_PGTABLE

Message ID	1547288798-10243-1-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-sh-owner@kernel.org> From: Anshuman Khandual <anshuman.khandual@arm.com> To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, linux-sh@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: linux@armlinux.org.uk, catalin.marinas@arm.com, will.deacon@arm.com, mpe@ellerman.id.au, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, peterz@infradead.org, christoffer.dall@arm.com, marc.zyngier@arm.com, kirill@shutemov.name, rppt@linux.vnet.ibm.com, mhocko@suse.com, ard.biesheuvel@linaro.org, mark.rutland@arm.com, steve.capper@arm.com, james.morse@arm.com, robin.murphy@arm.com, aneesh.kumar@linux.ibm.com, vbabka@suse.cz, shakeelb@google.com, rientjes@google.com Subject: [PATCH] mm: Introduce GFP_PGTABLE Date: Sat, 12 Jan 2019 15:56:38 +0530 Message-Id: <1547288798-10243-1-git-send-email-anshuman.khandual@arm.com> Sender: linux-sh-owner@vger.kernel.org Precedence: bulk
Series	mm: Introduce GFP_PGTABLE \| expand mm: Introduce GFP_PGTABLE

Anshuman Khandual Jan. 12, 2019, 10:26 a.m. UTC

All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
__GFP_ZERO) and using it for allocating page table pages. This causes some
code duplication which can be easily avoided. GFP_KERNEL allocated and
cleared out pages (__GFP_ZERO) are required for page tables on any given
architecture. This creates a new generic GFP flag flag which can be used
for any page table page allocation. Does not cause any functional change.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm/include/asm/pgalloc.h               |  8 +++-----
 arch/arm/mm/mmu.c                            |  2 +-
 arch/arm64/include/asm/pgalloc.h             |  9 ++++-----
 arch/arm64/mm/mmu.c                          |  2 +-
 arch/arm64/mm/pgd.c                          |  4 ++--
 arch/powerpc/include/asm/book3s/64/pgalloc.h |  4 ++--
 arch/powerpc/include/asm/pgalloc.h           |  2 --
 arch/powerpc/mm/pgtable-frag.c               |  4 ++--
 arch/sh/mm/pgtable.c                         |  6 ++----
 arch/unicore32/include/asm/pgalloc.h         |  6 ++----
 arch/x86/kernel/espfix_64.c                  |  6 ++----
 arch/x86/mm/pgtable.c                        | 14 ++++++--------
 include/linux/gfp.h                          |  1 +
 virt/kvm/arm/mmu.c                           |  2 +-
 14 files changed, 29 insertions(+), 41 deletions(-)

Matthew Wilcox Jan. 12, 2019, 12:12 p.m. UTC | #1

On Sat, Jan 12, 2019 at 03:56:38PM +0530, Anshuman Khandual wrote:
> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
> __GFP_ZERO) and using it for allocating page table pages.

Except that's not true.

> +++ b/arch/x86/mm/pgtable.c
> @@ -13,19 +13,17 @@ phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>  EXPORT_SYMBOL(physical_mask);
>  #endif
>  
> -#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
> -
>  #ifdef CONFIG_HIGHPTE

...

>  pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> -	return (pte_t *)__get_free_page(PGALLOC_GFP & ~__GFP_ACCOUNT);
> +	return (pte_t *)__get_free_page(GFP_PGTABLE & ~__GFP_ACCOUNT);
>  }

I think x86 was the only odd one out here, but you'll need to try again ...

Anshuman Khandual Jan. 12, 2019, 12:55 p.m. UTC | #2

On 01/12/2019 05:42 PM, Matthew Wilcox wrote:
> On Sat, Jan 12, 2019 at 03:56:38PM +0530, Anshuman Khandual wrote:
>> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
>> __GFP_ZERO) and using it for allocating page table pages.
> 
> Except that's not true.
> 
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -13,19 +13,17 @@ phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>>  EXPORT_SYMBOL(physical_mask);
>>  #endif
>>  
>> -#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
>> -
>>  #ifdef CONFIG_HIGHPTE
> 
> ...
> 
>>  pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>>  {
>> -	return (pte_t *)__get_free_page(PGALLOC_GFP & ~__GFP_ACCOUNT);
>> +	return (pte_t *)__get_free_page(GFP_PGTABLE & ~__GFP_ACCOUNT);
>>  }
> 
> I think x86 was the only odd one out here, but you'll need to try again ...

IIUC the user page table pages need __GFP_ACCOUNT not the kernel ones. Hence
in the above function it clears out __GFP_ACCOUNT for kernel page table page
allocations but where as by default it has got __GFP_ACCOUNT which would be
used for user page tables. Instead we can make X86 user allocations add
__GFP_ACCOUNT (like other archs) to generic GFP_PGTABLE when ever required.

Christophe Leroy Jan. 12, 2019, 1:49 p.m. UTC | #3

Le 12/01/2019 à 13:12, Matthew Wilcox a écrit :
> On Sat, Jan 12, 2019 at 03:56:38PM +0530, Anshuman Khandual wrote:
>> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
>> __GFP_ZERO) and using it for allocating page table pages.
> 
> Except that's not true.
> 
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -13,19 +13,17 @@ phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>>   EXPORT_SYMBOL(physical_mask);
>>   #endif
>>   
>> -#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
>> -
>>   #ifdef CONFIG_HIGHPTE
> 
> ...
> 
>>   pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>>   {
>> -	return (pte_t *)__get_free_page(PGALLOC_GFP & ~__GFP_ACCOUNT);
>> +	return (pte_t *)__get_free_page(GFP_PGTABLE & ~__GFP_ACCOUNT);
>>   }

As far as I can see,

#define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)

So what's the difference between:

(GFP_KERNEL_ACCOUNT | __GFP_ZERO) & ~__GFP_ACCOUNT

and

(GFP_KERNEL | __GFP_ZERO) & ~__GFP_ACCOUNT

Christophe

> 
> I think x86 was the only odd one out here, but you'll need to try again ...
>

Matthew Wilcox Jan. 12, 2019, 3:49 p.m. UTC | #4

On Sat, Jan 12, 2019 at 02:49:29PM +0100, Christophe Leroy wrote:
> As far as I can see,
> 
> #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
> 
> So what's the difference between:
> 
> (GFP_KERNEL_ACCOUNT | __GFP_ZERO) & ~__GFP_ACCOUNT
> 
> and
> 
> (GFP_KERNEL | __GFP_ZERO) & ~__GFP_ACCOUNT

Nothing.  But there's a huge difference in the other parts of that same
file where GFP_ACCOUNT is _not_ used.

I think this unification is too small to bother with.  Something I've
had on my todo list for some time and have not done anything about
is to actually unify all of the architecture pte/pmd/... allocations.
There are tricks some architectures use that others would benefit from.

Shakeel Butt Jan. 12, 2019, 4:48 p.m. UTC | #5

On Sat, Jan 12, 2019 at 2:27 AM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
> __GFP_ZERO) and using it for allocating page table pages. This causes some
> code duplication which can be easily avoided. GFP_KERNEL allocated and
> cleared out pages (__GFP_ZERO) are required for page tables on any given
> architecture. This creates a new generic GFP flag flag which can be used
> for any page table page allocation. Does not cause any functional change.
>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm/include/asm/pgalloc.h               |  8 +++-----
>  arch/arm/mm/mmu.c                            |  2 +-
>  arch/arm64/include/asm/pgalloc.h             |  9 ++++-----
>  arch/arm64/mm/mmu.c                          |  2 +-
>  arch/arm64/mm/pgd.c                          |  4 ++--
>  arch/powerpc/include/asm/book3s/64/pgalloc.h |  4 ++--
>  arch/powerpc/include/asm/pgalloc.h           |  2 --
>  arch/powerpc/mm/pgtable-frag.c               |  4 ++--
>  arch/sh/mm/pgtable.c                         |  6 ++----
>  arch/unicore32/include/asm/pgalloc.h         |  6 ++----
>  arch/x86/kernel/espfix_64.c                  |  6 ++----
>  arch/x86/mm/pgtable.c                        | 14 ++++++--------
>  include/linux/gfp.h                          |  1 +
>  virt/kvm/arm/mmu.c                           |  2 +-
>  14 files changed, 29 insertions(+), 41 deletions(-)
>
> diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
> index 17ab72f..72be6f5 100644
> --- a/arch/arm/include/asm/pgalloc.h
> +++ b/arch/arm/include/asm/pgalloc.h
> @@ -57,8 +57,6 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>
> -#define PGALLOC_GFP    (GFP_KERNEL | __GFP_ZERO)
> -
>  static inline void clean_pte_table(pte_t *pte)
>  {
>         clean_dcache_area(pte + PTE_HWTABLE_PTRS, PTE_HWTABLE_SIZE);
> @@ -85,7 +83,7 @@ pte_alloc_one_kernel(struct mm_struct *mm)
>  {
>         pte_t *pte;
>
> -       pte = (pte_t *)__get_free_page(PGALLOC_GFP);
> +       pte = (pte_t *)__get_free_page(GFP_PGTABLE);
>         if (pte)
>                 clean_pte_table(pte);
>
> @@ -98,9 +96,9 @@ pte_alloc_one(struct mm_struct *mm)
>         struct page *pte;
>
>  #ifdef CONFIG_HIGHPTE
> -       pte = alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 0);
> +       pte = alloc_pages(GFP_PGTABLE | __GFP_HIGHMEM, 0);
>  #else
> -       pte = alloc_pages(PGALLOC_GFP, 0);
> +       pte = alloc_pages(GFP_PGTABLE, 0);
>  #endif
>         if (!pte)
>                 return NULL;
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index f5cc1cc..6d47784 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -733,7 +733,7 @@ static void __init *early_alloc(unsigned long sz)
>
>  static void *__init late_alloc(unsigned long sz)
>  {
> -       void *ptr = (void *)__get_free_pages(PGALLOC_GFP, get_order(sz));
> +       void *ptr = (void *)__get_free_pages(GFP_PGTABLE, get_order(sz));
>
>         if (!ptr || !pgtable_page_ctor(virt_to_page(ptr)))
>                 BUG();
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 52fa47c..d5c75bf 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -26,14 +26,13 @@
>
>  #define check_pgt_cache()              do { } while (0)
>
> -#define PGALLOC_GFP    (GFP_KERNEL | __GFP_ZERO)
>  #define PGD_SIZE       (PTRS_PER_PGD * sizeof(pgd_t))
>
>  #if CONFIG_PGTABLE_LEVELS > 2
>
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> -       return (pmd_t *)__get_free_page(PGALLOC_GFP);
> +       return (pmd_t *)__get_free_page(GFP_PGTABLE);
>  }
>
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
> @@ -62,7 +61,7 @@ static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
>
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> -       return (pud_t *)__get_free_page(PGALLOC_GFP);
> +       return (pud_t *)__get_free_page(GFP_PGTABLE);
>  }
>
>  static inline void pud_free(struct mm_struct *mm, pud_t *pudp)
> @@ -93,7 +92,7 @@ extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
>  static inline pte_t *
>  pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> -       return (pte_t *)__get_free_page(PGALLOC_GFP);
> +       return (pte_t *)__get_free_page(GFP_PGTABLE);
>  }
>
>  static inline pgtable_t
> @@ -101,7 +100,7 @@ pte_alloc_one(struct mm_struct *mm)
>  {
>         struct page *pte;
>
> -       pte = alloc_pages(PGALLOC_GFP, 0);
> +       pte = alloc_pages(GFP_PGTABLE, 0);
>         if (!pte)
>                 return NULL;
>         if (!pgtable_page_ctor(pte)) {
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index b6f5aa5..07b1c0f 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -372,7 +372,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>
>  static phys_addr_t pgd_pgtable_alloc(void)
>  {
> -       void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> +       void *ptr = (void *)__get_free_page(GFP_PGTABLE);
>         if (!ptr || !pgtable_page_ctor(virt_to_page(ptr)))
>                 BUG();
>
> diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
> index 289f911..5b28e2b 100644
> --- a/arch/arm64/mm/pgd.c
> +++ b/arch/arm64/mm/pgd.c
> @@ -31,9 +31,9 @@ static struct kmem_cache *pgd_cache __ro_after_init;
>  pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
>         if (PGD_SIZE == PAGE_SIZE)
> -               return (pgd_t *)__get_free_page(PGALLOC_GFP);
> +               return (pgd_t *)__get_free_page(GFP_PGTABLE);
>         else
> -               return kmem_cache_alloc(pgd_cache, PGALLOC_GFP);
> +               return kmem_cache_alloc(pgd_cache, GFP_PGTABLE);
>  }
>
>  void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h
> index 9c11732..8a7235e 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
> @@ -52,10 +52,10 @@ void pte_frag_destroy(void *pte_frag);
>  static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_PPC_64K_PAGES
> -       return (pgd_t *)__get_free_page(pgtable_gfp_flags(mm, PGALLOC_GFP));
> +       return (pgd_t *)__get_free_page(pgtable_gfp_flags(mm, GFP_PGTABLE));
>  #else
>         struct page *page;
> -       page = alloc_pages(pgtable_gfp_flags(mm, PGALLOC_GFP | __GFP_RETRY_MAYFAIL),
> +       page = alloc_pages(pgtable_gfp_flags(mm, GFP_PGTABLE | __GFP_RETRY_MAYFAIL),
>                                 4);
>         if (!page)
>                 return NULL;
> diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h
> index e11f030..3b11e8b 100644
> --- a/arch/powerpc/include/asm/pgalloc.h
> +++ b/arch/powerpc/include/asm/pgalloc.h
> @@ -18,8 +18,6 @@ static inline gfp_t pgtable_gfp_flags(struct mm_struct *mm, gfp_t gfp)
>  }
>  #endif /* MODULE */
>
> -#define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO)
> -
>  #ifdef CONFIG_PPC_BOOK3S
>  #include <asm/book3s/pgalloc.h>
>  #else
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index a7b0521..211aaa7 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -58,7 +58,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
>         struct page *page;
>
>         if (!kernel) {
> -               page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
> +               page = alloc_page(GFP_PGTABLE | __GFP_ACCOUNT);
>                 if (!page)
>                         return NULL;
>                 if (!pgtable_page_ctor(page)) {
> @@ -66,7 +66,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
>                         return NULL;
>                 }
>         } else {
> -               page = alloc_page(PGALLOC_GFP);
> +               page = alloc_page(GFP_PGTABLE);
>                 if (!page)
>                         return NULL;
>         }
> diff --git a/arch/sh/mm/pgtable.c b/arch/sh/mm/pgtable.c
> index 5c8f924..324732dc5 100644
> --- a/arch/sh/mm/pgtable.c
> +++ b/arch/sh/mm/pgtable.c
> @@ -2,8 +2,6 @@
>  #include <linux/mm.h>
>  #include <linux/slab.h>
>
> -#define PGALLOC_GFP GFP_KERNEL | __GFP_ZERO
> -
>  static struct kmem_cache *pgd_cachep;
>  #if PAGETABLE_LEVELS > 2
>  static struct kmem_cache *pmd_cachep;
> @@ -32,7 +30,7 @@ void pgtable_cache_init(void)
>
>  pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> -       return kmem_cache_alloc(pgd_cachep, PGALLOC_GFP);
> +       return kmem_cache_alloc(pgd_cachep, GFP_PGTABLE);
>  }
>
>  void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> @@ -48,7 +46,7 @@ void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>
>  pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
>  {
> -       return kmem_cache_alloc(pmd_cachep, PGALLOC_GFP);
> +       return kmem_cache_alloc(pmd_cachep, GFP_PGTABLE);
>  }
>
>  void pmd_free(struct mm_struct *mm, pmd_t *pmd)
> diff --git a/arch/unicore32/include/asm/pgalloc.h b/arch/unicore32/include/asm/pgalloc.h
> index 7cceabe..a3506e5 100644
> --- a/arch/unicore32/include/asm/pgalloc.h
> +++ b/arch/unicore32/include/asm/pgalloc.h
> @@ -28,8 +28,6 @@ extern void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd);
>  #define pgd_alloc(mm)                  get_pgd_slow(mm)
>  #define pgd_free(mm, pgd)              free_pgd_slow(mm, pgd)
>
> -#define PGALLOC_GFP    (GFP_KERNEL | __GFP_ZERO)
> -
>  /*
>   * Allocate one PTE table.
>   */
> @@ -38,7 +36,7 @@ pte_alloc_one_kernel(struct mm_struct *mm)
>  {
>         pte_t *pte;
>
> -       pte = (pte_t *)__get_free_page(PGALLOC_GFP);
> +       pte = (pte_t *)__get_free_page(GFP_PGTABLE);
>         if (pte)
>                 clean_dcache_area(pte, PTRS_PER_PTE * sizeof(pte_t));
>
> @@ -50,7 +48,7 @@ pte_alloc_one(struct mm_struct *mm)
>  {
>         struct page *pte;
>
> -       pte = alloc_pages(PGALLOC_GFP, 0);
> +       pte = alloc_pages(GFP_PGTABLE, 0);
>         if (!pte)
>                 return NULL;
>         if (!PageHighMem(pte)) {
> diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
> index aebd0d5..dae28cc 100644
> --- a/arch/x86/kernel/espfix_64.c
> +++ b/arch/x86/kernel/espfix_64.c
> @@ -57,8 +57,6 @@
>  # error "Need more virtual address space for the ESPFIX hack"
>  #endif
>
> -#define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO)
> -
>  /* This contains the *bottom* address of the espfix stack */
>  DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack);
>  DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_waddr);
> @@ -172,7 +170,7 @@ void init_espfix_ap(int cpu)
>         pud_p = &espfix_pud_page[pud_index(addr)];
>         pud = *pud_p;
>         if (!pud_present(pud)) {
> -               struct page *page = alloc_pages_node(node, PGALLOC_GFP, 0);
> +               struct page *page = alloc_pages_node(node, GFP_PGTABLE, 0);
>
>                 pmd_p = (pmd_t *)page_address(page);
>                 pud = __pud(__pa(pmd_p) | (PGTABLE_PROT & ptemask));
> @@ -184,7 +182,7 @@ void init_espfix_ap(int cpu)
>         pmd_p = pmd_offset(&pud, addr);
>         pmd = *pmd_p;
>         if (!pmd_present(pmd)) {
> -               struct page *page = alloc_pages_node(node, PGALLOC_GFP, 0);
> +               struct page *page = alloc_pages_node(node, GFP_PGTABLE, 0);
>
>                 pte_p = (pte_t *)page_address(page);
>                 pmd = __pmd(__pa(pte_p) | (PGTABLE_PROT & ptemask));
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 7bd0170..d608b03 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -13,19 +13,17 @@ phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>  EXPORT_SYMBOL(physical_mask);
>  #endif
>
> -#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
> -

You have silently dropped __GFP_ACCOUNT from all the allocations in this file.

BTW why other archs not using __GFP_ACCOUNT for the user page tables?

>  #ifdef CONFIG_HIGHPTE
>  #define PGALLOC_USER_GFP __GFP_HIGHMEM
>  #else
>  #define PGALLOC_USER_GFP 0
>  #endif
>
> -gfp_t __userpte_alloc_gfp = PGALLOC_GFP | PGALLOC_USER_GFP;
> +gfp_t __userpte_alloc_gfp = GFP_PGTABLE | PGALLOC_USER_GFP;
>
>  pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> -       return (pte_t *)__get_free_page(PGALLOC_GFP & ~__GFP_ACCOUNT);
> +       return (pte_t *)__get_free_page(GFP_PGTABLE & ~__GFP_ACCOUNT);
>  }
>
>  pgtable_t pte_alloc_one(struct mm_struct *mm)
> @@ -235,7 +233,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
>  {
>         int i;
>         bool failed = false;
> -       gfp_t gfp = PGALLOC_GFP;
> +       gfp_t gfp = GFP_PGTABLE;
>
>         if (mm == &init_mm)
>                 gfp &= ~__GFP_ACCOUNT;
> @@ -401,14 +399,14 @@ static inline pgd_t *_pgd_alloc(void)
>          * We allocate one page for pgd.
>          */
>         if (!SHARED_KERNEL_PMD)
> -               return (pgd_t *)__get_free_pages(PGALLOC_GFP,
> +               return (pgd_t *)__get_free_pages(GFP_PGTABLE,
>                                                  PGD_ALLOCATION_ORDER);
>
>         /*
>          * Now PAE kernel is not running as a Xen domain. We can allocate
>          * a 32-byte slab for pgd to save memory space.
>          */
> -       return kmem_cache_alloc(pgd_cache, PGALLOC_GFP);
> +       return kmem_cache_alloc(pgd_cache, GFP_PGTABLE);
>  }
>
>  static inline void _pgd_free(pgd_t *pgd)
> @@ -422,7 +420,7 @@ static inline void _pgd_free(pgd_t *pgd)
>
>  static inline pgd_t *_pgd_alloc(void)
>  {
> -       return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
> +       return (pgd_t *)__get_free_pages(GFP_PGTABLE, PGD_ALLOCATION_ORDER);
>  }
>
>  static inline void _pgd_free(pgd_t *pgd)
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 5f5e25f..a8414be 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -300,6 +300,7 @@ struct vm_area_struct;
>  #define GFP_TRANSHUGE_LIGHT    ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
>                          __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
>  #define GFP_TRANSHUGE  (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
> +#define GFP_PGTABLE    (GFP_KERNEL | __GFP_ZERO)
>
>  /* Convert GFP flags to their corresponding migrate type */
>  #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index fbdf3ac..f60a5b8 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -143,7 +143,7 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>         if (cache->nobjs >= min)
>                 return 0;
>         while (cache->nobjs < max) {
> -               page = (void *)__get_free_page(PGALLOC_GFP);
> +               page = (void *)__get_free_page(GFP_PGTABLE);
>                 if (!page)
>                         return -ENOMEM;
>                 cache->objects[cache->nobjs++] = page;
> --
> 2.7.4
>

Shakeel Butt Jan. 12, 2019, 4:50 p.m. UTC | #6

On Sat, Jan 12, 2019 at 7:50 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Jan 12, 2019 at 02:49:29PM +0100, Christophe Leroy wrote:
> > As far as I can see,
> >
> > #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
> >
> > So what's the difference between:
> >
> > (GFP_KERNEL_ACCOUNT | __GFP_ZERO) & ~__GFP_ACCOUNT
> >
> > and
> >
> > (GFP_KERNEL | __GFP_ZERO) & ~__GFP_ACCOUNT
>
> Nothing.  But there's a huge difference in the other parts of that same
> file where GFP_ACCOUNT is _not_ used.
>
> I think this unification is too small to bother with.  Something I've
> had on my todo list for some time and have not done anything about
> is to actually unify all of the architecture pte/pmd/... allocations.
> There are tricks some architectures use that others would benefit from.

Can you explain a bit more on this? If this is too low priority on
your todo list then maybe me or someone else can pick that up.

Shakeel

Michal Hocko Jan. 13, 2019, 5:35 p.m. UTC | #7

On Sat 12-01-19 15:56:38, Anshuman Khandual wrote:
> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
> __GFP_ZERO) and using it for allocating page table pages. This causes some
> code duplication which can be easily avoided. GFP_KERNEL allocated and
> cleared out pages (__GFP_ZERO) are required for page tables on any given
> architecture. This creates a new generic GFP flag flag which can be used
> for any page table page allocation. Does not cause any functional change.

I agree that some unification is due but GFP_PGTABLE is not something to
expose in generic gfp.h IMHO. It just risks an abuse. I would be looking
at providing asm-generic implementation and reuse it to remove the code
duplication. But I haven't tried that to know that it will work out due
to small/subtle differences between arches.

> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Anshuman Khandual Jan. 14, 2019, 4 a.m. UTC | #8

On 01/13/2019 11:05 PM, Michal Hocko wrote:
> On Sat 12-01-19 15:56:38, Anshuman Khandual wrote:
>> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
>> __GFP_ZERO) and using it for allocating page table pages. This causes some
>> code duplication which can be easily avoided. GFP_KERNEL allocated and
>> cleared out pages (__GFP_ZERO) are required for page tables on any given
>> architecture. This creates a new generic GFP flag flag which can be used
>> for any page table page allocation. Does not cause any functional change.
> 
> I agree that some unification is due but GFP_PGTABLE is not something to
> expose in generic gfp.h IMHO. It just risks an abuse. I would be looking

Why would you think that it risks an abuse ? It does not create new semantics
of allocation in the buddy. Its just uses existing GFP_KERNEL allocation which
is then getting zeroed out. The risks (if any) is exactly same as GFP_KERNEL.

> at providing asm-generic implementation and reuse it to remove the code

Does that mean GFP_PGTABLE can be created but not in gfp.h but in some other
memory related header file ?

> duplication. But I haven't tried that to know that it will work out due
> to small/subtle differences between arches.

IIUC from the allocation perspective GFP_ACCOUNT is the only thing which gets
added with GFP_PGTABLE for user page table for memcg accounting purpose. There
does not seem to be any other differences unless I am missing something.

Anshuman Khandual Jan. 14, 2019, 4:14 a.m. UTC | #9

On 01/12/2019 10:18 PM, Shakeel Butt wrote:
>> --- a/arch/x86/kernel/espfix_64.c
>> +++ b/arch/x86/kernel/espfix_64.c
>> @@ -57,8 +57,6 @@
>>  # error "Need more virtual address space for the ESPFIX hack"
>>  #endif
>>
>> -#define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO)
>> -
>>  /* This contains the *bottom* address of the espfix stack */
>>  DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack);
>>  DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_waddr);
>> @@ -172,7 +170,7 @@ void init_espfix_ap(int cpu)
>>         pud_p = &espfix_pud_page[pud_index(addr)];
>>         pud = *pud_p;
>>         if (!pud_present(pud)) {
>> -               struct page *page = alloc_pages_node(node, PGALLOC_GFP, 0);
>> +               struct page *page = alloc_pages_node(node, GFP_PGTABLE, 0);
>>
>>                 pmd_p = (pmd_t *)page_address(page);
>>                 pud = __pud(__pa(pmd_p) | (PGTABLE_PROT & ptemask));
>> @@ -184,7 +182,7 @@ void init_espfix_ap(int cpu)
>>         pmd_p = pmd_offset(&pud, addr);
>>         pmd = *pmd_p;
>>         if (!pmd_present(pmd)) {
>> -               struct page *page = alloc_pages_node(node, PGALLOC_GFP, 0);
>> +               struct page *page = alloc_pages_node(node, GFP_PGTABLE, 0);
>>
>>                 pte_p = (pte_t *)page_address(page);
>>                 pmd = __pmd(__pa(pte_p) | (PGTABLE_PROT & ptemask));
>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>> index 7bd0170..d608b03 100644
>> --- a/arch/x86/mm/pgtable.c
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -13,19 +13,17 @@ phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>>  EXPORT_SYMBOL(physical_mask);
>>  #endif
>>
>> -#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
>> -
> You have silently dropped __GFP_ACCOUNT from all the allocations in this file.

Right, they need to be added back explicitly after GFP_PGTABLE. Matthew had
pointed this earlier. Will fix it next time around.

> 
> BTW why other archs not using __GFP_ACCOUNT for the user page tables?
> 

Some archs do and some dont. User page tables pages should use __GFP_ACCOUNT
for allocation. I am working on fixing it for arm64.

Anshuman Khandual Jan. 14, 2019, 4:28 a.m. UTC | #10

On 01/12/2019 09:19 PM, Matthew Wilcox wrote:
> On Sat, Jan 12, 2019 at 02:49:29PM +0100, Christophe Leroy wrote:
>> As far as I can see,
>>
>> #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
>>
>> So what's the difference between:
>>
>> (GFP_KERNEL_ACCOUNT | __GFP_ZERO) & ~__GFP_ACCOUNT
>>
>> and
>>
>> (GFP_KERNEL | __GFP_ZERO) & ~__GFP_ACCOUNT
> 
> Nothing.  But there's a huge difference in the other parts of that same
> file where GFP_ACCOUNT is _not_ used.
> 
> I think this unification is too small to bother with.  Something I've
> had on my todo list for some time and have not done anything about
> is to actually unify all of the architecture pte/pmd/... allocations.
> There are tricks some architectures use that others would benefit from.

Sure. Could you please elaborate on this ?

Invariably all kernel pgtable page allocations should use GFP_PGTABLE and
all user page table allocation should use (GFP_PGTABLE | __GFP_ACCOUNT).
Ideally there should be default generic functions user_pgtable_gfp() or
kernel_pgtable_gfp() returning these values. Overrides can be provided if
an arch still wants some more control.

Michal Hocko Jan. 14, 2019, 7:01 a.m. UTC | #11

On Mon 14-01-19 09:30:55, Anshuman Khandual wrote:
> 
> 
> On 01/13/2019 11:05 PM, Michal Hocko wrote:
> > On Sat 12-01-19 15:56:38, Anshuman Khandual wrote:
> >> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
> >> __GFP_ZERO) and using it for allocating page table pages. This causes some
> >> code duplication which can be easily avoided. GFP_KERNEL allocated and
> >> cleared out pages (__GFP_ZERO) are required for page tables on any given
> >> architecture. This creates a new generic GFP flag flag which can be used
> >> for any page table page allocation. Does not cause any functional change.
> > 
> > I agree that some unification is due but GFP_PGTABLE is not something to
> > expose in generic gfp.h IMHO. It just risks an abuse. I would be looking
> 
> Why would you think that it risks an abuse ? It does not create new semantics
> of allocation in the buddy. Its just uses existing GFP_KERNEL allocation which
> is then getting zeroed out. The risks (if any) is exactly same as GFP_KERNEL.

Beucase my experience just tells me that people tend to use whatever
they find and name fits what they think they need.

> > at providing asm-generic implementation and reuse it to remove the code
> 
> Does that mean GFP_PGTABLE can be created but not in gfp.h but in some other
> memory related header file ?

I would just keep it close to its users. If that is a single arch
generic place then only better. But I suspect some arches have special
requirements.

> > duplication. But I haven't tried that to know that it will work out due
> > to small/subtle differences between arches.
> 
> IIUC from the allocation perspective GFP_ACCOUNT is the only thing which gets
> added with GFP_PGTABLE for user page table for memcg accounting purpose. There
> does not seem to be any other differences unless I am missing something.

It's been some time since I've checked the last time. Some arches were
using GPF_REPEAT (__GFP_RETRY_MAYFAIL) back then. I have removed most of
those but some were doing a higher order allocations so they probably
have stayed.

Anshuman Khandual Jan. 15, 2019, 2:11 p.m. UTC | #12

On 01/14/2019 12:31 PM, Michal Hocko wrote:
> On Mon 14-01-19 09:30:55, Anshuman Khandual wrote:
>>
>>
>> On 01/13/2019 11:05 PM, Michal Hocko wrote:
>>> On Sat 12-01-19 15:56:38, Anshuman Khandual wrote:
>>>> All architectures have been defining their own PGALLOC_GFP as (GFP_KERNEL |
>>>> __GFP_ZERO) and using it for allocating page table pages. This causes some
>>>> code duplication which can be easily avoided. GFP_KERNEL allocated and
>>>> cleared out pages (__GFP_ZERO) are required for page tables on any given
>>>> architecture. This creates a new generic GFP flag flag which can be used
>>>> for any page table page allocation. Does not cause any functional change.
>>>
>>> I agree that some unification is due but GFP_PGTABLE is not something to
>>> expose in generic gfp.h IMHO. It just risks an abuse. I would be looking
>>
>> Why would you think that it risks an abuse ? It does not create new semantics
>> of allocation in the buddy. Its just uses existing GFP_KERNEL allocation which
>> is then getting zeroed out. The risks (if any) is exactly same as GFP_KERNEL.
> 
> Beucase my experience just tells me that people tend to use whatever
> they find and name fits what they think they need.
> 
>>> at providing asm-generic implementation and reuse it to remove the code
>>
>> Does that mean GFP_PGTABLE can be created but not in gfp.h but in some other
>> memory related header file ?
> 
> I would just keep it close to its users. If that is a single arch
> generic place then only better. But I suspect some arches have special
> requirements.

We can move the definition into include/asm-generic/pgtable.h which can be
used by all archs. If there any special requirements those can be added on
this generic and common minimum allocation flag. The minimum required flag
should not be duplicated every where.

> 
>>> duplication. But I haven't tried that to know that it will work out due
>>> to small/subtle differences between arches.
>>
>> IIUC from the allocation perspective GFP_ACCOUNT is the only thing which gets
>> added with GFP_PGTABLE for user page table for memcg accounting purpose. There
>> does not seem to be any other differences unless I am missing something.
> 
> It's been some time since I've checked the last time. Some arches were
> using GPF_REPEAT (__GFP_RETRY_MAYFAIL) back then. I have removed most of
> those but some were doing a higher order allocations so they probably
> have stayed.

A simple grep shows that still there are some places which use the flag
__GFP_RETRY_MAYFAIL. But that can be added on GFP_PGTABLE for these archs.

arch/nds32/include/asm/pgalloc.h:           (pte_t *) __get_free_page(GFP_KERNEL | __GFP_RETRY_MAYFAIL |
arch/nds32/include/asm/pgalloc.h:       pte = alloc_pages(GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_ZERO, 0);
arch/powerpc/include/asm/book3s/64/pgalloc.h:   page = alloc_pages(pgtable_gfp_flags(mm, PGALLOC_GFP | __GFP_RETRY_MAYFAIL),
arch/powerpc/kvm/book3s_64_mmu_hv.c:            hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_RETRY_MAYFAIL
arch/riscv/include/asm/pgalloc.h:               GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_ZERO);
arch/riscv/include/asm/pgalloc.h:               GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_ZERO);
arch/riscv/include/asm/pgalloc.h:       pte = alloc_page(GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_ZERO);
arch/sparc/kernel/mdesc.c:      base = kmalloc(handle_size + 15, GFP_KERNEL | __GFP_RETRY_MAYFAIL);

mm: Introduce GFP_PGTABLE

Commit Message

Comments

Patch