diff mbox series

[01/13] arm64: mm: Add p?d_large() definitions

Message ID 20190215170235.23360-2-steven.price@arm.com (mailing list archive)
State New, archived
Headers show
Series Convert x86 & arm64 to use generic page walk | expand

Commit Message

Steven Price Feb. 15, 2019, 5:02 p.m. UTC
From: James Morse <james.morse@arm.com>

Exposing the pud/pgd levels of the page tables to walk_page_range() means
we may come across the exotic large mappings that come with large areas
of contiguous memory (such as the kernel's linear map).

Expose p?d_large() from each architecture to detect these large mappings.

arm64 already has these macros defined, but with a different name.
p?d_large() is used by s390, sparc and x86. Only arm/arm64 use p?d_sect().
Add a macro to allow both names.

By not providing a pgd_large(), we get the generic version that always
returns 0.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

Comments

Mark Rutland Feb. 18, 2019, 11:16 a.m. UTC | #1
On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote:
> From: James Morse <james.morse@arm.com>
>
> Exposing the pud/pgd levels of the page tables to walk_page_range() means
> we may come across the exotic large mappings that come with large areas
> of contiguous memory (such as the kernel's linear map).
>
> Expose p?d_large() from each architecture to detect these large mappings.
>
> arm64 already has these macros defined, but with a different name.
> p?d_large() is used by s390, sparc and x86. Only arm/arm64 use p?d_sect().
> Add a macro to allow both names.
>
> By not providing a pgd_large(), we get the generic version that always
> returns 0.

This last sentence isn't true until a subsequent patch, so it's probably
worth dropping it to avoid confusion.

Thanks,
Mark.

>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/pgtable.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index de70c1eabf33..09d308921625 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>   PMD_TYPE_TABLE)
>  #define pmd_sect(pmd)((pmd_val(pmd) & PMD_TYPE_MASK) == \
>   PMD_TYPE_SECT)
> +#define pmd_large(x)pmd_sect(x)
>
>  #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
>  #define pud_sect(pud)(0)
> @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  #else
>  #define pud_sect(pud)((pud_val(pud) & PUD_TYPE_MASK) == \
>   PUD_TYPE_SECT)
> +#define pud_large(x)pud_sect(x)
>  #define pud_table(pud)((pud_val(pud) & PUD_TYPE_MASK) == \
>   PUD_TYPE_TABLE)
>  #endif
> --
> 2.20.1
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Peter Zijlstra Feb. 18, 2019, 11:29 a.m. UTC | #2
On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote:

> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index de70c1eabf33..09d308921625 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  				 PMD_TYPE_TABLE)
>  #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>  				 PMD_TYPE_SECT)
> +#define pmd_large(x)		pmd_sect(x)
>  
>  #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
>  #define pud_sect(pud)		(0)
> @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  #else
>  #define pud_sect(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
>  				 PUD_TYPE_SECT)
> +#define pud_large(x)		pud_sect(x)
>  #define pud_table(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
>  				 PUD_TYPE_TABLE)
>  #endif

So on x86 p*d_large() also matches p*d_huge() and thp, But it is not
clear to me this p*d_sect() thing does so, given your definitions.

See here why I care:

  http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net
Mark Rutland Feb. 18, 2019, 1:45 p.m. UTC | #3
On Mon, Feb 18, 2019 at 12:29:22PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote:
> 
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index de70c1eabf33..09d308921625 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> >  				 PMD_TYPE_TABLE)
> >  #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> >  				 PMD_TYPE_SECT)
> > +#define pmd_large(x)		pmd_sect(x)
> >  
> >  #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
> >  #define pud_sect(pud)		(0)
> > @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> >  #else
> >  #define pud_sect(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
> >  				 PUD_TYPE_SECT)
> > +#define pud_large(x)		pud_sect(x)
> >  #define pud_table(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
> >  				 PUD_TYPE_TABLE)
> >  #endif
> 
> So on x86 p*d_large() also matches p*d_huge() and thp, But it is not
> clear to me this p*d_sect() thing does so, given your definitions.
> 
> See here why I care:
> 
>   http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net

I believe it does not.

IIUC our p?d_huge() helpers implicitly handle contiguous entries. That's
where you have $N entries in the current level of table that the TLB can
cache together as one.

Our p?d_sect() helpers only match section entries. That's where we map
an entire next-level-table's worth of VA space with a single entry at
the current level.

Thanks,
Mark.
Steven Price Feb. 18, 2019, 2:11 p.m. UTC | #4
On 18/02/2019 11:29, Peter Zijlstra wrote:
> On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote:
> 
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index de70c1eabf33..09d308921625 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>  				 PMD_TYPE_TABLE)
>>  #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>  				 PMD_TYPE_SECT)
>> +#define pmd_large(x)		pmd_sect(x)
>>  
>>  #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
>>  #define pud_sect(pud)		(0)
>> @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>  #else
>>  #define pud_sect(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
>>  				 PUD_TYPE_SECT)
>> +#define pud_large(x)		pud_sect(x)
>>  #define pud_table(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
>>  				 PUD_TYPE_TABLE)
>>  #endif
> 
> So on x86 p*d_large() also matches p*d_huge() and thp, But it is not
> clear to me this p*d_sect() thing does so, given your definitions.
> 
> See here why I care:
> 
>   http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net
> 

pmd_huge()/pud_huge() unfortunately are currently defined as '0' if
!CONFIG_HUGETLB_PAGE and for this reason I was avoiding using them.
While most code would reasonably not care about huge pages in that build
configuration, the likes of the debugfs page table dump code needs to be
able to recognise them in all build configurations. I believe the
situation is the same on arm64 and x86.

The other quirk is that higher levels are not supported for HUGETLB on
some arch implementations. For example arm64 provides no definition for
pgd_huge() so falls back to the generic defined as '0' implementation.
The only architecture I can see that defines this is powerpc. Keeping
this as '0' ensures that the otherwise dead code in other places is
compiled out.

Where p?d_huge is defined by the arch code the intention is that
p?d_large(x)==p?d_huge(x).

Thanks,

Steve
Mark Rutland Feb. 18, 2019, 2:29 p.m. UTC | #5
On Mon, Feb 18, 2019 at 02:11:40PM +0000, Steven Price wrote:
> On 18/02/2019 11:29, Peter Zijlstra wrote:
> > On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote:
> > 
> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> >> index de70c1eabf33..09d308921625 100644
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> >>  				 PMD_TYPE_TABLE)
> >>  #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> >>  				 PMD_TYPE_SECT)
> >> +#define pmd_large(x)		pmd_sect(x)
> >>  
> >>  #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
> >>  #define pud_sect(pud)		(0)
> >> @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> >>  #else
> >>  #define pud_sect(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
> >>  				 PUD_TYPE_SECT)
> >> +#define pud_large(x)		pud_sect(x)
> >>  #define pud_table(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
> >>  				 PUD_TYPE_TABLE)
> >>  #endif
> > 
> > So on x86 p*d_large() also matches p*d_huge() and thp, But it is not
> > clear to me this p*d_sect() thing does so, given your definitions.
> > 
> > See here why I care:
> > 
> >   http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net
> > 
> 
> pmd_huge()/pud_huge() unfortunately are currently defined as '0' if
> !CONFIG_HUGETLB_PAGE and for this reason I was avoiding using them.

I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g.

#define pmd_large(x) \
	(pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x))

... which should work regardless of CONFIG_HUGETLB_PAGE.

> While most code would reasonably not care about huge pages in that build
> configuration, the likes of the debugfs page table dump code needs to be
> able to recognise them in all build configurations. I believe the
> situation is the same on arm64 and x86.

There's a very important distinction here between:

* section mappings, which are an archtiectural construct used in
  arm64-specific code (e.g. the kernel's own page tables).

* huge mappings, which are Linux logical construct for mapping
  userspace memory. These are buillt using section mappings.

The existing arm64 debugfs pagetable dump code cares about section
mappings specifically in all cases, since it is not used to dump
userspace page tables.

The existing generic code doesn't care about section mappings
specifically, because they are not generic.

Thanks,
Mark.
Peter Zijlstra Feb. 18, 2019, 3:06 p.m. UTC | #6
On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote:
> I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g.
> 
> #define pmd_large(x) \
> 	(pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x))
> 
> ... which should work regardless of CONFIG_HUGETLB_PAGE.

Yep, that.
Steven Price Feb. 18, 2019, 3:30 p.m. UTC | #7
On 18/02/2019 15:06, Peter Zijlstra wrote:
> On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote:
>> I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g.
>>
>> #define pmd_large(x) \
>> 	(pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x))
>>
>> ... which should work regardless of CONFIG_HUGETLB_PAGE.
> 
> Yep, that.

I'm not aware of a situation where pmd_huge(x) is true but pmd_sect(x)
isn't. Equally for pmd_huge(x) and pmd_trans_huge(x).

What am I missing?

Steve
Mark Rutland Feb. 18, 2019, 5:04 p.m. UTC | #8
On Mon, Feb 18, 2019 at 03:30:38PM +0000, Steven Price wrote:
> On 18/02/2019 15:06, Peter Zijlstra wrote:
> > On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote:
> >> I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g.
> >>
> >> #define pmd_large(x) \
> >> 	(pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x))
> >>
> >> ... which should work regardless of CONFIG_HUGETLB_PAGE.
> > 
> > Yep, that.
> 
> I'm not aware of a situation where pmd_huge(x) is true but pmd_sect(x)
> isn't. Equally for pmd_huge(x) and pmd_trans_huge(x).
> 
> What am I missing?

Having dug for a bit, I think you're right in asserting that pmd_sect()
should cover those.

I had worried that wouldn't cater for contiguous pmd entries, but those
have to be contiguous section entries, so they get picked up.

That said, do we have any special handling for contiguous PTEs? We use
those in kernel mappings regardless of hugetlb support, and I didn't
spot a pte_large() helper.

Thanks,
Mark.
Steven Price Feb. 18, 2019, 5:22 p.m. UTC | #9
On 18/02/2019 17:04, Mark Rutland wrote:
> On Mon, Feb 18, 2019 at 03:30:38PM +0000, Steven Price wrote:
>> On 18/02/2019 15:06, Peter Zijlstra wrote:
>>> On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote:
>>>> I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g.
>>>>
>>>> #define pmd_large(x) \
>>>> 	(pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x))
>>>>
>>>> ... which should work regardless of CONFIG_HUGETLB_PAGE.
>>>
>>> Yep, that.
>>
>> I'm not aware of a situation where pmd_huge(x) is true but pmd_sect(x)
>> isn't. Equally for pmd_huge(x) and pmd_trans_huge(x).
>>
>> What am I missing?
> 
> Having dug for a bit, I think you're right in asserting that pmd_sect()
> should cover those.
> 
> I had worried that wouldn't cater for contiguous pmd entries, but those
> have to be contiguous section entries, so they get picked up.
> 
> That said, do we have any special handling for contiguous PTEs? We use
> those in kernel mappings regardless of hugetlb support, and I didn't
> spot a pte_large() helper.

There's no special handling for contiguous PTEs because the page walk
code doesn't care - each PTE is valid individually even if it is part of
a contiguous group. So the walker can descend all levels in this case.
pte_large() if it existed would therefore always return 0.

The pte_entry() callback obviously might go looking for the contiguous
bit so that it can annotate the output correctly but that's different
from a 'large' page. The code in arch/arm64/mm/dump.c simply looks for
the PTE_CONT bit being set to do this annotation.

Steve
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index de70c1eabf33..09d308921625 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -428,6 +428,7 @@  extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 				 PMD_TYPE_TABLE)
 #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
 				 PMD_TYPE_SECT)
+#define pmd_large(x)		pmd_sect(x)
 
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
 #define pud_sect(pud)		(0)
@@ -435,6 +436,7 @@  extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 #else
 #define pud_sect(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
 				 PUD_TYPE_SECT)
+#define pud_large(x)		pud_sect(x)
 #define pud_table(pud)		((pud_val(pud) & PUD_TYPE_MASK) == \
 				 PUD_TYPE_TABLE)
 #endif