Message ID | 20190215170235.23360-2-steven.price@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Convert x86 & arm64 to use generic page walk | expand |
On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote: > From: James Morse <james.morse@arm.com> > > Exposing the pud/pgd levels of the page tables to walk_page_range() means > we may come across the exotic large mappings that come with large areas > of contiguous memory (such as the kernel's linear map). > > Expose p?d_large() from each architecture to detect these large mappings. > > arm64 already has these macros defined, but with a different name. > p?d_large() is used by s390, sparc and x86. Only arm/arm64 use p?d_sect(). > Add a macro to allow both names. > > By not providing a pgd_large(), we get the generic version that always > returns 0. This last sentence isn't true until a subsequent patch, so it's probably worth dropping it to avoid confusion. Thanks, Mark. > > Signed-off-by: James Morse <james.morse@arm.com> > Signed-off-by: Steven Price <steven.price@arm.com> > --- > arch/arm64/include/asm/pgtable.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index de70c1eabf33..09d308921625 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > PMD_TYPE_TABLE) > #define pmd_sect(pmd)((pmd_val(pmd) & PMD_TYPE_MASK) == \ > PMD_TYPE_SECT) > +#define pmd_large(x)pmd_sect(x) > > #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3 > #define pud_sect(pud)(0) > @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > #else > #define pud_sect(pud)((pud_val(pud) & PUD_TYPE_MASK) == \ > PUD_TYPE_SECT) > +#define pud_large(x)pud_sect(x) > #define pud_table(pud)((pud_val(pud) & PUD_TYPE_MASK) == \ > PUD_TYPE_TABLE) > #endif > -- > 2.20.1 > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote: > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index de70c1eabf33..09d308921625 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > PMD_TYPE_TABLE) > #define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \ > PMD_TYPE_SECT) > +#define pmd_large(x) pmd_sect(x) > > #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3 > #define pud_sect(pud) (0) > @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > #else > #define pud_sect(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ > PUD_TYPE_SECT) > +#define pud_large(x) pud_sect(x) > #define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ > PUD_TYPE_TABLE) > #endif So on x86 p*d_large() also matches p*d_huge() and thp, But it is not clear to me this p*d_sect() thing does so, given your definitions. See here why I care: http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net
On Mon, Feb 18, 2019 at 12:29:22PM +0100, Peter Zijlstra wrote: > On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote: > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > > index de70c1eabf33..09d308921625 100644 > > --- a/arch/arm64/include/asm/pgtable.h > > +++ b/arch/arm64/include/asm/pgtable.h > > @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > > PMD_TYPE_TABLE) > > #define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \ > > PMD_TYPE_SECT) > > +#define pmd_large(x) pmd_sect(x) > > > > #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3 > > #define pud_sect(pud) (0) > > @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > > #else > > #define pud_sect(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ > > PUD_TYPE_SECT) > > +#define pud_large(x) pud_sect(x) > > #define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ > > PUD_TYPE_TABLE) > > #endif > > So on x86 p*d_large() also matches p*d_huge() and thp, But it is not > clear to me this p*d_sect() thing does so, given your definitions. > > See here why I care: > > http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net I believe it does not. IIUC our p?d_huge() helpers implicitly handle contiguous entries. That's where you have $N entries in the current level of table that the TLB can cache together as one. Our p?d_sect() helpers only match section entries. That's where we map an entire next-level-table's worth of VA space with a single entry at the current level. Thanks, Mark.
On 18/02/2019 11:29, Peter Zijlstra wrote: > On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote: > >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index de70c1eabf33..09d308921625 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, >> PMD_TYPE_TABLE) >> #define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \ >> PMD_TYPE_SECT) >> +#define pmd_large(x) pmd_sect(x) >> >> #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3 >> #define pud_sect(pud) (0) >> @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, >> #else >> #define pud_sect(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ >> PUD_TYPE_SECT) >> +#define pud_large(x) pud_sect(x) >> #define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ >> PUD_TYPE_TABLE) >> #endif > > So on x86 p*d_large() also matches p*d_huge() and thp, But it is not > clear to me this p*d_sect() thing does so, given your definitions. > > See here why I care: > > http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net > pmd_huge()/pud_huge() unfortunately are currently defined as '0' if !CONFIG_HUGETLB_PAGE and for this reason I was avoiding using them. While most code would reasonably not care about huge pages in that build configuration, the likes of the debugfs page table dump code needs to be able to recognise them in all build configurations. I believe the situation is the same on arm64 and x86. The other quirk is that higher levels are not supported for HUGETLB on some arch implementations. For example arm64 provides no definition for pgd_huge() so falls back to the generic defined as '0' implementation. The only architecture I can see that defines this is powerpc. Keeping this as '0' ensures that the otherwise dead code in other places is compiled out. Where p?d_huge is defined by the arch code the intention is that p?d_large(x)==p?d_huge(x). Thanks, Steve
On Mon, Feb 18, 2019 at 02:11:40PM +0000, Steven Price wrote: > On 18/02/2019 11:29, Peter Zijlstra wrote: > > On Fri, Feb 15, 2019 at 05:02:22PM +0000, Steven Price wrote: > > > >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > >> index de70c1eabf33..09d308921625 100644 > >> --- a/arch/arm64/include/asm/pgtable.h > >> +++ b/arch/arm64/include/asm/pgtable.h > >> @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > >> PMD_TYPE_TABLE) > >> #define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \ > >> PMD_TYPE_SECT) > >> +#define pmd_large(x) pmd_sect(x) > >> > >> #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3 > >> #define pud_sect(pud) (0) > >> @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > >> #else > >> #define pud_sect(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ > >> PUD_TYPE_SECT) > >> +#define pud_large(x) pud_sect(x) > >> #define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ > >> PUD_TYPE_TABLE) > >> #endif > > > > So on x86 p*d_large() also matches p*d_huge() and thp, But it is not > > clear to me this p*d_sect() thing does so, given your definitions. > > > > See here why I care: > > > > http://lkml.kernel.org/r/20190201124741.GE31552@hirez.programming.kicks-ass.net > > > > pmd_huge()/pud_huge() unfortunately are currently defined as '0' if > !CONFIG_HUGETLB_PAGE and for this reason I was avoiding using them. I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g. #define pmd_large(x) \ (pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x)) ... which should work regardless of CONFIG_HUGETLB_PAGE. > While most code would reasonably not care about huge pages in that build > configuration, the likes of the debugfs page table dump code needs to be > able to recognise them in all build configurations. I believe the > situation is the same on arm64 and x86. There's a very important distinction here between: * section mappings, which are an archtiectural construct used in arm64-specific code (e.g. the kernel's own page tables). * huge mappings, which are Linux logical construct for mapping userspace memory. These are buillt using section mappings. The existing arm64 debugfs pagetable dump code cares about section mappings specifically in all cases, since it is not used to dump userspace page tables. The existing generic code doesn't care about section mappings specifically, because they are not generic. Thanks, Mark.
On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote: > I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g. > > #define pmd_large(x) \ > (pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x)) > > ... which should work regardless of CONFIG_HUGETLB_PAGE. Yep, that.
On 18/02/2019 15:06, Peter Zijlstra wrote: > On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote: >> I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g. >> >> #define pmd_large(x) \ >> (pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x)) >> >> ... which should work regardless of CONFIG_HUGETLB_PAGE. > > Yep, that. I'm not aware of a situation where pmd_huge(x) is true but pmd_sect(x) isn't. Equally for pmd_huge(x) and pmd_trans_huge(x). What am I missing? Steve
On Mon, Feb 18, 2019 at 03:30:38PM +0000, Steven Price wrote: > On 18/02/2019 15:06, Peter Zijlstra wrote: > > On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote: > >> I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g. > >> > >> #define pmd_large(x) \ > >> (pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x)) > >> > >> ... which should work regardless of CONFIG_HUGETLB_PAGE. > > > > Yep, that. > > I'm not aware of a situation where pmd_huge(x) is true but pmd_sect(x) > isn't. Equally for pmd_huge(x) and pmd_trans_huge(x). > > What am I missing? Having dug for a bit, I think you're right in asserting that pmd_sect() should cover those. I had worried that wouldn't cater for contiguous pmd entries, but those have to be contiguous section entries, so they get picked up. That said, do we have any special handling for contiguous PTEs? We use those in kernel mappings regardless of hugetlb support, and I didn't spot a pte_large() helper. Thanks, Mark.
On 18/02/2019 17:04, Mark Rutland wrote: > On Mon, Feb 18, 2019 at 03:30:38PM +0000, Steven Price wrote: >> On 18/02/2019 15:06, Peter Zijlstra wrote: >>> On Mon, Feb 18, 2019 at 02:29:52PM +0000, Mark Rutland wrote: >>>> I think that Peter means p?d_huge(x) should imply p?d_large(x), e.g. >>>> >>>> #define pmd_large(x) \ >>>> (pmd_sect(x) || pmd_huge(x) || pmd_trans_huge(x)) >>>> >>>> ... which should work regardless of CONFIG_HUGETLB_PAGE. >>> >>> Yep, that. >> >> I'm not aware of a situation where pmd_huge(x) is true but pmd_sect(x) >> isn't. Equally for pmd_huge(x) and pmd_trans_huge(x). >> >> What am I missing? > > Having dug for a bit, I think you're right in asserting that pmd_sect() > should cover those. > > I had worried that wouldn't cater for contiguous pmd entries, but those > have to be contiguous section entries, so they get picked up. > > That said, do we have any special handling for contiguous PTEs? We use > those in kernel mappings regardless of hugetlb support, and I didn't > spot a pte_large() helper. There's no special handling for contiguous PTEs because the page walk code doesn't care - each PTE is valid individually even if it is part of a contiguous group. So the walker can descend all levels in this case. pte_large() if it existed would therefore always return 0. The pte_entry() callback obviously might go looking for the contiguous bit so that it can annotate the output correctly but that's different from a 'large' page. The code in arch/arm64/mm/dump.c simply looks for the PTE_CONT bit being set to do this annotation. Steve
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index de70c1eabf33..09d308921625 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, PMD_TYPE_TABLE) #define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \ PMD_TYPE_SECT) +#define pmd_large(x) pmd_sect(x) #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3 #define pud_sect(pud) (0) @@ -435,6 +436,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, #else #define pud_sect(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ PUD_TYPE_SECT) +#define pud_large(x) pud_sect(x) #define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \ PUD_TYPE_TABLE) #endif