diff mbox series

[RFC,2/2] mm/x86/pat: Do proper PAT bit shift for large mappings

Message ID 20240523223745.395337-3-peterx@redhat.com (mailing list archive)
State New
Headers show
Series mm/x86/pat: Fix two possible issues | expand

Commit Message

Peter Xu May 23, 2024, 10:37 p.m. UTC
For large mappings, the pgtable PAT is set on bit 12 (_PAGE_PAT_LARGE)
rather than bit 9 (_PAGE_PAT), while bit 9 is used as PAE hint.  Do proper
shifting when inject large pfn pgtable mappings to make cache mode alright.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: x86@kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Dave Hansen May 23, 2024, 10:48 p.m. UTC | #1
On 5/23/24 15:37, Peter Xu wrote:
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 317de2afd371..c4a2356b1a54 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1135,7 +1135,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		goto out_unlock;
>  	}
>  
> -	entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
> +	entry = pmd_mkhuge(pfn_t_pmd(pfn, pgprot_4k_2_large(prot)));
>  	if (pfn_t_devmap(pfn))
>  		entry = pmd_mkdevmap(entry);
>  	if (write) {

Does this even compile on non-x86 architectures?
Peter Xu May 23, 2024, 11:07 p.m. UTC | #2
On Thu, May 23, 2024 at 03:48:22PM -0700, Dave Hansen wrote:
> On 5/23/24 15:37, Peter Xu wrote:
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 317de2afd371..c4a2356b1a54 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1135,7 +1135,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
> >  		goto out_unlock;
> >  	}
> >  
> > -	entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
> > +	entry = pmd_mkhuge(pfn_t_pmd(pfn, pgprot_4k_2_large(prot)));
> >  	if (pfn_t_devmap(pfn))
> >  		entry = pmd_mkdevmap(entry);
> >  	if (write) {
> 
> Does this even compile on non-x86 architectures?

Probably not..  I think I can define a pgprot_to_large() globally, pointing
that to pgprot_4k_2_large() on x86 and make the fallback to be noop.  And
if there's a new version I'll guarantee to run over my cross compilers.

Any comments on the idea itself?  Do we have a problem, or maybe I
overlooked something?

Thanks,
Peter Xu May 24, 2024, 12:53 a.m. UTC | #3
On Thu, May 23, 2024 at 07:07:06PM -0400, Peter Xu wrote:
> On Thu, May 23, 2024 at 03:48:22PM -0700, Dave Hansen wrote:
> > On 5/23/24 15:37, Peter Xu wrote:
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 317de2afd371..c4a2356b1a54 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -1135,7 +1135,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
> > >  		goto out_unlock;
> > >  	}
> > >  
> > > -	entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
> > > +	entry = pmd_mkhuge(pfn_t_pmd(pfn, pgprot_4k_2_large(prot)));
> > >  	if (pfn_t_devmap(pfn))
> > >  		entry = pmd_mkdevmap(entry);
> > >  	if (write) {
> > 
> > Does this even compile on non-x86 architectures?
> 
> Probably not..  I think I can define a pgprot_to_large() globally, pointing
> that to pgprot_4k_2_large() on x86 and make the fallback to be noop.  And
> if there's a new version I'll guarantee to run over my cross compilers.
> 
> Any comments on the idea itself?  Do we have a problem, or maybe I
> overlooked something?

I also attached one new version of patch 2 that should pass the cross
builds.  Please reviewers feel free to look at this one instead.  From x86
perspective they should be the same thing.

Thanks,

===8<===
From 1cce12c872cb01aaa8686d8f5c7cd6b266ca4e38 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 23 May 2024 18:19:35 -0400
Subject: [PATCH rfcv1.1] mm/x86/pat: Do proper PAT bit shift for large mappings

For large mappings, the pgtable PAT is set on bit 12 (_PAGE_PAT_LARGE)
rather than bit 9 (_PAGE_PAT), while bit 9 is used as PAE hint.  Do proper
shifting when inject large pfn pgtable mappings to make cache mode alright.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: x86@kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/pgtable_types.h | 1 +
 include/linux/pgtable.h              | 4 ++++
 mm/huge_memory.c                     | 4 ++--
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b78644962626..f9edb2bb1512 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -512,6 +512,7 @@ static inline pgprot_t pgprot_large_2_4k(pgprot_t pgprot)
 	return __pgprot(protval_large_2_4k(pgprot_val(pgprot)));
 }
 
+#define  pgprot_to_large(pgprot)  pgprot_4k_2_large(pgprot)
 
 typedef struct page *pgtable_t;
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 18019f037bae..54487d2b3e40 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1956,4 +1956,8 @@ pgprot_t vm_get_page_prot(unsigned long vm_flags)			\
 }									\
 EXPORT_SYMBOL(vm_get_page_prot);
 
+#ifndef  pgprot_to_large
+#define  pgprot_to_large(pgprot)  pgprot
+#endif
+
 #endif /* _LINUX_PGTABLE_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 317de2afd371..4c134a60fb64 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1135,7 +1135,7 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 		goto out_unlock;
 	}
 
-	entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
+	entry = pmd_mkhuge(pfn_t_pmd(pfn, pgprot_to_large(prot)));
 	if (pfn_t_devmap(pfn))
 		entry = pmd_mkdevmap(entry);
 	if (write) {
@@ -1233,7 +1233,7 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
 		goto out_unlock;
 	}
 
-	entry = pud_mkhuge(pfn_t_pud(pfn, prot));
+	entry = pud_mkhuge(pfn_t_pud(pfn, pgprot_to_large(prot)));
 	if (pfn_t_devmap(pfn))
 		entry = pud_mkdevmap(entry);
 	if (write) {
Dave Hansen May 24, 2024, 3:30 a.m. UTC | #4
On 5/23/24 16:07, Peter Xu wrote:
> Probably not..  I think I can define a pgprot_to_large() globally, pointing
> that to pgprot_4k_2_large() on x86 and make the fallback to be noop.  And
> if there's a new version I'll guarantee to run over my cross compilers.

I guess that would be functional, but it would be a bit mean to
everybody else.

> Any comments on the idea itself?  Do we have a problem, or maybe I
> overlooked something?

I think it's probably unnecessary to inflict this particular x86-ism on
generic code.  The arch-generic 'prot' should have PAT at its 4k
(_PAGE_BIT_PAT) position and then p*d_mkhuge() can shift it into the
_PAGE_BIT_PAT_LARGE spot.
Peter Xu May 24, 2024, 11:55 p.m. UTC | #5
On Thu, May 23, 2024 at 08:30:19PM -0700, Dave Hansen wrote:
> On 5/23/24 16:07, Peter Xu wrote:
> > Probably not..  I think I can define a pgprot_to_large() globally, pointing
> > that to pgprot_4k_2_large() on x86 and make the fallback to be noop.  And
> > if there's a new version I'll guarantee to run over my cross compilers.
> 
> I guess that would be functional, but it would be a bit mean to
> everybody else.
> 
> > Any comments on the idea itself?  Do we have a problem, or maybe I
> > overlooked something?
> 
> I think it's probably unnecessary to inflict this particular x86-ism on
> generic code.  The arch-generic 'prot' should have PAT at its 4k
> (_PAGE_BIT_PAT) position and then p*d_mkhuge() can shift it into the
> _PAGE_BIT_PAT_LARGE spot.

Right that's another option indeed.

It's just that I found it might in many cases be better when we have the
API separately properly and making the pairs matching each other.

For example, it could be clearer if pxx_mkhuge() does exactly what
pxx_leaf() would check against.

PS: I hoped it's called pxx_huge() already to make the name paired with
each other; afaict we called it pxx_leaf() only because pxx_huge() used to
be "abused" by hugetlbfs before.. now it's gone.

The other thing is we mostly only need these knobs for special maps like
pfnmaps, am I right?  OTOH we use WB for RAMs, and maybe we don't want to
bother any PAT stuff when the kernel is installing a THP anonymous?

IMHO having pgprot_to_large() is fine even if only x86 has it; it's really
like pfn tracking itself which is noop for !x86. but I'll follow your
advise if you still insist; I don't really have a strong opinion.

But if so I'd also like to mention a 3rd option, which is to have
pxx_mkhuge_prot(), fallback to pxx_mkhuge() for !x86.  That'll make
pxx_huge() untainted for x86.  I'm not sure whether that would ease the
same concern, though.

In all cases, thanks for confirming this issue, I appreciate that.  Let me
know if you have any comment on patch 1 too; that one isn't a problem so
far iiuc, but it can be soon.

Thanks,
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 317de2afd371..c4a2356b1a54 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1135,7 +1135,7 @@  static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 		goto out_unlock;
 	}
 
-	entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
+	entry = pmd_mkhuge(pfn_t_pmd(pfn, pgprot_4k_2_large(prot)));
 	if (pfn_t_devmap(pfn))
 		entry = pmd_mkdevmap(entry);
 	if (write) {
@@ -1233,7 +1233,7 @@  static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
 		goto out_unlock;
 	}
 
-	entry = pud_mkhuge(pfn_t_pud(pfn, prot));
+	entry = pud_mkhuge(pfn_t_pud(pfn, pgprot_4k_2_large(prot)));
 	if (pfn_t_devmap(pfn))
 		entry = pud_mkdevmap(entry);
 	if (write) {