diff mbox series

[RFC,1/4] x86/mm/cpa: restore global bit when page is present

Message ID 20220808145649.2261258-2-aaron.lu@intel.com (mailing list archive)
State New
Headers show
Series x86/mm/cpa: merge small mappings whenever possible | expand

Commit Message

Aaron Lu Aug. 8, 2022, 2:56 p.m. UTC
For configs that don't have PTI enabled or cpus that don't need
meltdown mitigation, current kernel can lose GLOBAL bit after a page
goes through a cycle of present -> not present -> present.

It happened like this(__vunmap() does this in vm_remove_mappings()):
original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
set_memory_np(page, 1):   0x8000000000000062 (NX/D/A/RW) lose G and P
set_memory_p(pagem 1):    0x8000000000000063 (NX/D/A/RW/P) restored P

In the end, this page's protection no longer has Global bit set and this
would create problem for this merge small mapping feature.

For this reason, restore Global bit for systems that do not have PTI
enabled if page is present.

(pgprot_clear_protnone_bits() deserves a better name if this patch is
acceptible but first, I would like to get some feedback if this is the
right way to solve this so I didn't bother with the name yet)

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 arch/x86/mm/pat/set_memory.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Hyeonggon Yoo Aug. 11, 2022, 5:21 a.m. UTC | #1
On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote:
> For configs that don't have PTI enabled or cpus that don't need
> meltdown mitigation, current kernel can lose GLOBAL bit after a page
> goes through a cycle of present -> not present -> present.
> 
> It happened like this(__vunmap() does this in vm_remove_mappings()):
> original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
> set_memory_np(page, 1):   0x8000000000000062 (NX/D/A/RW) lose G and P
> set_memory_p(pagem 1):    0x8000000000000063 (NX/D/A/RW/P) restored P
> 
> In the end, this page's protection no longer has Global bit set and this
> would create problem for this merge small mapping feature.
> 
> For this reason, restore Global bit for systems that do not have PTI
> enabled if page is present.
> 
> (pgprot_clear_protnone_bits() deserves a better name if this patch is
> acceptible but first, I would like to get some feedback if this is the
> right way to solve this so I didn't bother with the name yet)
> 
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
>  arch/x86/mm/pat/set_memory.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 1abd5438f126..33657a54670a 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
>  	 */
>  	if (!(pgprot_val(prot) & _PAGE_PRESENT))
>  		pgprot_val(prot) &= ~_PAGE_GLOBAL;
> +	else
> +		pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
>  
>  	return prot;
>  }

IIUC It makes it unable to set _PAGE_GLOBL when PTI is on.

Maybe it would be less intrusive to make
set_direct_map_default_noflush() replace protection bits
with PAGE_KENREL as it's only called for direct map, and the function
is to reset permission to default:

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 1abd5438f126..0dd4433c1382 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page)

 int set_direct_map_default_noflush(struct page *page)
 {
-       return __set_pages_p(page, 1);
+       unsigned long tempaddr = (unsigned long) page_address(page);
+       struct cpa_data cpa = {
+                       .vaddr = &tempaddr,
+                       .pgd = NULL,
+                       .numpages = 1,
+                       .mask_set = PAGE_KERNEL,
+                       .mask_clr = __pgprot(~0),
+                       .flags = 0};
+
+       return __change_page_attr_set_clr(&cpa, 0);
 }

set_direct_map_{invalid,default}_noflush() is the exact reason
why direct map become split after vmalloc/vfree with special
permissions.

> -- 
> 2.37.1
> 
>
Aaron Lu Aug. 11, 2022, 8:16 a.m. UTC | #2
On Thu, 2022-08-11 at 05:21 +0000, Hyeonggon Yoo wrote:
> On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote:
> > For configs that don't have PTI enabled or cpus that don't need
> > meltdown mitigation, current kernel can lose GLOBAL bit after a page
> > goes through a cycle of present -> not present -> present.
> > 
> > It happened like this(__vunmap() does this in vm_remove_mappings()):
> > original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
> > set_memory_np(page, 1):   0x8000000000000062 (NX/D/A/RW) lose G and P
> > set_memory_p(pagem 1):    0x8000000000000063 (NX/D/A/RW/P) restored P
> > 
> > In the end, this page's protection no longer has Global bit set and this
> > would create problem for this merge small mapping feature.
> > 
> > For this reason, restore Global bit for systems that do not have PTI
> > enabled if page is present.
> > 
> > (pgprot_clear_protnone_bits() deserves a better name if this patch is
> > acceptible but first, I would like to get some feedback if this is the
> > right way to solve this so I didn't bother with the name yet)
> > 
> > Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> > ---
> >  arch/x86/mm/pat/set_memory.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index 1abd5438f126..33657a54670a 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
> >  	 */
> >  	if (!(pgprot_val(prot) & _PAGE_PRESENT))
> >  		pgprot_val(prot) &= ~_PAGE_GLOBAL;
> > +	else
> > +		pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
> >  
> >  	return prot;
> >  }
> 
> IIUC It makes it unable to set _PAGE_GLOBL when PTI is on.
> 

Yes. Is this a problem?
I think that is the intended behaviour when PTI is on: not to enable
Gloabl bit on kernel mappings.

> Maybe it would be less intrusive to make
> set_direct_map_default_noflush() replace protection bits
> with PAGE_KENREL as it's only called for direct map, and the function
> is to reset permission to default:
> 
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 1abd5438f126..0dd4433c1382 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page)
> 
>  int set_direct_map_default_noflush(struct page *page)
>  {
> -       return __set_pages_p(page, 1);
> +       unsigned long tempaddr = (unsigned long) page_address(page);
> +       struct cpa_data cpa = {
> +                       .vaddr = &tempaddr,
> +                       .pgd = NULL,
> +                       .numpages = 1,
> +                       .mask_set = PAGE_KERNEL,
> +                       .mask_clr = __pgprot(~0),
> +                       .flags = 0};
> +
> +       return __change_page_attr_set_clr(&cpa, 0);
>  }

Looks reasonable to me and it is indeed less intrusive. I'm only
concerned there might be other paths that also go through present ->
not present -> present and this change can not cover them.

> 
> set_direct_map_{invalid,default}_noflush() is the exact reason
> why direct map become split after vmalloc/vfree with special
> permissions.

Yes I agree, because it can lose G bit after the whole cycle when PTI
is not on. When PTI is on, there is no such problem because G bit is
not there initially.

Thanks,
Aaron
Hyeonggon Yoo Aug. 11, 2022, 11:30 a.m. UTC | #3
On Thu, Aug 11, 2022 at 08:16:08AM +0000, Lu, Aaron wrote:
> On Thu, 2022-08-11 at 05:21 +0000, Hyeonggon Yoo wrote:
> > On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote:
> > > For configs that don't have PTI enabled or cpus that don't need
> > > meltdown mitigation, current kernel can lose GLOBAL bit after a page
> > > goes through a cycle of present -> not present -> present.
> > > 
> > > It happened like this(__vunmap() does this in vm_remove_mappings()):
> > > original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
> > > set_memory_np(page, 1):   0x8000000000000062 (NX/D/A/RW) lose G and P
> > > set_memory_p(pagem 1):    0x8000000000000063 (NX/D/A/RW/P) restored P
> > > 
> > > In the end, this page's protection no longer has Global bit set and this
> > > would create problem for this merge small mapping feature.
> > > 
> > > For this reason, restore Global bit for systems that do not have PTI
> > > enabled if page is present.
> > > 
> > > (pgprot_clear_protnone_bits() deserves a better name if this patch is
> > > acceptible but first, I would like to get some feedback if this is the
> > > right way to solve this so I didn't bother with the name yet)
> > > 
> > > Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> > > ---
> > >  arch/x86/mm/pat/set_memory.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > > index 1abd5438f126..33657a54670a 100644
> > > --- a/arch/x86/mm/pat/set_memory.c
> > > +++ b/arch/x86/mm/pat/set_memory.c
> > > @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
> > >  	 */
> > >  	if (!(pgprot_val(prot) & _PAGE_PRESENT))
> > >  		pgprot_val(prot) &= ~_PAGE_GLOBAL;
> > > +	else
> > > +		pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
> > >  
> > >  	return prot;
> > >  }
> > 
> > IIUC It makes it unable to set _PAGE_GLOBL when PTI is on.
> > 
> 
> Yes. Is this a problem?
> I think that is the intended behaviour when PTI is on: not to enable
> Gloabl bit on kernel mappings.

Please note that I'm not expert on PTI.

but AFAIK with PTI, at least everything (kernel part) mapped to user page table is
mapped as global when PGE is supported.

Not sure "Global bit is never used for kernel part when PTI is enabled"
is true.

Also, commit d1440b23c922d ("x86/mm: Factor out pageattr _PAGE_GLOBAL setting") that introduced
pgprot_clear_protnone_bits() says:
	
	This unconditional setting of _PAGE_GLOBAL is a problem when we have
	PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
	not.

	This updated version of the code says:
	1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
	2. Never set _PAGE_GLOBAL implicitly
	3. Allow _PAGE_GLOBAL to be in cpa.set_mask
	4. Allow _PAGE_GLOBAL to be inherited from previous PTE

> > Maybe it would be less intrusive to make
> > set_direct_map_default_noflush() replace protection bits
> > with PAGE_KENREL as it's only called for direct map, and the function
> > is to reset permission to default:
> > 
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index 1abd5438f126..0dd4433c1382 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page)
> > 
> >  int set_direct_map_default_noflush(struct page *page)
> >  {
> > -       return __set_pages_p(page, 1);
> > +       unsigned long tempaddr = (unsigned long) page_address(page);
> > +       struct cpa_data cpa = {
> > +                       .vaddr = &tempaddr,
> > +                       .pgd = NULL,
> > +                       .numpages = 1,
> > +                       .mask_set = PAGE_KERNEL,
> > +                       .mask_clr = __pgprot(~0),

Nah, this sets _PAGE_ENC unconditionally, which should be evaluated.
Maybe less intrusive way would be:
		       .mask_set = __pgprot(_PAGE_PRESENT |
					   (_PAGE_GLOBAL & __kernel_default_pte_mask)),
                       .mask_clr = __pgprot(0),

> > +                       .flags = 0};
> > +
> > +       return __change_page_attr_set_clr(&cpa, 0);
> >  }
> 
> Looks reasonable to me and it is indeed less intrusive. I'm only
> concerned there might be other paths that also go through present ->
> not present -> present and this change can not cover them.
>

AFAIK other paths going through present->not present->present (using CPA)
is only when DEBUG_PAGEALLOC is used.

Do we care direct map fragmentation when using DEBUG_PAGEALLOC?

> > 
> > set_direct_map_{invalid,default}_noflush() is the exact reason
> > why direct map become split after vmalloc/vfree with special
> > permissions.
> 
> Yes I agree, because it can lose G bit after the whole cycle when PTI
> is not on. When PTI is on, there is no such problem because G bit is
> not there initially.
> 
> Thanks,
> Aaron
Aaron Lu Aug. 11, 2022, 12:28 p.m. UTC | #4
On 8/11/2022 7:30 PM, Hyeonggon Yoo wrote:
> On Thu, Aug 11, 2022 at 08:16:08AM +0000, Lu, Aaron wrote:
>> On Thu, 2022-08-11 at 05:21 +0000, Hyeonggon Yoo wrote:
>>> On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote:
>>>> For configs that don't have PTI enabled or cpus that don't need
>>>> meltdown mitigation, current kernel can lose GLOBAL bit after a page
>>>> goes through a cycle of present -> not present -> present.
>>>>
>>>> It happened like this(__vunmap() does this in vm_remove_mappings()):
>>>> original page protection: 0x8000000000000163 (NX/G/D/A/RW/P)
>>>> set_memory_np(page, 1):   0x8000000000000062 (NX/D/A/RW) lose G and P
>>>> set_memory_p(pagem 1):    0x8000000000000063 (NX/D/A/RW/P) restored P
>>>>
>>>> In the end, this page's protection no longer has Global bit set and this
>>>> would create problem for this merge small mapping feature.
>>>>
>>>> For this reason, restore Global bit for systems that do not have PTI
>>>> enabled if page is present.
>>>>
>>>> (pgprot_clear_protnone_bits() deserves a better name if this patch is
>>>> acceptible but first, I would like to get some feedback if this is the
>>>> right way to solve this so I didn't bother with the name yet)
>>>>
>>>> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
>>>> ---
>>>>  arch/x86/mm/pat/set_memory.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>>> index 1abd5438f126..33657a54670a 100644
>>>> --- a/arch/x86/mm/pat/set_memory.c
>>>> +++ b/arch/x86/mm/pat/set_memory.c
>>>> @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
>>>>  	 */
>>>>  	if (!(pgprot_val(prot) & _PAGE_PRESENT))
>>>>  		pgprot_val(prot) &= ~_PAGE_GLOBAL;
>>>> +	else
>>>> +		pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
>>>>  
>>>>  	return prot;
>>>>  }
>>>
>>> IIUC It makes it unable to set _PAGE_GLOBL when PTI is on.
>>>
>>
>> Yes. Is this a problem?
>> I think that is the intended behaviour when PTI is on: not to enable
>> Gloabl bit on kernel mappings.
> 
> Please note that I'm not expert on PTI.
> 
> but AFAIK with PTI, at least everything (kernel part) mapped to user page table is
> mapped as global when PGE is supported.
> 
> Not sure "Global bit is never used for kernel part when PTI is enabled"
> is true.
>
> Also, commit d1440b23c922d ("x86/mm: Factor out pageattr _PAGE_GLOBAL setting") that introduced
> pgprot_clear_protnone_bits() says:
> 	
> 	This unconditional setting of _PAGE_GLOBAL is a problem when we have
> 	PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
> 	not.
> 
> 	This updated version of the code says:
> 	1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
> 	2. Never set _PAGE_GLOBAL implicitly
> 	3. Allow _PAGE_GLOBAL to be in cpa.set_mask
> 	4. Allow _PAGE_GLOBAL to be inherited from previous PTE
>

Thanks for these info, I'll need to take a closer look at PTI.

>>> Maybe it would be less intrusive to make
>>> set_direct_map_default_noflush() replace protection bits
>>> with PAGE_KENREL as it's only called for direct map, and the function
>>> is to reset permission to default:
>>>
>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>>> index 1abd5438f126..0dd4433c1382 100644
>>> --- a/arch/x86/mm/pat/set_memory.c
>>> +++ b/arch/x86/mm/pat/set_memory.c
>>> @@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page)
>>>
>>>  int set_direct_map_default_noflush(struct page *page)
>>>  {
>>> -       return __set_pages_p(page, 1);
>>> +       unsigned long tempaddr = (unsigned long) page_address(page);
>>> +       struct cpa_data cpa = {
>>> +                       .vaddr = &tempaddr,
>>> +                       .pgd = NULL,
>>> +                       .numpages = 1,
>>> +                       .mask_set = PAGE_KERNEL,
>>> +                       .mask_clr = __pgprot(~0),
> 
> Nah, this sets _PAGE_ENC unconditionally, which should be evaluated.
> Maybe less intrusive way would be:
> 		       .mask_set = __pgprot(_PAGE_PRESENT |
> 					   (_PAGE_GLOBAL & __kernel_default_pte_mask)),
>                        .mask_clr = __pgprot(0),
> 
>>> +                       .flags = 0};
>>> +
>>> +       return __change_page_attr_set_clr(&cpa, 0);
>>>  }
>>
>> Looks reasonable to me and it is indeed less intrusive. I'm only
>> concerned there might be other paths that also go through present ->
>> not present -> present and this change can not cover them.
>>
> 
> AFAIK other paths going through present->not present->present (using CPA)
> is only when DEBUG_PAGEALLOC is used.
> 
> Do we care direct map fragmentation when using DEBUG_PAGEALLOC?
> 

No, direct mapping does not use large page mapping when DEBUG_PAGEALLOC.

>>>
>>> set_direct_map_{invalid,default}_noflush() is the exact reason
>>> why direct map become split after vmalloc/vfree with special
>>> permissions.
>>
>> Yes I agree, because it can lose G bit after the whole cycle when PTI
>> is not on. When PTI is on, there is no such problem because G bit is
>> not there initially.
>>
>> Thanks,
>> Aaron
>
diff mbox series

Patch

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 1abd5438f126..33657a54670a 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -758,6 +758,8 @@  static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
 	 */
 	if (!(pgprot_val(prot) & _PAGE_PRESENT))
 		pgprot_val(prot) &= ~_PAGE_GLOBAL;
+	else
+		pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask;
 
 	return prot;
 }