diff mbox series

[v3] powerpc/mm: Support execute-only memory on the Radix MMU

Message ID 20220809024433.17644-1-ruscur@russell.cc (mailing list archive)
State Superseded
Headers show
Series [v3] powerpc/mm: Support execute-only memory on the Radix MMU | expand

Commit Message

Russell Currey Aug. 9, 2022, 2:44 a.m. UTC
The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
through the execute-only pkey.  A PROT_EXEC-only mapping will actually
map to RX, and then the pkey will be applied on top of it.

Radix doesn't have pkeys, but it does have execute permissions built-in
to the MMU, so all we have to do to support XOM is expose it.

Signed-off-by: Russell Currey <ruscur@russell.cc>
---
v3: Incorporate Aneesh's suggestions, leave protection_map untouched
Basic test: https://github.com/ruscur/junkcode/blob/main/mmap_test.c

 arch/powerpc/include/asm/book3s/64/pgtable.h |  2 ++
 arch/powerpc/mm/book3s64/pgtable.c           | 11 +++++++++--
 arch/powerpc/mm/fault.c                      |  6 +++++-
 3 files changed, 16 insertions(+), 3 deletions(-)

Comments

Christophe Leroy Aug. 9, 2022, 5:51 a.m. UTC | #1
Le 09/08/2022 à 04:44, Russell Currey a écrit :
> The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
> through the execute-only pkey.  A PROT_EXEC-only mapping will actually
> map to RX, and then the pkey will be applied on top of it.

I don't think XOM is a commonly understood accronym. Maybe the first 
time you use it it'd be better to say something like:

The Hash MMU already supports execute-only memory (XOM)

When you say that Hash MMU supports it through the execute-only pkey, 
does it mean that it is taken into account automatically at mmap time, 
or does the userspace app has to do something special to use the key ? 
If it is the second, it means that depending on whether you are radix or 
not, you must do something different ? Is that expected ?

> 
> Radix doesn't have pkeys, but it does have execute permissions built-in
> to the MMU, so all we have to do to support XOM is expose it.
> 
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
> v3: Incorporate Aneesh's suggestions, leave protection_map untouched
> Basic test: https://github.com/ruscur/junkcode/blob/main/mmap_test.c
> 
>   arch/powerpc/include/asm/book3s/64/pgtable.h |  2 ++
>   arch/powerpc/mm/book3s64/pgtable.c           | 11 +++++++++--
>   arch/powerpc/mm/fault.c                      |  6 +++++-
>   3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 392ff48f77df..486902aff040 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -151,6 +151,8 @@
>   #define PAGE_COPY_X	__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
>   #define PAGE_READONLY	__pgprot(_PAGE_BASE | _PAGE_READ)
>   #define PAGE_READONLY_X	__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
> +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */
> +#define PAGE_EXECONLY	__pgprot(_PAGE_BASE | _PAGE_EXEC)
>   
>   /* Permission masks used for kernel mappings */
>   #define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 7b9966402b25..62f63d344596 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
>   
>   pgprot_t vm_get_page_prot(unsigned long vm_flags)
>   {
> -	unsigned long prot = pgprot_val(protection_map[vm_flags &
> -					(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
> +	unsigned long prot;
> +
> +	/* Radix supports execute-only, but protection_map maps X -> RX */
> +	if (radix_enabled() && ((vm_flags & (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) {

Maybe use VM_ACCESS_FLAGS ?

> +		prot = pgprot_val(PAGE_EXECONLY);
> +	} else {
> +		prot = pgprot_val(protection_map[vm_flags &
> +				  (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
> +	}
>   
>   	if (vm_flags & VM_SAO)
>   		prot |= _PAGE_SAO;
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 014005428687..59e4cbcf3109 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool is_exec, struct vm_area_struct *vma
>   		return false;
>   	}
>   
> -	if (unlikely(!vma_is_accessible(vma)))
> +	/* On Radix, a read fault could be from PROT_NONE or PROT_EXEC */
> +	if (unlikely(radix_enabled() && !(vma->vm_flags & VM_READ)))
> +		return true;

Why do you need the radix_enabled() here ?
Even if it doesn't fault directly, reading a non readable area is still 
an error and should be handled as such, even on hardware that will not 
generate a fault for it at the first place. So I'd just do:

	if (!(vma->vm_flags & VM_READ)))
		return true;

> +	/* Check for a PROT_NONE fault on other MMUs */
> +	else if (unlikely(!vma_is_accessible(vma)))
>   		return true;
>   	/*
>   	 * We should ideally do the vma pkey access check here. But in the

Don't use an if/else construct, there is no other 'else' in that 
function, or in similar functions like bad_kernel_fault() for instance.

So leave the !vma_is_accessible(vma) untouched and add your check as a 
standalone check before or after it.
Aneesh Kumar K.V Aug. 9, 2022, 6:27 a.m. UTC | #2
On 8/9/22 11:21 AM, Christophe Leroy wrote:
> Le 09/08/2022 à 04:44, Russell Currey a écrit :
>> The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
>> through the execute-only pkey.  A PROT_EXEC-only mapping will actually
>> map to RX, and then the pkey will be applied on top of it.
> 
> I don't think XOM is a commonly understood accronym. Maybe the first 
> time you use it it'd be better to say something like:
> 
> The Hash MMU already supports execute-only memory (XOM)
> 
> When you say that Hash MMU supports it through the execute-only pkey, 
> does it mean that it is taken into account automatically at mmap time, 
> or does the userspace app has to do something special to use the key ? 
> If it is the second, it means that depending on whether you are radix or 
> not, you must do something different ? Is that expected ?
> 
>>
>> Radix doesn't have pkeys, but it does have execute permissions built-in
>> to the MMU, so all we have to do to support XOM is expose it.
>>
>> Signed-off-by: Russell Currey <ruscur@russell.cc>
>> ---
>> v3: Incorporate Aneesh's suggestions, leave protection_map untouched
>> Basic test: https://github.com/ruscur/junkcode/blob/main/mmap_test.c
>>
>>   arch/powerpc/include/asm/book3s/64/pgtable.h |  2 ++
>>   arch/powerpc/mm/book3s64/pgtable.c           | 11 +++++++++--
>>   arch/powerpc/mm/fault.c                      |  6 +++++-
>>   3 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 392ff48f77df..486902aff040 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -151,6 +151,8 @@
>>   #define PAGE_COPY_X	__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
>>   #define PAGE_READONLY	__pgprot(_PAGE_BASE | _PAGE_READ)
>>   #define PAGE_READONLY_X	__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
>> +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */
>> +#define PAGE_EXECONLY	__pgprot(_PAGE_BASE | _PAGE_EXEC)
>>   
>>   /* Permission masks used for kernel mappings */
>>   #define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
>> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
>> index 7b9966402b25..62f63d344596 100644
>> --- a/arch/powerpc/mm/book3s64/pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/pgtable.c
>> @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
>>   
>>   pgprot_t vm_get_page_prot(unsigned long vm_flags)
>>   {
>> -	unsigned long prot = pgprot_val(protection_map[vm_flags &
>> -					(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
>> +	unsigned long prot;
>> +
>> +	/* Radix supports execute-only, but protection_map maps X -> RX */
>> +	if (radix_enabled() && ((vm_flags & (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) {
> 
> Maybe use VM_ACCESS_FLAGS ?
> 
>> +		prot = pgprot_val(PAGE_EXECONLY);
>> +	} else {
>> +		prot = pgprot_val(protection_map[vm_flags &
>> +				  (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
>> +	}
>>   
>>   	if (vm_flags & VM_SAO)
>>   		prot |= _PAGE_SAO;
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index 014005428687..59e4cbcf3109 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool is_exec, struct vm_area_struct *vma
>>   		return false;
>>   	}
>>   
>> -	if (unlikely(!vma_is_accessible(vma)))
>> +	/* On Radix, a read fault could be from PROT_NONE or PROT_EXEC */
>> +	if (unlikely(radix_enabled() && !(vma->vm_flags & VM_READ)))
>> +		return true;
> 
> Why do you need the radix_enabled() here ?
> Even if it doesn't fault directly, reading a non readable area is still 
> an error and should be handled as such, even on hardware that will not 
> generate a fault for it at the first place. So I'd just do:
> 
> 	if (!(vma->vm_flags & VM_READ)))
> 		return true;
> 
>> +	/* Check for a PROT_NONE fault on other MMUs */
>> +	else if (unlikely(!vma_is_accessible(vma)))
>>   		return true;
>>   	/*
>>   	 * We should ideally do the vma pkey access check here. But in the
> 
> Don't use an if/else construct, there is no other 'else' in that 
> function, or in similar functions like bad_kernel_fault() for instance.
> 
> So leave the !vma_is_accessible(vma) untouched and add your check as a 
> standalone check before or after it.


What does vma_is_accessible() check bring if we have the VM_READ check unconditional ?

-aneesh
Russell Currey Aug. 9, 2022, 6:28 a.m. UTC | #3
On Tue, 2022-08-09 at 05:51 +0000, Christophe Leroy wrote:
> Le 09/08/2022 à 04:44, Russell Currey a écrit :
> > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
> > through the execute-only pkey.  A PROT_EXEC-only mapping will
> > actually
> > map to RX, and then the pkey will be applied on top of it.
> 
> I don't think XOM is a commonly understood accronym. Maybe the first 
> time you use it it'd be better to say something like:
> 
> The Hash MMU already supports execute-only memory (XOM)

Yes, that's better.

> 
> When you say that Hash MMU supports it through the execute-only pkey,
> does it mean that it is taken into account automatically at mmap
> time, 
> or does the userspace app has to do something special to use the key
> ? 
> If it is the second, it means that depending on whether you are radix
> or 
> not, you must do something different ? Is that expected ?

It happens at mmap time, see do_mmap() in mm/mmap.c (and similar for
mprotect).  That calls into execute_only_pkey() which can return
something on x86 & Hash, and if it does that pkey gets used.  The
userspace process doesn't have to do anything, it's transparent.  So
there's no difference in program behaviour switching between Hash/Radix
- at least in the basic cases I've tested.

> 
> > 
> > Radix doesn't have pkeys, but it does have execute permissions
> > built-in
> > to the MMU, so all we have to do to support XOM is expose it.
> > 
> > Signed-off-by: Russell Currey <ruscur@russell.cc>
> > ---
> > v3: Incorporate Aneesh's suggestions, leave protection_map
> > untouched
> > Basic test:
> > https://github.com/ruscur/junkcode/blob/main/mmap_test.c
> > 
> >   arch/powerpc/include/asm/book3s/64/pgtable.h |  2 ++
> >   arch/powerpc/mm/book3s64/pgtable.c           | 11 +++++++++--
> >   arch/powerpc/mm/fault.c                      |  6 +++++-
> >   3 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 392ff48f77df..486902aff040 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -151,6 +151,8 @@
> >   #define PAGE_COPY_X   __pgprot(_PAGE_BASE | _PAGE_READ |
> > _PAGE_EXEC)
> >   #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ)
> >   #define PAGE_READONLY_X       __pgprot(_PAGE_BASE | _PAGE_READ |
> > _PAGE_EXEC)
> > +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey
> > instead */
> > +#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
> >   
> >   /* Permission masks used for kernel mappings */
> >   #define PAGE_KERNEL   __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c
> > b/arch/powerpc/mm/book3s64/pgtable.c
> > index 7b9966402b25..62f63d344596 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
> >   
> >   pgprot_t vm_get_page_prot(unsigned long vm_flags)
> >   {
> > -       unsigned long prot = pgprot_val(protection_map[vm_flags &
> > -
> >                                        (VM_READ|VM_WRITE|VM_EXEC|VM_
> > SHARED)]);
> > +       unsigned long prot;
> > +
> > +       /* Radix supports execute-only, but protection_map maps X -
> > > RX */
> > +       if (radix_enabled() && ((vm_flags &
> > (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) {
> 
> Maybe use VM_ACCESS_FLAGS ?

I was looking for something like that but only checked powerpc, thanks.

> 
> > +               prot = pgprot_val(PAGE_EXECONLY);
> > +       } else {
> > +               prot = pgprot_val(protection_map[vm_flags &
> > +                                
> > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
> > +       }
> >   
> >         if (vm_flags & VM_SAO)
> >                 prot |= _PAGE_SAO;
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 014005428687..59e4cbcf3109 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool
> > is_exec, struct vm_area_struct *vma
> >                 return false;
> >         }
> >   
> > -       if (unlikely(!vma_is_accessible(vma)))
> > +       /* On Radix, a read fault could be from PROT_NONE or
> > PROT_EXEC */
> > +       if (unlikely(radix_enabled() && !(vma->vm_flags &
> > VM_READ)))
> > +               return true;
> 
> Why do you need the radix_enabled() here ?
> Even if it doesn't fault directly, reading a non readable area is
> still 
> an error and should be handled as such, even on hardware that will
> not 
> generate a fault for it at the first place. So I'd just do:
> 
>         if (!(vma->vm_flags & VM_READ)))
>                 return true;
> 

I don't think we need it, just concerned I might break something.  I
can do that.

> > +       /* Check for a PROT_NONE fault on other MMUs */
> > +       else if (unlikely(!vma_is_accessible(vma)))
> >                 return true;
> >         /*
> >          * We should ideally do the vma pkey access check here. But
> > in the
> 
> Don't use an if/else construct, there is no other 'else' in that 
> function, or in similar functions like bad_kernel_fault() for
> instance.
> 
> So leave the !vma_is_accessible(vma) untouched and add your check as
> a 
> standalone check before or after it.

I think checking vma_is_accessible(vma) is redundant if we're checking
for a read fault.  It doesn't satisfy the Radix exec-only case because
PROT_EXEC will be set, but as far as I can tell other MMUs don't have a
no-read mode other than PROT_NONE.  Unless I'm missing something,
checking if PROT_READ is there should be enough.
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 392ff48f77df..486902aff040 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -151,6 +151,8 @@ 
 #define PAGE_COPY_X	__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
 #define PAGE_READONLY	__pgprot(_PAGE_BASE | _PAGE_READ)
 #define PAGE_READONLY_X	__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
+/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */
+#define PAGE_EXECONLY	__pgprot(_PAGE_BASE | _PAGE_EXEC)
 
 /* Permission masks used for kernel mappings */
 #define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 7b9966402b25..62f63d344596 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -553,8 +553,15 @@  EXPORT_SYMBOL_GPL(memremap_compat_align);
 
 pgprot_t vm_get_page_prot(unsigned long vm_flags)
 {
-	unsigned long prot = pgprot_val(protection_map[vm_flags &
-					(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+	unsigned long prot;
+
+	/* Radix supports execute-only, but protection_map maps X -> RX */
+	if (radix_enabled() && ((vm_flags & (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) {
+		prot = pgprot_val(PAGE_EXECONLY);
+	} else {
+		prot = pgprot_val(protection_map[vm_flags &
+				  (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+	}
 
 	if (vm_flags & VM_SAO)
 		prot |= _PAGE_SAO;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 014005428687..59e4cbcf3109 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -270,7 +270,11 @@  static bool access_error(bool is_write, bool is_exec, struct vm_area_struct *vma
 		return false;
 	}
 
-	if (unlikely(!vma_is_accessible(vma)))
+	/* On Radix, a read fault could be from PROT_NONE or PROT_EXEC */
+	if (unlikely(radix_enabled() && !(vma->vm_flags & VM_READ)))
+		return true;
+	/* Check for a PROT_NONE fault on other MMUs */
+	else if (unlikely(!vma_is_accessible(vma)))
 		return true;
 	/*
 	 * We should ideally do the vma pkey access check here. But in the