diff mbox series

[-next] powerpc: add support for syscall stack randomization

Message ID 20220505111932.228814-1-xiujianfeng@huawei.com (mailing list archive)
State Superseded
Headers show
Series [-next] powerpc: add support for syscall stack randomization | expand

Commit Message

Xiu Jianfeng May 5, 2022, 11:19 a.m. UTC
Add support for adding a random offset to the stack while handling
syscalls. This patch uses mftb() instead of get_random_int() for better
performance.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
---
 arch/powerpc/Kconfig            | 1 +
 arch/powerpc/kernel/interrupt.c | 3 +++
 2 files changed, 4 insertions(+)

Comments

Nicholas Piggin May 10, 2022, 9:23 a.m. UTC | #1
Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
> Add support for adding a random offset to the stack while handling
> syscalls. This patch uses mftb() instead of get_random_int() for better
> performance.

Hey, very nice.

> 
> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
> ---
>  arch/powerpc/Kconfig            | 1 +
>  arch/powerpc/kernel/interrupt.c | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 5fc9153927ac..7e04c9f80cbc 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -192,6 +192,7 @@ config PPC
>  	select HAVE_ARCH_KASAN			if PPC32 && PPC_PAGE_SHIFT <= 14
>  	select HAVE_ARCH_KASAN_VMALLOC		if PPC32 && PPC_PAGE_SHIFT <= 14
>  	select HAVE_ARCH_KFENCE			if PPC_BOOK3S_32 || PPC_8xx || 40x
> +	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>  	select HAVE_ARCH_KGDB
>  	select HAVE_ARCH_MMAP_RND_BITS
>  	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if COMPAT
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 784ea3289c84..459385769721 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -4,6 +4,7 @@
>  #include <linux/err.h>
>  #include <linux/compat.h>
>  #include <linux/sched/debug.h> /* for show_regs */
> +#include <linux/randomize_kstack.h>
>  
>  #include <asm/kup.h>
>  #include <asm/cputime.h>
> @@ -82,6 +83,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  
>  	kuap_lock();
>  
> +	add_random_kstack_offset();
>  	regs->orig_gpr3 = r3;
>  
>  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))

This looks like the right place. I wonder why other interrupts don't
get the same treatment. Userspace can induce the kernel to take a 
synchronous interrupt, or wait for async ones. Smaller surface area 
maybe but certain instruction emulation for example could result in
significant logic that depends on user state. Anyway that's for
hardening gurus to ponder.

> @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
>  
>  	/* Restore user access locks last */
>  	kuap_user_restore(regs);
> +	choose_random_kstack_offset(mftb() & 0xFF);
>  
>  	return ret;
>  }

So this seems to be what x86 and s390 do, but why are we choosing a
new offset for every interrupt when it's only used on a syscall?
I would rather you do what arm64 does and just choose the offset
at the end of system_call_exception.

I wonder why the choose is separated from the add? I guess it's to
avoid a data dependency for stack access on an expensive random
function, so that makes sense (a comment would be nice in the
generic code).

I don't actually know if mftb() is cheaper here than a RNG. It
may not be conditioned all that well either. I would be tempted
to measure. 64-bit *may* be able to use a bit more than 256
bytes of stack too -- we have 16 byte alignment minimum so this
gives only 4 bits of randomness AFAIKS.

Thanks,
Nick
Kees Cook May 10, 2022, 4:19 p.m. UTC | #2
On Tue, May 10, 2022 at 07:23:46PM +1000, Nicholas Piggin wrote:
> Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
> > Add support for adding a random offset to the stack while handling
> > syscalls. This patch uses mftb() instead of get_random_int() for better
> > performance.
> 
> Hey, very nice.

Agreed! :)

> > [...]
> > @@ -82,6 +83,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  
> >  	kuap_lock();
> >  
> > +	add_random_kstack_offset();
> >  	regs->orig_gpr3 = r3;
> >  
> >  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> 
> This looks like the right place. I wonder why other interrupts don't
> get the same treatment. Userspace can induce the kernel to take a 
> synchronous interrupt, or wait for async ones. Smaller surface area 
> maybe but certain instruction emulation for example could result in
> significant logic that depends on user state. Anyway that's for
> hardening gurus to ponder.

I welcome it being used for any userspace controllable entry to the
kernel! :)

Also, related, have you validated the result using the LKDTM test?
See tools/testing/selftests/lkdtm/stack-entropy.sh

> 
> > @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
> >  
> >  	/* Restore user access locks last */
> >  	kuap_user_restore(regs);
> > +	choose_random_kstack_offset(mftb() & 0xFF);
> >  
> >  	return ret;
> >  }
> 
> So this seems to be what x86 and s390 do, but why are we choosing a
> new offset for every interrupt when it's only used on a syscall?
> I would rather you do what arm64 does and just choose the offset
> at the end of system_call_exception.
> 
> I wonder why the choose is separated from the add? I guess it's to
> avoid a data dependency for stack access on an expensive random
> function, so that makes sense (a comment would be nice in the
> generic code).

How does this read? I can send a "real" patch if it looks good:


diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 1468caf001c0..ad3e80275c74 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -40,8 +40,11 @@ DECLARE_PER_CPU(u32, kstack_offset);
  */
 #define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
 
-/*
- * These macros must be used during syscall entry when interrupts and
+/**
+ * add_random_kstack_offset - Increase stack utilization by previously
+ *			      chosen random offset
+ *
+ * This should be used in the syscall entry path when interrupts and
  * preempt are disabled, and after user registers have been stored to
  * the stack.
  */
@@ -55,6 +58,24 @@ DECLARE_PER_CPU(u32, kstack_offset);
 	}								\
 } while (0)
 
+/**
+ * choose_random_kstack_offset - Choose the random offsset for the next
+ *				 add_random_kstack_offset()
+ *
+ * This should only be used during syscall exit when interrupts and
+ * preempt are disabled, and before user registers have been restored
+ * from the stack. This is done to frustrate attack attempts from
+ * userspace to learn the offset:
+ * - Maximize the timing uncertainty visible from userspace: if the
+ *   the offset is chosen at syscall entry, userspace has much more
+ *   control over the timing between chosen offsets. "How long will we
+ *   be in kernel mode?" tends to be more difficult to know than "how
+ *   long will be be in user mode?"
+ * - Reduce the lifetime of the new offset sitting in memory during
+ *   kernel mode execution. Exposures of "thread-local" (e.g. current,
+ *   percpu, etc) memory contents tends to be easier than arbitrary
+ *   location memory exposures.
+ */
 #define choose_random_kstack_offset(rand) do {				\
 	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
 				&randomize_kstack_offset)) {		\
Xiu Jianfeng May 11, 2022, 8:34 a.m. UTC | #3
Hi,

在 2022/5/10 17:23, Nicholas Piggin 写道:
> Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
>> Add support for adding a random offset to the stack while handling
>> syscalls. This patch uses mftb() instead of get_random_int() for better
>> performance.
> Hey, very nice.
>
>> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
>> ---
>>   arch/powerpc/Kconfig            | 1 +
>>   arch/powerpc/kernel/interrupt.c | 3 +++
>>   2 files changed, 4 insertions(+)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 5fc9153927ac..7e04c9f80cbc 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -192,6 +192,7 @@ config PPC
>>   	select HAVE_ARCH_KASAN			if PPC32 && PPC_PAGE_SHIFT <= 14
>>   	select HAVE_ARCH_KASAN_VMALLOC		if PPC32 && PPC_PAGE_SHIFT <= 14
>>   	select HAVE_ARCH_KFENCE			if PPC_BOOK3S_32 || PPC_8xx || 40x
>> +	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>>   	select HAVE_ARCH_KGDB
>>   	select HAVE_ARCH_MMAP_RND_BITS
>>   	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if COMPAT
>> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
>> index 784ea3289c84..459385769721 100644
>> --- a/arch/powerpc/kernel/interrupt.c
>> +++ b/arch/powerpc/kernel/interrupt.c
>> @@ -4,6 +4,7 @@
>>   #include <linux/err.h>
>>   #include <linux/compat.h>
>>   #include <linux/sched/debug.h> /* for show_regs */
>> +#include <linux/randomize_kstack.h>
>>   
>>   #include <asm/kup.h>
>>   #include <asm/cputime.h>
>> @@ -82,6 +83,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>>   
>>   	kuap_lock();
>>   
>> +	add_random_kstack_offset();
>>   	regs->orig_gpr3 = r3;
>>   
>>   	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> This looks like the right place. I wonder why other interrupts don't
> get the same treatment. Userspace can induce the kernel to take a
> synchronous interrupt, or wait for async ones. Smaller surface area
> maybe but certain instruction emulation for example could result in
> significant logic that depends on user state. Anyway that's for
> hardening gurus to ponder.
>
>> @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
>>   
>>   	/* Restore user access locks last */
>>   	kuap_user_restore(regs);
>> +	choose_random_kstack_offset(mftb() & 0xFF);
>>   
>>   	return ret;
>>   }
> So this seems to be what x86 and s390 do, but why are we choosing a
> new offset for every interrupt when it's only used on a syscall?
> I would rather you do what arm64 does and just choose the offset
> at the end of system_call_exception.
thanks for you suggestion, will do in v2.
>
> I wonder why the choose is separated from the add? I guess it's to
> avoid a data dependency for stack access on an expensive random
> function, so that makes sense (a comment would be nice in the
> generic code).
>
> I don't actually know if mftb() is cheaper here than a RNG. It
> may not be conditioned all that well either. I would be tempted
#if defined(__powerpc64__) && (defined(CONFIG_PPC_CELL) || 
defined(CONFIG_E500))
#define mftb()          ({unsigned long rval;                           \
                         asm volatile(                                   \
                                 "90:    mfspr %0, %2;\n"                \
ASM_FTR_IFSET(                          \
                                         "97:    cmpwi %0,0;\n"          \
                                         "       beq- 90b;\n", "", %1)   \
                         : "=r" (rval) \
                         : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL) : 
"cr0"); \
                         rval;})
#elif defined(CONFIG_PPC_8xx)
#define mftb()          ({unsigned long rval;   \
                         asm volatile("mftbl %0" : "=r" (rval)); rval;})
#else
#define mftb()          ({unsigned long rval;   \
                         asm volatile("mfspr %0, %1" : \
                                      "=r" (rval) : "i" (SPRN_TBRL)); 
rval;})
#endif /* !CONFIG_PPC_CELL */

there are 3 implementations of mftb() in 
arch/powerpc/include/asm/vdso/timebase.h,

the last two cases have only one instruction, It's obviously cheaper 
than get_random_int,

do you mean the first one? It looks like cheaper too, or am I missing 
something?

> to measure. 64-bit *may* be able to use a bit more than 256
> bytes of stack too -- we have 16 byte alignment minimum so this
> gives only 4 bits of randomness AFAIKS.

KSTACK_OFFSET_MAX limits entropy to 10 bits, and THREAD_SHIFT is 14 for 
ppc64 and 13 for ppc32,

so can we just use 0x1FF for both or 0x1FF for 64bit and 0xFF for 32bit? 
what is your suggestion?

thanks.

>
> Thanks,
> Nick
> .
Xiu Jianfeng May 11, 2022, 8:36 a.m. UTC | #4
在 2022/5/11 0:19, Kees Cook 写道:
> On Tue, May 10, 2022 at 07:23:46PM +1000, Nicholas Piggin wrote:
>> Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
>>> Add support for adding a random offset to the stack while handling
>>> syscalls. This patch uses mftb() instead of get_random_int() for better
>>> performance.
>> Hey, very nice.
> Agreed! :)
>
>>> [...]
>>> @@ -82,6 +83,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>>>   
>>>   	kuap_lock();
>>>   
>>> +	add_random_kstack_offset();
>>>   	regs->orig_gpr3 = r3;
>>>   
>>>   	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
>> This looks like the right place. I wonder why other interrupts don't
>> get the same treatment. Userspace can induce the kernel to take a
>> synchronous interrupt, or wait for async ones. Smaller surface area
>> maybe but certain instruction emulation for example could result in
>> significant logic that depends on user state. Anyway that's for
>> hardening gurus to ponder.
> I welcome it being used for any userspace controllable entry to the
> kernel! :)
>
> Also, related, have you validated the result using the LKDTM test?
> See tools/testing/selftests/lkdtm/stack-entropy.sh

not yet, I tested it by printing the address of local variable 
directly,  will do before I send v2,

thanks.

>>> @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
>>>   
>>>   	/* Restore user access locks last */
>>>   	kuap_user_restore(regs);
>>> +	choose_random_kstack_offset(mftb() & 0xFF);
>>>   
>>>   	return ret;
>>>   }
>> So this seems to be what x86 and s390 do, but why are we choosing a
>> new offset for every interrupt when it's only used on a syscall?
>> I would rather you do what arm64 does and just choose the offset
>> at the end of system_call_exception.
>>
>> I wonder why the choose is separated from the add? I guess it's to
>> avoid a data dependency for stack access on an expensive random
>> function, so that makes sense (a comment would be nice in the
>> generic code).
> How does this read? I can send a "real" patch if it looks good:
>
>
> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> index 1468caf001c0..ad3e80275c74 100644
> --- a/include/linux/randomize_kstack.h
> +++ b/include/linux/randomize_kstack.h
> @@ -40,8 +40,11 @@ DECLARE_PER_CPU(u32, kstack_offset);
>    */
>   #define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
>   
> -/*
> - * These macros must be used during syscall entry when interrupts and
> +/**
> + * add_random_kstack_offset - Increase stack utilization by previously
> + *			      chosen random offset
> + *
> + * This should be used in the syscall entry path when interrupts and
>    * preempt are disabled, and after user registers have been stored to
>    * the stack.
>    */
> @@ -55,6 +58,24 @@ DECLARE_PER_CPU(u32, kstack_offset);
>   	}								\
>   } while (0)
>   
> +/**
> + * choose_random_kstack_offset - Choose the random offsset for the next
> + *				 add_random_kstack_offset()
> + *
> + * This should only be used during syscall exit when interrupts and
> + * preempt are disabled, and before user registers have been restored
> + * from the stack. This is done to frustrate attack attempts from
> + * userspace to learn the offset:
> + * - Maximize the timing uncertainty visible from userspace: if the
> + *   the offset is chosen at syscall entry, userspace has much more
> + *   control over the timing between chosen offsets. "How long will we
> + *   be in kernel mode?" tends to be more difficult to know than "how
> + *   long will be be in user mode?"
> + * - Reduce the lifetime of the new offset sitting in memory during
> + *   kernel mode execution. Exposures of "thread-local" (e.g. current,
> + *   percpu, etc) memory contents tends to be easier than arbitrary
> + *   location memory exposures.
> + */
>   #define choose_random_kstack_offset(rand) do {				\
>   	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
>   				&randomize_kstack_offset)) {		\
>
>
Michael Ellerman May 12, 2022, 1:03 p.m. UTC | #5
Kees Cook <keescook@chromium.org> writes:
> On Tue, May 10, 2022 at 07:23:46PM +1000, Nicholas Piggin wrote:
...
>> 
>> I wonder why the choose is separated from the add? I guess it's to
>> avoid a data dependency for stack access on an expensive random
>> function, so that makes sense (a comment would be nice in the
>> generic code).
>
> How does this read? I can send a "real" patch if it looks good:
>
>
> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> index 1468caf001c0..ad3e80275c74 100644
> --- a/include/linux/randomize_kstack.h
> +++ b/include/linux/randomize_kstack.h
> @@ -40,8 +40,11 @@ DECLARE_PER_CPU(u32, kstack_offset);
>   */
>  #define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
>  
> -/*
> - * These macros must be used during syscall entry when interrupts and
> +/**
> + * add_random_kstack_offset - Increase stack utilization by previously
> + *			      chosen random offset
> + *
> + * This should be used in the syscall entry path when interrupts and
 
I would say "called" rather than used, but that's a nit-pick.

>   * preempt are disabled, and after user registers have been stored to
>   * the stack.
>   */
> @@ -55,6 +58,24 @@ DECLARE_PER_CPU(u32, kstack_offset);
>  	}								\
>  } while (0)
>  
> +/**
> + * choose_random_kstack_offset - Choose the random offsset for the next
> + *				 add_random_kstack_offset()

The name "choose" tricked me into thinking the offset is used verbatim.
But it's actually xor'ed into the existing offset.

I was pretty dubious about using mftb (~= rdtsc) based on that, but the
xor makes me less worried.

Obviously you don't want to change the name now, but it would be good if
the doc comment mentioned that the value is combined with the existing
value, not used as-is.

> + * This should only be used during syscall exit when interrupts and
> + * preempt are disabled, and before user registers have been restored
> + * from the stack. This is done to frustrate attack attempts from
> + * userspace to learn the offset:
> + * - Maximize the timing uncertainty visible from userspace: if the
> + *   the offset is chosen at syscall entry, userspace has much more

You have a "the the" across the line-break there.

> + *   control over the timing between chosen offsets. "How long will we
> + *   be in kernel mode?" tends to be more difficult to know than "how
> + *   long will be be in user mode?"
> + * - Reduce the lifetime of the new offset sitting in memory during
> + *   kernel mode execution. Exposures of "thread-local" (e.g. current,
> + *   percpu, etc) memory contents tends to be easier than arbitrary
> + *   location memory exposures.
> + */
>  #define choose_random_kstack_offset(rand) do {				\
>  	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
>  				&randomize_kstack_offset)) {		\
>

cheers
Michael Ellerman May 12, 2022, 1:17 p.m. UTC | #6
xiujianfeng <xiujianfeng@huawei.com> writes:
> 在 2022/5/10 17:23, Nicholas Piggin 写道:
>> Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
>>> Add support for adding a random offset to the stack while handling
>>> syscalls. This patch uses mftb() instead of get_random_int() for better
>>> performance.
>>
...
>>
>>> @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
>>>
>>>   	/* Restore user access locks last */
>>>   	kuap_user_restore(regs);
>>> +	choose_random_kstack_offset(mftb() & 0xFF);
>>>
>>>   	return ret;
>>>   }
>> So this seems to be what x86 and s390 do, but why are we choosing a
>> new offset for every interrupt when it's only used on a syscall?
>> I would rather you do what arm64 does and just choose the offset
>> at the end of system_call_exception.
> thanks for you suggestion, will do in v2.
>>
>> I wonder why the choose is separated from the add? I guess it's to
>> avoid a data dependency for stack access on an expensive random
>> function, so that makes sense (a comment would be nice in the
>> generic code).
>>
>> I don't actually know if mftb() is cheaper here than a RNG. It
>> may not be conditioned all that well either. I would be tempted

> #if defined(__powerpc64__) && (defined(CONFIG_PPC_CELL) ||
> defined(CONFIG_E500))
> #define mftb()          ({unsigned long rval;                           \
>                          asm volatile(                                   \
>                                  "90:    mfspr %0, %2;\n"                \
> ASM_FTR_IFSET(                          \
>                                          "97:    cmpwi %0,0;\n"          \
>                                          "       beq- 90b;\n", "", %1)   \
>                          : "=r" (rval) \
>                          : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL) :
> "cr0"); \
>                          rval;})
> #elif defined(CONFIG_PPC_8xx)
> #define mftb()          ({unsigned long rval;   \
>                          asm volatile("mftbl %0" : "=r" (rval)); rval;})
> #else
> #define mftb()          ({unsigned long rval;   \
>                          asm volatile("mfspr %0, %1" : \
>                                       "=r" (rval) : "i" (SPRN_TBRL));
> rval;})
> #endif /* !CONFIG_PPC_CELL */
>
> there are 3 implementations of mftb() in
> arch/powerpc/include/asm/vdso/timebase.h,
>
> the last two cases have only one instruction, It's obviously cheaper
> than get_random_int,

Just because it's one instruction doesn't mean it's obviously cheaper.
On some CPUs mftb takes 10s of cycles, and can also stall the pipeline.

But looking at get_random_u32() it does look pretty complicated, it
takes a lock and so on. It's also silly to call get_random_u32() for
4-bits of randomness.

My initial impression was that mftb() is too predictable to be useful
against a determined attacker. But looking closer I see that
choose_random_kstack_offset() xor's the value we pass with the existing
value. So that makes me less worried about using mftb().

We could additionally call choose_random_kstack_offset(get_random_int())
less regularly, eg. during context switch. But I guess that's too
infrequent to actually make any difference.

But limiting it to 4-bits of randomness seems insufficient. It seems
like we should allow the full 6 (10) bits, and anyone turning this
option on should probably also consider increasing their stack size.

Also did you check the help text about stack-protector under
HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET?

cheers
Xiu Jianfeng May 16, 2022, 7:29 a.m. UTC | #7
在 2022/5/12 21:17, Michael Ellerman 写道:
> xiujianfeng <xiujianfeng@huawei.com> writes:
>> 在 2022/5/10 17:23, Nicholas Piggin 写道:
>>> Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
>>>> Add support for adding a random offset to the stack while handling
>>>> syscalls. This patch uses mftb() instead of get_random_int() for better
>>>> performance.
> ...
>>>> @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
>>>>
>>>>    	/* Restore user access locks last */
>>>>    	kuap_user_restore(regs);
>>>> +	choose_random_kstack_offset(mftb() & 0xFF);
>>>>
>>>>    	return ret;
>>>>    }
>>> So this seems to be what x86 and s390 do, but why are we choosing a
>>> new offset for every interrupt when it's only used on a syscall?
>>> I would rather you do what arm64 does and just choose the offset
>>> at the end of system_call_exception.
>> thanks for you suggestion, will do in v2.
>>> I wonder why the choose is separated from the add? I guess it's to
>>> avoid a data dependency for stack access on an expensive random
>>> function, so that makes sense (a comment would be nice in the
>>> generic code).
>>>
>>> I don't actually know if mftb() is cheaper here than a RNG. It
>>> may not be conditioned all that well either. I would be tempted
>> #if defined(__powerpc64__) && (defined(CONFIG_PPC_CELL) ||
>> defined(CONFIG_E500))
>> #define mftb()          ({unsigned long rval;                           \
>>                           asm volatile(                                   \
>>                                   "90:    mfspr %0, %2;\n"                \
>> ASM_FTR_IFSET(                          \
>>                                           "97:    cmpwi %0,0;\n"          \
>>                                           "       beq- 90b;\n", "", %1)   \
>>                           : "=r" (rval) \
>>                           : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL) :
>> "cr0"); \
>>                           rval;})
>> #elif defined(CONFIG_PPC_8xx)
>> #define mftb()          ({unsigned long rval;   \
>>                           asm volatile("mftbl %0" : "=r" (rval)); rval;})
>> #else
>> #define mftb()          ({unsigned long rval;   \
>>                           asm volatile("mfspr %0, %1" : \
>>                                        "=r" (rval) : "i" (SPRN_TBRL));
>> rval;})
>> #endif /* !CONFIG_PPC_CELL */
>>
>> there are 3 implementations of mftb() in
>> arch/powerpc/include/asm/vdso/timebase.h,
>>
>> the last two cases have only one instruction, It's obviously cheaper
>> than get_random_int,
> Just because it's one instruction doesn't mean it's obviously cheaper.
> On some CPUs mftb takes 10s of cycles, and can also stall the pipeline.
>
> But looking at get_random_u32() it does look pretty complicated, it
> takes a lock and so on. It's also silly to call get_random_u32() for
> 4-bits of randomness.
>
> My initial impression was that mftb() is too predictable to be useful
> against a determined attacker. But looking closer I see that
> choose_random_kstack_offset() xor's the value we pass with the existing
> value. So that makes me less worried about using mftb().
>
> We could additionally call choose_random_kstack_offset(get_random_int())
> less regularly, eg. during context switch. But I guess that's too
> infrequent to actually make any difference.
>
> But limiting it to 4-bits of randomness seems insufficient. It seems
> like we should allow the full 6 (10) bits, and anyone turning this
> option on should probably also consider increasing their stack size.
>
> Also did you check the help text about stack-protector under
> HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET?

thanks for your reminder, will disable stack-protector for interrupt.c 
in v2,

just like arm64 do.

>
> cheers
diff mbox series

Patch

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5fc9153927ac..7e04c9f80cbc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -192,6 +192,7 @@  config PPC
 	select HAVE_ARCH_KASAN			if PPC32 && PPC_PAGE_SHIFT <= 14
 	select HAVE_ARCH_KASAN_VMALLOC		if PPC32 && PPC_PAGE_SHIFT <= 14
 	select HAVE_ARCH_KFENCE			if PPC_BOOK3S_32 || PPC_8xx || 40x
+	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if COMPAT
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 784ea3289c84..459385769721 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -4,6 +4,7 @@ 
 #include <linux/err.h>
 #include <linux/compat.h>
 #include <linux/sched/debug.h> /* for show_regs */
+#include <linux/randomize_kstack.h>
 
 #include <asm/kup.h>
 #include <asm/cputime.h>
@@ -82,6 +83,7 @@  notrace long system_call_exception(long r3, long r4, long r5,
 
 	kuap_lock();
 
+	add_random_kstack_offset();
 	regs->orig_gpr3 = r3;
 
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
@@ -405,6 +407,7 @@  interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
 
 	/* Restore user access locks last */
 	kuap_user_restore(regs);
+	choose_random_kstack_offset(mftb() & 0xFF);
 
 	return ret;
 }