mbox series

[v8,0/6] Optionally randomize kernel stack offset each syscall

Message ID 20210330205750.428816-1-keescook@chromium.org (mailing list archive)
Headers show
Series Optionally randomize kernel stack offset each syscall | expand

Message

Kees Cook March 30, 2021, 8:57 p.m. UTC
v8:
- switch to __this_cpu_*() (tglx)
- improve commit log details, comments, and masking (ingo, tglx)
v7: https://lore.kernel.org/lkml/20210319212835.3928492-1-keescook@chromium.org/
v6: https://lore.kernel.org/lkml/20210315180229.1224655-1-keescook@chromium.org/
v5: https://lore.kernel.org/lkml/20210309214301.678739-1-keescook@chromium.org/
v4: https://lore.kernel.org/lkml/20200622193146.2985288-1-keescook@chromium.org/
v3: https://lore.kernel.org/lkml/20200406231606.37619-1-keescook@chromium.org/
v2: https://lore.kernel.org/lkml/20200324203231.64324-1-keescook@chromium.org/
rfc: https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/

Hi,

This is a continuation and refactoring of Elena's earlier effort to add
kernel stack base offset randomization. In the time since the earlier
discussions, two attacks[1][2] were made public that depended on stack
determinism, so we're no longer in the position of "this is a good idea
but we have no examples of attacks". :)

Earlier discussions also devolved into debates on entropy sources, which
is mostly a red herring, given the already low entropy available due
to stack size. Regardless, entropy can be changed/improved separately
from this series as needed.

Earlier discussions also got stuck debating how much syscall overhead
was too much, but this is also a red herring since the feature itself
needs to be selectable at boot with no cost for those that don't want it:
this is solved here with static branches.

So, here is the latest improved version, made as arch-agnostic as
possible, with usage added for x86 and arm64. It also includes some small
static branch clean ups, and addresses some surprise performance issues
due to the stack canary[3].

At the very least, the first two patches can land separately (already
Acked and Reviewed), since they're kind of "separate", but introduce
macros that are used in the core stack changes.

If I can get an Ack from an arm64 maintainer, I think this could all
land via -tip to make merging easiest.

Thanks!

-Kees

[1] https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
[2] https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
[3] https://lore.kernel.org/lkml/202003281520.A9BFF461@keescook/


Kees Cook (6):
  jump_label: Provide CONFIG-driven build state defaults
  init_on_alloc: Optimize static branches
  stack: Optionally randomize kernel stack offset each syscall
  x86/entry: Enable random_kstack_offset support
  arm64: entry: Enable random_kstack_offset support
  lkdtm: Add REPORT_STACK for checking stack offsets

 .../admin-guide/kernel-parameters.txt         | 11 ++++
 Makefile                                      |  4 ++
 arch/Kconfig                                  | 23 ++++++++
 arch/arm64/Kconfig                            |  1 +
 arch/arm64/kernel/Makefile                    |  5 ++
 arch/arm64/kernel/syscall.c                   | 16 ++++++
 arch/x86/Kconfig                              |  1 +
 arch/x86/entry/common.c                       |  3 +
 arch/x86/include/asm/entry-common.h           | 16 ++++++
 drivers/misc/lkdtm/bugs.c                     | 17 ++++++
 drivers/misc/lkdtm/core.c                     |  1 +
 drivers/misc/lkdtm/lkdtm.h                    |  1 +
 include/linux/jump_label.h                    | 19 +++++++
 include/linux/mm.h                            | 10 ++--
 include/linux/randomize_kstack.h              | 55 +++++++++++++++++++
 init/main.c                                   | 23 ++++++++
 mm/page_alloc.c                               |  4 +-
 mm/slab.h                                     |  6 +-
 18 files changed, 208 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/randomize_kstack.h

Comments

Thomas Gleixner March 31, 2021, 7:53 a.m. UTC | #1
On Tue, Mar 30 2021 at 13:57, Kees Cook wrote:
> +/*
> + * Do not use this anywhere else in the kernel. This is used here because
> + * it provides an arch-agnostic way to grow the stack with correct
> + * alignment. Also, since this use is being explicitly masked to a max of
> + * 10 bits, stack-clash style attacks are unlikely. For more details see
> + * "VLAs" in Documentation/process/deprecated.rst
> + * The asm statement is designed to convince the compiler to keep the
> + * allocation around even after "ptr" goes out of scope.

Nit. That explanation of "ptr" might be better placed right at the
add_random...() macro.

> + */
> +void *__builtin_alloca(size_t size);
> +/*
> + * Use, at most, 10 bits of entropy. We explicitly cap this to keep the
> + * "VLA" from being unbounded (see above). 10 bits leaves enough room for
> + * per-arch offset masks to reduce entropy (by removing higher bits, since
> + * high entropy may overly constrain usable stack space), and for
> + * compiler/arch-specific stack alignment to remove the lower bits.
> + */
> +#define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
> +
> +/*
> + * These macros must be used during syscall entry when interrupts and
> + * preempt are disabled, and after user registers have been stored to
> + * the stack.
> + */
> +#define add_random_kstack_offset() do {					\
> +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> +				&randomize_kstack_offset)) {		\
> +		u32 offset = __this_cpu_read(kstack_offset);		\
> +		u8 *ptr = __builtin_alloca(KSTACK_OFFSET_MAX(offset));	\
> +		asm volatile("" : "=m"(*ptr) :: "memory");		\
> +	}								\
> +} while (0)

Other than that.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Kees Cook March 31, 2021, 9:54 p.m. UTC | #2
On Wed, Mar 31, 2021 at 09:53:26AM +0200, Thomas Gleixner wrote:
> On Tue, Mar 30 2021 at 13:57, Kees Cook wrote:
> > +/*
> > + * Do not use this anywhere else in the kernel. This is used here because
> > + * it provides an arch-agnostic way to grow the stack with correct
> > + * alignment. Also, since this use is being explicitly masked to a max of
> > + * 10 bits, stack-clash style attacks are unlikely. For more details see
> > + * "VLAs" in Documentation/process/deprecated.rst
> > + * The asm statement is designed to convince the compiler to keep the
> > + * allocation around even after "ptr" goes out of scope.
> 
> Nit. That explanation of "ptr" might be better placed right at the
> add_random...() macro.

Ah, yes! Fixed in v9.

> Other than that.
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

Thank you for the reviews!

Do you want to take this via -tip (and leave off the arm64 patch until
it is acked), or would you rather it go via arm64? (I've sent v9 now...)
Thomas Gleixner March 31, 2021, 10:38 p.m. UTC | #3
On Wed, Mar 31 2021 at 14:54, Kees Cook wrote:
> On Wed, Mar 31, 2021 at 09:53:26AM +0200, Thomas Gleixner wrote:
>> On Tue, Mar 30 2021 at 13:57, Kees Cook wrote:
>> > +/*
>> > + * Do not use this anywhere else in the kernel. This is used here because
>> > + * it provides an arch-agnostic way to grow the stack with correct
>> > + * alignment. Also, since this use is being explicitly masked to a max of
>> > + * 10 bits, stack-clash style attacks are unlikely. For more details see
>> > + * "VLAs" in Documentation/process/deprecated.rst
>> > + * The asm statement is designed to convince the compiler to keep the
>> > + * allocation around even after "ptr" goes out of scope.
>> 
>> Nit. That explanation of "ptr" might be better placed right at the
>> add_random...() macro.
>
> Ah, yes! Fixed in v9.

Hmm, looking at V9 the "ptr" thing got lost ....

> +/*
> + * Do not use this anywhere else in the kernel. This is used here because
> + * it provides an arch-agnostic way to grow the stack with correct
> + * alignment. Also, since this use is being explicitly masked to a max of
> + * 10 bits, stack-clash style attacks are unlikely. For more details see
> + * "VLAs" in Documentation/process/deprecated.rst
> + */
> +void *__builtin_alloca(size_t size);
> +/*
> + * Use, at most, 10 bits of entropy. We explicitly cap this to keep the
> + * "VLA" from being unbounded (see above). 10 bits leaves enough room for
> + * per-arch offset masks to reduce entropy (by removing higher bits, since
> + * high entropy may overly constrain usable stack space), and for
> + * compiler/arch-specific stack alignment to remove the lower bits.
> + */
> +#define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
> +
> +/*
> + * These macros must be used during syscall entry when interrupts and
> + * preempt are disabled, and after user registers have been stored to
> + * the stack.
> + */
> +#define add_random_kstack_offset() do {					\

> Do you want to take this via -tip (and leave off the arm64 patch until
> it is acked), or would you rather it go via arm64? (I've sent v9 now...)

Either way is fine.

Thanks,

        tglx
Kees Cook April 1, 2021, 6:31 a.m. UTC | #4
On Thu, Apr 01, 2021 at 12:38:31AM +0200, Thomas Gleixner wrote:
> On Wed, Mar 31 2021 at 14:54, Kees Cook wrote:
> > On Wed, Mar 31, 2021 at 09:53:26AM +0200, Thomas Gleixner wrote:
> >> On Tue, Mar 30 2021 at 13:57, Kees Cook wrote:
> >> > +/*
> >> > + * Do not use this anywhere else in the kernel. This is used here because
> >> > + * it provides an arch-agnostic way to grow the stack with correct
> >> > + * alignment. Also, since this use is being explicitly masked to a max of
> >> > + * 10 bits, stack-clash style attacks are unlikely. For more details see
> >> > + * "VLAs" in Documentation/process/deprecated.rst
> >> > + * The asm statement is designed to convince the compiler to keep the
> >> > + * allocation around even after "ptr" goes out of scope.
> >> 
> >> Nit. That explanation of "ptr" might be better placed right at the
> >> add_random...() macro.
> >
> > Ah, yes! Fixed in v9.
> 
> Hmm, looking at V9 the "ptr" thing got lost ....

I put the comment inline in the macro directly above the asm().

> > Do you want to take this via -tip (and leave off the arm64 patch until
> > it is acked), or would you rather it go via arm64? (I've sent v9 now...)
> 
> Either way is fine.

Since the arm64 folks have been a bit busy, can you just put this in
-tip and leave off the arm64 patch for now?

Thanks!
Will Deacon April 1, 2021, 8:30 a.m. UTC | #5
On Tue, Mar 30, 2021 at 01:57:47PM -0700, Kees Cook wrote:
> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> new file mode 100644
> index 000000000000..351520803006
> --- /dev/null
> +++ b/include/linux/randomize_kstack.h
> @@ -0,0 +1,55 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef _LINUX_RANDOMIZE_KSTACK_H
> +#define _LINUX_RANDOMIZE_KSTACK_H
> +
> +#include <linux/kernel.h>
> +#include <linux/jump_label.h>
> +#include <linux/percpu-defs.h>
> +
> +DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> +			 randomize_kstack_offset);
> +DECLARE_PER_CPU(u32, kstack_offset);
> +
> +/*
> + * Do not use this anywhere else in the kernel. This is used here because
> + * it provides an arch-agnostic way to grow the stack with correct
> + * alignment. Also, since this use is being explicitly masked to a max of
> + * 10 bits, stack-clash style attacks are unlikely. For more details see
> + * "VLAs" in Documentation/process/deprecated.rst
> + * The asm statement is designed to convince the compiler to keep the
> + * allocation around even after "ptr" goes out of scope.
> + */
> +void *__builtin_alloca(size_t size);
> +/*
> + * Use, at most, 10 bits of entropy. We explicitly cap this to keep the
> + * "VLA" from being unbounded (see above). 10 bits leaves enough room for
> + * per-arch offset masks to reduce entropy (by removing higher bits, since
> + * high entropy may overly constrain usable stack space), and for
> + * compiler/arch-specific stack alignment to remove the lower bits.
> + */
> +#define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
> +
> +/*
> + * These macros must be used during syscall entry when interrupts and
> + * preempt are disabled, and after user registers have been stored to
> + * the stack.
> + */
> +#define add_random_kstack_offset() do {					\
> +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> +				&randomize_kstack_offset)) {		\
> +		u32 offset = __this_cpu_read(kstack_offset);		\
> +		u8 *ptr = __builtin_alloca(KSTACK_OFFSET_MAX(offset));	\
> +		asm volatile("" : "=m"(*ptr) :: "memory");		\

Using the "m" constraint here is dangerous if you don't actually evaluate it
inside the asm. For example, if the compiler decides to generate an
addressing mode relative to the stack but with writeback (autodecrement), then
the stack pointer will be off by 8 bytes. Can you use "o" instead?

Will
David Laight April 1, 2021, 11:15 a.m. UTC | #6
From: Will Deacon
> Sent: 01 April 2021 09:31
...
> > +/*
> > + * These macros must be used during syscall entry when interrupts and
> > + * preempt are disabled, and after user registers have been stored to
> > + * the stack.
> > + */
> > +#define add_random_kstack_offset() do {					\
> > +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> > +				&randomize_kstack_offset)) {		\
> > +		u32 offset = __this_cpu_read(kstack_offset);		\
> > +		u8 *ptr = __builtin_alloca(KSTACK_OFFSET_MAX(offset));	\
> > +		asm volatile("" : "=m"(*ptr) :: "memory");		\
> 
> Using the "m" constraint here is dangerous if you don't actually evaluate it
> inside the asm. For example, if the compiler decides to generate an
> addressing mode relative to the stack but with writeback (autodecrement), then
> the stack pointer will be off by 8 bytes. Can you use "o" instead?

Is it allowed to use such a mode?
It would have to know that the "m" was substituted exactly once.
I think there are quite a few examples with 'strange' uses of memory
asm arguments.

However, in this case, isn't it enough to ensure the address is 'saved'?
So:
	asm volatile("" : "=r"(ptr) );
should be enough.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Roy Yang April 1, 2021, 7:17 p.m. UTC | #7
Both Android and Chrome OS really want this feature; For Container-Optimized OS, we have customers
interested in the defense too.

Thank you very much.

Change-Id: I1eb1b726007aa8f9c374b934cc1c690fb4924aa3
Al Viro April 1, 2021, 7:48 p.m. UTC | #8
On Thu, Apr 01, 2021 at 12:17:44PM -0700, Roy Yang wrote:
> Both Android and Chrome OS really want this feature; For Container-Optimized OS, we have customers
> interested in the defense too.
> 
> Thank you very much.
> 
> Change-Id: I1eb1b726007aa8f9c374b934cc1c690fb4924aa3

	You forgot to tell what patch you are refering to.  Your
Change-Id (whatever the hell that is) doesn't help at all.  Don't
assume that keys in your internal database make sense for the
rest of the world, especially when they appear to contain a hash
of something...
Theodore Ts'o April 1, 2021, 8:13 p.m. UTC | #9
On Thu, Apr 01, 2021 at 07:48:30PM +0000, Al Viro wrote:
> On Thu, Apr 01, 2021 at 12:17:44PM -0700, Roy Yang wrote:
> > Both Android and Chrome OS really want this feature; For Container-Optimized OS, we have customers
> > interested in the defense too.
> > 
> > Thank you very much.
> > 
> > Change-Id: I1eb1b726007aa8f9c374b934cc1c690fb4924aa3
> 
> 	You forgot to tell what patch you are refering to.  Your
> Change-Id (whatever the hell that is) doesn't help at all.  Don't
> assume that keys in your internal database make sense for the
> rest of the world, especially when they appear to contain a hash
> of something...

The Change-Id fails to have any direct search hits at lore.kernel.org.
However, it turn up Roy's original patch, and clicking on the
message-Id in the "In-Reply-Field", it apperas Roy was replying to
this message:

https://lore.kernel.org/lkml/20210330205750.428816-1-keescook@chromium.org/

which is the head of this patch series:

Subject: [PATCH v8 0/6] Optionally randomize kernel stack offset each syscall

That being said, it would have been better if the original subject
line had been preserved, and it's yet another example of how the
lore.kernel.org URL is infinitely better than the Change-Id.  :-)

		       		  	      - Ted
Kees Cook April 1, 2021, 9:46 p.m. UTC | #10
On Thu, Apr 01, 2021 at 12:17:44PM -0700, Roy Yang wrote:
> Both Android and Chrome OS really want this feature; For Container-Optimized OS, we have customers
> interested in the defense too.

It's pretty close! There are a couple recent comments that need to be
addressed, but hopefully it can land if x86 and arm64 maintainers are
happy v10.

> Change-Id: I1eb1b726007aa8f9c374b934cc1c690fb4924aa3
> -- 
> 2.31.0.208.g409f899ff0-goog

And to let other folks know, I'm guessing this email got sent with git
send-email to try to get a valid In-Reply-To header, but I guess git
trashed the Subject and ran hooks to generate a Change-Id UUID.

I assume it's from following the "Reply instructions" at the bottom of:
https://lore.kernel.org/lkml/20210330205750.428816-1-keescook@chromium.org/
(It seems those need clarification about Subject handling.)
Kees Cook April 1, 2021, 10:42 p.m. UTC | #11
On Thu, Apr 01, 2021 at 11:15:43AM +0000, David Laight wrote:
> From: Will Deacon
> > Sent: 01 April 2021 09:31
> ...
> > > +/*
> > > + * These macros must be used during syscall entry when interrupts and
> > > + * preempt are disabled, and after user registers have been stored to
> > > + * the stack.
> > > + */
> > > +#define add_random_kstack_offset() do {					\
> > > +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> > > +				&randomize_kstack_offset)) {		\
> > > +		u32 offset = __this_cpu_read(kstack_offset);		\
> > > +		u8 *ptr = __builtin_alloca(KSTACK_OFFSET_MAX(offset));	\
> > > +		asm volatile("" : "=m"(*ptr) :: "memory");		\
> > 
> > Using the "m" constraint here is dangerous if you don't actually evaluate it
> > inside the asm. For example, if the compiler decides to generate an
> > addressing mode relative to the stack but with writeback (autodecrement), then
> > the stack pointer will be off by 8 bytes. Can you use "o" instead?

I see other examples of empty asm, but it's true, none are using "=m" read
constraints. But, yes, using "o" appears to work happily.

> Is it allowed to use such a mode?
> It would have to know that the "m" was substituted exactly once.
> I think there are quite a few examples with 'strange' uses of memory
> asm arguments.
> 
> However, in this case, isn't it enough to ensure the address is 'saved'?
> So:
> 	asm volatile("" : "=r"(ptr) );
> should be enough.

It isn't, it seems.

Here's a comparison:

https://godbolt.org/z/xYGn9GfGY

So, I'll resend with "o", and with raw_cpu_*().

Thanks!