fork: make whole stack_canary random

Message ID	1477922641-2221-1-git-send-email-jann@thejh.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <kernel-hardening-return-5179-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com> Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk Reply-To: kernel-hardening@lists.openwall.com From: Jann Horn <jann@thejh.net> To: Andrew Morton <akpm@linux-foundation.org>, Michal Hocko <mhocko@suse.com>, Ingo Molnar <mingo@kernel.org>, Andy Lutomirski <luto@kernel.org>, Kees Cook <keescook@chromium.org> Cc: linux-kernel@vger.kernel.org, kernel-hardening@lists.openwall.com, Daniel Micay <danielmicay@gmail.com> Date: Mon, 31 Oct 2016 15:04:01 +0100 Message-Id: <1477922641-2221-1-git-send-email-jann@thejh.net> Subject: [kernel-hardening] [PATCH] fork: make whole stack_canary random

Jann Horn Oct. 31, 2016, 2:04 p.m. UTC

On machines with sizeof(unsigned long)==8, this ensures that the more
significant 32 bits of stack_canary are random, too.
stack_canary is defined as unsigned long, all the architectures with stack
protector support already pick the stack_canary of init as a random
unsigned long, and get_random_long() should be as fast as get_random_int(),
so there seems to be no good reason against this.

This should help if someone tries to guess a stack canary with brute force.

(This change has been made in PaX already, with a different RNG.)

Signed-off-by: Jann Horn <jann@thejh.net>
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Kees Cook Oct. 31, 2016, 4:04 p.m. UTC | #1

On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote:
> On machines with sizeof(unsigned long)==8, this ensures that the more
> significant 32 bits of stack_canary are random, too.
> stack_canary is defined as unsigned long, all the architectures with stack
> protector support already pick the stack_canary of init as a random
> unsigned long, and get_random_long() should be as fast as get_random_int(),
> so there seems to be no good reason against this.
>
> This should help if someone tries to guess a stack canary with brute force.
>
> (This change has been made in PaX already, with a different RNG.)
>
> Signed-off-by: Jann Horn <jann@thejh.net>

Acked-by: Kees Cook <keescook@chromium.org>

(A separate change might be to make sure that the leading byte is
zeroed. Entropy of the value, I think, is less important than blocking
canary exposures from unbounded str* functions. Brute forcing kernel
stack canaries isn't like it bruting them in userspace...)

-Kees

> ---
>  kernel/fork.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 623259fc794d..d577e2c5d14f 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -518,7 +518,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>         set_task_stack_end_magic(tsk);
>
>  #ifdef CONFIG_CC_STACKPROTECTOR
> -       tsk->stack_canary = get_random_int();
> +       tsk->stack_canary = get_random_long();
>  #endif
>
>         /*
> --
> 2.1.4
>

Jann Horn Oct. 31, 2016, 4:29 p.m. UTC | #2

On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote:
> On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote:
> > On machines with sizeof(unsigned long)==8, this ensures that the more
> > significant 32 bits of stack_canary are random, too.
> > stack_canary is defined as unsigned long, all the architectures with stack
> > protector support already pick the stack_canary of init as a random
> > unsigned long, and get_random_long() should be as fast as get_random_int(),
> > so there seems to be no good reason against this.
> >
> > This should help if someone tries to guess a stack canary with brute force.
> >
> > (This change has been made in PaX already, with a different RNG.)
> >
> > Signed-off-by: Jann Horn <jann@thejh.net>
> 
> Acked-by: Kees Cook <keescook@chromium.org>
> 
> (A separate change might be to make sure that the leading byte is
> zeroed. Entropy of the value, I think, is less important than blocking
> canary exposures from unbounded str* functions. Brute forcing kernel
> stack canaries isn't like it bruting them in userspace...)

Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to be
enough anyway.

Florian Weimer Oct. 31, 2016, 8:45 p.m. UTC | #3

* Jann Horn:

> On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote:
>> On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote:
>> > On machines with sizeof(unsigned long)==8, this ensures that the more
>> > significant 32 bits of stack_canary are random, too.
>> > stack_canary is defined as unsigned long, all the architectures with stack
>> > protector support already pick the stack_canary of init as a random
>> > unsigned long, and get_random_long() should be as fast as get_random_int(),
>> > so there seems to be no good reason against this.
>> >
>> > This should help if someone tries to guess a stack canary with brute force.
>> >
>> > (This change has been made in PaX already, with a different RNG.)
>> >
>> > Signed-off-by: Jann Horn <jann@thejh.net>
>> 
>> Acked-by: Kees Cook <keescook@chromium.org>
>> 
>> (A separate change might be to make sure that the leading byte is
>> zeroed. Entropy of the value, I think, is less important than blocking
>> canary exposures from unbounded str* functions. Brute forcing kernel
>> stack canaries isn't like it bruting them in userspace...)
>
> Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to be
> enough anyway.

So you two approve of the way glibc does this currently?  (See the
other thread.)

I was under the impression that the kernel performs far less
null-terminated string processing the average user space application,
especially on the stack.  (A lot of userspace code assumes large
stacks and puts essentially arbitrarily long strings into VLAs.)

Jann Horn Oct. 31, 2016, 8:55 p.m. UTC | #4

On Mon, Oct 31, 2016 at 09:45:59PM +0100, Florian Weimer wrote:
> * Jann Horn:
> 
> > On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote:
> >> On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote:
> >> > On machines with sizeof(unsigned long)==8, this ensures that the more
> >> > significant 32 bits of stack_canary are random, too.
> >> > stack_canary is defined as unsigned long, all the architectures with stack
> >> > protector support already pick the stack_canary of init as a random
> >> > unsigned long, and get_random_long() should be as fast as get_random_int(),
> >> > so there seems to be no good reason against this.
> >> >
> >> > This should help if someone tries to guess a stack canary with brute force.
> >> >
> >> > (This change has been made in PaX already, with a different RNG.)
> >> >
> >> > Signed-off-by: Jann Horn <jann@thejh.net>
> >> 
> >> Acked-by: Kees Cook <keescook@chromium.org>
> >> 
> >> (A separate change might be to make sure that the leading byte is
> >> zeroed. Entropy of the value, I think, is less important than blocking
> >> canary exposures from unbounded str* functions. Brute forcing kernel
> >> stack canaries isn't like it bruting them in userspace...)
> >
> > Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to be
> > enough anyway.
> 
> So you two approve of the way glibc does this currently?  (See the
> other thread.)

Well... not really with a 32-bit canary. 2^23 crashes to defeat a
mitigation is not so much, even over the network. With a 64-bit canary,
losing the 8 bits would be no problem at all IMO.

So I guess I should revise what I said: I think the nullbyte thing is
fine for 64-bit canaries, but not for 32-bit ones.

> I was under the impression that the kernel performs far less
> null-terminated string processing the average user space application,
> especially on the stack.

Yes, that's true - the kernel allocates even small-ish temporary string
buffers with kmalloc() to reduce stack usage.

Daniel Micay Oct. 31, 2016, 8:56 p.m. UTC | #5

On Mon, 2016-10-31 at 21:45 +0100, Florian Weimer wrote:
> * Jann Horn:
> 
> > On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote:
> > > On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote:
> > > > On machines with sizeof(unsigned long)==8, this ensures that the
> > > > more
> > > > significant 32 bits of stack_canary are random, too.
> > > > stack_canary is defined as unsigned long, all the architectures
> > > > with stack
> > > > protector support already pick the stack_canary of init as a
> > > > random
> > > > unsigned long, and get_random_long() should be as fast as
> > > > get_random_int(),
> > > > so there seems to be no good reason against this.
> > > > 
> > > > This should help if someone tries to guess a stack canary with
> > > > brute force.
> > > > 
> > > > (This change has been made in PaX already, with a different
> > > > RNG.)
> > > > 
> > > > Signed-off-by: Jann Horn <jann@thejh.net>
> > > 
> > > Acked-by: Kees Cook <keescook@chromium.org>
> > > 
> > > (A separate change might be to make sure that the leading byte is
> > > zeroed. Entropy of the value, I think, is less important than
> > > blocking
> > > canary exposures from unbounded str* functions. Brute forcing
> > > kernel
> > > stack canaries isn't like it bruting them in userspace...)
> > 
> > Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to
> > be
> > enough anyway.
> 
> So you two approve of the way glibc does this currently?  (See the
> other thread.)
> 
> I was under the impression that the kernel performs far less
> null-terminated string processing the average user space application,
> especially on the stack.  (A lot of userspace code assumes large
> stacks and puts essentially arbitrarily long strings into VLAs.)

It makes a lot of sense on x86_64 where it means the canary is still 56
bits. Also, you want -fstack-check for protecting again stack overflows
rather than stack *buffer* overflow. SSP won't really help you in that
regard. Sadly, while -fstack-check now works well in GCC 6 with little
performance cost, it's not really a complete feature (and Clang impls it
as a no-op!).

Daniel Micay Oct. 31, 2016, 9:01 p.m. UTC | #6

> It makes a lot of sense on x86_64 where it means the canary is still
> 56
> bits. Also, you want -fstack-check for protecting again stack
> overflows
> rather than stack *buffer* overflow. SSP won't really help you in that
> regard. Sadly, while -fstack-check now works well in GCC 6 with little
> performance cost, it's not really a complete feature (and Clang impls
> it
> as a no-op!).

Note: talking about userspace after the entropy bit. The kernel doesn't
really -fstack-check, at least in even slightly sane code...

Florian Weimer Oct. 31, 2016, 9:10 p.m. UTC | #7

* Daniel Micay:

>> It makes a lot of sense on x86_64 where it means the canary is
>> still 56 bits. Also, you want -fstack-check for protecting again
>> stack overflows rather than stack *buffer* overflow. SSP won't
>> really help you in that regard. Sadly, while -fstack-check now
>> works well in GCC 6 with little performance cost, it's not really a

I think GCC still does not treat the return address push on
architectures which have such a CALL instruction as an implicit stack
probe.

>> complete feature (and Clang impls it as a no-op!).

How many guard pages at the end of the stack does the kernel
guarantee?  I saw some -fstack-check-generated code which seemed to
jump over a single guard page.

The other thing I've seen which could impact the effectiveness of
-fstack-check: mmap *without* MAP_FIXED and a hint within stack
allocation can create a mapping inside the stack.  That's rather
surprising, and I'm not sure if the net result is that there actually
is a guard page in all cases.

> Note: talking about userspace after the entropy bit. The kernel doesn't
> really -fstack-check, at least in even slightly sane code...

There used to be lots of discussions about kernel stack sizes ...

Daniel Micay Oct. 31, 2016, 9:21 p.m. UTC | #8

On Mon, 2016-10-31 at 22:10 +0100, Florian Weimer wrote:
> * Daniel Micay:
> 
> > > It makes a lot of sense on x86_64 where it means the canary is
> > > still 56 bits. Also, you want -fstack-check for protecting again
> > > stack overflows rather than stack *buffer* overflow. SSP won't
> > > really help you in that regard. Sadly, while -fstack-check now
> > > works well in GCC 6 with little performance cost, it's not really
> > > a
> 
> I think GCC still does not treat the return address push on
> architectures which have such a CALL instruction as an implicit stack
> probe.
> 
> > > complete feature (and Clang impls it as a no-op!).
> 
> How many guard pages at the end of the stack does the kernel
> guarantee?  I saw some -fstack-check-generated code which seemed to
> jump over a single guard page.
>
> The other thing I've seen which could impact the effectiveness of
> -fstack-check: mmap *without* MAP_FIXED and a hint within stack
> allocation can create a mapping inside the stack.  That's rather
> surprising, and I'm not sure if the net result is that there actually
> is a guard page in all cases.

It's not ideal but userspace can work around it. OpenJDK and ART both do
something like walking to the end of the main thread stack during init
to install their own guard region. ART then uses all but the last guard
page it installed as a red zone to handle out-of-stack (not sure about
OpenJDK).

-fstack-stack is supposed to handle a single guard by default, and
that's all there is for thread stacks by default.

> > Note: talking about userspace after the entropy bit. The kernel
> > doesn't
> > really -fstack-check, at least in even slightly sane code...
> 
> There used to be lots of discussions about kernel stack sizes ...

It should just be banning VLAs, alloca and large stack frames though, if
it's not already. There wasn't even support for guard pages with kernel
stacks until recently outside grsecurity, and -fstack-check relies on
them so it doesn't seem like a great solution for the kernel.

Jann Horn Oct. 31, 2016, 9:22 p.m. UTC | #9

On Mon, Oct 31, 2016 at 10:10:41PM +0100, Florian Weimer wrote:
> * Daniel Micay:
> 
> >> It makes a lot of sense on x86_64 where it means the canary is
> >> still 56 bits. Also, you want -fstack-check for protecting again
> >> stack overflows rather than stack *buffer* overflow. SSP won't
> >> really help you in that regard. Sadly, while -fstack-check now
> >> works well in GCC 6 with little performance cost, it's not really a
> 
> I think GCC still does not treat the return address push on
> architectures which have such a CALL instruction as an implicit stack
> probe.
> 
> >> complete feature (and Clang impls it as a no-op!).
> 
> How many guard pages at the end of the stack does the kernel
> guarantee?  I saw some -fstack-check-generated code which seemed to
> jump over a single guard page.

Until recently: Zero, no guard pages below stacks, stack overflow
goes straight into some other allocation.
Now: One guard page, thanks to a lot of work by Andy Lutomirski.
(I think that change is in the current 4.9-rc3 kernel, but not in
any stable kernel yet.)


> The other thing I've seen which could impact the effectiveness of
> -fstack-check: mmap *without* MAP_FIXED and a hint within stack
> allocation can create a mapping inside the stack.  That's rather
> surprising, and I'm not sure if the net result is that there actually
> is a guard page in all cases.
> 
> > Note: talking about userspace after the entropy bit. The kernel doesn't
> > really -fstack-check, at least in even slightly sane code...
> 
> There used to be lots of discussions about kernel stack sizes ...

Daniel Micay Oct. 31, 2016, 9:26 p.m. UTC | #10

On Mon, 2016-10-31 at 22:22 +0100, Jann Horn wrote:
> On Mon, Oct 31, 2016 at 10:10:41PM +0100, Florian Weimer wrote:
> > * Daniel Micay:
> > 
> > > > It makes a lot of sense on x86_64 where it means the canary is
> > > > still 56 bits. Also, you want -fstack-check for protecting again
> > > > stack overflows rather than stack *buffer* overflow. SSP won't
> > > > really help you in that regard. Sadly, while -fstack-check now
> > > > works well in GCC 6 with little performance cost, it's not
> > > > really a
> > 
> > I think GCC still does not treat the return address push on
> > architectures which have such a CALL instruction as an implicit
> > stack
> > probe.
> > 
> > > > complete feature (and Clang impls it as a no-op!).
> > 
> > How many guard pages at the end of the stack does the kernel
> > guarantee?  I saw some -fstack-check-generated code which seemed to
> > jump over a single guard page.
> 
> Until recently: Zero, no guard pages below stacks, stack overflow
> goes straight into some other allocation.
> Now: One guard page, thanks to a lot of work by Andy Lutomirski.
> (I think that change is in the current 4.9-rc3 kernel, but not in
> any stable kernel yet.)

I think Florian is talking about the main thread stack in userspace.

Thus the next part about how it doesn't actually *guarantee* a guard
page at all right now for that. Userspace has to work around that if
it's worried about it (it can't really happen in practice though, but
it's not great that mmap with a hint + no MAP_FIXED can break it).

Florian Weimer Oct. 31, 2016, 9:26 p.m. UTC | #11

* Jann Horn:

> Until recently: Zero, no guard pages below stacks, stack overflow
> goes straight into some other allocation.
> Now: One guard page, thanks to a lot of work by Andy Lutomirski.
> (I think that change is in the current 4.9-rc3 kernel, but not in
> any stable kernel yet.)

Sorry, I meant for the stack allocation in user space.

(I'm very much a user space person only.)

Florian Weimer Oct. 31, 2016, 9:38 p.m. UTC | #12

* Daniel Micay:

> -fstack-stack is supposed to handle a single guard by default, and
> that's all there is for thread stacks by default.

Okay, then I'll really have to look at the probing offsets again.
It's been on my to-do list since about 2012, and arguably, it *is* a
user-space thing.

And I just realized that we should probably fail at dlopen time if
some tries to open a DSO which needs an executable stack, rather than
silently changing all thread stacks to executable. *sigh*

>> > Note: talking about userspace after the entropy bit. The kernel
>> > doesn't
>> > really -fstack-check, at least in even slightly sane code...
>> 
>> There used to be lots of discussions about kernel stack sizes ...
>
> It should just be banning VLAs, alloca and large stack frames though, if
> it's not already. There wasn't even support for guard pages with kernel
> stacks until recently outside grsecurity,

Which is not surprising, considering that one prime motivation for
small stacks was to conserve 32-bit address space.  But I'm glad that
there is now a guard page.  Hopefully, it does not affect performance,
and on 64-bit, at least there isn't the address space limit to worry
about.

> and -fstack-check relies on them so it doesn't seem like a great
> solution for the kernel.

-fsplit-stack could enforce stack usage limits even without guard
pages, but of course, there is some run-time overhead, and the limit
has to come from somewhere (typically the TCB).

Daniel Micay Oct. 31, 2016, 10:02 p.m. UTC | #13

On Mon, 2016-10-31 at 22:38 +0100, Florian Weimer wrote:
> * Daniel Micay:
> 
> > -fstack-stack is supposed to handle a single guard by default, and
> > that's all there is for thread stacks by default.
> 
> Okay, then I'll really have to look at the probing offsets again.
> It's been on my to-do list since about 2012, and arguably, it *is* a
> user-space thing.

This is concerning too:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66479

It might be prevented for VLAs by using -fsanitize=vla-bound -fsanitize-
trap=vla-bound but probably not alloca (or the older -fsanitize-
undefined-trap-on-error for GCC,  since for some reason it doesn't seem
to have the new way).

> And I just realized that we should probably fail at dlopen time if
> some tries to open a DSO which needs an executable stack, rather than
> silently changing all thread stacks to executable. *sigh*
> 
> > > > Note: talking about userspace after the entropy bit. The kernel
> > > > doesn't
> > > > really -fstack-check, at least in even slightly sane code...
> > > 
> > >  
> > > There used to be lots of discussions about kernel stack sizes ...
> > 
> > It should just be banning VLAs, alloca and large stack frames
> > though, if
> > it's not already. There wasn't even support for guard pages with
> > kernel
> > stacks until recently outside grsecurity,
> 
> Which is not surprising, considering that one prime motivation for
> small stacks was to conserve 32-bit address space.  But I'm glad that
> there is now a guard page.  Hopefully, it does not affect performance,
> and on 64-bit, at least there isn't the address space limit to worry
> about.

I think it might actually improve performance overall.

> > and -fstack-check relies on them so it doesn't seem like a great
> > solution for the kernel.
> 
> -fsplit-stack could enforce stack usage limits even without guard
> pages, but of course, there is some run-time overhead, and the limit
> has to come from somewhere (typically the TCB).

Yeah, that's how Rust provided stack safety on LLVM. They ended up
removing it and they only have stack safety on Windows now, since LLVM
doesn't yet provide stack probing outside Windows. So much for being a
memory safe language... it's ultimately LLVM's fault though. There were
actually working patches submitted for this by people working on Rust
but *it never landed* due to endless bikeshedding + scope creep.

Florian Weimer Oct. 31, 2016, 10:11 p.m. UTC | #14

* Daniel Micay:

> On Mon, 2016-10-31 at 22:38 +0100, Florian Weimer wrote:
>> * Daniel Micay:
>> 
>> > -fstack-stack is supposed to handle a single guard by default, and
>> > that's all there is for thread stacks by default.
>> 
>> Okay, then I'll really have to look at the probing offsets again.
>> It's been on my to-do list since about 2012, and arguably, it *is* a
>> user-space thing.
>
> This is concerning too:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66479

Thanks.  This also shows the large stack pointer decrement:

	subq	$4144, %rsp
	orq	$0, (%rsp)

I really don't see how this can be safe with just a single guard page.

> It might be prevented for VLAs by using -fsanitize=vla-bound -fsanitize-
> trap=vla-bound but probably not alloca (or the older -fsanitize-
> undefined-trap-on-error for GCC,  since for some reason it doesn't seem
> to have the new way).

It's certainly reasonable to expect that this was covered by
-fstack-check.

fork: make whole stack_canary random

Commit Message

Comments

Patch