Message ID | 1477922641-2221-1-git-send-email-jann@thejh.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote: > On machines with sizeof(unsigned long)==8, this ensures that the more > significant 32 bits of stack_canary are random, too. > stack_canary is defined as unsigned long, all the architectures with stack > protector support already pick the stack_canary of init as a random > unsigned long, and get_random_long() should be as fast as get_random_int(), > so there seems to be no good reason against this. > > This should help if someone tries to guess a stack canary with brute force. > > (This change has been made in PaX already, with a different RNG.) > > Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> (A separate change might be to make sure that the leading byte is zeroed. Entropy of the value, I think, is less important than blocking canary exposures from unbounded str* functions. Brute forcing kernel stack canaries isn't like it bruting them in userspace...) -Kees > --- > kernel/fork.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index 623259fc794d..d577e2c5d14f 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -518,7 +518,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) > set_task_stack_end_magic(tsk); > > #ifdef CONFIG_CC_STACKPROTECTOR > - tsk->stack_canary = get_random_int(); > + tsk->stack_canary = get_random_long(); > #endif > > /* > -- > 2.1.4 >
On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote: > On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote: > > On machines with sizeof(unsigned long)==8, this ensures that the more > > significant 32 bits of stack_canary are random, too. > > stack_canary is defined as unsigned long, all the architectures with stack > > protector support already pick the stack_canary of init as a random > > unsigned long, and get_random_long() should be as fast as get_random_int(), > > so there seems to be no good reason against this. > > > > This should help if someone tries to guess a stack canary with brute force. > > > > (This change has been made in PaX already, with a different RNG.) > > > > Signed-off-by: Jann Horn <jann@thejh.net> > > Acked-by: Kees Cook <keescook@chromium.org> > > (A separate change might be to make sure that the leading byte is > zeroed. Entropy of the value, I think, is less important than blocking > canary exposures from unbounded str* functions. Brute forcing kernel > stack canaries isn't like it bruting them in userspace...) Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to be enough anyway.
* Jann Horn: > On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote: >> On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote: >> > On machines with sizeof(unsigned long)==8, this ensures that the more >> > significant 32 bits of stack_canary are random, too. >> > stack_canary is defined as unsigned long, all the architectures with stack >> > protector support already pick the stack_canary of init as a random >> > unsigned long, and get_random_long() should be as fast as get_random_int(), >> > so there seems to be no good reason against this. >> > >> > This should help if someone tries to guess a stack canary with brute force. >> > >> > (This change has been made in PaX already, with a different RNG.) >> > >> > Signed-off-by: Jann Horn <jann@thejh.net> >> >> Acked-by: Kees Cook <keescook@chromium.org> >> >> (A separate change might be to make sure that the leading byte is >> zeroed. Entropy of the value, I think, is less important than blocking >> canary exposures from unbounded str* functions. Brute forcing kernel >> stack canaries isn't like it bruting them in userspace...) > > Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to be > enough anyway. So you two approve of the way glibc does this currently? (See the other thread.) I was under the impression that the kernel performs far less null-terminated string processing the average user space application, especially on the stack. (A lot of userspace code assumes large stacks and puts essentially arbitrarily long strings into VLAs.)
On Mon, Oct 31, 2016 at 09:45:59PM +0100, Florian Weimer wrote: > * Jann Horn: > > > On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote: > >> On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote: > >> > On machines with sizeof(unsigned long)==8, this ensures that the more > >> > significant 32 bits of stack_canary are random, too. > >> > stack_canary is defined as unsigned long, all the architectures with stack > >> > protector support already pick the stack_canary of init as a random > >> > unsigned long, and get_random_long() should be as fast as get_random_int(), > >> > so there seems to be no good reason against this. > >> > > >> > This should help if someone tries to guess a stack canary with brute force. > >> > > >> > (This change has been made in PaX already, with a different RNG.) > >> > > >> > Signed-off-by: Jann Horn <jann@thejh.net> > >> > >> Acked-by: Kees Cook <keescook@chromium.org> > >> > >> (A separate change might be to make sure that the leading byte is > >> zeroed. Entropy of the value, I think, is less important than blocking > >> canary exposures from unbounded str* functions. Brute forcing kernel > >> stack canaries isn't like it bruting them in userspace...) > > > > Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to be > > enough anyway. > > So you two approve of the way glibc does this currently? (See the > other thread.) Well... not really with a 32-bit canary. 2^23 crashes to defeat a mitigation is not so much, even over the network. With a 64-bit canary, losing the 8 bits would be no problem at all IMO. So I guess I should revise what I said: I think the nullbyte thing is fine for 64-bit canaries, but not for 32-bit ones. > I was under the impression that the kernel performs far less > null-terminated string processing the average user space application, > especially on the stack. Yes, that's true - the kernel allocates even small-ish temporary string buffers with kmalloc() to reduce stack usage.
On Mon, 2016-10-31 at 21:45 +0100, Florian Weimer wrote: > * Jann Horn: > > > On Mon, Oct 31, 2016 at 09:04:02AM -0700, Kees Cook wrote: > > > On Mon, Oct 31, 2016 at 7:04 AM, Jann Horn <jann@thejh.net> wrote: > > > > On machines with sizeof(unsigned long)==8, this ensures that the > > > > more > > > > significant 32 bits of stack_canary are random, too. > > > > stack_canary is defined as unsigned long, all the architectures > > > > with stack > > > > protector support already pick the stack_canary of init as a > > > > random > > > > unsigned long, and get_random_long() should be as fast as > > > > get_random_int(), > > > > so there seems to be no good reason against this. > > > > > > > > This should help if someone tries to guess a stack canary with > > > > brute force. > > > > > > > > (This change has been made in PaX already, with a different > > > > RNG.) > > > > > > > > Signed-off-by: Jann Horn <jann@thejh.net> > > > > > > Acked-by: Kees Cook <keescook@chromium.org> > > > > > > (A separate change might be to make sure that the leading byte is > > > zeroed. Entropy of the value, I think, is less important than > > > blocking > > > canary exposures from unbounded str* functions. Brute forcing > > > kernel > > > stack canaries isn't like it bruting them in userspace...) > > > > Yeah, makes sense. Especially on 64bit, 56 bits of entropy ought to > > be > > enough anyway. > > So you two approve of the way glibc does this currently? (See the > other thread.) > > I was under the impression that the kernel performs far less > null-terminated string processing the average user space application, > especially on the stack. (A lot of userspace code assumes large > stacks and puts essentially arbitrarily long strings into VLAs.) It makes a lot of sense on x86_64 where it means the canary is still 56 bits. Also, you want -fstack-check for protecting again stack overflows rather than stack *buffer* overflow. SSP won't really help you in that regard. Sadly, while -fstack-check now works well in GCC 6 with little performance cost, it's not really a complete feature (and Clang impls it as a no-op!).
> It makes a lot of sense on x86_64 where it means the canary is still > 56 > bits. Also, you want -fstack-check for protecting again stack > overflows > rather than stack *buffer* overflow. SSP won't really help you in that > regard. Sadly, while -fstack-check now works well in GCC 6 with little > performance cost, it's not really a complete feature (and Clang impls > it > as a no-op!). Note: talking about userspace after the entropy bit. The kernel doesn't really -fstack-check, at least in even slightly sane code...
* Daniel Micay: >> It makes a lot of sense on x86_64 where it means the canary is >> still 56 bits. Also, you want -fstack-check for protecting again >> stack overflows rather than stack *buffer* overflow. SSP won't >> really help you in that regard. Sadly, while -fstack-check now >> works well in GCC 6 with little performance cost, it's not really a I think GCC still does not treat the return address push on architectures which have such a CALL instruction as an implicit stack probe. >> complete feature (and Clang impls it as a no-op!). How many guard pages at the end of the stack does the kernel guarantee? I saw some -fstack-check-generated code which seemed to jump over a single guard page. The other thing I've seen which could impact the effectiveness of -fstack-check: mmap *without* MAP_FIXED and a hint within stack allocation can create a mapping inside the stack. That's rather surprising, and I'm not sure if the net result is that there actually is a guard page in all cases. > Note: talking about userspace after the entropy bit. The kernel doesn't > really -fstack-check, at least in even slightly sane code... There used to be lots of discussions about kernel stack sizes ...
On Mon, 2016-10-31 at 22:10 +0100, Florian Weimer wrote: > * Daniel Micay: > > > > It makes a lot of sense on x86_64 where it means the canary is > > > still 56 bits. Also, you want -fstack-check for protecting again > > > stack overflows rather than stack *buffer* overflow. SSP won't > > > really help you in that regard. Sadly, while -fstack-check now > > > works well in GCC 6 with little performance cost, it's not really > > > a > > I think GCC still does not treat the return address push on > architectures which have such a CALL instruction as an implicit stack > probe. > > > > complete feature (and Clang impls it as a no-op!). > > How many guard pages at the end of the stack does the kernel > guarantee? I saw some -fstack-check-generated code which seemed to > jump over a single guard page. > > The other thing I've seen which could impact the effectiveness of > -fstack-check: mmap *without* MAP_FIXED and a hint within stack > allocation can create a mapping inside the stack. That's rather > surprising, and I'm not sure if the net result is that there actually > is a guard page in all cases. It's not ideal but userspace can work around it. OpenJDK and ART both do something like walking to the end of the main thread stack during init to install their own guard region. ART then uses all but the last guard page it installed as a red zone to handle out-of-stack (not sure about OpenJDK). -fstack-stack is supposed to handle a single guard by default, and that's all there is for thread stacks by default. > > Note: talking about userspace after the entropy bit. The kernel > > doesn't > > really -fstack-check, at least in even slightly sane code... > > There used to be lots of discussions about kernel stack sizes ... It should just be banning VLAs, alloca and large stack frames though, if it's not already. There wasn't even support for guard pages with kernel stacks until recently outside grsecurity, and -fstack-check relies on them so it doesn't seem like a great solution for the kernel.
On Mon, Oct 31, 2016 at 10:10:41PM +0100, Florian Weimer wrote: > * Daniel Micay: > > >> It makes a lot of sense on x86_64 where it means the canary is > >> still 56 bits. Also, you want -fstack-check for protecting again > >> stack overflows rather than stack *buffer* overflow. SSP won't > >> really help you in that regard. Sadly, while -fstack-check now > >> works well in GCC 6 with little performance cost, it's not really a > > I think GCC still does not treat the return address push on > architectures which have such a CALL instruction as an implicit stack > probe. > > >> complete feature (and Clang impls it as a no-op!). > > How many guard pages at the end of the stack does the kernel > guarantee? I saw some -fstack-check-generated code which seemed to > jump over a single guard page. Until recently: Zero, no guard pages below stacks, stack overflow goes straight into some other allocation. Now: One guard page, thanks to a lot of work by Andy Lutomirski. (I think that change is in the current 4.9-rc3 kernel, but not in any stable kernel yet.) > The other thing I've seen which could impact the effectiveness of > -fstack-check: mmap *without* MAP_FIXED and a hint within stack > allocation can create a mapping inside the stack. That's rather > surprising, and I'm not sure if the net result is that there actually > is a guard page in all cases. > > > Note: talking about userspace after the entropy bit. The kernel doesn't > > really -fstack-check, at least in even slightly sane code... > > There used to be lots of discussions about kernel stack sizes ...
On Mon, 2016-10-31 at 22:22 +0100, Jann Horn wrote: > On Mon, Oct 31, 2016 at 10:10:41PM +0100, Florian Weimer wrote: > > * Daniel Micay: > > > > > > It makes a lot of sense on x86_64 where it means the canary is > > > > still 56 bits. Also, you want -fstack-check for protecting again > > > > stack overflows rather than stack *buffer* overflow. SSP won't > > > > really help you in that regard. Sadly, while -fstack-check now > > > > works well in GCC 6 with little performance cost, it's not > > > > really a > > > > I think GCC still does not treat the return address push on > > architectures which have such a CALL instruction as an implicit > > stack > > probe. > > > > > > complete feature (and Clang impls it as a no-op!). > > > > How many guard pages at the end of the stack does the kernel > > guarantee? I saw some -fstack-check-generated code which seemed to > > jump over a single guard page. > > Until recently: Zero, no guard pages below stacks, stack overflow > goes straight into some other allocation. > Now: One guard page, thanks to a lot of work by Andy Lutomirski. > (I think that change is in the current 4.9-rc3 kernel, but not in > any stable kernel yet.) I think Florian is talking about the main thread stack in userspace. Thus the next part about how it doesn't actually *guarantee* a guard page at all right now for that. Userspace has to work around that if it's worried about it (it can't really happen in practice though, but it's not great that mmap with a hint + no MAP_FIXED can break it).
* Jann Horn: > Until recently: Zero, no guard pages below stacks, stack overflow > goes straight into some other allocation. > Now: One guard page, thanks to a lot of work by Andy Lutomirski. > (I think that change is in the current 4.9-rc3 kernel, but not in > any stable kernel yet.) Sorry, I meant for the stack allocation in user space. (I'm very much a user space person only.)
* Daniel Micay: > -fstack-stack is supposed to handle a single guard by default, and > that's all there is for thread stacks by default. Okay, then I'll really have to look at the probing offsets again. It's been on my to-do list since about 2012, and arguably, it *is* a user-space thing. And I just realized that we should probably fail at dlopen time if some tries to open a DSO which needs an executable stack, rather than silently changing all thread stacks to executable. *sigh* >> > Note: talking about userspace after the entropy bit. The kernel >> > doesn't >> > really -fstack-check, at least in even slightly sane code... >> >> There used to be lots of discussions about kernel stack sizes ... > > It should just be banning VLAs, alloca and large stack frames though, if > it's not already. There wasn't even support for guard pages with kernel > stacks until recently outside grsecurity, Which is not surprising, considering that one prime motivation for small stacks was to conserve 32-bit address space. But I'm glad that there is now a guard page. Hopefully, it does not affect performance, and on 64-bit, at least there isn't the address space limit to worry about. > and -fstack-check relies on them so it doesn't seem like a great > solution for the kernel. -fsplit-stack could enforce stack usage limits even without guard pages, but of course, there is some run-time overhead, and the limit has to come from somewhere (typically the TCB).
On Mon, 2016-10-31 at 22:38 +0100, Florian Weimer wrote: > * Daniel Micay: > > > -fstack-stack is supposed to handle a single guard by default, and > > that's all there is for thread stacks by default. > > Okay, then I'll really have to look at the probing offsets again. > It's been on my to-do list since about 2012, and arguably, it *is* a > user-space thing. This is concerning too: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66479 It might be prevented for VLAs by using -fsanitize=vla-bound -fsanitize- trap=vla-bound but probably not alloca (or the older -fsanitize- undefined-trap-on-error for GCC, since for some reason it doesn't seem to have the new way). > And I just realized that we should probably fail at dlopen time if > some tries to open a DSO which needs an executable stack, rather than > silently changing all thread stacks to executable. *sigh* > > > > > Note: talking about userspace after the entropy bit. The kernel > > > > doesn't > > > > really -fstack-check, at least in even slightly sane code... > > > > > > > > > There used to be lots of discussions about kernel stack sizes ... > > > > It should just be banning VLAs, alloca and large stack frames > > though, if > > it's not already. There wasn't even support for guard pages with > > kernel > > stacks until recently outside grsecurity, > > Which is not surprising, considering that one prime motivation for > small stacks was to conserve 32-bit address space. But I'm glad that > there is now a guard page. Hopefully, it does not affect performance, > and on 64-bit, at least there isn't the address space limit to worry > about. I think it might actually improve performance overall. > > and -fstack-check relies on them so it doesn't seem like a great > > solution for the kernel. > > -fsplit-stack could enforce stack usage limits even without guard > pages, but of course, there is some run-time overhead, and the limit > has to come from somewhere (typically the TCB). Yeah, that's how Rust provided stack safety on LLVM. They ended up removing it and they only have stack safety on Windows now, since LLVM doesn't yet provide stack probing outside Windows. So much for being a memory safe language... it's ultimately LLVM's fault though. There were actually working patches submitted for this by people working on Rust but *it never landed* due to endless bikeshedding + scope creep.
* Daniel Micay: > On Mon, 2016-10-31 at 22:38 +0100, Florian Weimer wrote: >> * Daniel Micay: >> >> > -fstack-stack is supposed to handle a single guard by default, and >> > that's all there is for thread stacks by default. >> >> Okay, then I'll really have to look at the probing offsets again. >> It's been on my to-do list since about 2012, and arguably, it *is* a >> user-space thing. > > This is concerning too: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66479 Thanks. This also shows the large stack pointer decrement: subq $4144, %rsp orq $0, (%rsp) I really don't see how this can be safe with just a single guard page. > It might be prevented for VLAs by using -fsanitize=vla-bound -fsanitize- > trap=vla-bound but probably not alloca (or the older -fsanitize- > undefined-trap-on-error for GCC, since for some reason it doesn't seem > to have the new way). It's certainly reasonable to expect that this was covered by -fstack-check.
diff --git a/kernel/fork.c b/kernel/fork.c index 623259fc794d..d577e2c5d14f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -518,7 +518,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) set_task_stack_end_magic(tsk); #ifdef CONFIG_CC_STACKPROTECTOR - tsk->stack_canary = get_random_int(); + tsk->stack_canary = get_random_long(); #endif /*
On machines with sizeof(unsigned long)==8, this ensures that the more significant 32 bits of stack_canary are random, too. stack_canary is defined as unsigned long, all the architectures with stack protector support already pick the stack_canary of init as a random unsigned long, and get_random_long() should be as fast as get_random_int(), so there seems to be no good reason against this. This should help if someone tries to guess a stack canary with brute force. (This change has been made in PaX already, with a different RNG.) Signed-off-by: Jann Horn <jann@thejh.net> --- kernel/fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)