Message ID | 20180117055015.GA15256@beast (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue 16-01-18 21:50:15, Kees Cook wrote: > One of the classes of kernel stack content leaks is exposing the contents > of prior heap or stack contents when a new process stack is allocated. > Normally, those stacks are not zeroed, and the old contents remain in > place. With some types of stack content exposure flaws, those contents > can leak to userspace. Kernels built with CONFIG_CLEAR_STACK_FORK will > no longer be vulnerable to this, as the stack will be wiped each time > a stack is assigned to a new process. There's not a meaningful change > in runtime performance; it almost looks like it provides a benefit. Have you tried something as simple as /bin/true in a loop. kbuild will certainly amortize few cycles for the clearing and I would expect, most reasonable applications would do as well. But it would be better to know the worst case scenario IMHO. > Performing back-to-back kernel builds before: > Run times: 157.86 157.09 158.90 160.94 160.80 > Mean: 159.12 > Std Dev: 1.54 > > With CONFIG_CLEAR_STACK_FORK=y: > Run times: 159.31 157.34 156.71 158.15 160.81 > Mean: 158.46 > Std Dev: 1.46 > > Signed-off-by: Kees Cook <keescook@chromium.org> The change seems reasonable to me. Although it would be better to extend on the types of attacks this prevents from, with some examples ideally. How many attacks of that kind we had in the past and how often they appear. That might help people to decide whether to deserve few cycles on each fork. Also the config option sounds rather limiting. Consider distros, should they enable it just to be on the safe side? This is kind of generic concern with other hardening options though.
On 01/17/2018 01:17 AM, Michal Hocko wrote: > On Tue 16-01-18 21:50:15, Kees Cook wrote: >> One of the classes of kernel stack content leaks is exposing the contents >> of prior heap or stack contents when a new process stack is allocated. >> Normally, those stacks are not zeroed, and the old contents remain in >> place. With some types of stack content exposure flaws, those contents >> can leak to userspace. Kernels built with CONFIG_CLEAR_STACK_FORK will >> no longer be vulnerable to this, as the stack will be wiped each time >> a stack is assigned to a new process. There's not a meaningful change >> in runtime performance; it almost looks like it provides a benefit. > > Have you tried something as simple as /bin/true in a loop. kbuild will > certainly amortize few cycles for the clearing and I would expect, most > reasonable applications would do as well. But it would be better to know > the worst case scenario IMHO. > I tried /bin/true in a loop in my QEMU setup and didn't see a difference there. >> Performing back-to-back kernel builds before: >> Run times: 157.86 157.09 158.90 160.94 160.80 >> Mean: 159.12 >> Std Dev: 1.54 >> >> With CONFIG_CLEAR_STACK_FORK=y: >> Run times: 159.31 157.34 156.71 158.15 160.81 >> Mean: 158.46 >> Std Dev: 1.46 >> >> Signed-off-by: Kees Cook <keescook@chromium.org> > > The change seems reasonable to me. Although it would be better to extend > on the types of attacks this prevents from, with some examples ideally. > How many attacks of that kind we had in the past and how often they > appear. That might help people to decide whether to deserve few cycles > on each fork. Also the config option sounds rather limiting. Consider > distros, should they enable it just to be on the safe side? This is kind > of generic concern with other hardening options though. > Agreed this could use a few more words, but it looks good to me overall. Thanks, Laura
On Tue, 16 Jan 2018, Kees Cook wrote: > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h > index 34f053a150a9..091f53fe31cc 100644 > --- a/include/linux/thread_info.h > +++ b/include/linux/thread_info.h > @@ -43,7 +43,9 @@ enum { > #define THREAD_ALIGN THREAD_SIZE > #endif > > -#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) > +#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || \ > + IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) || \ > + IS_ENABLED(CONFIG_CLEAR_STACK_FORK) > # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO) > #else > # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT) > diff --git a/kernel/fork.c b/kernel/fork.c > index 2295fc69717f..215b1ce2b2cd 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -215,7 +215,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) > if (!s) > continue; > > -#ifdef CONFIG_DEBUG_KMEMLEAK > +#if IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) || IS_ENABLED(CONFIG_CLEAR_STACK_FORK) > /* Clear stale pointers from reused stack. */ > memset(s->addr, 0, THREAD_SIZE); > #endif Is there any good reason not to do it symmetricaly also for non-vmapped stacks? (by passing __GFP_ZERO to alloc_pages_node())? Thanks,
On Fri, 26 Jan 2018, Jiri Kosina wrote: > > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h > > index 34f053a150a9..091f53fe31cc 100644 > > --- a/include/linux/thread_info.h > > +++ b/include/linux/thread_info.h > > @@ -43,7 +43,9 @@ enum { > > #define THREAD_ALIGN THREAD_SIZE > > #endif > > > > -#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) > > +#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || \ > > + IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) || \ > > + IS_ENABLED(CONFIG_CLEAR_STACK_FORK) > > # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO) > > #else > > # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT) > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 2295fc69717f..215b1ce2b2cd 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -215,7 +215,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) > > if (!s) > > continue; > > > > -#ifdef CONFIG_DEBUG_KMEMLEAK > > +#if IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) || IS_ENABLED(CONFIG_CLEAR_STACK_FORK) > > /* Clear stale pointers from reused stack. */ > > memset(s->addr, 0, THREAD_SIZE); > > #endif > > Is there any good reason not to do it symmetricaly also for non-vmapped > stacks? (by passing __GFP_ZERO to alloc_pages_node())? Ah, of course you already do by extending THREADINFO_GFP, sorry for the noise.
On Tue, 16 Jan 2018 21:50:15 -0800 Kees Cook <keescook@chromium.org> wrote: > One of the classes of kernel stack content leaks is exposing the contents > of prior heap or stack contents when a new process stack is allocated. > Normally, those stacks are not zeroed, and the old contents remain in > place. With some types of stack content exposure flaws, those contents > can leak to userspace. Kernels built with CONFIG_CLEAR_STACK_FORK will > no longer be vulnerable to this, as the stack will be wiped each time > a stack is assigned to a new process. There's not a meaningful change > in runtime performance; it almost looks like it provides a benefit. > > Performing back-to-back kernel builds before: > Run times: 157.86 157.09 158.90 160.94 160.80 > Mean: 159.12 > Std Dev: 1.54 > > With CONFIG_CLEAR_STACK_FORK=y: > Run times: 159.31 157.34 156.71 158.15 160.81 > Mean: 158.46 > Std Dev: 1.46 > > ... > > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -904,6 +904,14 @@ config VMAP_STACK > the stack to map directly to the KASAN shadow map using a formula > that is incorrect if the stack is in vmalloc space. > > +config CLEAR_STACK_FORK > + bool "Clear the kernel stack at each fork" > + help > + To resist stack content leak flaws, this clears newly allocated > + kernel stacks to keep previously freed heap or stack contents > + from being present in the new stack. This has almost no > + measurable performance impact. > + It would be much nicer to be able to control this at runtime rather than compile-time. Why not a /proc tunable? We could always use more of those ;)
On Wed, Feb 21, 2018 at 12:31 AM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Tue, 16 Jan 2018 21:50:15 -0800 Kees Cook <keescook@chromium.org> wrote: > >> One of the classes of kernel stack content leaks is exposing the contents >> of prior heap or stack contents when a new process stack is allocated. >> Normally, those stacks are not zeroed, and the old contents remain in >> place. With some types of stack content exposure flaws, those contents >> can leak to userspace. Kernels built with CONFIG_CLEAR_STACK_FORK will >> no longer be vulnerable to this, as the stack will be wiped each time >> a stack is assigned to a new process. There's not a meaningful change >> in runtime performance; it almost looks like it provides a benefit. >> >> Performing back-to-back kernel builds before: >> Run times: 157.86 157.09 158.90 160.94 160.80 >> Mean: 159.12 >> Std Dev: 1.54 >> >> With CONFIG_CLEAR_STACK_FORK=y: >> Run times: 159.31 157.34 156.71 158.15 160.81 >> Mean: 158.46 >> Std Dev: 1.46 >> >> ... >> >> --- a/arch/Kconfig >> +++ b/arch/Kconfig >> @@ -904,6 +904,14 @@ config VMAP_STACK >> the stack to map directly to the KASAN shadow map using a formula >> that is incorrect if the stack is in vmalloc space. >> >> +config CLEAR_STACK_FORK >> + bool "Clear the kernel stack at each fork" >> + help >> + To resist stack content leak flaws, this clears newly allocated >> + kernel stacks to keep previously freed heap or stack contents >> + from being present in the new stack. This has almost no >> + measurable performance impact. >> + > > It would be much nicer to be able to control this at runtime rather > than compile-time. Why not a /proc tunable? We could always use more > of those ;) /proc/sys/kernel/hardening_features_that_cost_essentially_nothing? Seriously, though, why don't we just enable it unconditionally? It wouldn't surprise me if it really is a speedup on more workloads than it slows down -- it'll fill the kernel stack into the CPU cache with exclusive ownership very quickly (streamily and without actually reading from memory, I imagine, at least on new enough CPUs) rather than grabbing each cache line one by one as they get used.
On Wed, Feb 21, 2018 at 01:56:33AM +0000, Andrew Lutomirski wrote: > > It would be much nicer to be able to control this at runtime rather > > than compile-time. Why not a /proc tunable? We could always use more > > of those ;) > > /proc/sys/kernel/hardening_features_that_cost_essentially_nothing? > > Seriously, though, why don't we just enable it unconditionally? It > wouldn't surprise me if it really is a speedup on more workloads than > it slows down -- it'll fill the kernel stack into the CPU cache with > exclusive ownership very quickly (streamily and without actually > reading from memory, I imagine, at least on new enough CPUs) rather > than grabbing each cache line one by one as they get used. Note that this is not unconditionally true, it depends on the calling context that clears the page. If this is during fork, then the parent may be doing the clear (I didn't check) which means it's quite likely when the child wakes for the first time that it will not necessary wake on the same CPU. Up until recently on NUMA machines, the child was almost guaranteed to be running on a remote node (mitigated in tip for sched now). I'm not claiming I've measured the overhead of this, just pointing out that "cache hotness" may actually result in double the cache line bounces -- first clear, then write on first wake. If only zeroing pages was a bit faster :/
diff --git a/arch/Kconfig b/arch/Kconfig index 400b9e1b2f27..42d56dad03ec 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -904,6 +904,14 @@ config VMAP_STACK the stack to map directly to the KASAN shadow map using a formula that is incorrect if the stack is in vmalloc space. +config CLEAR_STACK_FORK + bool "Clear the kernel stack at each fork" + help + To resist stack content leak flaws, this clears newly allocated + kernel stacks to keep previously freed heap or stack contents + from being present in the new stack. This has almost no + measurable performance impact. + config ARCH_OPTIONAL_KERNEL_RWX def_bool n diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 34f053a150a9..091f53fe31cc 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -43,7 +43,9 @@ enum { #define THREAD_ALIGN THREAD_SIZE #endif -#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) +#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || \ + IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) || \ + IS_ENABLED(CONFIG_CLEAR_STACK_FORK) # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO) #else # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT) diff --git a/kernel/fork.c b/kernel/fork.c index 2295fc69717f..215b1ce2b2cd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -215,7 +215,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) if (!s) continue; -#ifdef CONFIG_DEBUG_KMEMLEAK +#if IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) || IS_ENABLED(CONFIG_CLEAR_STACK_FORK) /* Clear stale pointers from reused stack. */ memset(s->addr, 0, THREAD_SIZE); #endif
One of the classes of kernel stack content leaks is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. With some types of stack content exposure flaws, those contents can leak to userspace. Kernels built with CONFIG_CLEAR_STACK_FORK will no longer be vulnerable to this, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit. Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54 With CONFIG_CLEAR_STACK_FORK=y: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46 Signed-off-by: Kees Cook <keescook@chromium.org> --- arch/Kconfig | 8 ++++++++ include/linux/thread_info.h | 4 +++- kernel/fork.c | 2 +- 3 files changed, 12 insertions(+), 2 deletions(-)