diff mbox

[v2] mm: Add SLUB free list pointer obfuscation

Message ID 20170623015010.GA137429@beast (mailing list archive)
State New, archived
Headers show

Commit Message

Kees Cook June 23, 2017, 1:50 a.m. UTC
This SLUB free list pointer obfuscation code is modified from Brad
Spengler/PaX Team's code in the last public patch of grsecurity/PaX based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

This adds a per-cache random value to SLUB caches that is XORed with
their freelist pointers. This adds nearly zero overhead and frustrates the
very common heap overflow exploitation method of overwriting freelist
pointers. A recent example of the attack is written up here:
http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit

This is based on patches by Daniel Micay, and refactored to avoid lots
of #ifdef code.

Suggested-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v2:
- renamed Kconfig to SLAB_FREELIST_HARDENED; labbott.
---
 include/linux/slub_def.h |  4 ++++
 init/Kconfig             |  9 +++++++++
 mm/slub.c                | 32 +++++++++++++++++++++++++++-----
 3 files changed, 40 insertions(+), 5 deletions(-)

Comments

Kees Cook June 25, 2017, 7:56 p.m. UTC | #1
On Thu, Jun 22, 2017 at 6:50 PM, Kees Cook <keescook@chromium.org> wrote:
> This SLUB free list pointer obfuscation code is modified from Brad
> Spengler/PaX Team's code in the last public patch of grsecurity/PaX based
> on my understanding of the code. Changes or omissions from the original
> code are mine and don't reflect the original grsecurity/PaX code.
>
> This adds a per-cache random value to SLUB caches that is XORed with
> their freelist pointers. This adds nearly zero overhead and frustrates the
> very common heap overflow exploitation method of overwriting freelist
> pointers. A recent example of the attack is written up here:
> http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit

BTW, to quantify "nearly zero overhead", I ran multiple 200-run cycles
of "hackbench -g 20 -l 1000", and saw:

before:
mean 10.11882499999999999995
variance .03320378329145728642
stdev .18221905304181911048

after:
mean 10.12654000000000000014
variance .04700556623115577889
stdev .21680767106160192064

The difference gets lost in the noise, but if the above is sensible,
it's 0.07% slower. ;)

-Kees
Christoph Lameter (Ampere) June 29, 2017, 5:05 p.m. UTC | #2
On Sun, 25 Jun 2017, Kees Cook wrote:

> The difference gets lost in the noise, but if the above is sensible,
> it's 0.07% slower. ;)

Hmmm... These differences add up. Also in a repetative benchmark like that
you do not see the impact that the additional cacheline use in the cpu
cache has on larger workloads. Those may be pushed over the edge of l1 or
l2 capacity at some point which then causes drastic regressions.
Kees Cook June 29, 2017, 5:47 p.m. UTC | #3
On Thu, Jun 29, 2017 at 10:05 AM, Christoph Lameter <cl@linux.com> wrote:
> On Sun, 25 Jun 2017, Kees Cook wrote:
>
>> The difference gets lost in the noise, but if the above is sensible,
>> it's 0.07% slower. ;)
>
> Hmmm... These differences add up. Also in a repetative benchmark like that
> you do not see the impact that the additional cacheline use in the cpu
> cache has on larger workloads. Those may be pushed over the edge of l1 or
> l2 capacity at some point which then causes drastic regressions.

Even if that is true, it may be worth it to some people to have the
protection. Given that is significantly hampers a large class of heap
overflow attacks[1], I think it's an important change to have. I'm not
suggesting this be on by default, it's cleanly behind
CONFIG-controlled macros, and is very limited in scope. If you can Ack
it we can let system builders decide if they want to risk a possible
performance hit. I'm pretty sure most distros would like to have this
protection.

Thanks for looking it over!

-Kees

[1] http://resources.infosecinstitute.com/exploiting-linux-kernel-heap-corruptions-slub-allocator/
Rik van Riel June 29, 2017, 5:54 p.m. UTC | #4
On Thu, 2017-06-29 at 10:47 -0700, Kees Cook wrote:
> On Thu, Jun 29, 2017 at 10:05 AM, Christoph Lameter <cl@linux.com>
> wrote:
> > On Sun, 25 Jun 2017, Kees Cook wrote:
> > 
> > > The difference gets lost in the noise, but if the above is
> > > sensible,
> > > it's 0.07% slower. ;)
> > 
> > Hmmm... These differences add up. Also in a repetative benchmark
> > like that
> > you do not see the impact that the additional cacheline use in the
> > cpu
> > cache has on larger workloads. Those may be pushed over the edge of
> > l1 or
> > l2 capacity at some point which then causes drastic regressions.
> 
> Even if that is true, it may be worth it to some people to have the
> protection. Given that is significantly hampers a large class of heap
> overflow attacks[1], I think it's an important change to have. I'm
> not
> suggesting this be on by default, it's cleanly behind
> CONFIG-controlled macros, and is very limited in scope. If you can
> Ack
> it we can let system builders decide if they want to risk a possible
> performance hit. I'm pretty sure most distros would like to have this
> protection.

I could certainly see it being useful for all kinds of portable
and network-connected systems where security is simply much
more important than performance.
Tycho Andersen June 29, 2017, 5:56 p.m. UTC | #5
On Thu, Jun 29, 2017 at 01:54:13PM -0400, Rik van Riel wrote:
> On Thu, 2017-06-29 at 10:47 -0700, Kees Cook wrote:
> > On Thu, Jun 29, 2017 at 10:05 AM, Christoph Lameter <cl@linux.com>
> > wrote:
> > > On Sun, 25 Jun 2017, Kees Cook wrote:
> > > 
> > > > The difference gets lost in the noise, but if the above is
> > > > sensible,
> > > > it's 0.07% slower. ;)
> > > 
> > > Hmmm... These differences add up. Also in a repetative benchmark
> > > like that
> > > you do not see the impact that the additional cacheline use in the
> > > cpu
> > > cache has on larger workloads. Those may be pushed over the edge of
> > > l1 or
> > > l2 capacity at some point which then causes drastic regressions.
> > 
> > Even if that is true, it may be worth it to some people to have the
> > protection. Given that is significantly hampers a large class of heap
> > overflow attacks[1], I think it's an important change to have. I'm
> > not
> > suggesting this be on by default, it's cleanly behind
> > CONFIG-controlled macros, and is very limited in scope. If you can
> > Ack
> > it we can let system builders decide if they want to risk a possible
> > performance hit. I'm pretty sure most distros would like to have this
> > protection.
> 
> I could certainly see it being useful for all kinds of portable
> and network-connected systems where security is simply much
> more important than performance.

Indeed, I believe we would enable this in our kernels.

Cheers,

Tycho
Kees Cook July 5, 2017, 11:30 p.m. UTC | #6
On Thu, Jun 29, 2017 at 10:56 AM, Tycho Andersen <tycho@docker.com> wrote:
> On Thu, Jun 29, 2017 at 01:54:13PM -0400, Rik van Riel wrote:
>> On Thu, 2017-06-29 at 10:47 -0700, Kees Cook wrote:
>> > On Thu, Jun 29, 2017 at 10:05 AM, Christoph Lameter <cl@linux.com>
>> > wrote:
>> > > On Sun, 25 Jun 2017, Kees Cook wrote:
>> > >
>> > > > The difference gets lost in the noise, but if the above is
>> > > > sensible,
>> > > > it's 0.07% slower. ;)
>> > >
>> > > Hmmm... These differences add up. Also in a repetative benchmark
>> > > like that
>> > > you do not see the impact that the additional cacheline use in the
>> > > cpu
>> > > cache has on larger workloads. Those may be pushed over the edge of
>> > > l1 or
>> > > l2 capacity at some point which then causes drastic regressions.
>> >
>> > Even if that is true, it may be worth it to some people to have the
>> > protection. Given that is significantly hampers a large class of heap
>> > overflow attacks[1], I think it's an important change to have. I'm
>> > not
>> > suggesting this be on by default, it's cleanly behind
>> > CONFIG-controlled macros, and is very limited in scope. If you can
>> > Ack
>> > it we can let system builders decide if they want to risk a possible
>> > performance hit. I'm pretty sure most distros would like to have this
>> > protection.
>>
>> I could certainly see it being useful for all kinds of portable
>> and network-connected systems where security is simply much
>> more important than performance.
>
> Indeed, I believe we would enable this in our kernels.

Andrew and Christoph,

What do you think about carrying this for -mm, since people are
interested in it and it's a very narrow change behind a config (with a
large impact on reducing the expoitability of freelist pointer
overwrites)?

-Kees
Andrew Morton July 5, 2017, 11:39 p.m. UTC | #7
On Thu, 22 Jun 2017 18:50:10 -0700 Kees Cook <keescook@chromium.org> wrote:

> This SLUB free list pointer obfuscation code is modified from Brad
> Spengler/PaX Team's code in the last public patch of grsecurity/PaX based
> on my understanding of the code. Changes or omissions from the original
> code are mine and don't reflect the original grsecurity/PaX code.
> 
> This adds a per-cache random value to SLUB caches that is XORed with
> their freelist pointers. This adds nearly zero overhead and frustrates the
> very common heap overflow exploitation method of overwriting freelist
> pointers. A recent example of the attack is written up here:
> http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit
> 
> This is based on patches by Daniel Micay, and refactored to avoid lots
> of #ifdef code.
> 
> ...
>
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1900,6 +1900,15 @@ config SLAB_FREELIST_RANDOM
>  	  security feature reduces the predictability of the kernel slab
>  	  allocator against heap overflows.
>  
> +config SLAB_FREELIST_HARDENED
> +	bool "Harden slab freelist metadata"
> +	depends on SLUB
> +	help
> +	  Many kernel heap attacks try to target slab cache metadata and
> +	  other infrastructure. This options makes minor performance
> +	  sacrifies to harden the kernel slab allocator against common
> +	  freelist exploit methods.
> +

Well, it is optable-outable.

>  config SLUB_CPU_PARTIAL
>  	default y
>  	depends on SLUB && SMP
> diff --git a/mm/slub.c b/mm/slub.c
> index 57e5156f02be..590e7830aaed 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -34,6 +34,7 @@
>  #include <linux/stacktrace.h>
>  #include <linux/prefetch.h>
>  #include <linux/memcontrol.h>
> +#include <linux/random.h>
>  
>  #include <trace/events/kmem.h>
>  
> @@ -238,30 +239,50 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si)
>   * 			Core slab cache functions
>   *******************************************************************/
>  
> +#ifdef CONFIG_SLAB_FREELIST_HARDENED
> +# define initialize_random(s)					\
> +		do {						\
> +			s->random = get_random_long();		\
> +		} while (0)
> +# define FREEPTR_VAL(ptr, ptr_addr, s)	\
> +		(void *)((unsigned long)(ptr) ^ s->random ^ (ptr_addr))
> +#else
> +# define initialize_random(s)		do { } while (0)
> +# define FREEPTR_VAL(ptr, addr, s)	((void *)(ptr))
> +#endif
> +#define FREELIST_ENTRY(ptr_addr, s)				\
> +		FREEPTR_VAL(*(unsigned long *)(ptr_addr),	\
> +			    (unsigned long)ptr_addr, s)
> +

That's a bit of an eyesore.  Is there any reason why we cannot
implement all of the above in nice, conventional C functions?

>
> ...
>
> @@ -3536,6 +3557,7 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
>  {
>  	s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
>  	s->reserved = 0;
> +	initialize_random(s);
>  
>  	if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
>  		s->reserved = sizeof(struct rcu_head);

We regularly have issues where the random system just isn't ready
(enough) for clients to use it.  Are you sure the above is actually
useful for the boot-time caches?
Kees Cook July 5, 2017, 11:56 p.m. UTC | #8
On Wed, Jul 5, 2017 at 4:39 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 22 Jun 2017 18:50:10 -0700 Kees Cook <keescook@chromium.org> wrote:
>
>> This SLUB free list pointer obfuscation code is modified from Brad
>> Spengler/PaX Team's code in the last public patch of grsecurity/PaX based
>> on my understanding of the code. Changes or omissions from the original
>> code are mine and don't reflect the original grsecurity/PaX code.
>>
>> This adds a per-cache random value to SLUB caches that is XORed with
>> their freelist pointers. This adds nearly zero overhead and frustrates the
>> very common heap overflow exploitation method of overwriting freelist
>> pointers. A recent example of the attack is written up here:
>> http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit
>>
>> This is based on patches by Daniel Micay, and refactored to avoid lots
>> of #ifdef code.
>>
>> ...
>>
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1900,6 +1900,15 @@ config SLAB_FREELIST_RANDOM
>>         security feature reduces the predictability of the kernel slab
>>         allocator against heap overflows.
>>
>> +config SLAB_FREELIST_HARDENED
>> +     bool "Harden slab freelist metadata"
>> +     depends on SLUB
>> +     help
>> +       Many kernel heap attacks try to target slab cache metadata and
>> +       other infrastructure. This options makes minor performance
>> +       sacrifies to harden the kernel slab allocator against common
>> +       freelist exploit methods.
>> +
>
> Well, it is optable-outable.
>
>>  config SLUB_CPU_PARTIAL
>>       default y
>>       depends on SLUB && SMP
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 57e5156f02be..590e7830aaed 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -34,6 +34,7 @@
>>  #include <linux/stacktrace.h>
>>  #include <linux/prefetch.h>
>>  #include <linux/memcontrol.h>
>> +#include <linux/random.h>
>>
>>  #include <trace/events/kmem.h>
>>
>> @@ -238,30 +239,50 @@ static inline void stat(const struct kmem_cache *s, enum stat_item si)
>>   *                   Core slab cache functions
>>   *******************************************************************/
>>
>> +#ifdef CONFIG_SLAB_FREELIST_HARDENED
>> +# define initialize_random(s)                                        \
>> +             do {                                            \
>> +                     s->random = get_random_long();          \
>> +             } while (0)
>> +# define FREEPTR_VAL(ptr, ptr_addr, s)       \
>> +             (void *)((unsigned long)(ptr) ^ s->random ^ (ptr_addr))
>> +#else
>> +# define initialize_random(s)                do { } while (0)
>> +# define FREEPTR_VAL(ptr, addr, s)   ((void *)(ptr))
>> +#endif
>> +#define FREELIST_ENTRY(ptr_addr, s)                          \
>> +             FREEPTR_VAL(*(unsigned long *)(ptr_addr),       \
>> +                         (unsigned long)ptr_addr, s)
>> +
>
> That's a bit of an eyesore.  Is there any reason why we cannot
> implement all of the above in nice, conventional C functions?

I could rework it using static inlines. I was mainly avoiding #ifdef
blocks in the freelist manipulation functions, but I could push them
up to these functions instead. I'll send a v2.

>> ...
>>
>> @@ -3536,6 +3557,7 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
>>  {
>>       s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
>>       s->reserved = 0;
>> +     initialize_random(s);
>>
>>       if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
>>               s->reserved = sizeof(struct rcu_head);
>
> We regularly have issues where the random system just isn't ready
> (enough) for clients to use it.  Are you sure the above is actually
> useful for the boot-time caches?

IMO, this isn't reason enough to block this since we have similar
problems in other places (e.g. the stack canary itself). The random
infrastructure is aware of these problems and is continually improving
(e.g. on x86 without enough pool entropy, this will fall back to
RDRAND, IIUC). Additionally for systems that are chronically short on
entropy they can build with the latent_entropy GCC plugin to at least
bump the seeding around a bit. So, it _can_ be low entropy, but adding
this still improves the situation since the address of the freelist
pointer itself is part of the XORing, so even if the random number was
static, it would still require info exposures to work around the
obfuscation.

Shorter version: yeah, it'll still be useful. :)

-Kees
diff mbox

Patch

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 07ef550c6627..d7990a83b416 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -93,6 +93,10 @@  struct kmem_cache {
 #endif
 #endif
 
+#ifdef CONFIG_SLAB_FREELIST_HARDENED
+	unsigned long random;
+#endif
+
 #ifdef CONFIG_NUMA
 	/*
 	 * Defragmentation by allocating from a remote node.
diff --git a/init/Kconfig b/init/Kconfig
index 1d3475fc9496..04ee3e507b9e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1900,6 +1900,15 @@  config SLAB_FREELIST_RANDOM
 	  security feature reduces the predictability of the kernel slab
 	  allocator against heap overflows.
 
+config SLAB_FREELIST_HARDENED
+	bool "Harden slab freelist metadata"
+	depends on SLUB
+	help
+	  Many kernel heap attacks try to target slab cache metadata and
+	  other infrastructure. This options makes minor performance
+	  sacrifies to harden the kernel slab allocator against common
+	  freelist exploit methods.
+
 config SLUB_CPU_PARTIAL
 	default y
 	depends on SLUB && SMP
diff --git a/mm/slub.c b/mm/slub.c
index 57e5156f02be..590e7830aaed 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -34,6 +34,7 @@ 
 #include <linux/stacktrace.h>
 #include <linux/prefetch.h>
 #include <linux/memcontrol.h>
+#include <linux/random.h>
 
 #include <trace/events/kmem.h>
 
@@ -238,30 +239,50 @@  static inline void stat(const struct kmem_cache *s, enum stat_item si)
  * 			Core slab cache functions
  *******************************************************************/
 
+#ifdef CONFIG_SLAB_FREELIST_HARDENED
+# define initialize_random(s)					\
+		do {						\
+			s->random = get_random_long();		\
+		} while (0)
+# define FREEPTR_VAL(ptr, ptr_addr, s)	\
+		(void *)((unsigned long)(ptr) ^ s->random ^ (ptr_addr))
+#else
+# define initialize_random(s)		do { } while (0)
+# define FREEPTR_VAL(ptr, addr, s)	((void *)(ptr))
+#endif
+#define FREELIST_ENTRY(ptr_addr, s)				\
+		FREEPTR_VAL(*(unsigned long *)(ptr_addr),	\
+			    (unsigned long)ptr_addr, s)
+
 static inline void *get_freepointer(struct kmem_cache *s, void *object)
 {
-	return *(void **)(object + s->offset);
+	return FREELIST_ENTRY(object + s->offset, s);
 }
 
 static void prefetch_freepointer(const struct kmem_cache *s, void *object)
 {
-	prefetch(object + s->offset);
+	if (object)
+		prefetch(FREELIST_ENTRY(object + s->offset, s));
 }
 
 static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
 {
+	unsigned long freepointer_addr;
 	void *p;
 
 	if (!debug_pagealloc_enabled())
 		return get_freepointer(s, object);
 
-	probe_kernel_read(&p, (void **)(object + s->offset), sizeof(p));
-	return p;
+	freepointer_addr = (unsigned long)object + s->offset;
+	probe_kernel_read(&p, (void **)freepointer_addr, sizeof(p));
+	return FREEPTR_VAL(p, freepointer_addr, s);
 }
 
 static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
 {
-	*(void **)(object + s->offset) = fp;
+	unsigned long freeptr_addr = (unsigned long)object + s->offset;
+
+	*(void **)freeptr_addr = FREEPTR_VAL(fp, freeptr_addr, s);
 }
 
 /* Loop over all objects in a slab */
@@ -3536,6 +3557,7 @@  static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
 {
 	s->flags = kmem_cache_flags(s->size, flags, s->name, s->ctor);
 	s->reserved = 0;
+	initialize_random(s);
 
 	if (need_reserve_slab_rcu && (s->flags & SLAB_TYPESAFE_BY_RCU))
 		s->reserved = sizeof(struct rcu_head);