[RFC] Randomized slab caches for kmalloc()

Message ID	20230315095459.186113-1-gongruiqi1@huawei.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: "GONG, Ruiqi" <gongruiqi1@huawei.com> To: Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Andrew Morton <akpm@linux-foundation.org>, Vlastimil Babka <vbabka@suse.cz> CC: Roman Gushchin <roman.gushchin@linux.dev>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, Alexander Potapenko <glider@google.com>, Marco Elver <elver@google.com>, Dmitry Vyukov <dvyukov@google.com>, <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <kasan-dev@googlegroups.com>, Kees Cook <keescook@chromium.org>, <linux-hardening@vger.kernel.org>, Paul Moore <paul@paul-moore.com>, <linux-security-module@vger.kernel.org>, James Morris <jmorris@namei.org>, Wang Weiyang <wangweiyang2@huawei.com>, Xiu Jianfeng <xiujianfeng@huawei.com> Subject: [PATCH RFC] Randomized slab caches for kmalloc() Date: Wed, 15 Mar 2023 17:54:59 +0800 Message-ID: <20230315095459.186113-1-gongruiqi1@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[RFC] Randomized slab caches for kmalloc() \| expand [RFC] Randomized slab caches for kmalloc()

Gong Ruiqi March 15, 2023, 9:54 a.m. UTC

When exploiting memory vulnerabilities, "heap spraying" is a common
technique targeting those related to dynamic memory allocation (i.e. the
"heap"), and it plays an important role in a successful exploitation.
Basically, it is to overwrite the memory area of vulnerable object by
triggering allocation in other subsystems or modules and therefore
getting a reference to the targeted memory location. It's usable on
various types of vulnerablity including use after free (UAF), heap out-
of-bound write and etc.

There are (at least) two reasons why the heap can be sprayed: 1) generic
slab caches are shared among different subsystems and modules, and
2) dedicated slab caches could be merged with the generic ones.
Currently these two factors cannot be prevented at a low cost: the first
one is a widely used memory allocation mechanism, and shutting down slab
merging completely via `slub_nomerge` would be overkill.

To efficiently prevent heap spraying, we propose the following approach:
to create multiple copies of generic slab caches that will never be
merged, and random one of them will be used at allocation. The random
selection is based on the location of code that calls `kmalloc()`, which
means it is static at runtime (rather than dynamically determined at
each time of allocation, which could be bypassed by repeatedly spraying
in brute force). In this way, the vulnerable object and memory allocated
in other subsystems and modules will (most probably) be on different
slab caches, which prevents the object from being sprayed.

Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
---

v0:
The current implementation only randomize slab caches for KMALLOC_NORMAL
allocation. Besides the patch itself, we would also like to know the
opinion of the community about whether or not it's necessary to extend
this randomization to all KMALLOC_*, and if so, if implementing a three-
dimensional `kmalloc_caches` is a better choice.


 include/linux/percpu.h  | 12 +++++++++---
 include/linux/slab.h    | 24 +++++++++++++++++++-----
 mm/Kconfig              | 20 ++++++++++++++++++++
 mm/kfence/kfence_test.c |  4 ++--
 mm/slab.c               |  2 +-
 mm/slab.h               |  3 ++-
 mm/slab_common.c        | 40 +++++++++++++++++++++++++++++++++++-----
 7 files changed, 88 insertions(+), 17 deletions(-)

Gong Ruiqi April 3, 2023, 12:06 p.m. UTC | #1

Hi all,

Friendly ping. Any suggestions are welcome.

Thanks.
Ruiqi

On 2023/03/15 17:54, GONG, Ruiqi wrote:
> When exploiting memory vulnerabilities, "heap spraying" is a common
> technique targeting those related to dynamic memory allocation (i.e. the
> "heap"), and it plays an important role in a successful exploitation.
> Basically, it is to overwrite the memory area of vulnerable object by
> triggering allocation in other subsystems or modules and therefore
> getting a reference to the targeted memory location. It's usable on
> various types of vulnerablity including use after free (UAF), heap out-
> of-bound write and etc.
> 
> There are (at least) two reasons why the heap can be sprayed: 1) generic
> slab caches are shared among different subsystems and modules, and
> 2) dedicated slab caches could be merged with the generic ones.
> Currently these two factors cannot be prevented at a low cost: the first
> one is a widely used memory allocation mechanism, and shutting down slab
> merging completely via `slub_nomerge` would be overkill.
> 
> To efficiently prevent heap spraying, we propose the following approach:
> to create multiple copies of generic slab caches that will never be
> merged, and random one of them will be used at allocation. The random
> selection is based on the location of code that calls `kmalloc()`, which
> means it is static at runtime (rather than dynamically determined at
> each time of allocation, which could be bypassed by repeatedly spraying
> in brute force). In this way, the vulnerable object and memory allocated
> in other subsystems and modules will (most probably) be on different
> slab caches, which prevents the object from being sprayed.
> 
> Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
> ---
> 
> v0:
> The current implementation only randomize slab caches for KMALLOC_NORMAL
> allocation. Besides the patch itself, we would also like to know the
> opinion of the community about whether or not it's necessary to extend
> this randomization to all KMALLOC_*, and if so, if implementing a three-
> dimensional `kmalloc_caches` is a better choice.
> 
> 
>  include/linux/percpu.h  | 12 +++++++++---
>  include/linux/slab.h    | 24 +++++++++++++++++++-----
>  mm/Kconfig              | 20 ++++++++++++++++++++
>  mm/kfence/kfence_test.c |  4 ++--
>  mm/slab.c               |  2 +-
>  mm/slab.h               |  3 ++-
>  mm/slab_common.c        | 40 +++++++++++++++++++++++++++++++++++-----
>  7 files changed, 88 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index 1338ea2aa720..6cee6425951f 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -34,6 +34,12 @@
>  #define PCPU_BITMAP_BLOCK_BITS		(PCPU_BITMAP_BLOCK_SIZE >>	\
>  					 PCPU_MIN_ALLOC_SHIFT)
>  
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> +#define PERCPU_DYNAMIC_SIZE_SHIFT      13
> +#else
> +#define PERCPU_DYNAMIC_SIZE_SHIFT      10
> +#endif
> +
>  /*
>   * Percpu allocator can serve percpu allocations before slab is
>   * initialized which allows slab to depend on the percpu allocator.
> @@ -41,7 +47,7 @@
>   * for this.  Keep PERCPU_DYNAMIC_RESERVE equal to or larger than
>   * PERCPU_DYNAMIC_EARLY_SIZE.
>   */
> -#define PERCPU_DYNAMIC_EARLY_SIZE	(20 << 10)
> +#define PERCPU_DYNAMIC_EARLY_SIZE	(20 << PERCPU_DYNAMIC_SIZE_SHIFT)
>  
>  /*
>   * PERCPU_DYNAMIC_RESERVE indicates the amount of free area to piggy
> @@ -55,9 +61,9 @@
>   * intelligent way to determine this would be nice.
>   */
>  #if BITS_PER_LONG > 32
> -#define PERCPU_DYNAMIC_RESERVE		(28 << 10)
> +#define PERCPU_DYNAMIC_RESERVE		(28 << PERCPU_DYNAMIC_SIZE_SHIFT)
>  #else
> -#define PERCPU_DYNAMIC_RESERVE		(20 << 10)
> +#define PERCPU_DYNAMIC_RESERVE		(20 << PERCPU_DYNAMIC_SIZE_SHIFT)
>  #endif
>  
>  extern void *pcpu_base_addr;
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 87d687c43d8c..fea7644a1985 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -106,6 +106,12 @@
>  /* Avoid kmemleak tracing */
>  #define SLAB_NOLEAKTRACE	((slab_flags_t __force)0x00800000U)
>  
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> +# define SLAB_RANDOMSLAB	((slab_flags_t __force)0x01000000U)
> +#else
> +# define SLAB_RANDOMSLAB	0
> +#endif
> +
>  /* Fault injection mark */
>  #ifdef CONFIG_FAILSLAB
>  # define SLAB_FAILSLAB		((slab_flags_t __force)0x02000000U)
> @@ -336,6 +342,12 @@ static inline unsigned int arch_slab_minalign(void)
>  #define SLAB_OBJ_MIN_SIZE      (KMALLOC_MIN_SIZE < 16 ? \
>                                 (KMALLOC_MIN_SIZE) : 16)
>  
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> +#define KMALLOC_RANDOM_NR CONFIG_RANDOM_KMALLOC_CACHES_NR
> +#else
> +#define KMALLOC_RANDOM_NR 1
> +#endif
> +
>  /*
>   * Whenever changing this, take care of that kmalloc_type() and
>   * create_kmalloc_caches() still work as intended.
> @@ -345,7 +357,9 @@ static inline unsigned int arch_slab_minalign(void)
>   * kmem caches can have both accounted and unaccounted objects.
>   */
>  enum kmalloc_cache_type {
> -	KMALLOC_NORMAL = 0,
> +	KMALLOC_RANDOM_START = 0,
> +	KMALLOC_RANDOM_END = KMALLOC_RANDOM_START + KMALLOC_RANDOM_NR - 1,
> +	KMALLOC_NORMAL = KMALLOC_RANDOM_END,
>  #ifndef CONFIG_ZONE_DMA
>  	KMALLOC_DMA = KMALLOC_NORMAL,
>  #endif
> @@ -378,14 +392,14 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
>  	(IS_ENABLED(CONFIG_ZONE_DMA)   ? __GFP_DMA : 0) |	\
>  	(IS_ENABLED(CONFIG_MEMCG_KMEM) ? __GFP_ACCOUNT : 0))
>  
> -static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
> +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller)
>  {
>  	/*
>  	 * The most common case is KMALLOC_NORMAL, so test for it
>  	 * with a single branch for all the relevant flags.
>  	 */
>  	if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0))
> -		return KMALLOC_NORMAL;
> +		return KMALLOC_RANDOM_START + caller % KMALLOC_RANDOM_NR;
>  
>  	/*
>  	 * At least one of the flags has to be set. Their priorities in
> @@ -578,7 +592,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>  
>  		index = kmalloc_index(size);
>  		return kmalloc_trace(
> -				kmalloc_caches[kmalloc_type(flags)][index],
> +				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>  				flags, size);
>  	}
>  	return __kmalloc(size, flags);
> @@ -604,7 +618,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
>  
>  		index = kmalloc_index(size);
>  		return kmalloc_node_trace(
> -				kmalloc_caches[kmalloc_type(flags)][index],
> +				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>  				flags, node, size);
>  	}
>  	return __kmalloc_node(size, flags, node);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index bc828f640cd9..0b116bd8fdf0 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -333,6 +333,26 @@ config SLUB_CPU_PARTIAL
>  	  which requires the taking of locks that may cause latency spikes.
>  	  Typically one would choose no for a realtime system.
>  
> +config RANDOM_KMALLOC_CACHES
> +	default n
> +	depends on SLUB
> +	bool "Random slab caches for normal kmalloc"
> +	help
> +	  A hardening feature that creates multiple copies of slab caches for
> +	  normal kmalloc allocation and makes kmalloc randomly pick one based
> +	  on code address, which makes the attackers unable to spray vulnerable
> +	  memory objects on the heap for exploiting memory vulnerabilities.
> +
> +config RANDOM_KMALLOC_CACHES_NR
> +	int "Number of random slab caches copies"
> +	default 16
> +	range 4 16
> +	depends on RANDOM_KMALLOC_CACHES
> +	help
> +	  The number of copies of random slab caches. Bigger value makes the
> +	  potentially vulnerable memory object less likely to collide with
> +	  objects allocated from other subsystems or modules.
> +
>  endmenu # SLAB allocator options
>  
>  config SHUFFLE_PAGE_ALLOCATOR
> diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
> index b5d66a69200d..316d12af7202 100644
> --- a/mm/kfence/kfence_test.c
> +++ b/mm/kfence/kfence_test.c
> @@ -213,7 +213,7 @@ static void test_cache_destroy(void)
>  
>  static inline size_t kmalloc_cache_alignment(size_t size)
>  {
> -	return kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)]->align;
> +	return kmalloc_caches[kmalloc_type(GFP_KERNEL, _RET_IP_)][__kmalloc_index(size, false)]->align;
>  }
>  
>  /* Must always inline to match stack trace against caller. */
> @@ -284,7 +284,7 @@ static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocat
>  		if (is_kfence_address(alloc)) {
>  			struct slab *slab = virt_to_slab(alloc);
>  			struct kmem_cache *s = test_cache ?:
> -					kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)];
> +					kmalloc_caches[kmalloc_type(GFP_KERNEL, _RET_IP_)][__kmalloc_index(size, false)];
>  
>  			/*
>  			 * Verify that various helpers return the right values
> diff --git a/mm/slab.c b/mm/slab.c
> index dabc2a671fc6..8dc7e183dcc5 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1675,7 +1675,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
>  			if (freelist_size > KMALLOC_MAX_CACHE_SIZE) {
>  				freelist_cache_size = PAGE_SIZE << get_order(freelist_size);
>  			} else {
> -				freelist_cache = kmalloc_slab(freelist_size, 0u);
> +				freelist_cache = kmalloc_slab(freelist_size, 0u, _RET_IP_);
>  				if (!freelist_cache)
>  					continue;
>  				freelist_cache_size = freelist_cache->size;
> diff --git a/mm/slab.h b/mm/slab.h
> index 43966aa5fadf..4f4caf422b77 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -280,7 +280,7 @@ void setup_kmalloc_cache_index_table(void);
>  void create_kmalloc_caches(slab_flags_t);
>  
>  /* Find the kmalloc slab corresponding for a certain size */
> -struct kmem_cache *kmalloc_slab(size_t, gfp_t);
> +struct kmem_cache *kmalloc_slab(size_t, gfp_t, unsigned long);
>  
>  void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
>  			      int node, size_t orig_size,
> @@ -374,6 +374,7 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
>  			      SLAB_TEMPORARY | \
>  			      SLAB_ACCOUNT | \
>  			      SLAB_KMALLOC | \
> +			      SLAB_RANDOMSLAB | \
>  			      SLAB_NO_USER_FLAGS)
>  
>  bool __kmem_cache_empty(struct kmem_cache *);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index bf4e777cfe90..895a3edb82d4 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -47,6 +47,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>   */
>  #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
>  		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> +		SLAB_RANDOMSLAB | \
>  		SLAB_FAILSLAB | kasan_never_merge())
>  
>  #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
> @@ -722,7 +723,7 @@ static inline unsigned int size_index_elem(unsigned int bytes)
>   * Find the kmem_cache structure that serves a given size of
>   * allocation
>   */
> -struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
> +struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags, unsigned long caller)
>  {
>  	unsigned int index;
>  
> @@ -737,7 +738,7 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
>  		index = fls(size - 1);
>  	}
>  
> -	return kmalloc_caches[kmalloc_type(flags)][index];
> +	return kmalloc_caches[kmalloc_type(flags, caller)][index];
>  }
>  
>  size_t kmalloc_size_roundup(size_t size)
> @@ -755,7 +756,7 @@ size_t kmalloc_size_roundup(size_t size)
>  		return PAGE_SIZE << get_order(size);
>  
>  	/* The flags don't matter since size_index is common to all. */
> -	c = kmalloc_slab(size, GFP_KERNEL);
> +	c = kmalloc_slab(size, GFP_KERNEL, _RET_IP_);
>  	return c ? c->object_size : 0;
>  }
>  EXPORT_SYMBOL(kmalloc_size_roundup);
> @@ -778,12 +779,36 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
>  #define KMALLOC_RCL_NAME(sz)
>  #endif
>  
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> +#define __KMALLOC_RANDOM_CONCAT(a, b, c) a ## b ## c
> +#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMALLOC_RANDOM_, N, _NAME)(sz)
> +#define KMALLOC_RANDOM_1_NAME(sz)                             .name[KMALLOC_RANDOM_START +  0] = "kmalloc-random-01-" #sz,
> +#define KMALLOC_RANDOM_2_NAME(sz)  KMALLOC_RANDOM_1_NAME(sz)  .name[KMALLOC_RANDOM_START +  1] = "kmalloc-random-02-" #sz,
> +#define KMALLOC_RANDOM_3_NAME(sz)  KMALLOC_RANDOM_2_NAME(sz)  .name[KMALLOC_RANDOM_START +  2] = "kmalloc-random-03-" #sz,
> +#define KMALLOC_RANDOM_4_NAME(sz)  KMALLOC_RANDOM_3_NAME(sz)  .name[KMALLOC_RANDOM_START +  3] = "kmalloc-random-04-" #sz,
> +#define KMALLOC_RANDOM_5_NAME(sz)  KMALLOC_RANDOM_4_NAME(sz)  .name[KMALLOC_RANDOM_START +  4] = "kmalloc-random-05-" #sz,
> +#define KMALLOC_RANDOM_6_NAME(sz)  KMALLOC_RANDOM_5_NAME(sz)  .name[KMALLOC_RANDOM_START +  5] = "kmalloc-random-06-" #sz,
> +#define KMALLOC_RANDOM_7_NAME(sz)  KMALLOC_RANDOM_6_NAME(sz)  .name[KMALLOC_RANDOM_START +  6] = "kmalloc-random-07-" #sz,
> +#define KMALLOC_RANDOM_8_NAME(sz)  KMALLOC_RANDOM_7_NAME(sz)  .name[KMALLOC_RANDOM_START +  7] = "kmalloc-random-08-" #sz,
> +#define KMALLOC_RANDOM_9_NAME(sz)  KMALLOC_RANDOM_8_NAME(sz)  .name[KMALLOC_RANDOM_START +  8] = "kmalloc-random-09-" #sz,
> +#define KMALLOC_RANDOM_10_NAME(sz) KMALLOC_RANDOM_9_NAME(sz)  .name[KMALLOC_RANDOM_START +  9] = "kmalloc-random-10-" #sz,
> +#define KMALLOC_RANDOM_11_NAME(sz) KMALLOC_RANDOM_10_NAME(sz) .name[KMALLOC_RANDOM_START + 10] = "kmalloc-random-11-" #sz,
> +#define KMALLOC_RANDOM_12_NAME(sz) KMALLOC_RANDOM_11_NAME(sz) .name[KMALLOC_RANDOM_START + 11] = "kmalloc-random-12-" #sz,
> +#define KMALLOC_RANDOM_13_NAME(sz) KMALLOC_RANDOM_12_NAME(sz) .name[KMALLOC_RANDOM_START + 12] = "kmalloc-random-13-" #sz,
> +#define KMALLOC_RANDOM_14_NAME(sz) KMALLOC_RANDOM_13_NAME(sz) .name[KMALLOC_RANDOM_START + 13] = "kmalloc-random-14-" #sz,
> +#define KMALLOC_RANDOM_15_NAME(sz) KMALLOC_RANDOM_14_NAME(sz) .name[KMALLOC_RANDOM_START + 14] = "kmalloc-random-15-" #sz,
> +#define KMALLOC_RANDOM_16_NAME(sz) KMALLOC_RANDOM_15_NAME(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-random-16-" #sz,
> +#else
> +#define KMALLOC_RANDOM_NAME(N, sz)
> +#endif
> +
>  #define INIT_KMALLOC_INFO(__size, __short_size)			\
>  {								\
>  	.name[KMALLOC_NORMAL]  = "kmalloc-" #__short_size,	\
>  	KMALLOC_RCL_NAME(__short_size)				\
>  	KMALLOC_CGROUP_NAME(__short_size)			\
>  	KMALLOC_DMA_NAME(__short_size)				\
> +	KMALLOC_RANDOM_NAME(CONFIG_RANDOM_KMALLOC_CACHES_NR, __short_size)	\
>  	.size = __size,						\
>  }
>  
> @@ -879,6 +904,11 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
>  		flags |= SLAB_CACHE_DMA;
>  	}
>  
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> +	if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END)
> +		flags |= SLAB_RANDOMSLAB;
> +#endif
> +
>  	kmalloc_caches[type][idx] = create_kmalloc_cache(
>  					kmalloc_info[idx].name[type],
>  					kmalloc_info[idx].size, flags, 0,
> @@ -905,7 +935,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>  	/*
>  	 * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
>  	 */
> -	for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) {
> +	for (type = KMALLOC_RANDOM_START; type < NR_KMALLOC_TYPES; type++) {
>  		for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
>  			if (!kmalloc_caches[type][i])
>  				new_kmalloc_cache(i, type, flags);
> @@ -958,7 +988,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
>  		return ret;
>  	}
>  
> -	s = kmalloc_slab(size, flags);
> +	s = kmalloc_slab(size, flags, caller);
>  
>  	if (unlikely(ZERO_OR_NULL_PTR(s)))
>  		return s;

Hyeonggon Yoo April 5, 2023, 12:26 p.m. UTC | #2

On 3/15/2023 6:54 PM, GONG, Ruiqi wrote:
> When exploiting memory vulnerabilities, "heap spraying" is a common
> technique targeting those related to dynamic memory allocation (i.e. the
> "heap"), and it plays an important role in a successful exploitation.
> Basically, it is to overwrite the memory area of vulnerable object by
> triggering allocation in other subsystems or modules and therefore
> getting a reference to the targeted memory location. It's usable on
> various types of vulnerablity including use after free (UAF), heap out-
> of-bound write and etc.
>
> There are (at least) two reasons why the heap can be sprayed: 1) generic
> slab caches are shared among different subsystems and modules, and
> 2) dedicated slab caches could be merged with the generic ones.
> Currently these two factors cannot be prevented at a low cost: the first
> one is a widely used memory allocation mechanism, and shutting down slab
> merging completely via `slub_nomerge` would be overkill.
>
> To efficiently prevent heap spraying, we propose the following approach:
> to create multiple copies of generic slab caches that will never be
> merged, and random one of them will be used at allocation. The random
> selection is based on the location of code that calls `kmalloc()`, which
> means it is static at runtime (rather than dynamically determined at
> each time of allocation, which could be bypassed by repeatedly spraying
> in brute force). In this way, the vulnerable object and memory allocated
> in other subsystems and modules will (most probably) be on different
> slab caches, which prevents the object from being sprayed.
>
> Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
> ---

I'm not yet sure if this feature is appropriate for mainline kernel.

I have few questions:

1) What is cost of this configuration, in terms of memory overhead, or 
execution time?


2) The actual cache depends on caller which is static at build time, not 
runtime.

     What about using (caller ^ (some subsystem-wide random sequence)),

     which is static at runtime?

Alexander Lobakin April 5, 2023, 3:15 p.m. UTC | #3

From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date: Wed, 5 Apr 2023 21:26:47 +0900

> On 3/15/2023 6:54 PM, GONG, Ruiqi wrote:
>> When exploiting memory vulnerabilities, "heap spraying" is a common
>> technique targeting those related to dynamic memory allocation (i.e. the
>> "heap"), and it plays an important role in a successful exploitation.
>> Basically, it is to overwrite the memory area of vulnerable object by
>> triggering allocation in other subsystems or modules and therefore
>> getting a reference to the targeted memory location. It's usable on
>> various types of vulnerablity including use after free (UAF), heap out-
>> of-bound write and etc.
>>
>> There are (at least) two reasons why the heap can be sprayed: 1) generic
>> slab caches are shared among different subsystems and modules, and
>> 2) dedicated slab caches could be merged with the generic ones.
>> Currently these two factors cannot be prevented at a low cost: the first
>> one is a widely used memory allocation mechanism, and shutting down slab
>> merging completely via `slub_nomerge` would be overkill.
>>
>> To efficiently prevent heap spraying, we propose the following approach:
>> to create multiple copies of generic slab caches that will never be
>> merged, and random one of them will be used at allocation. The random
>> selection is based on the location of code that calls `kmalloc()`, which
>> means it is static at runtime (rather than dynamically determined at
>> each time of allocation, which could be bypassed by repeatedly spraying
>> in brute force). In this way, the vulnerable object and memory allocated
>> in other subsystems and modules will (most probably) be on different
>> slab caches, which prevents the object from being sprayed.
>>
>> Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
>> ---
> 
> I'm not yet sure if this feature is appropriate for mainline kernel.
> 
> I have few questions:
> 
> 1) What is cost of this configuration, in terms of memory overhead, or
> execution time?
> 
> 
> 2) The actual cache depends on caller which is static at build time, not
> runtime.
> 
>     What about using (caller ^ (some subsystem-wide random sequence)),
> 
>     which is static at runtime?

Why can't we just do

	random_get_u32_below(CONFIG_RANDOM_KMALLOC_CACHES_NR)

?
It's fast enough according to Jason... `_RET_IP_ % nr` doesn't sound
"secure" to me. It really is a compile-time constant, which can be
calculated (or not?) manually. Even if it wasn't, `% nr` doesn't sound
good, there should be at least hash_32().

Thanks,
Olek

Gong Ruiqi April 24, 2023, 2:54 a.m. UTC | #4

Sorry for the late reply. I just came back from my paternity leave :)

On 2023/04/05 23:15, Alexander Lobakin wrote:
> From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Date: Wed, 5 Apr 2023 21:26:47 +0900
> 
>> ...
>>
>> I'm not yet sure if this feature is appropriate for mainline kernel.
>>
>> I have few questions:
>>
>> 1) What is cost of this configuration, in terms of memory overhead, or
>> execution time?
>>
>>
>> 2) The actual cache depends on caller which is static at build time, not
>> runtime.
>>
>>     What about using (caller ^ (some subsystem-wide random sequence)),
>>
>>     which is static at runtime?
> 
> Why can't we just do
> 
> 	random_get_u32_below(CONFIG_RANDOM_KMALLOC_CACHES_NR)
> 
> ?

This makes the cache selection "dynamic", i.e. each kmalloc() will
randomly pick a different cache at each time it's executed. The problem
of this approach is that it only reduces the probability of the cache
being sprayed by the attacker, and the attacker can bypass it by simply
repeating the attack multiple times in a brute-force manner.

Our proposal is to make the randomness be with respect to the code
address rather than time, i.e. allocations in different code paths would
most likely pick different caches, although kmalloc() at each place
would use the same cache copy whenever it is executed. In this way, the
code path that the attacker uses would most likely pick a different
cache than which the targeted subsystem/driver would pick, which means
in most of cases the heap spraying is unachievable.

> It's fast enough according to Jason... `_RET_IP_ % nr` doesn't sound
> "secure" to me. It really is a compile-time constant, which can be
> calculated (or not?) manually. Even if it wasn't, `% nr` doesn't sound
> good, there should be at least hash_32().

Yes, `_RET_IP_ % nr` is a bit naive. Currently the patch is more like a
PoC so I wrote this. Indeed a proper hash function should be used here.

And yes _RET_IP_ could somehow be manually determined especially for
kernels without KASLR, and I think adding a per-boot random seed into
the selection could solve this.

I will implement these in v2. Thanks!

> 
> Thanks,
> Olek
>

Gong Ruiqi April 24, 2023, 3:15 a.m. UTC | #5

Sorry for the late reply. I just came back from my paternity leave :)

On 2023/04/05 20:26, Hyeonggon Yoo wrote:
> On 3/15/2023 6:54 PM, GONG, Ruiqi wrote:
>> When exploiting memory vulnerabilities, "heap spraying" is a common
>> technique targeting those related to dynamic memory allocation (i.e. the
>> "heap"), and it plays an important role in a successful exploitation.
>> Basically, it is to overwrite the memory area of vulnerable object by
>> triggering allocation in other subsystems or modules and therefore
>> getting a reference to the targeted memory location. It's usable on
>> various types of vulnerablity including use after free (UAF), heap out-
>> of-bound write and etc.
>>
>> There are (at least) two reasons why the heap can be sprayed: 1) generic
>> slab caches are shared among different subsystems and modules, and
>> 2) dedicated slab caches could be merged with the generic ones.
>> Currently these two factors cannot be prevented at a low cost: the first
>> one is a widely used memory allocation mechanism, and shutting down slab
>> merging completely via `slub_nomerge` would be overkill.
>>
>> To efficiently prevent heap spraying, we propose the following approach:
>> to create multiple copies of generic slab caches that will never be
>> merged, and random one of them will be used at allocation. The random
>> selection is based on the location of code that calls `kmalloc()`, which
>> means it is static at runtime (rather than dynamically determined at
>> each time of allocation, which could be bypassed by repeatedly spraying
>> in brute force). In this way, the vulnerable object and memory allocated
>> in other subsystems and modules will (most probably) be on different
>> slab caches, which prevents the object from being sprayed.
>>
>> Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
>> ---
> 
> I'm not yet sure if this feature is appropriate for mainline kernel.
> 
> I have few questions:
> 
> 1) What is cost of this configuration, in terms of memory overhead, or
> execution time? 

I haven't done a throughout test on the runtime overhead yet, but in
theory it won't be large because in essence what it does is to create
some additionally `struct kmem_cache` instances and separate the
management of slab objects from the original one cache to all these caches.

But indeed the test is necessary. I will do it based on the v2 patch.

> 
> 2) The actual cache depends on caller which is static at build time, not
> runtime.
> 
>     What about using (caller ^ (some subsystem-wide random sequence)),
> 
>     which is static at runtime?

Yes it could be better. As I said in my reply to Alexander, I will add a
the per-boot random seed in v2, and I think it's basically the `(some
subsystem-wide random sequence)` you mentioned here.

Alexander Lobakin April 24, 2023, 1:46 p.m. UTC | #6

From: Gong, Ruiqi <gongruiqi1@huawei.com>
Date: Mon, 24 Apr 2023 10:54:33 +0800

> Sorry for the late reply. I just came back from my paternity leave :)
> 
> On 2023/04/05 23:15, Alexander Lobakin wrote:
>> From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> Date: Wed, 5 Apr 2023 21:26:47 +0900
>>
>>> ...
>>>
>>> I'm not yet sure if this feature is appropriate for mainline kernel.
>>>
>>> I have few questions:
>>>
>>> 1) What is cost of this configuration, in terms of memory overhead, or
>>> execution time?
>>>
>>>
>>> 2) The actual cache depends on caller which is static at build time, not
>>> runtime.
>>>
>>>     What about using (caller ^ (some subsystem-wide random sequence)),
>>>
>>>     which is static at runtime?
>>
>> Why can't we just do
>>
>> 	random_get_u32_below(CONFIG_RANDOM_KMALLOC_CACHES_NR)
>>
>> ?
> 
> This makes the cache selection "dynamic", i.e. each kmalloc() will
> randomly pick a different cache at each time it's executed. The problem
> of this approach is that it only reduces the probability of the cache
> being sprayed by the attacker, and the attacker can bypass it by simply
> repeating the attack multiple times in a brute-force manner.
> 
> Our proposal is to make the randomness be with respect to the code
> address rather than time, i.e. allocations in different code paths would
> most likely pick different caches, although kmalloc() at each place
> would use the same cache copy whenever it is executed. In this way, the
> code path that the attacker uses would most likely pick a different
> cache than which the targeted subsystem/driver would pick, which means
> in most of cases the heap spraying is unachievable.

Ah, I see now. Thanks for the explanation, made it really clear.

> 
>> It's fast enough according to Jason... `_RET_IP_ % nr` doesn't sound
>> "secure" to me. It really is a compile-time constant, which can be
>> calculated (or not?) manually. Even if it wasn't, `% nr` doesn't sound
>> good, there should be at least hash_32().
> 
> Yes, `_RET_IP_ % nr` is a bit naive. Currently the patch is more like a
> PoC so I wrote this. Indeed a proper hash function should be used here.
> 
> And yes _RET_IP_ could somehow be manually determined especially for
> kernels without KASLR, and I think adding a per-boot random seed into
> the selection could solve this.

I recall how it is done for kCFI/FineIBT in the x86 code -- it also uses
per-boot random seed (although it gets patched into the code itself each
time, when applying alternatives). So probably should be optimal enough.
The only thing I'm wondering is where to store this per-boot seed :D
It's generic code, so you can't patch it directly. OTOH storing it in
.data/.bss can make it vulnerable to attacks... Can't it?

> 
> I will implement these in v2. Thanks!
> 
>>
>> Thanks,
>> Olek
>>

Thanks,
Olek

Gong Ruiqi April 25, 2023, 3:55 a.m. UTC | #7

On 2023/04/24 21:46, Alexander Lobakin wrote:
> From: Gong, Ruiqi <gongruiqi1@huawei.com>
> Date: Mon, 24 Apr 2023 10:54:33 +0800
> 
> ...
> 
>>
>>> It's fast enough according to Jason... `_RET_IP_ % nr` doesn't sound
>>> "secure" to me. It really is a compile-time constant, which can be
>>> calculated (or not?) manually. Even if it wasn't, `% nr` doesn't sound
>>> good, there should be at least hash_32().
>>
>> Yes, `_RET_IP_ % nr` is a bit naive. Currently the patch is more like a
>> PoC so I wrote this. Indeed a proper hash function should be used here.
>>
>> And yes _RET_IP_ could somehow be manually determined especially for
>> kernels without KASLR, and I think adding a per-boot random seed into
>> the selection could solve this.
> 
> I recall how it is done for kCFI/FineIBT in the x86 code -- it also uses
> per-boot random seed (although it gets patched into the code itself each
> time, when applying alternatives). So probably should be optimal enough.
> The only thing I'm wondering is where to store this per-boot seed :D
> It's generic code, so you can't patch it directly. OTOH storing it in
> .data/.bss can make it vulnerable to attacks... Can't it?

I think marking the seed with __ro_after_init is enough, since we don't
mind it could be read by the attacker.

Given that the code paths the attacker can utilize to spray the heap is
limited, our address-related randomness in most cases prevents
kmalloc()s on these paths from picking the same cache the vulnerable
subsystem/module would pick. Although _RET_IP_ of kmalloc()s could be
known, without tampering the source code and rebuilding the image, the
attacker can't do anything to make those caches collide if the cache
selection algorithm says they don't.

So in my perspective the per-boot random seed is more like an
enhancement: if one day, by analyzing the open source code, the attacker
does find a usable kmalloc that happens to pick the same cache with the
vulnerable subsystem/module, the seed could make his/her effort wasted ;)

> 
>>
>> I will implement these in v2. Thanks!
>>
>>>
>>> Thanks,
>>> Olek
>>>
> 
> Thanks,
> Olek

[RFC] Randomized slab caches for kmalloc()

Commit Message

Comments

Patch