diff mbox series

[v5,2/6] mm/slab: Plumb kmem_buckets into __do_kmalloc_node()

Message ID 20240619193357.1333772-2-kees@kernel.org (mailing list archive)
State Superseded
Headers show
Series slab: Introduce dedicated bucket allocator | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Kees Cook June 19, 2024, 7:33 p.m. UTC
Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
support separated kmalloc buckets (in the following kmem_buckets_create()
patches and future codetag-based separation). Since this will provide
a mitigation for a very common case of exploits, enable it by default.

To be able to choose which buckets to allocate from, make the buckets
available to the internal kmalloc interfaces by adding them as the
first argument, rather than depending on the buckets being chosen from
the fixed set of global buckets. Where the bucket is not available,
pass NULL, which means "use the default system kmalloc bucket set"
(the prior existing behavior), as implemented in kmalloc_slab().

To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
top-level macros and static inlines use the buckets argument (where
they are stripped out and compiled out respectively). The actual extern
functions can then been built without the argument, and the internals
fall back to the global kmalloc buckets unconditionally.

Co-developed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Kees Cook <kees@kernel.org>
---
 include/linux/slab.h | 27 ++++++++++++++++++++++-----
 mm/Kconfig           | 16 ++++++++++++++++
 mm/slab.h            |  6 ++++--
 mm/slab_common.c     |  2 +-
 mm/slub.c            | 20 ++++++++++----------
 5 files changed, 53 insertions(+), 18 deletions(-)

Comments

Vlastimil Babka June 20, 2024, 1:08 p.m. UTC | #1
On 6/19/24 9:33 PM, Kees Cook wrote:
> Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
> support separated kmalloc buckets (in the following kmem_buckets_create()
> patches and future codetag-based separation). Since this will provide
> a mitigation for a very common case of exploits, enable it by default.

No longer "enable it by default".

> 
> To be able to choose which buckets to allocate from, make the buckets
> available to the internal kmalloc interfaces by adding them as the
> first argument, rather than depending on the buckets being chosen from

second argument now

> the fixed set of global buckets. Where the bucket is not available,
> pass NULL, which means "use the default system kmalloc bucket set"
> (the prior existing behavior), as implemented in kmalloc_slab().
> 
> To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
> top-level macros and static inlines use the buckets argument (where
> they are stripped out and compiled out respectively). The actual extern
> functions can then been built without the argument, and the internals
> fall back to the global kmalloc buckets unconditionally.

Also describes the previous implementation and not the new one?

> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -273,6 +273,22 @@ config SLAB_FREELIST_HARDENED
>  	  sacrifices to harden the kernel slab allocator against common
>  	  freelist exploit methods.
>  
> +config SLAB_BUCKETS
> +	bool "Support allocation from separate kmalloc buckets"
> +	depends on !SLUB_TINY
> +	help
> +	  Kernel heap attacks frequently depend on being able to create
> +	  specifically-sized allocations with user-controlled contents
> +	  that will be allocated into the same kmalloc bucket as a
> +	  target object. To avoid sharing these allocation buckets,
> +	  provide an explicitly separated set of buckets to be used for
> +	  user-controlled allocations. This may very slightly increase
> +	  memory fragmentation, though in practice it's only a handful
> +	  of extra pages since the bulk of user-controlled allocations
> +	  are relatively long-lived.
> +
> +	  If unsure, say Y.

I was wondering why I don't see the buckets in slabinfo and turns out it was
SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?
Vlastimil Babka June 20, 2024, 1:37 p.m. UTC | #2
On 6/20/24 3:08 PM, Vlastimil Babka wrote:
> On 6/19/24 9:33 PM, Kees Cook wrote:
> I was wondering why I don't see the buckets in slabinfo and turns out it was
> SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
> depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?

Hm I might have been just blind, can see them there now. Anyway it probably
doesn't make much sense to have SLAB_BUCKETS and/or RANDOM_KMALLOC_CACHES
together with SLAB_MERGE_DEFAULT?
Kees Cook June 20, 2024, 6:41 p.m. UTC | #3
On Thu, Jun 20, 2024 at 03:08:32PM +0200, Vlastimil Babka wrote:
> On 6/19/24 9:33 PM, Kees Cook wrote:
> > Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to
> > support separated kmalloc buckets (in the following kmem_buckets_create()
> > patches and future codetag-based separation). Since this will provide
> > a mitigation for a very common case of exploits, enable it by default.
> 
> No longer "enable it by default".

Whoops! Yes, thank you.

> 
> > 
> > To be able to choose which buckets to allocate from, make the buckets
> > available to the internal kmalloc interfaces by adding them as the
> > first argument, rather than depending on the buckets being chosen from
> 
> second argument now

Fixed.

> 
> > the fixed set of global buckets. Where the bucket is not available,
> > pass NULL, which means "use the default system kmalloc bucket set"
> > (the prior existing behavior), as implemented in kmalloc_slab().
> > 
> > To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the
> > top-level macros and static inlines use the buckets argument (where
> > they are stripped out and compiled out respectively). The actual extern
> > functions can then been built without the argument, and the internals
> > fall back to the global kmalloc buckets unconditionally.
> 
> Also describes the previous implementation and not the new one?

I think this still describes the implementation: the macros are doing
this work now. I wanted to explain in the commit log why the "static
inline"s still have explicit arguments (they will vanish during
inlining), as they are needed to detect the need for the using the
global buckets.

> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -273,6 +273,22 @@ config SLAB_FREELIST_HARDENED
> >  	  sacrifices to harden the kernel slab allocator against common
> >  	  freelist exploit methods.
> >  
> > +config SLAB_BUCKETS
> > +	bool "Support allocation from separate kmalloc buckets"
> > +	depends on !SLUB_TINY
> > +	help
> > +	  Kernel heap attacks frequently depend on being able to create
> > +	  specifically-sized allocations with user-controlled contents
> > +	  that will be allocated into the same kmalloc bucket as a
> > +	  target object. To avoid sharing these allocation buckets,
> > +	  provide an explicitly separated set of buckets to be used for
> > +	  user-controlled allocations. This may very slightly increase
> > +	  memory fragmentation, though in practice it's only a handful
> > +	  of extra pages since the bulk of user-controlled allocations
> > +	  are relatively long-lived.
> > +
> > +	  If unsure, say Y.
> 
> I was wondering why I don't see the buckets in slabinfo and turns out it was
> SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
> depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?

You mention this was a misunderstanding in the next email, but just to
reply here: I explicitly use SLAB_NO_MERGE, so if it ever DOES become
invisible, then yes, that would be unexpected!

Thanks for the review!
Kees Cook June 20, 2024, 6:46 p.m. UTC | #4
On Thu, Jun 20, 2024 at 03:37:31PM +0200, Vlastimil Babka wrote:
> On 6/20/24 3:08 PM, Vlastimil Babka wrote:
> > On 6/19/24 9:33 PM, Kees Cook wrote:
> > I was wondering why I don't see the buckets in slabinfo and turns out it was
> > SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
> > depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?
> 
> Hm I might have been just blind, can see them there now. Anyway it probably
> doesn't make much sense to have SLAB_BUCKETS and/or RANDOM_KMALLOC_CACHES
> together with SLAB_MERGE_DEFAULT?

It's already handled so that the _other_ caches can still be merged if
people want it. See new_kmalloc_cache():

#ifdef CONFIG_RANDOM_KMALLOC_CACHES
        if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END)
                flags |= SLAB_NO_MERGE;
#endif
Vlastimil Babka June 20, 2024, 8:44 p.m. UTC | #5
On 6/20/24 8:46 PM, Kees Cook wrote:
> On Thu, Jun 20, 2024 at 03:37:31PM +0200, Vlastimil Babka wrote:
>> On 6/20/24 3:08 PM, Vlastimil Babka wrote:
>> > On 6/19/24 9:33 PM, Kees Cook wrote:
>> > I was wondering why I don't see the buckets in slabinfo and turns out it was
>> > SLAB_MERGE_DEFAULT. It would probably make sense for SLAB_MERGE_DEFAULT to
>> > depends on !SLAB_BUCKETS now as the merging defeats the purpose, wdyt?
>> 
>> Hm I might have been just blind, can see them there now. Anyway it probably
>> doesn't make much sense to have SLAB_BUCKETS and/or RANDOM_KMALLOC_CACHES
>> together with SLAB_MERGE_DEFAULT?
> 
> It's already handled so that the _other_ caches can still be merged if
> people want it. See new_kmalloc_cache():
> 
> #ifdef CONFIG_RANDOM_KMALLOC_CACHES
>         if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END)
>                 flags |= SLAB_NO_MERGE;
> #endif

OK
diff mbox series

Patch

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 8a006fac57c6..708bde6039f0 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -570,6 +570,21 @@  void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
 				   int node) __assume_slab_alignment __malloc;
 #define kmem_cache_alloc_node(...)	alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))
 
+/*
+ * These macros allow declaring a kmem_buckets * parameter alongside size, which
+ * can be compiled out with CONFIG_SLAB_BUCKETS=n so that a large number of call
+ * sites don't have to pass NULL.
+ */
+#ifdef CONFIG_SLAB_BUCKETS
+#define DECL_BUCKET_PARAMS(_size, _b)	size_t (_size), kmem_buckets *(_b)
+#define PASS_BUCKET_PARAMS(_size, _b)	(_size), (_b)
+#define PASS_BUCKET_PARAM(_b)		(_b)
+#else
+#define DECL_BUCKET_PARAMS(_size, _b)	size_t (_size)
+#define PASS_BUCKET_PARAMS(_size, _b)	(_size)
+#define PASS_BUCKET_PARAM(_b)		NULL
+#endif
+
 /*
  * The following functions are not to be used directly and are intended only
  * for internal use from kmalloc() and kmalloc_node()
@@ -579,7 +594,7 @@  void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
 void *__kmalloc_noprof(size_t size, gfp_t flags)
 				__assume_kmalloc_alignment __alloc_size(1);
 
-void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node)
+void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 				__assume_kmalloc_alignment __alloc_size(1);
 
 void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
@@ -679,7 +694,7 @@  static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gf
 				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
 				flags, node, size);
 	}
-	return __kmalloc_node_noprof(size, flags, node);
+	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node);
 }
 #define kmalloc_node(...)			alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))
 
@@ -730,8 +745,10 @@  static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(voi
  */
 #define kcalloc(n, size, flags)		kmalloc_array(n, size, (flags) | __GFP_ZERO)
 
-void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags, int node,
-				  unsigned long caller) __alloc_size(1);
+void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node,
+					 unsigned long caller) __alloc_size(1);
+#define kmalloc_node_track_caller_noprof(size, flags, node, caller) \
+	__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller)
 #define kmalloc_node_track_caller(...)		\
 	alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))
 
@@ -757,7 +774,7 @@  static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_
 		return NULL;
 	if (__builtin_constant_p(n) && __builtin_constant_p(size))
 		return kmalloc_node_noprof(bytes, flags, node);
-	return __kmalloc_node_noprof(bytes, flags, node);
+	return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(bytes, NULL), flags, node);
 }
 #define kmalloc_array_node(...)			alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__))
 
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a54..20bb71e241c3 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -273,6 +273,22 @@  config SLAB_FREELIST_HARDENED
 	  sacrifices to harden the kernel slab allocator against common
 	  freelist exploit methods.
 
+config SLAB_BUCKETS
+	bool "Support allocation from separate kmalloc buckets"
+	depends on !SLUB_TINY
+	help
+	  Kernel heap attacks frequently depend on being able to create
+	  specifically-sized allocations with user-controlled contents
+	  that will be allocated into the same kmalloc bucket as a
+	  target object. To avoid sharing these allocation buckets,
+	  provide an explicitly separated set of buckets to be used for
+	  user-controlled allocations. This may very slightly increase
+	  memory fragmentation, though in practice it's only a handful
+	  of extra pages since the bulk of user-controlled allocations
+	  are relatively long-lived.
+
+	  If unsure, say Y.
+
 config SLUB_STATS
 	default n
 	bool "Enable performance statistics"
diff --git a/mm/slab.h b/mm/slab.h
index b16e63191578..d5e8034af9d5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -403,16 +403,18 @@  static inline unsigned int size_index_elem(unsigned int bytes)
  * KMALLOC_MAX_CACHE_SIZE and the caller must check that.
  */
 static inline struct kmem_cache *
-kmalloc_slab(size_t size, gfp_t flags, unsigned long caller)
+kmalloc_slab(size_t size, kmem_buckets *b, gfp_t flags, unsigned long caller)
 {
 	unsigned int index;
 
+	if (!b)
+		b = &kmalloc_caches[kmalloc_type(flags, caller)];
 	if (size <= 192)
 		index = kmalloc_size_index[size_index_elem(size)];
 	else
 		index = fls(size - 1);
 
-	return kmalloc_caches[kmalloc_type(flags, caller)][index];
+	return (*b)[index];
 }
 
 gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index e0b1c109bed2..9b0f2ef951f1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -702,7 +702,7 @@  size_t kmalloc_size_roundup(size_t size)
 		 * The flags don't matter since size_index is common to all.
 		 * Neither does the caller for just getting ->object_size.
 		 */
-		return kmalloc_slab(size, GFP_KERNEL, 0)->object_size;
+		return kmalloc_slab(size, NULL, GFP_KERNEL, 0)->object_size;
 	}
 
 	/* Above the smaller buckets, size is a multiple of page size. */
diff --git a/mm/slub.c b/mm/slub.c
index 3d19a0ee411f..80f0a51242d1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4117,7 +4117,7 @@  void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
 EXPORT_SYMBOL(__kmalloc_large_node_noprof);
 
 static __always_inline
-void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
+void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
 			unsigned long caller)
 {
 	struct kmem_cache *s;
@@ -4133,32 +4133,32 @@  void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;
 
-	s = kmalloc_slab(size, flags, caller);
+	s = kmalloc_slab(size, b, flags, caller);
 
 	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
 	ret = kasan_kmalloc(s, ret, size, flags);
 	trace_kmalloc(caller, ret, size, s->size, flags, node);
 	return ret;
 }
-
-void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node)
+void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 {
-	return __do_kmalloc_node(size, flags, node, _RET_IP_);
+	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc_node_noprof);
 
 void *__kmalloc_noprof(size_t size, gfp_t flags)
 {
-	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+	return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc_noprof);
 
-void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags,
-				       int node, unsigned long caller)
+void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags,
+					 int node, unsigned long caller)
 {
-	return __do_kmalloc_node(size, flags, node, caller);
+	return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node, caller);
+
 }
-EXPORT_SYMBOL(kmalloc_node_track_caller_noprof);
+EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
 
 void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {