Message ID | 20240305101026.694758-4-keescook@chromium.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | slab: Introduce dedicated bucket allocator | expand |
On Tue, Mar 05, 2024 at 02:10:20AM -0800, Kees Cook wrote: > Dedicated caches are available For fixed size allocations via > kmem_cache_alloc(), but for dynamically sized allocations there is only > the global kmalloc API's set of buckets available. This means it isn't > possible to separate specific sets of dynamically sized allocations into > a separate collection of caches. > > This leads to a use-after-free exploitation weakness in the Linux > kernel since many heap memory spraying/grooming attacks depend on using > userspace-controllable dynamically sized allocations to collide with > fixed size allocations that end up in same cache. > > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense > against these kinds of "type confusion" attacks, including for fixed > same-size heap objects, we can create a complementary deterministic > defense for dynamically sized allocations. > > In order to isolate user-controllable sized allocations from system > allocations, introduce kmem_buckets_create(), which behaves like > kmem_cache_create(). (The next patch will introduce kmem_buckets_alloc(), > which behaves like kmem_cache_alloc().) > > Allows for confining allocations to a dedicated set of sized caches > (which have the same layout as the kmalloc caches). > > This can also be used in the future once codetag allocation annotations > exist to implement per-caller allocation cache isolation[1] even for > dynamic allocations. > > Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] > Signed-off-by: Kees Cook <keescook@chromium.org> > --- > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Christoph Lameter <cl@linux.com> > Cc: Pekka Enberg <penberg@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > Cc: linux-mm@kvack.org > --- > include/linux/slab.h | 5 +++ > mm/slab_common.c | 72 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 77 insertions(+) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index f26ac9a6ef9f..058d0e3cd181 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -493,6 +493,11 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, > gfp_t gfpflags) __assume_slab_alignment __malloc; > void kmem_cache_free(struct kmem_cache *s, void *objp); > > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align, > + slab_flags_t flags, > + unsigned int useroffset, unsigned int usersize, > + void (*ctor)(void *)); I'd prefer an API that initialized an object over one that allocates it - that is, prefer kmem_buckets_init(kmem_buckets *bucekts, ...) by forcing it to be separately allocated, you're adding a pointer deref to every access. That would also allow for kmem_buckets to be lazily initialized, which would play nicely declaring the kmem_buckets in the alloc_hooks() macro. I'm curious what all the arguments to kmem_buckets_create() are needed for, if this is supposed to be a replacement for kmalloc() users.
On Mon, Mar 25, 2024 at 03:40:51PM -0400, Kent Overstreet wrote: > On Tue, Mar 05, 2024 at 02:10:20AM -0800, Kees Cook wrote: > > Dedicated caches are available For fixed size allocations via > > kmem_cache_alloc(), but for dynamically sized allocations there is only > > the global kmalloc API's set of buckets available. This means it isn't > > possible to separate specific sets of dynamically sized allocations into > > a separate collection of caches. > > > > This leads to a use-after-free exploitation weakness in the Linux > > kernel since many heap memory spraying/grooming attacks depend on using > > userspace-controllable dynamically sized allocations to collide with > > fixed size allocations that end up in same cache. > > > > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense > > against these kinds of "type confusion" attacks, including for fixed > > same-size heap objects, we can create a complementary deterministic > > defense for dynamically sized allocations. > > > > In order to isolate user-controllable sized allocations from system > > allocations, introduce kmem_buckets_create(), which behaves like > > kmem_cache_create(). (The next patch will introduce kmem_buckets_alloc(), > > which behaves like kmem_cache_alloc().) > > > > Allows for confining allocations to a dedicated set of sized caches > > (which have the same layout as the kmalloc caches). > > > > This can also be used in the future once codetag allocation annotations > > exist to implement per-caller allocation cache isolation[1] even for > > dynamic allocations. > > > > Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] > > Signed-off-by: Kees Cook <keescook@chromium.org> > > --- > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Christoph Lameter <cl@linux.com> > > Cc: Pekka Enberg <penberg@kernel.org> > > Cc: David Rientjes <rientjes@google.com> > > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > Cc: linux-mm@kvack.org > > --- > > include/linux/slab.h | 5 +++ > > mm/slab_common.c | 72 ++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 77 insertions(+) > > > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > index f26ac9a6ef9f..058d0e3cd181 100644 > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -493,6 +493,11 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, > > gfp_t gfpflags) __assume_slab_alignment __malloc; > > void kmem_cache_free(struct kmem_cache *s, void *objp); > > > > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align, > > + slab_flags_t flags, > > + unsigned int useroffset, unsigned int usersize, > > + void (*ctor)(void *)); > > I'd prefer an API that initialized an object over one that allocates it > - that is, prefer > > kmem_buckets_init(kmem_buckets *bucekts, ...) Sure, that can work. kmem_cache_init() would need to exist for the same reason though. > > by forcing it to be separately allocated, you're adding a pointer deref > to every access. I don't understand what you mean here. "every access"? I take a guess below... > That would also allow for kmem_buckets to be lazily initialized, which > would play nicely declaring the kmem_buckets in the alloc_hooks() macro. Sure, I think it'll depend on how the per-site allocations got wired up. I think you're meaning to include a full copy of the kmem cache/bucket struct with the codetag instead of just a pointer? I don't think that'll work well to make it runtime selectable, and I don't see it using an extra deref -- allocations already get the struct from somewhere and deref it. The only change is where to find the struct. > I'm curious what all the arguments to kmem_buckets_create() are needed > for, if this is supposed to be a replacement for kmalloc() users. Are you confusing kmem_buckets_create() with kmem_buckets_alloc()? These args are needed to initialize the per-bucket caches, just like is already done for the global kmalloc per-bucket caches. This mirrors kmem_cache_create(). (Or more specifically, calls kmem_cache_create() for each bucket size, so the args need to be passed through.) If you mean "why expose these arguments because they can just use the existing defaults already used by the global kmalloc caches" then I would say, it's to gain the benefit here of narrowing the scope of the usercopy offsets. Right now kmalloc is forced to allow the full usercopy window into an allocation, but we don't have to do this any more. For example, see patch 8, where struct msg_msg doesn't need to expose the header to userspace: msg_buckets = kmem_buckets_create("msg_msg", 0, SLAB_ACCOUNT, sizeof(struct msg_msg), DATALEN_MSG, NULL); Only DATALEN_MSG many bytes, starting at sizeof(struct msg_msg), will be allowed to be copied in/out of userspace. Before, it was unbounded. -Kees
On Mon, Mar 25, 2024 at 01:40:34PM -0700, Kees Cook wrote: > On Mon, Mar 25, 2024 at 03:40:51PM -0400, Kent Overstreet wrote: > > On Tue, Mar 05, 2024 at 02:10:20AM -0800, Kees Cook wrote: > > > Dedicated caches are available For fixed size allocations via > > > kmem_cache_alloc(), but for dynamically sized allocations there is only > > > the global kmalloc API's set of buckets available. This means it isn't > > > possible to separate specific sets of dynamically sized allocations into > > > a separate collection of caches. > > > > > > This leads to a use-after-free exploitation weakness in the Linux > > > kernel since many heap memory spraying/grooming attacks depend on using > > > userspace-controllable dynamically sized allocations to collide with > > > fixed size allocations that end up in same cache. > > > > > > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense > > > against these kinds of "type confusion" attacks, including for fixed > > > same-size heap objects, we can create a complementary deterministic > > > defense for dynamically sized allocations. > > > > > > In order to isolate user-controllable sized allocations from system > > > allocations, introduce kmem_buckets_create(), which behaves like > > > kmem_cache_create(). (The next patch will introduce kmem_buckets_alloc(), > > > which behaves like kmem_cache_alloc().) > > > > > > Allows for confining allocations to a dedicated set of sized caches > > > (which have the same layout as the kmalloc caches). > > > > > > This can also be used in the future once codetag allocation annotations > > > exist to implement per-caller allocation cache isolation[1] even for > > > dynamic allocations. > > > > > > Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] > > > Signed-off-by: Kees Cook <keescook@chromium.org> > > > --- > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > Cc: Christoph Lameter <cl@linux.com> > > > Cc: Pekka Enberg <penberg@kernel.org> > > > Cc: David Rientjes <rientjes@google.com> > > > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > Cc: linux-mm@kvack.org > > > --- > > > include/linux/slab.h | 5 +++ > > > mm/slab_common.c | 72 ++++++++++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 77 insertions(+) > > > > > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > > index f26ac9a6ef9f..058d0e3cd181 100644 > > > --- a/include/linux/slab.h > > > +++ b/include/linux/slab.h > > > @@ -493,6 +493,11 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, > > > gfp_t gfpflags) __assume_slab_alignment __malloc; > > > void kmem_cache_free(struct kmem_cache *s, void *objp); > > > > > > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align, > > > + slab_flags_t flags, > > > + unsigned int useroffset, unsigned int usersize, > > > + void (*ctor)(void *)); > > > > I'd prefer an API that initialized an object over one that allocates it > > - that is, prefer > > > > kmem_buckets_init(kmem_buckets *bucekts, ...) > > Sure, that can work. kmem_cache_init() would need to exist for the same > reason though. That'll be a very worthwhile addition too; IPC running kernel code is always crap and dependent loads is a big part of that. I did mempool_init() and bioset_init() awhile back, so it's someone else's turn for this one :) > Sure, I think it'll depend on how the per-site allocations got wired up. > I think you're meaning to include a full copy of the kmem cache/bucket > struct with the codetag instead of just a pointer? I don't think that'll > work well to make it runtime selectable, and I don't see it using an > extra deref -- allocations already get the struct from somewhere and > deref it. The only change is where to find the struct. The codetags are in their own dedicated elf sections already, so if you put the kmem_buckets in the codetag the entire elf section can be discarded if it's not in use. Also, the issue isn't derefs - it's dependent loads and locality. Taking the address of the kmem_buckets to pass it is fine; the data referred to will still get pulled into cache when we touch the codetag. If it's behind a pointer we have to pull the codetag into cache, wait for that so we can get the kmme_buckets pointer - then start to pull in the kmem_buckets itself. If it's a cache miss you just slowed the entire allocation down by around 30 ns. > > I'm curious what all the arguments to kmem_buckets_create() are needed > > for, if this is supposed to be a replacement for kmalloc() users. > > Are you confusing kmem_buckets_create() with kmem_buckets_alloc()? These > args are needed to initialize the per-bucket caches, just like is > already done for the global kmalloc per-bucket caches. This mirrors > kmem_cache_create(). (Or more specifically, calls kmem_cache_create() > for each bucket size, so the args need to be passed through.) > > If you mean "why expose these arguments because they can just use the > existing defaults already used by the global kmalloc caches" then I > would say, it's to gain the benefit here of narrowing the scope of the > usercopy offsets. Right now kmalloc is forced to allow the full usercopy > window into an allocation, but we don't have to do this any more. For > example, see patch 8, where struct msg_msg doesn't need to expose the > header to userspace: "usercopy window"? You're now annotating which data can be copied to userspace? I'm skeptical, this looks like defensive programming gone amuck to me. > msg_buckets = kmem_buckets_create("msg_msg", 0, SLAB_ACCOUNT, > sizeof(struct msg_msg), > DATALEN_MSG, NULL);
On Mon, Mar 25, 2024 at 05:49:49PM -0400, Kent Overstreet wrote: > The codetags are in their own dedicated elf sections already, so if you > put the kmem_buckets in the codetag the entire elf section can be > discarded if it's not in use. Gotcha. Yeah, sounds good. Once codetags and this series land, I can start working on making the per-site series. > "usercopy window"? You're now annotating which data can be copied to > userspace? Hm? Yes. That's been there for over 7 years. :) It's just that it was only meaningful for kmem_cache_create() users, since the proposed GFP_USERCOPY for kmalloc() never landed[1]. -Kees [1] https://lore.kernel.org/lkml/1497915397-93805-23-git-send-email-keescook@chromium.org/
diff --git a/include/linux/slab.h b/include/linux/slab.h index f26ac9a6ef9f..058d0e3cd181 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -493,6 +493,11 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags) __assume_slab_alignment __malloc; void kmem_cache_free(struct kmem_cache *s, void *objp); +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align, + slab_flags_t flags, + unsigned int useroffset, unsigned int usersize, + void (*ctor)(void *)); + /* * Bulk allocation and freeing operations. These are accelerated in an * allocator specific way to avoid taking locks repeatedly or building diff --git a/mm/slab_common.c b/mm/slab_common.c index 1d0f25b6ae91..03ba9aac96b6 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -392,6 +392,74 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align, } EXPORT_SYMBOL(kmem_cache_create); +static struct kmem_cache *kmem_buckets_cache __ro_after_init; + +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align, + slab_flags_t flags, + unsigned int useroffset, + unsigned int usersize, + void (*ctor)(void *)) +{ + kmem_buckets *b; + int idx; + + if (WARN_ON(!kmem_buckets_cache)) + return NULL; + + b = kmem_cache_alloc(kmem_buckets_cache, GFP_KERNEL|__GFP_ZERO); + if (WARN_ON(!b)) + return NULL; + + flags |= SLAB_NO_MERGE; + + for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) { + char *short_size, *cache_name; + unsigned int cache_useroffset, cache_usersize; + unsigned int size; + + if (!kmalloc_caches[KMALLOC_NORMAL][idx]) + continue; + + size = kmalloc_caches[KMALLOC_NORMAL][idx]->object_size; + if (!size) + continue; + + short_size = strchr(kmalloc_caches[KMALLOC_NORMAL][idx]->name, '-'); + if (WARN_ON(!short_size)) + goto fail; + + cache_name = kasprintf(GFP_KERNEL, "%s-%s", name, short_size + 1); + if (WARN_ON(!cache_name)) + goto fail; + + if (useroffset >= size) { + cache_useroffset = 0; + cache_usersize = 0; + } else { + cache_useroffset = useroffset; + cache_usersize = min(size - cache_useroffset, usersize); + } + (*b)[idx] = kmem_cache_create_usercopy(cache_name, size, + align, flags, cache_useroffset, + cache_usersize, ctor); + kfree(cache_name); + if (WARN_ON(!(*b)[idx])) + goto fail; + } + + return b; + +fail: + for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) { + if ((*b)[idx]) + kmem_cache_destroy((*b)[idx]); + } + kfree(b); + + return NULL; +} +EXPORT_SYMBOL(kmem_buckets_create); + #ifdef SLAB_SUPPORTS_SYSFS /* * For a given kmem_cache, kmem_cache_destroy() should only be called @@ -933,6 +1001,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) /* Kmalloc array is now usable */ slab_state = UP; + + kmem_buckets_cache = kmem_cache_create("kmalloc_buckets", + sizeof(kmem_buckets), + 0, 0, NULL); } /**
Dedicated caches are available For fixed size allocations via kmem_cache_alloc(), but for dynamically sized allocations there is only the global kmalloc API's set of buckets available. This means it isn't possible to separate specific sets of dynamically sized allocations into a separate collection of caches. This leads to a use-after-free exploitation weakness in the Linux kernel since many heap memory spraying/grooming attacks depend on using userspace-controllable dynamically sized allocations to collide with fixed size allocations that end up in same cache. While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense against these kinds of "type confusion" attacks, including for fixed same-size heap objects, we can create a complementary deterministic defense for dynamically sized allocations. In order to isolate user-controllable sized allocations from system allocations, introduce kmem_buckets_create(), which behaves like kmem_cache_create(). (The next patch will introduce kmem_buckets_alloc(), which behaves like kmem_cache_alloc().) Allows for confining allocations to a dedicated set of sized caches (which have the same layout as the kmalloc caches). This can also be used in the future once codetag allocation annotations exist to implement per-caller allocation cache isolation[1] even for dynamic allocations. Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] Signed-off-by: Kees Cook <keescook@chromium.org> --- Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: linux-mm@kvack.org --- include/linux/slab.h | 5 +++ mm/slab_common.c | 72 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+)