mbox series

[v6,0/6] slab: Introduce dedicated bucket allocator

Message ID 20240701190152.it.631-kees@kernel.org (mailing list archive)
Headers show
Series slab: Introduce dedicated bucket allocator | expand

Message

Kees Cook July 1, 2024, 7:12 p.m. UTC
Hi,

 v6:
  - update commit logs:
    - update description of Kconfig default and API details
    - call out usage of SLAB_NO_MERGE
    - improve description of cross-cache attacks
    - fix typos
  - add kern-doc parsing of DECL_BUCKET_PARAMS macro
  - add kern-doc to kmem_buckets_create()
  - have CONFIG_SLAB_BUCKETS default to CONFIG_SLAB_FREELIST_HARDENED
  - add CONFIG_SLAB_BUCKETS to hardening.config
  - drop alignment argument from kmem_buckets_create()
 v5: https://lore.kernel.org/lkml/20240619192131.do.115-kees@kernel.org
 v4: https://lore.kernel.org/lkml/20240531191304.it.853-kees@kernel.org/
 v3: https://lore.kernel.org/lkml/20240424213019.make.366-kees@kernel.org/
 v2: https://lore.kernel.org/lkml/20240305100933.it.923-kees@kernel.org/
 v1: https://lore.kernel.org/lkml/20240304184252.work.496-kees@kernel.org/

For the cover letter, I'm repeating the commit log for patch 4 here,
which has the more complete rationale:

    Dedicated caches are available for fixed size allocations via
    kmem_cache_alloc(), but for dynamically sized allocations there is only
    the global kmalloc API's set of buckets available. This means it isn't
    possible to separate specific sets of dynamically sized allocations into
    a separate collection of caches.

    This leads to a use-after-free exploitation weakness in the Linux
    kernel since many heap memory spraying/grooming attacks depend on using
    userspace-controllable dynamically sized allocations to collide with
    fixed size allocations that end up in same cache.

    While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
    against these kinds of "type confusion" attacks, including for fixed
    same-size heap objects, we can create a complementary deterministic
    defense for dynamically sized allocations that are directly user
    controlled. Addressing these cases is limited in scope, so isolating these
    kinds of interfaces will not become an unbounded game of whack-a-mole. For
    example, many pass through memdup_user(), making isolation there very
    effective.

    In order to isolate user-controllable dynamically-sized
    allocations from the common system kmalloc allocations, introduce
    kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
    kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
    kmem_buckets_alloc_track_caller() for where caller tracking is
    needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
    is needed. Note that these caches are specifically flagged with
    SLAB_NO_MERGE, since merging would defeat the entire purpose of the
    mitigation.

    This can also be used in the future to extend allocation profiling's use
    of code tagging to implement per-caller allocation cache isolation[1]
    even for dynamic allocations.

    Memory allocation pinning[2] is still needed to plug the Use-After-Free
    cross-allocator weakness (where attackers can arrange to free an
    entire slab page and have it reallocated to a different cache),
    but that is an existing and separate issue which is complementary
    to this improvement. Development continues for that feature via the
    SLAB_VIRTUAL[3] series (which could also provide guard pages -- another
    complementary improvement).

    Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
    Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
    Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]

After the core implementation are 2 patches that cover the most heavily
abused "repeat offenders" used in exploits. Repeating those details here:

    The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
    use-after-free type confusion flaws in the kernel for both read and
    write primitives. Avoid having a user-controlled size cache share the
    global kmalloc allocator by using a separate set of kmalloc buckets.

    Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
    Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
    Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
    Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
    Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
    Link: https://zplin.me/papers/ELOISE.pdf [6]
    Link: https://syst3mfailure.io/wall-of-perdition/ [7]

    Both memdup_user() and vmemdup_user() handle allocations that are
    regularly used for exploiting use-after-free type confusion flaws in
    the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
    respectively).

    Since both are designed for contents coming from userspace, it allows
    for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
    buckets so these allocations do not share caches with the global kmalloc
    buckets.

    Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
    Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
    Link: https://etenal.me/archives/1336 [3]
    Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]

Thanks!

-Kees

Kees Cook (6):
  mm/slab: Introduce kmem_buckets typedef
  mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets
    argument
  mm/slab: Introduce kmem_buckets_create() and family
  ipc, msg: Use dedicated slab buckets for alloc_msg()
  mm/util: Use dedicated slab buckets for memdup_user()

 include/linux/slab.h            |  48 ++++++++++++---
 ipc/msgutil.c                   |  13 +++-
 kernel/configs/hardening.config |   1 +
 mm/Kconfig                      |  17 ++++++
 mm/slab.h                       |   6 +-
 mm/slab_common.c                | 101 +++++++++++++++++++++++++++++++-
 mm/slub.c                       |  20 +++----
 mm/util.c                       |  23 ++++++--
 scripts/kernel-doc              |   1 +
 9 files changed, 200 insertions(+), 30 deletions(-)

Comments

Vlastimil Babka July 2, 2024, 9:24 a.m. UTC | #1
On 7/1/24 9:12 PM, Kees Cook wrote:
> 
> Kees Cook (6):
>   mm/slab: Introduce kmem_buckets typedef
>   mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
>   mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets
>     argument
>   mm/slab: Introduce kmem_buckets_create() and family
>   ipc, msg: Use dedicated slab buckets for alloc_msg()
>   mm/util: Use dedicated slab buckets for memdup_user()

pushed to slab/for-6.11/buckets, slab/for-next

Thanks!

> 
>  include/linux/slab.h            |  48 ++++++++++++---
>  ipc/msgutil.c                   |  13 +++-
>  kernel/configs/hardening.config |   1 +
>  mm/Kconfig                      |  17 ++++++
>  mm/slab.h                       |   6 +-
>  mm/slab_common.c                | 101 +++++++++++++++++++++++++++++++-
>  mm/slub.c                       |  20 +++----
>  mm/util.c                       |  23 ++++++--
>  scripts/kernel-doc              |   1 +
>  9 files changed, 200 insertions(+), 30 deletions(-)
>
Kees Cook July 2, 2024, 8:12 p.m. UTC | #2
On Tue, Jul 02, 2024 at 11:24:57AM +0200, Vlastimil Babka wrote:
> On 7/1/24 9:12 PM, Kees Cook wrote:
> > 
> > Kees Cook (6):
> >   mm/slab: Introduce kmem_buckets typedef
> >   mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
> >   mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets
> >     argument
> >   mm/slab: Introduce kmem_buckets_create() and family
> >   ipc, msg: Use dedicated slab buckets for alloc_msg()
> >   mm/util: Use dedicated slab buckets for memdup_user()
> 
> pushed to slab/for-6.11/buckets, slab/for-next

Great! Thanks for the review and improvements! :) I'll get started on
the next step getting it hooked up to the codetag bits...