Message ID | 20240911064535.557650-1-feng.tang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/slub: Improve data handling of krealloc() when orig_size is enabled | expand |
On 9/11/24 08:45, Feng Tang wrote: > Danilo Krummrich's patch [1] raised one problem about krealloc() that > its caller doesn't pass the old request size, say the object is 64 > bytes kmalloc one, but caller originally only requested 48 bytes. Then > when krealloc() shrinks or grows in the same object, or allocate a new > bigger object, it lacks this 'original size' information to do accurate > data preserving or zeroing (when __GFP_ZERO is set). > > Thus with slub debug redzone and object tracking enabled, parts of the > object after krealloc() might contain redzone data instead of zeroes, > which is violating the __GFP_ZERO guarantees. Good thing is in this > case, kmalloc caches do have this 'orig_size' feature, which could be > used to improve the situation here. > > To make the 'orig_size' accurate, we adjust some kasan/slub meta data > handling. Also add a slub kunit test case for krealloc(). > > This patchset has dependency over patches in both -mm tree and -slab > trees, so it is written based on linux-next tree '20240910' version. > > [1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/ Thanks, added to slab/for-next > > Thanks, > Feng > > Changelog: > > Since v1: > * Drop the patch changing generic kunit code from this patchset, > and will send it separately. > * Separate the krealloc moving form slab_common.c to slub.c to a > new patch for better review (Danilo/Vlastimil) > * Improve commit log and comments (Vlastimil/Danilo) > * Rework the kunit test case to remove its dependency over > slub_debug (which is incomplete in v1) (Vlastimil) > * Add ack and review tag from developers. > > Feng Tang (5): > mm/kasan: Don't store metadata inside kmalloc object when > slub_debug_orig_size is on > mm/slub: Consider kfence case for get_orig_size() > mm/slub: Move krealloc() and related code to slub.c > mm/slub: Improve redzone check and zeroing for krealloc() > mm/slub, kunit: Add testcase for krealloc redzone and zeroing > > lib/slub_kunit.c | 42 +++++++++++++++ > mm/kasan/generic.c | 7 ++- > mm/slab.h | 6 +++ > mm/slab_common.c | 84 ------------------------------ > mm/slub.c | 125 ++++++++++++++++++++++++++++++++++++++------- > 5 files changed, 160 insertions(+), 104 deletions(-) >
On Wed, 2 Oct 2024 at 12:42, Vlastimil Babka <vbabka@suse.cz> wrote: > > On 9/11/24 08:45, Feng Tang wrote: > > Danilo Krummrich's patch [1] raised one problem about krealloc() that > > its caller doesn't pass the old request size, say the object is 64 > > bytes kmalloc one, but caller originally only requested 48 bytes. Then > > when krealloc() shrinks or grows in the same object, or allocate a new > > bigger object, it lacks this 'original size' information to do accurate > > data preserving or zeroing (when __GFP_ZERO is set). > > > > Thus with slub debug redzone and object tracking enabled, parts of the > > object after krealloc() might contain redzone data instead of zeroes, > > which is violating the __GFP_ZERO guarantees. Good thing is in this > > case, kmalloc caches do have this 'orig_size' feature, which could be > > used to improve the situation here. > > > > To make the 'orig_size' accurate, we adjust some kasan/slub meta data > > handling. Also add a slub kunit test case for krealloc(). > > > > This patchset has dependency over patches in both -mm tree and -slab > > trees, so it is written based on linux-next tree '20240910' version. > > > > [1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/ > > Thanks, added to slab/for-next This series just hit -next, and we're seeing several "KFENCE: memory corruption ...". Here's one: https://lore.kernel.org/all/66ff8bf6.050a0220.49194.0453.GAE@google.com/ One more (no link): > ================================================================== > BUG: KFENCE: memory corruption in xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051 > > Corrupted memory at 0xffff88823bf5a0d0 [ 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ] (in kfence-#172): > xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051 > xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062 > xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145 > rcu_do_batch kernel/rcu/tree.c:2567 [inline] [...] > > kfence-#172: 0xffff88823bf5a000-0xffff88823bf5a0cf, size=208, cache=kmalloc-256 > > allocated by task 5494 on cpu 0 at 101.266046s (0.409225s ago): > __do_krealloc mm/slub.c:4784 [inline] > krealloc_noprof+0xd6/0x2e0 mm/slub.c:4838 > xfs_iext_realloc_root fs/xfs/libxfs/xfs_iext_tree.c:613 [inline] [...] > > freed by task 16 on cpu 0 at 101.573936s (0.186416s ago): > xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051 > xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062 > xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145 [...] > > CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1-next-20241003-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 > ================================================================== Unfortunately there's no reproducer yet it seems. Unless it's immediately obvious to say what's wrong, is it possible to take this series out of -next to confirm this series is causing the memory corruptions? Syzbot should then stop finding these crashes. Thanks, -- Marco
On 10/4/24 08:44, Marco Elver wrote: > On Wed, 2 Oct 2024 at 12:42, Vlastimil Babka <vbabka@suse.cz> wrote: >> >> On 9/11/24 08:45, Feng Tang wrote: >> > Danilo Krummrich's patch [1] raised one problem about krealloc() that >> > its caller doesn't pass the old request size, say the object is 64 >> > bytes kmalloc one, but caller originally only requested 48 bytes. Then >> > when krealloc() shrinks or grows in the same object, or allocate a new >> > bigger object, it lacks this 'original size' information to do accurate >> > data preserving or zeroing (when __GFP_ZERO is set). >> > >> > Thus with slub debug redzone and object tracking enabled, parts of the >> > object after krealloc() might contain redzone data instead of zeroes, >> > which is violating the __GFP_ZERO guarantees. Good thing is in this >> > case, kmalloc caches do have this 'orig_size' feature, which could be >> > used to improve the situation here. >> > >> > To make the 'orig_size' accurate, we adjust some kasan/slub meta data >> > handling. Also add a slub kunit test case for krealloc(). >> > >> > This patchset has dependency over patches in both -mm tree and -slab >> > trees, so it is written based on linux-next tree '20240910' version. >> > >> > [1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/ >> >> Thanks, added to slab/for-next > > This series just hit -next, and we're seeing several "KFENCE: memory > corruption ...". Here's one: > https://lore.kernel.org/all/66ff8bf6.050a0220.49194.0453.GAE@google.com/ > > One more (no link): > >> ================================================================== >> BUG: KFENCE: memory corruption in xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051 >> >> Corrupted memory at 0xffff88823bf5a0d0 [ 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ] (in kfence-#172): >> xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051 >> xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062 >> xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145 >> rcu_do_batch kernel/rcu/tree.c:2567 [inline] > [...] >> >> kfence-#172: 0xffff88823bf5a000-0xffff88823bf5a0cf, size=208, cache=kmalloc-256 >> >> allocated by task 5494 on cpu 0 at 101.266046s (0.409225s ago): >> __do_krealloc mm/slub.c:4784 [inline] >> krealloc_noprof+0xd6/0x2e0 mm/slub.c:4838 >> xfs_iext_realloc_root fs/xfs/libxfs/xfs_iext_tree.c:613 [inline] > [...] >> >> freed by task 16 on cpu 0 at 101.573936s (0.186416s ago): >> xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051 >> xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062 >> xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145 > [...] >> >> CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1-next-20241003-syzkaller #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 >> ================================================================== > > Unfortunately there's no reproducer yet it seems. Unless it's > immediately obvious to say what's wrong, is it possible to take this > series out of -next to confirm this series is causing the memory > corruptions? Syzbot should then stop finding these crashes. I think it's commit d0a38fad51cc7 doing in __do_krealloc() - ks = ksize(p); + + s = virt_to_cache(p); + orig_size = get_orig_size(s, (void *)p); + ks = s->object_size; so for kfence objects we don't get their actual allocation size but the potentially larger bucket size? I guess we could do: ks = kfence_ksize(p) ?: s->object_size; ? > Thanks, > -- Marco
On 10/4/24 11:18, Vlastimil Babka wrote: > On 10/4/24 08:44, Marco Elver wrote: > > I think it's commit d0a38fad51cc7 doing in __do_krealloc() > > - ks = ksize(p); > + > + s = virt_to_cache(p); > + orig_size = get_orig_size(s, (void *)p); > + ks = s->object_size; > > so for kfence objects we don't get their actual allocation size but the > potentially larger bucket size? > > I guess we could do: > > ks = kfence_ksize(p) ?: s->object_size; > > ? Hmm this probably is not the whole story, we also have: - memcpy(ret, kasan_reset_tag(p), ks); + if (orig_size) + memcpy(ret, kasan_reset_tag(p), orig_size); orig_size for kfence will be again s->object_size so the memcpy might be a (read) buffer overflow from a kfence allocation. I think get_orig_size() should perhaps return kfence_ksize(p) for kfence allocations, in addition to the change above. Or alternatively we don't change get_orig_size() (in a different commit) at all, but __do_krealloc() will have an "if is_kfence_address()" that sets both orig_size and ks to kfence_ksize(p) appropriately. That might be easier to follow. But either way means rewriting 2 commits. I think it's indeed better to drop the series now from -next and submit a v3. Vlastimil >> Thanks, >> -- Marco >
On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote: > On 10/4/24 11:18, Vlastimil Babka wrote: > > On 10/4/24 08:44, Marco Elver wrote: > > > > I think it's commit d0a38fad51cc7 doing in __do_krealloc() > > > > - ks = ksize(p); > > + > > + s = virt_to_cache(p); > > + orig_size = get_orig_size(s, (void *)p); > > + ks = s->object_size; > > > > so for kfence objects we don't get their actual allocation size but the > > potentially larger bucket size? > > > > I guess we could do: > > > > ks = kfence_ksize(p) ?: s->object_size; > > > > ? > > Hmm this probably is not the whole story, we also have: > > - memcpy(ret, kasan_reset_tag(p), ks); > + if (orig_size) > + memcpy(ret, kasan_reset_tag(p), orig_size); > > orig_size for kfence will be again s->object_size so the memcpy might be a > (read) buffer overflow from a kfence allocation. > > I think get_orig_size() should perhaps return kfence_ksize(p) for kfence > allocations, in addition to the change above. > > Or alternatively we don't change get_orig_size() (in a different commit) at > all, but __do_krealloc() will have an "if is_kfence_address()" that sets > both orig_size and ks to kfence_ksize(p) appropriately. That might be easier > to follow. > > But either way means rewriting 2 commits. I think it's indeed better to drop > the series now from -next and submit a v3. Yes, we can revert now. Sorry for the inconvenience. Thanks, Feng
On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote: > On 10/4/24 11:18, Vlastimil Babka wrote: > > On 10/4/24 08:44, Marco Elver wrote: > > > > I think it's commit d0a38fad51cc7 doing in __do_krealloc() > > > > - ks = ksize(p); > > + > > + s = virt_to_cache(p); > > + orig_size = get_orig_size(s, (void *)p); > > + ks = s->object_size; > > > > so for kfence objects we don't get their actual allocation size but the > > potentially larger bucket size? > > > > I guess we could do: > > > > ks = kfence_ksize(p) ?: s->object_size; > > > > ? > > Hmm this probably is not the whole story, we also have: > > - memcpy(ret, kasan_reset_tag(p), ks); > + if (orig_size) > + memcpy(ret, kasan_reset_tag(p), orig_size); > > orig_size for kfence will be again s->object_size so the memcpy might be a > (read) buffer overflow from a kfence allocation. > > I think get_orig_size() should perhaps return kfence_ksize(p) for kfence > allocations, in addition to the change above. > > Or alternatively we don't change get_orig_size() (in a different commit) at > all, but __do_krealloc() will have an "if is_kfence_address()" that sets > both orig_size and ks to kfence_ksize(p) appropriately. That might be easier > to follow. Thanks for the suggestion! As there were error report about the NULL slab for big kmalloc object, how about the following code for __do_krealloc(const void *p, size_t new_size, gfp_t flags) { void *ret; size_t ks = 0; int orig_size = 0; struct kmem_cache *s = NULL; /* Check for double-free. */ if (likely(!ZERO_OR_NULL_PTR(p))) { if (!kasan_check_byte(p)) return NULL; ks = ksize(p); /* Some objects have no orig_size, like big kmalloc case */ if (is_kfence_address(p)) { orig_size = kfence_ksize(p); } else if (virt_to_slab(p)) { s = virt_to_cache(p); orig_size = get_orig_size(s, (void *)p); } } else { goto alloc_new; } /* If the object doesn't fit, allocate a bigger one */ if (new_size > ks) goto alloc_new; /* Zero out spare memory. */ if (want_init_on_alloc(flags)) { kasan_disable_current(); if (orig_size && orig_size < new_size) memset((void *)p + orig_size, 0, new_size - orig_size); else memset((void *)p + new_size, 0, ks - new_size); kasan_enable_current(); } /* Setup kmalloc redzone when needed */ if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) { set_orig_size(s, (void *)p, new_size); if (s->flags & SLAB_RED_ZONE && new_size < ks) memset_no_sanitize_memory((void *)p + new_size, SLUB_RED_ACTIVE, ks - new_size); } p = kasan_krealloc((void *)p, new_size, flags); return (void *)p; alloc_new: ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_); if (ret && p) { /* Disable KASAN checks as the object's redzone is accessed. */ kasan_disable_current(); memcpy(ret, kasan_reset_tag(p), orig_size ?: ks); kasan_enable_current(); } return ret; } I've run it with the reproducer of syzbot, so far the issue hasn't been reproduced on my local machine. Thanks, Feng > > But either way means rewriting 2 commits. I think it's indeed better to drop > the series now from -next and submit a v3. > > Vlastimil > > >> Thanks, > >> -- Marco > > >
On 10/14/24 09:52, Feng Tang wrote: > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote: > Thanks for the suggestion! > > As there were error report about the NULL slab for big kmalloc object, how > about the following code for > > __do_krealloc(const void *p, size_t new_size, gfp_t flags) > { > void *ret; > size_t ks = 0; > int orig_size = 0; > struct kmem_cache *s = NULL; > > /* Check for double-free. */ > if (likely(!ZERO_OR_NULL_PTR(p))) { > if (!kasan_check_byte(p)) > return NULL; > > ks = ksize(p); I think this will result in __ksize() doing skip_orig_size_check(folio_slab(folio)->slab_cache, object); and we don't want that? Also the checks below repeat some of the checks of ksize(). So I think in __do_krealloc() we should do things manually to determine ks and not call ksize(). Just not break any of the cases ksize() handles (kfence, large kmalloc). > > /* Some objects have no orig_size, like big kmalloc case */ > if (is_kfence_address(p)) { > orig_size = kfence_ksize(p); > } else if (virt_to_slab(p)) { > s = virt_to_cache(p); > orig_size = get_orig_size(s, (void *)p); > } > } else { > goto alloc_new; > } > > /* If the object doesn't fit, allocate a bigger one */ > if (new_size > ks) > goto alloc_new; > > /* Zero out spare memory. */ > if (want_init_on_alloc(flags)) { > kasan_disable_current(); > if (orig_size && orig_size < new_size) > memset((void *)p + orig_size, 0, new_size - orig_size); > else > memset((void *)p + new_size, 0, ks - new_size); > kasan_enable_current(); > } > > /* Setup kmalloc redzone when needed */ > if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) { > set_orig_size(s, (void *)p, new_size); > if (s->flags & SLAB_RED_ZONE && new_size < ks) > memset_no_sanitize_memory((void *)p + new_size, > SLUB_RED_ACTIVE, ks - new_size); > } > > p = kasan_krealloc((void *)p, new_size, flags); > return (void *)p; > > alloc_new: > ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_); > if (ret && p) { > /* Disable KASAN checks as the object's redzone is accessed. */ > kasan_disable_current(); > memcpy(ret, kasan_reset_tag(p), orig_size ?: ks); > kasan_enable_current(); > } > > return ret; > } > > I've run it with the reproducer of syzbot, so far the issue hasn't been > reproduced on my local machine. > > Thanks, > Feng > >> >> But either way means rewriting 2 commits. I think it's indeed better to drop >> the series now from -next and submit a v3. >> >> Vlastimil >> >> >> Thanks, >> >> -- Marco >> > >>
On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote: > On 10/14/24 09:52, Feng Tang wrote: > > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote: > > Thanks for the suggestion! > > > > As there were error report about the NULL slab for big kmalloc object, how > > about the following code for > > > > __do_krealloc(const void *p, size_t new_size, gfp_t flags) > > { > > void *ret; > > size_t ks = 0; > > int orig_size = 0; > > struct kmem_cache *s = NULL; > > > > /* Check for double-free. */ > > if (likely(!ZERO_OR_NULL_PTR(p))) { > > if (!kasan_check_byte(p)) > > return NULL; > > > > ks = ksize(p); > > I think this will result in __ksize() doing > skip_orig_size_check(folio_slab(folio)->slab_cache, object); > and we don't want that? I think that's fine. As later code will re-set the orig_size anyway. > Also the checks below repeat some of the checks of ksize(). Yes, there is some redundancy, mostly the virt_to_slab() > So I think in __do_krealloc() we should do things manually to determine ks > and not call ksize(). Just not break any of the cases ksize() handles > (kfence, large kmalloc). OK, originally I tried not to expose internals of __ksize(). Let me try this way. Thanks, Feng > > > > > /* Some objects have no orig_size, like big kmalloc case */ > > if (is_kfence_address(p)) { > > orig_size = kfence_ksize(p); > > } else if (virt_to_slab(p)) { > > s = virt_to_cache(p); > > orig_size = get_orig_size(s, (void *)p); > > } > > } else { > > goto alloc_new; > > } > > > > /* If the object doesn't fit, allocate a bigger one */ > > if (new_size > ks) > > goto alloc_new; > > > > /* Zero out spare memory. */ > > if (want_init_on_alloc(flags)) { > > kasan_disable_current(); > > if (orig_size && orig_size < new_size) > > memset((void *)p + orig_size, 0, new_size - orig_size); > > else > > memset((void *)p + new_size, 0, ks - new_size); > > kasan_enable_current(); > > } > > > > /* Setup kmalloc redzone when needed */ > > if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) { > > set_orig_size(s, (void *)p, new_size); > > if (s->flags & SLAB_RED_ZONE && new_size < ks) > > memset_no_sanitize_memory((void *)p + new_size, > > SLUB_RED_ACTIVE, ks - new_size); > > } > > > > p = kasan_krealloc((void *)p, new_size, flags); > > return (void *)p; > > > > alloc_new: > > ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_); > > if (ret && p) { > > /* Disable KASAN checks as the object's redzone is accessed. */ > > kasan_disable_current(); > > memcpy(ret, kasan_reset_tag(p), orig_size ?: ks); > > kasan_enable_current(); > > } > > > > return ret; > > }
On 10/14/24 14:52, Feng Tang wrote: > On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote: >> On 10/14/24 09:52, Feng Tang wrote: >> > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote: >> > Thanks for the suggestion! >> > >> > As there were error report about the NULL slab for big kmalloc object, how >> > about the following code for >> > >> > __do_krealloc(const void *p, size_t new_size, gfp_t flags) >> > { >> > void *ret; >> > size_t ks = 0; >> > int orig_size = 0; >> > struct kmem_cache *s = NULL; >> > >> > /* Check for double-free. */ >> > if (likely(!ZERO_OR_NULL_PTR(p))) { >> > if (!kasan_check_byte(p)) >> > return NULL; >> > >> > ks = ksize(p); >> >> I think this will result in __ksize() doing >> skip_orig_size_check(folio_slab(folio)->slab_cache, object); >> and we don't want that? > > I think that's fine. As later code will re-set the orig_size anyway. But you also read it first. >> > /* Some objects have no orig_size, like big kmalloc case */ >> > if (is_kfence_address(p)) { >> > orig_size = kfence_ksize(p); >> > } else if (virt_to_slab(p)) { >> > s = virt_to_cache(p); >> > orig_size = get_orig_size(s, (void *)p); here. >> > } >> Also the checks below repeat some of the checks of ksize(). > > Yes, there is some redundancy, mostly the virt_to_slab() > >> So I think in __do_krealloc() we should do things manually to determine ks >> and not call ksize(). Just not break any of the cases ksize() handles >> (kfence, large kmalloc). > > OK, originally I tried not to expose internals of __ksize(). Let me > try this way. ksize() makes assumptions that a user outside of slab itself is calling it. But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid querying ksize() for the purposes of writing beyond the original kmalloc(size) up to the bucket size. So maybe we can also investigate if the skip_orig_size_check() mechanism can be removed now? Still I think __do_krealloc() should rather do its own thing and not call ksize(). > Thanks, > Feng > >> >> > >> > } else { >> > goto alloc_new; >> > } >> > >> > /* If the object doesn't fit, allocate a bigger one */ >> > if (new_size > ks) >> > goto alloc_new; >> > >> > /* Zero out spare memory. */ >> > if (want_init_on_alloc(flags)) { >> > kasan_disable_current(); >> > if (orig_size && orig_size < new_size) >> > memset((void *)p + orig_size, 0, new_size - orig_size); >> > else >> > memset((void *)p + new_size, 0, ks - new_size); >> > kasan_enable_current(); >> > } >> > >> > /* Setup kmalloc redzone when needed */ >> > if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) { >> > set_orig_size(s, (void *)p, new_size); >> > if (s->flags & SLAB_RED_ZONE && new_size < ks) >> > memset_no_sanitize_memory((void *)p + new_size, >> > SLUB_RED_ACTIVE, ks - new_size); >> > } >> > >> > p = kasan_krealloc((void *)p, new_size, flags); >> > return (void *)p; >> > >> > alloc_new: >> > ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_); >> > if (ret && p) { >> > /* Disable KASAN checks as the object's redzone is accessed. */ >> > kasan_disable_current(); >> > memcpy(ret, kasan_reset_tag(p), orig_size ?: ks); >> > kasan_enable_current(); >> > } >> > >> > return ret; >> > }
On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote: > On 10/14/24 14:52, Feng Tang wrote: > > On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote: > >> On 10/14/24 09:52, Feng Tang wrote: > >> > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote: > >> > Thanks for the suggestion! > >> > > >> > As there were error report about the NULL slab for big kmalloc object, how > >> > about the following code for > >> > > >> > __do_krealloc(const void *p, size_t new_size, gfp_t flags) > >> > { > >> > void *ret; > >> > size_t ks = 0; > >> > int orig_size = 0; > >> > struct kmem_cache *s = NULL; > >> > > >> > /* Check for double-free. */ > >> > if (likely(!ZERO_OR_NULL_PTR(p))) { > >> > if (!kasan_check_byte(p)) > >> > return NULL; > >> > > >> > ks = ksize(p); > >> > >> I think this will result in __ksize() doing > >> skip_orig_size_check(folio_slab(folio)->slab_cache, object); > >> and we don't want that? > > > > I think that's fine. As later code will re-set the orig_size anyway. > > But you also read it first. > > >> > /* Some objects have no orig_size, like big kmalloc case */ > >> > if (is_kfence_address(p)) { > >> > orig_size = kfence_ksize(p); > >> > } else if (virt_to_slab(p)) { > >> > s = virt_to_cache(p); > >> > orig_size = get_orig_size(s, (void *)p); > > here. Aha, you are right! > > >> > } > > >> Also the checks below repeat some of the checks of ksize(). > > > > Yes, there is some redundancy, mostly the virt_to_slab() > > > >> So I think in __do_krealloc() we should do things manually to determine ks > >> and not call ksize(). Just not break any of the cases ksize() handles > >> (kfence, large kmalloc). > > > > OK, originally I tried not to expose internals of __ksize(). Let me > > try this way. > > ksize() makes assumptions that a user outside of slab itself is calling it. > > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid > querying ksize() for the purposes of writing beyond the original > kmalloc(size) up to the bucket size. So maybe we can also investigate if the > skip_orig_size_check() mechanism can be removed now? I did a quick grep, and fortunately it seems that the ksize() user are much less than before. We used to see some trouble in network code, which is now very clean without the need to skip orig_size check. Will check other call site later. > Still I think __do_krealloc() should rather do its own thing and not call > ksize(). Yes. I made some changes: static __always_inline __realloc_size(2) void * __do_krealloc(const void *p, size_t new_size, gfp_t flags) { void *ret; size_t ks = 0; int orig_size = 0; struct kmem_cache *s = NULL; /* Check for double-free. */ if (unlikely(ZERO_OR_NULL_PTR(p))) goto alloc_new; if (!kasan_check_byte(p)) return NULL; if (is_kfence_address(p)) { ks = orig_size = kfence_ksize(p); } else { struct folio *folio; folio = virt_to_folio(p); if (unlikely(!folio_test_slab(folio))) { /* Big kmalloc object */ WARN_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE); WARN_ON(p != folio_address(folio)); ks = folio_size(folio); } else { s = folio_slab(folio)->slab_cache; orig_size = get_orig_size(s, (void *)p); ks = s->object_size; } } /* If the old object doesn't fit, allocate a bigger one */ if (new_size > ks) goto alloc_new; /* Zero out spare memory. */ if (want_init_on_alloc(flags)) { kasan_disable_current(); if (orig_size && orig_size < new_size) memset((void *)p + orig_size, 0, new_size - orig_size); else memset((void *)p + new_size, 0, ks - new_size); kasan_enable_current(); } /* Setup kmalloc redzone when needed */ if (s && slub_debug_orig_size(s)) { set_orig_size(s, (void *)p, new_size); if (s->flags & SLAB_RED_ZONE && new_size < ks) memset_no_sanitize_memory((void *)p + new_size, SLUB_RED_ACTIVE, ks - new_size); } p = kasan_krealloc((void *)p, new_size, flags); return (void *)p; alloc_new: ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_); if (ret && p) { /* Disable KASAN checks as the object's redzone is accessed. */ kasan_disable_current(); memcpy(ret, kasan_reset_tag(p), orig_size ?: ks); kasan_enable_current(); } return ret; } Thanks, Feng
On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote: > On 10/14/24 14:52, Feng Tang wrote: > > On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote: > >> On 10/14/24 09:52, Feng Tang wrote: > >> > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote: > >> > Thanks for the suggestion! > >> > > >> > As there were error report about the NULL slab for big kmalloc object, how > >> > about the following code for > >> > > >> > __do_krealloc(const void *p, size_t new_size, gfp_t flags) > >> > { > >> > void *ret; > >> > size_t ks = 0; > >> > int orig_size = 0; > >> > struct kmem_cache *s = NULL; > >> > > >> > /* Check for double-free. */ > >> > if (likely(!ZERO_OR_NULL_PTR(p))) { > >> > if (!kasan_check_byte(p)) > >> > return NULL; > >> > > >> > ks = ksize(p); > >> > >> I think this will result in __ksize() doing > >> skip_orig_size_check(folio_slab(folio)->slab_cache, object); > >> and we don't want that? > > > > I think that's fine. As later code will re-set the orig_size anyway. > > But you also read it first. > > >> > /* Some objects have no orig_size, like big kmalloc case */ > >> > if (is_kfence_address(p)) { > >> > orig_size = kfence_ksize(p); > >> > } else if (virt_to_slab(p)) { > >> > s = virt_to_cache(p); > >> > orig_size = get_orig_size(s, (void *)p); > > here. > > >> > } > > >> Also the checks below repeat some of the checks of ksize(). > > > > Yes, there is some redundancy, mostly the virt_to_slab() > > > >> So I think in __do_krealloc() we should do things manually to determine ks > >> and not call ksize(). Just not break any of the cases ksize() handles > >> (kfence, large kmalloc). > > > > OK, originally I tried not to expose internals of __ksize(). Let me > > try this way. > > ksize() makes assumptions that a user outside of slab itself is calling it. > > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid > querying ksize() for the purposes of writing beyond the original > kmalloc(size) up to the bucket size. So maybe we can also investigate if the > skip_orig_size_check() mechanism can be removed now? > > Still I think __do_krealloc() should rather do its own thing and not call > ksize(). The goal was to avoid having users of the allocation APIs change the sizes of allocations without calling into realloc. This is because otherwise the "alloc_size" attribute used by compilers inform __builtin_dynamic_object_size() can get confused: ptr = alloc(less_than_bucket_size); ... size = ksize(ptr); /* larger size! */ memcpy(ptr, src, size); /* compiler instrumentation doesn't see that ptr "grows" */ So the callers use kmalloc_size_roundup() to just allocate the rounded up size immediately. Internally, the allocator can do what it wants.
On Mon, Oct 14, 2024 at 10:20:36PM +0800, Feng Tang wrote: > On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote: > > On 10/14/24 14:52, Feng Tang wrote: > > > On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote: > > >> On 10/14/24 09:52, Feng Tang wrote: > > > OK, originally I tried not to expose internals of __ksize(). Let me > > > try this way. > > > > ksize() makes assumptions that a user outside of slab itself is calling it. > > > > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid > > querying ksize() for the purposes of writing beyond the original > > kmalloc(size) up to the bucket size. So maybe we can also investigate if the > > skip_orig_size_check() mechanism can be removed now? > > I did a quick grep, and fortunately it seems that the ksize() user are > much less than before. We used to see some trouble in network code, which > is now very clean without the need to skip orig_size check. Will check > other call site later. Right -- only things that are performing a reallocation should be using ksize(). e.g. see __slab_build_skb()
On Mon, Oct 14, 2024 at 10:20:36PM +0800, Tang, Feng wrote: > On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote: > > > > > >> So I think in __do_krealloc() we should do things manually to determine ks > > >> and not call ksize(). Just not break any of the cases ksize() handles > > >> (kfence, large kmalloc). > > > > > > OK, originally I tried not to expose internals of __ksize(). Let me > > > try this way. > > > > ksize() makes assumptions that a user outside of slab itself is calling it. > > > > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid > > querying ksize() for the purposes of writing beyond the original > > kmalloc(size) up to the bucket size. So maybe we can also investigate if the > > skip_orig_size_check() mechanism can be removed now? > > I did a quick grep, and fortunately it seems that the ksize() user are > much less than before. We used to see some trouble in network code, which > is now very clean without the need to skip orig_size check. Will check > other call site later. I did more further check about ksize() usage, and there are still some places to be handled. The thing stands out is kfree_sensitive(), and another potential one is sound/soc/codecs/cs-amp-lib-test.c Some details: * Thanks to Kees Cook, who has cured many cases of ksize() as below: drivers/base/devres.c: total_old_size = ksize(container_of(ptr, struct devres, data)); drivers/net/ethernet/intel/igb/igb_main.c: } else if (size > ksize(q_vector)) { net/core/skbuff.c: *size = ksize(data); net/openvswitch/flow_netlink.c: new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2); kernel/bpf/verifier.c: alloc_bytes = max(ksize(orig), kmalloc_size_roundup(bytes)); * Some callers use ksize() mostly for calculation or sanity check, and not for accessing those extra space, which are fine: drivers/gpu/drm/drm_managed.c: WARN_ON(dev + 1 > (struct drm_device *) (container + ksize(container))); lib/kunit/string-stream-test.c: actual_bytes_used = ksize(stream); lib/kunit/string-stream-test.c: actual_bytes_used += ksize(frag_container); lib/kunit/string-stream-test.c: actual_bytes_used += ksize(frag_container->fragment); mm/nommu.c: return ksize(objp); mm/util.c: memcpy(n, kasan_reset_tag(p), ksize(p)); security/tomoyo/gc.c: tomoyo_memory_used[TOMOYO_MEMORY_POLICY] -= ksize(ptr); security/tomoyo/memory.c: const size_t s = ksize(ptr); drivers/md/dm-vdo/memory-alloc.c: add_kmalloc_block(ksize(p)); drivers/md/dm-vdo/memory-alloc.c: add_kmalloc_block(ksize(p)); drivers/md/dm-vdo/memory-alloc.c: remove_kmalloc_block(ksize(ptr)); * One usage may need to be handled sound/soc/codecs/cs-amp-lib-test.c: KUNIT_ASSERT_GE_MSG(test, ksize(buf), priv->cal_blob->size, "Buffer to small"); * bigger problem is the kfree_sensitive(), which will use ksize() to get the total size and then zero all of them. One solution for this could be get the kmem_cache first, and do the skip_orig_size_check() Thanks, Feng
On 11/4/24 12:28, Feng Tang wrote: > On Mon, Oct 14, 2024 at 10:20:36PM +0800, Tang, Feng wrote: >> On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote: >> > > >> > >> So I think in __do_krealloc() we should do things manually to determine ks >> > >> and not call ksize(). Just not break any of the cases ksize() handles >> > >> (kfence, large kmalloc). >> > > >> > > OK, originally I tried not to expose internals of __ksize(). Let me >> > > try this way. >> > >> > ksize() makes assumptions that a user outside of slab itself is calling it. >> > >> > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid >> > querying ksize() for the purposes of writing beyond the original >> > kmalloc(size) up to the bucket size. So maybe we can also investigate if the >> > skip_orig_size_check() mechanism can be removed now? >> >> I did a quick grep, and fortunately it seems that the ksize() user are >> much less than before. We used to see some trouble in network code, which >> is now very clean without the need to skip orig_size check. Will check >> other call site later. > > > I did more further check about ksize() usage, and there are still some > places to be handled. The thing stands out is kfree_sensitive(), and > another potential one is sound/soc/codecs/cs-amp-lib-test.c > > Some details: > > * Thanks to Kees Cook, who has cured many cases of ksize() as below: > > drivers/base/devres.c: total_old_size = ksize(container_of(ptr, struct devres, data)); > drivers/net/ethernet/intel/igb/igb_main.c: } else if (size > ksize(q_vector)) { > net/core/skbuff.c: *size = ksize(data); > net/openvswitch/flow_netlink.c: new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2); > kernel/bpf/verifier.c: alloc_bytes = max(ksize(orig), kmalloc_size_roundup(bytes)); > > * Some callers use ksize() mostly for calculation or sanity check, > and not for accessing those extra space, which are fine: > > drivers/gpu/drm/drm_managed.c: WARN_ON(dev + 1 > (struct drm_device *) (container + ksize(container))); > lib/kunit/string-stream-test.c: actual_bytes_used = ksize(stream); > lib/kunit/string-stream-test.c: actual_bytes_used += ksize(frag_container); > lib/kunit/string-stream-test.c: actual_bytes_used += ksize(frag_container->fragment); > mm/nommu.c: return ksize(objp); > mm/util.c: memcpy(n, kasan_reset_tag(p), ksize(p)); > security/tomoyo/gc.c: tomoyo_memory_used[TOMOYO_MEMORY_POLICY] -= ksize(ptr); > security/tomoyo/memory.c: const size_t s = ksize(ptr); > drivers/md/dm-vdo/memory-alloc.c: add_kmalloc_block(ksize(p)); > drivers/md/dm-vdo/memory-alloc.c: add_kmalloc_block(ksize(p)); > drivers/md/dm-vdo/memory-alloc.c: remove_kmalloc_block(ksize(ptr)); > > * One usage may need to be handled > > sound/soc/codecs/cs-amp-lib-test.c: KUNIT_ASSERT_GE_MSG(test, ksize(buf), priv->cal_blob->size, "Buffer to small"); > > * bigger problem is the kfree_sensitive(), which will use ksize() to > get the total size and then zero all of them. > > One solution for this could be get the kmem_cache first, and > do the skip_orig_size_check() Maybe add a parameter for __ksize() that controls if we do skip_orig_size_check(), current ksize() will pass "false" to it (once remaining wrong users are handled), then another ksize_internal() variant will pass "true" and be used from kfree_sensitive()? > Thanks, > Feng
On Mon, Nov 04, 2024 at 12:45:51PM +0100, Vlastimil Babka wrote: > On 11/4/24 12:28, Feng Tang wrote: > > On Mon, Oct 14, 2024 at 10:20:36PM +0800, Tang, Feng wrote: > >> On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote: > >> > > > >> > >> So I think in __do_krealloc() we should do things manually to determine ks > >> > >> and not call ksize(). Just not break any of the cases ksize() handles > >> > >> (kfence, large kmalloc). > >> > > > >> > > OK, originally I tried not to expose internals of __ksize(). Let me > >> > > try this way. > >> > > >> > ksize() makes assumptions that a user outside of slab itself is calling it. > >> > > >> > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid > >> > querying ksize() for the purposes of writing beyond the original > >> > kmalloc(size) up to the bucket size. So maybe we can also investigate if the > >> > skip_orig_size_check() mechanism can be removed now? > >> > >> I did a quick grep, and fortunately it seems that the ksize() user are > >> much less than before. We used to see some trouble in network code, which > >> is now very clean without the need to skip orig_size check. Will check > >> other call site later. > > > > > > I did more further check about ksize() usage, and there are still some > > places to be handled. The thing stands out is kfree_sensitive(), and > > another potential one is sound/soc/codecs/cs-amp-lib-test.c > > > > Some details: > > > > * Thanks to Kees Cook, who has cured many cases of ksize() as below: > > > > drivers/base/devres.c: total_old_size = ksize(container_of(ptr, struct devres, data)); > > drivers/net/ethernet/intel/igb/igb_main.c: } else if (size > ksize(q_vector)) { > > net/core/skbuff.c: *size = ksize(data); > > net/openvswitch/flow_netlink.c: new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2); > > kernel/bpf/verifier.c: alloc_bytes = max(ksize(orig), kmalloc_size_roundup(bytes)); > > > > * Some callers use ksize() mostly for calculation or sanity check, > > and not for accessing those extra space, which are fine: > > > > drivers/gpu/drm/drm_managed.c: WARN_ON(dev + 1 > (struct drm_device *) (container + ksize(container))); > > lib/kunit/string-stream-test.c: actual_bytes_used = ksize(stream); > > lib/kunit/string-stream-test.c: actual_bytes_used += ksize(frag_container); > > lib/kunit/string-stream-test.c: actual_bytes_used += ksize(frag_container->fragment); > > mm/nommu.c: return ksize(objp); > > mm/util.c: memcpy(n, kasan_reset_tag(p), ksize(p)); > > security/tomoyo/gc.c: tomoyo_memory_used[TOMOYO_MEMORY_POLICY] -= ksize(ptr); > > security/tomoyo/memory.c: const size_t s = ksize(ptr); > > drivers/md/dm-vdo/memory-alloc.c: add_kmalloc_block(ksize(p)); > > drivers/md/dm-vdo/memory-alloc.c: add_kmalloc_block(ksize(p)); > > drivers/md/dm-vdo/memory-alloc.c: remove_kmalloc_block(ksize(ptr)); > > > > * One usage may need to be handled > > > > sound/soc/codecs/cs-amp-lib-test.c: KUNIT_ASSERT_GE_MSG(test, ksize(buf), priv->cal_blob->size, "Buffer to small"); > > > > * bigger problem is the kfree_sensitive(), which will use ksize() to > > get the total size and then zero all of them. > > > > One solution for this could be get the kmem_cache first, and > > do the skip_orig_size_check() > > Maybe add a parameter for __ksize() that controls if we do > skip_orig_size_check(), current ksize() will pass "false" to it (once > remaining wrong users are handled), then another ksize_internal() variant > will pass "true" and be used from kfree_sensitive()? Sounds good to me! And for future wrong usages of ksize(), we can fix them case by case when they are deteced. Thanks, Feng