[v2,0/5] mm/slub: Improve data handling of krealloc() when orig_size is enabled

Message ID	20240911064535.557650-1-feng.tang@intel.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Feng Tang <feng.tang@intel.com> To: Vlastimil Babka <vbabka@suse.cz>, Andrew Morton <akpm@linux-foundation.org>, Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Roman Gushchin <roman.gushchin@linux.dev>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, Andrey Konovalov <andreyknvl@gmail.com>, Marco Elver <elver@google.com>, Shuah Khan <skhan@linuxfoundation.org>, David Gow <davidgow@google.com>, Danilo Krummrich <dakr@kernel.org>, Alexander Potapenko <glider@google.com>, Andrey Ryabinin <ryabinin.a.a@gmail.com>, Dmitry Vyukov <dvyukov@google.com>, Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: linux-mm@kvack.org, kasan-dev@googlegroups.com, linux-kernel@vger.kernel.org, Feng Tang <feng.tang@intel.com> Subject: [PATCH v2 0/5] mm/slub: Improve data handling of krealloc() when orig_size is enabled Date: Wed, 11 Sep 2024 14:45:30 +0800 Message-Id: <20240911064535.557650-1-feng.tang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/slub: Improve data handling of krealloc() when orig_size is enabled \| expand [v2,0/5] mm/slub: Improve data handling of krealloc() when orig_size is enabled [v2,1/5] mm/kasan: Don't store metadata inside kmalloc object when slub_debug_orig_size is on [v2,2/5] mm/slub: Consider kfence case for get_orig_size() [v2,3/5] mm/slub: Move krealloc() and related code to slub.c [v2,4/5] mm/slub: Improve redzone check and zeroing for krealloc() [v2,5/5] mm/slub, kunit: Add testcase for krealloc redzone and zeroing

Feng Tang Sept. 11, 2024, 6:45 a.m. UTC

Danilo Krummrich's patch [1] raised one problem about krealloc() that
its caller doesn't pass the old request size, say the object is 64
bytes kmalloc one, but caller originally only requested 48 bytes. Then
when krealloc() shrinks or grows in the same object, or allocate a new
bigger object, it lacks this 'original size' information to do accurate
data preserving or zeroing (when __GFP_ZERO is set).

Thus with slub debug redzone and object tracking enabled, parts of the
object after krealloc() might contain redzone data instead of zeroes,
which is violating the __GFP_ZERO guarantees. Good thing is in this
case, kmalloc caches do have this 'orig_size' feature, which could be
used to improve the situation here.

To make the 'orig_size' accurate, we adjust some kasan/slub meta data
handling. Also add a slub kunit test case for krealloc().

This patchset has dependency over patches in both -mm tree and -slab
trees, so it is written based on linux-next tree '20240910' version.

[1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/

Thanks,
Feng

Changelog:

  Since v1:
  * Drop the patch changing generic kunit code from this patchset,
    and will send it separately.
  * Separate the krealloc moving form slab_common.c to slub.c to a 
    new patch for better review (Danilo/Vlastimil)
  * Improve commit log and comments (Vlastimil/Danilo) 
  * Rework the kunit test case to remove its dependency over
    slub_debug (which is incomplete in v1) (Vlastimil)
  * Add ack and review tag from developers.

Feng Tang (5):
  mm/kasan: Don't store metadata inside kmalloc object when
    slub_debug_orig_size is on
  mm/slub: Consider kfence case for get_orig_size()
  mm/slub: Move krealloc() and related code to slub.c
  mm/slub: Improve redzone check and zeroing for krealloc()
  mm/slub, kunit: Add testcase for krealloc redzone and zeroing

 lib/slub_kunit.c   |  42 +++++++++++++++
 mm/kasan/generic.c |   7 ++-
 mm/slab.h          |   6 +++
 mm/slab_common.c   |  84 ------------------------------
 mm/slub.c          | 125 ++++++++++++++++++++++++++++++++++++++-------
 5 files changed, 160 insertions(+), 104 deletions(-)

Vlastimil Babka Oct. 2, 2024, 10:42 a.m. UTC | #1

On 9/11/24 08:45, Feng Tang wrote:
> Danilo Krummrich's patch [1] raised one problem about krealloc() that
> its caller doesn't pass the old request size, say the object is 64
> bytes kmalloc one, but caller originally only requested 48 bytes. Then
> when krealloc() shrinks or grows in the same object, or allocate a new
> bigger object, it lacks this 'original size' information to do accurate
> data preserving or zeroing (when __GFP_ZERO is set).
> 
> Thus with slub debug redzone and object tracking enabled, parts of the
> object after krealloc() might contain redzone data instead of zeroes,
> which is violating the __GFP_ZERO guarantees. Good thing is in this
> case, kmalloc caches do have this 'orig_size' feature, which could be
> used to improve the situation here.
> 
> To make the 'orig_size' accurate, we adjust some kasan/slub meta data
> handling. Also add a slub kunit test case for krealloc().
> 
> This patchset has dependency over patches in both -mm tree and -slab
> trees, so it is written based on linux-next tree '20240910' version.
> 
> [1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/

Thanks, added to slab/for-next

> 
> Thanks,
> Feng
> 
> Changelog:
> 
>   Since v1:
>   * Drop the patch changing generic kunit code from this patchset,
>     and will send it separately.
>   * Separate the krealloc moving form slab_common.c to slub.c to a 
>     new patch for better review (Danilo/Vlastimil)
>   * Improve commit log and comments (Vlastimil/Danilo) 
>   * Rework the kunit test case to remove its dependency over
>     slub_debug (which is incomplete in v1) (Vlastimil)
>   * Add ack and review tag from developers.
> 
> Feng Tang (5):
>   mm/kasan: Don't store metadata inside kmalloc object when
>     slub_debug_orig_size is on
>   mm/slub: Consider kfence case for get_orig_size()
>   mm/slub: Move krealloc() and related code to slub.c
>   mm/slub: Improve redzone check and zeroing for krealloc()
>   mm/slub, kunit: Add testcase for krealloc redzone and zeroing
> 
>  lib/slub_kunit.c   |  42 +++++++++++++++
>  mm/kasan/generic.c |   7 ++-
>  mm/slab.h          |   6 +++
>  mm/slab_common.c   |  84 ------------------------------
>  mm/slub.c          | 125 ++++++++++++++++++++++++++++++++++++++-------
>  5 files changed, 160 insertions(+), 104 deletions(-)
>

Marco Elver Oct. 4, 2024, 6:44 a.m. UTC | #2

On Wed, 2 Oct 2024 at 12:42, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/11/24 08:45, Feng Tang wrote:
> > Danilo Krummrich's patch [1] raised one problem about krealloc() that
> > its caller doesn't pass the old request size, say the object is 64
> > bytes kmalloc one, but caller originally only requested 48 bytes. Then
> > when krealloc() shrinks or grows in the same object, or allocate a new
> > bigger object, it lacks this 'original size' information to do accurate
> > data preserving or zeroing (when __GFP_ZERO is set).
> >
> > Thus with slub debug redzone and object tracking enabled, parts of the
> > object after krealloc() might contain redzone data instead of zeroes,
> > which is violating the __GFP_ZERO guarantees. Good thing is in this
> > case, kmalloc caches do have this 'orig_size' feature, which could be
> > used to improve the situation here.
> >
> > To make the 'orig_size' accurate, we adjust some kasan/slub meta data
> > handling. Also add a slub kunit test case for krealloc().
> >
> > This patchset has dependency over patches in both -mm tree and -slab
> > trees, so it is written based on linux-next tree '20240910' version.
> >
> > [1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/
>
> Thanks, added to slab/for-next

This series just hit -next, and we're seeing several "KFENCE: memory
corruption ...". Here's one:
https://lore.kernel.org/all/66ff8bf6.050a0220.49194.0453.GAE@google.com/

One more (no link):

> ==================================================================
> BUG: KFENCE: memory corruption in xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051
>
> Corrupted memory at 0xffff88823bf5a0d0 [ 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ] (in kfence-#172):
> xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051
> xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062
> xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145
> rcu_do_batch kernel/rcu/tree.c:2567 [inline]
[...]
>
> kfence-#172: 0xffff88823bf5a000-0xffff88823bf5a0cf, size=208, cache=kmalloc-256
>
> allocated by task 5494 on cpu 0 at 101.266046s (0.409225s ago):
> __do_krealloc mm/slub.c:4784 [inline]
> krealloc_noprof+0xd6/0x2e0 mm/slub.c:4838
> xfs_iext_realloc_root fs/xfs/libxfs/xfs_iext_tree.c:613 [inline]
[...]
>
> freed by task 16 on cpu 0 at 101.573936s (0.186416s ago):
> xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051
> xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062
> xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145
[...]
>
> CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1-next-20241003-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> ==================================================================

Unfortunately there's no reproducer yet it seems. Unless it's
immediately obvious to say what's wrong, is it possible to take this
series out of -next to confirm this series is causing the memory
corruptions? Syzbot should then stop finding these crashes.

Thanks,
-- Marco

Vlastimil Babka Oct. 4, 2024, 9:18 a.m. UTC | #3

On 10/4/24 08:44, Marco Elver wrote:
> On Wed, 2 Oct 2024 at 12:42, Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 9/11/24 08:45, Feng Tang wrote:
>> > Danilo Krummrich's patch [1] raised one problem about krealloc() that
>> > its caller doesn't pass the old request size, say the object is 64
>> > bytes kmalloc one, but caller originally only requested 48 bytes. Then
>> > when krealloc() shrinks or grows in the same object, or allocate a new
>> > bigger object, it lacks this 'original size' information to do accurate
>> > data preserving or zeroing (when __GFP_ZERO is set).
>> >
>> > Thus with slub debug redzone and object tracking enabled, parts of the
>> > object after krealloc() might contain redzone data instead of zeroes,
>> > which is violating the __GFP_ZERO guarantees. Good thing is in this
>> > case, kmalloc caches do have this 'orig_size' feature, which could be
>> > used to improve the situation here.
>> >
>> > To make the 'orig_size' accurate, we adjust some kasan/slub meta data
>> > handling. Also add a slub kunit test case for krealloc().
>> >
>> > This patchset has dependency over patches in both -mm tree and -slab
>> > trees, so it is written based on linux-next tree '20240910' version.
>> >
>> > [1]. https://lore.kernel.org/lkml/20240812223707.32049-1-dakr@kernel.org/
>>
>> Thanks, added to slab/for-next
> 
> This series just hit -next, and we're seeing several "KFENCE: memory
> corruption ...". Here's one:
> https://lore.kernel.org/all/66ff8bf6.050a0220.49194.0453.GAE@google.com/
> 
> One more (no link):
> 
>> ==================================================================
>> BUG: KFENCE: memory corruption in xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051
>>
>> Corrupted memory at 0xffff88823bf5a0d0 [ 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ] (in kfence-#172):
>> xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051
>> xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062
>> xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145
>> rcu_do_batch kernel/rcu/tree.c:2567 [inline]
> [...]
>>
>> kfence-#172: 0xffff88823bf5a000-0xffff88823bf5a0cf, size=208, cache=kmalloc-256
>>
>> allocated by task 5494 on cpu 0 at 101.266046s (0.409225s ago):
>> __do_krealloc mm/slub.c:4784 [inline]
>> krealloc_noprof+0xd6/0x2e0 mm/slub.c:4838
>> xfs_iext_realloc_root fs/xfs/libxfs/xfs_iext_tree.c:613 [inline]
> [...]
>>
>> freed by task 16 on cpu 0 at 101.573936s (0.186416s ago):
>> xfs_iext_destroy_node+0xab/0x670 fs/xfs/libxfs/xfs_iext_tree.c:1051
>> xfs_iext_destroy+0x66/0x100 fs/xfs/libxfs/xfs_iext_tree.c:1062
>> xfs_inode_free_callback+0x91/0x1d0 fs/xfs/xfs_icache.c:145
> [...]
>>
>> CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1-next-20241003-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
>> ==================================================================
> 
> Unfortunately there's no reproducer yet it seems. Unless it's
> immediately obvious to say what's wrong, is it possible to take this
> series out of -next to confirm this series is causing the memory
> corruptions? Syzbot should then stop finding these crashes.

I think it's commit d0a38fad51cc7 doing in __do_krealloc()

-               ks = ksize(p);
+
+               s = virt_to_cache(p);
+               orig_size = get_orig_size(s, (void *)p);
+               ks = s->object_size;

so for kfence objects we don't get their actual allocation size but the
potentially larger bucket size?

I guess we could do:

ks = kfence_ksize(p) ?: s->object_size;

?

> Thanks,
> -- Marco

Vlastimil Babka Oct. 4, 2024, 9:52 a.m. UTC | #4

On 10/4/24 11:18, Vlastimil Babka wrote:
> On 10/4/24 08:44, Marco Elver wrote:
> 
> I think it's commit d0a38fad51cc7 doing in __do_krealloc()
> 
> -               ks = ksize(p);
> +
> +               s = virt_to_cache(p);
> +               orig_size = get_orig_size(s, (void *)p);
> +               ks = s->object_size;
> 
> so for kfence objects we don't get their actual allocation size but the
> potentially larger bucket size?
> 
> I guess we could do:
> 
> ks = kfence_ksize(p) ?: s->object_size;
> 
> ?

Hmm this probably is not the whole story, we also have:

-               memcpy(ret, kasan_reset_tag(p), ks);
+               if (orig_size)
+                       memcpy(ret, kasan_reset_tag(p), orig_size);

orig_size for kfence will be again s->object_size so the memcpy might be a
(read) buffer overflow from a kfence allocation.

I think get_orig_size() should perhaps return kfence_ksize(p) for kfence
allocations, in addition to the change above.

Or alternatively we don't change get_orig_size() (in a different commit) at
all, but __do_krealloc() will have an "if is_kfence_address()" that sets
both orig_size and ks to kfence_ksize(p) appropriately. That might be easier
to follow.

But either way means rewriting 2 commits. I think it's indeed better to drop
the series now from -next and submit a v3.

Vlastimil

>> Thanks,
>> -- Marco
>

Feng Tang Oct. 4, 2024, 10:28 a.m. UTC | #5

On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote:
> On 10/4/24 11:18, Vlastimil Babka wrote:
> > On 10/4/24 08:44, Marco Elver wrote:
> > 
> > I think it's commit d0a38fad51cc7 doing in __do_krealloc()
> > 
> > -               ks = ksize(p);
> > +
> > +               s = virt_to_cache(p);
> > +               orig_size = get_orig_size(s, (void *)p);
> > +               ks = s->object_size;
> > 
> > so for kfence objects we don't get their actual allocation size but the
> > potentially larger bucket size?
> > 
> > I guess we could do:
> > 
> > ks = kfence_ksize(p) ?: s->object_size;
> > 
> > ?
> 
> Hmm this probably is not the whole story, we also have:
> 
> -               memcpy(ret, kasan_reset_tag(p), ks);
> +               if (orig_size)
> +                       memcpy(ret, kasan_reset_tag(p), orig_size);
> 
> orig_size for kfence will be again s->object_size so the memcpy might be a
> (read) buffer overflow from a kfence allocation.
> 
> I think get_orig_size() should perhaps return kfence_ksize(p) for kfence
> allocations, in addition to the change above.
> 
> Or alternatively we don't change get_orig_size() (in a different commit) at
> all, but __do_krealloc() will have an "if is_kfence_address()" that sets
> both orig_size and ks to kfence_ksize(p) appropriately. That might be easier
> to follow.
> 
> But either way means rewriting 2 commits. I think it's indeed better to drop
> the series now from -next and submit a v3.

Yes, we can revert now. Sorry for the inconvenience.

Thanks,
Feng

Feng Tang Oct. 14, 2024, 7:52 a.m. UTC | #6

On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote:
> On 10/4/24 11:18, Vlastimil Babka wrote:
> > On 10/4/24 08:44, Marco Elver wrote:
> > 
> > I think it's commit d0a38fad51cc7 doing in __do_krealloc()
> > 
> > -               ks = ksize(p);
> > +
> > +               s = virt_to_cache(p);
> > +               orig_size = get_orig_size(s, (void *)p);
> > +               ks = s->object_size;
> > 
> > so for kfence objects we don't get their actual allocation size but the
> > potentially larger bucket size?
> > 
> > I guess we could do:
> > 
> > ks = kfence_ksize(p) ?: s->object_size;
> > 
> > ?
> 
> Hmm this probably is not the whole story, we also have:
> 
> -               memcpy(ret, kasan_reset_tag(p), ks);
> +               if (orig_size)
> +                       memcpy(ret, kasan_reset_tag(p), orig_size);
> 
> orig_size for kfence will be again s->object_size so the memcpy might be a
> (read) buffer overflow from a kfence allocation.
> 
> I think get_orig_size() should perhaps return kfence_ksize(p) for kfence
> allocations, in addition to the change above.
> 
> Or alternatively we don't change get_orig_size() (in a different commit) at
> all, but __do_krealloc() will have an "if is_kfence_address()" that sets
> both orig_size and ks to kfence_ksize(p) appropriately. That might be easier
> to follow.

Thanks for the suggestion!

As there were error report about the NULL slab for big kmalloc object, how
about the following code for 

__do_krealloc(const void *p, size_t new_size, gfp_t flags)
{
	void *ret;
	size_t ks = 0;
	int orig_size = 0;
	struct kmem_cache *s = NULL;

	/* Check for double-free. */
	if (likely(!ZERO_OR_NULL_PTR(p))) {
		if (!kasan_check_byte(p))
			return NULL;

		ks = ksize(p);

		/* Some objects have no orig_size, like big kmalloc case */
		if (is_kfence_address(p)) {
			orig_size = kfence_ksize(p);
		} else if (virt_to_slab(p)) {
			s = virt_to_cache(p);
			orig_size = get_orig_size(s, (void *)p);
		}
	} else {
		goto alloc_new;
	}

	/* If the object doesn't fit, allocate a bigger one */
	if (new_size > ks)
		goto alloc_new;

	/* Zero out spare memory. */
	if (want_init_on_alloc(flags)) {
		kasan_disable_current();
		if (orig_size && orig_size < new_size)
			memset((void *)p + orig_size, 0, new_size - orig_size);
		else
			memset((void *)p + new_size, 0, ks - new_size);
		kasan_enable_current();
	}

	/* Setup kmalloc redzone when needed */
	if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) {
		set_orig_size(s, (void *)p, new_size);
		if (s->flags & SLAB_RED_ZONE && new_size < ks)
			memset_no_sanitize_memory((void *)p + new_size,
						SLUB_RED_ACTIVE, ks - new_size);
	}

	p = kasan_krealloc((void *)p, new_size, flags);
	return (void *)p;

alloc_new:
	ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
	if (ret && p) {
		/* Disable KASAN checks as the object's redzone is accessed. */
		kasan_disable_current();
		memcpy(ret, kasan_reset_tag(p), orig_size ?: ks);
		kasan_enable_current();
	}

	return ret;
}

I've run it with the reproducer of syzbot, so far the issue hasn't been
reproduced on my local machine.

Thanks,
Feng

> 
> But either way means rewriting 2 commits. I think it's indeed better to drop
> the series now from -next and submit a v3.
> 
> Vlastimil
> 
> >> Thanks,
> >> -- Marco
> > 
>

Vlastimil Babka Oct. 14, 2024, 8:53 a.m. UTC | #7

On 10/14/24 09:52, Feng Tang wrote:
> On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote:
> Thanks for the suggestion!
> 
> As there were error report about the NULL slab for big kmalloc object, how
> about the following code for 
> 
> __do_krealloc(const void *p, size_t new_size, gfp_t flags)
> {
> 	void *ret;
> 	size_t ks = 0;
> 	int orig_size = 0;
> 	struct kmem_cache *s = NULL;
> 
> 	/* Check for double-free. */
> 	if (likely(!ZERO_OR_NULL_PTR(p))) {
> 		if (!kasan_check_byte(p))
> 			return NULL;
> 
> 		ks = ksize(p);

I think this will result in __ksize() doing
  skip_orig_size_check(folio_slab(folio)->slab_cache, object);
and we don't want that?

Also the checks below repeat some of the checks of ksize().

So I think in __do_krealloc() we should do things manually to determine ks
and not call ksize(). Just not break any of the cases ksize() handles
(kfence, large kmalloc).

> 
> 		/* Some objects have no orig_size, like big kmalloc case */
> 		if (is_kfence_address(p)) {
> 			orig_size = kfence_ksize(p);
> 		} else if (virt_to_slab(p)) {
> 			s = virt_to_cache(p);
> 			orig_size = get_orig_size(s, (void *)p);
> 		}
> 	} else {
> 		goto alloc_new;
> 	}
> 
> 	/* If the object doesn't fit, allocate a bigger one */
> 	if (new_size > ks)
> 		goto alloc_new;
> 
> 	/* Zero out spare memory. */
> 	if (want_init_on_alloc(flags)) {
> 		kasan_disable_current();
> 		if (orig_size && orig_size < new_size)
> 			memset((void *)p + orig_size, 0, new_size - orig_size);
> 		else
> 			memset((void *)p + new_size, 0, ks - new_size);
> 		kasan_enable_current();
> 	}
> 
> 	/* Setup kmalloc redzone when needed */
> 	if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) {
> 		set_orig_size(s, (void *)p, new_size);
> 		if (s->flags & SLAB_RED_ZONE && new_size < ks)
> 			memset_no_sanitize_memory((void *)p + new_size,
> 						SLUB_RED_ACTIVE, ks - new_size);
> 	}
> 
> 	p = kasan_krealloc((void *)p, new_size, flags);
> 	return (void *)p;
> 
> alloc_new:
> 	ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
> 	if (ret && p) {
> 		/* Disable KASAN checks as the object's redzone is accessed. */
> 		kasan_disable_current();
> 		memcpy(ret, kasan_reset_tag(p), orig_size ?: ks);
> 		kasan_enable_current();
> 	}
> 
> 	return ret;
> }
> 
> I've run it with the reproducer of syzbot, so far the issue hasn't been
> reproduced on my local machine.
> 
> Thanks,
> Feng
> 
>> 
>> But either way means rewriting 2 commits. I think it's indeed better to drop
>> the series now from -next and submit a v3.
>> 
>> Vlastimil
>> 
>> >> Thanks,
>> >> -- Marco
>> > 
>>

Feng Tang Oct. 14, 2024, 12:52 p.m. UTC | #8

On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote:
> On 10/14/24 09:52, Feng Tang wrote:
> > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote:
> > Thanks for the suggestion!
> > 
> > As there were error report about the NULL slab for big kmalloc object, how
> > about the following code for 
> > 
> > __do_krealloc(const void *p, size_t new_size, gfp_t flags)
> > {
> > 	void *ret;
> > 	size_t ks = 0;
> > 	int orig_size = 0;
> > 	struct kmem_cache *s = NULL;
> > 
> > 	/* Check for double-free. */
> > 	if (likely(!ZERO_OR_NULL_PTR(p))) {
> > 		if (!kasan_check_byte(p))
> > 			return NULL;
> > 
> > 		ks = ksize(p);
> 
> I think this will result in __ksize() doing
>   skip_orig_size_check(folio_slab(folio)->slab_cache, object);
> and we don't want that?

I think that's fine. As later code will re-set the orig_size anyway.
 
> Also the checks below repeat some of the checks of ksize().

Yes, there is some redundancy, mostly the virt_to_slab() 

> So I think in __do_krealloc() we should do things manually to determine ks
> and not call ksize(). Just not break any of the cases ksize() handles
> (kfence, large kmalloc).

OK, originally I tried not to expose internals of __ksize(). Let me
try this way.

Thanks,
Feng

> 
> > 
> > 		/* Some objects have no orig_size, like big kmalloc case */
> > 		if (is_kfence_address(p)) {
> > 			orig_size = kfence_ksize(p);
> > 		} else if (virt_to_slab(p)) {
> > 			s = virt_to_cache(p);
> > 			orig_size = get_orig_size(s, (void *)p);
> > 		}
> > 	} else {
> > 		goto alloc_new;
> > 	}
> > 
> > 	/* If the object doesn't fit, allocate a bigger one */
> > 	if (new_size > ks)
> > 		goto alloc_new;
> > 
> > 	/* Zero out spare memory. */
> > 	if (want_init_on_alloc(flags)) {
> > 		kasan_disable_current();
> > 		if (orig_size && orig_size < new_size)
> > 			memset((void *)p + orig_size, 0, new_size - orig_size);
> > 		else
> > 			memset((void *)p + new_size, 0, ks - new_size);
> > 		kasan_enable_current();
> > 	}
> > 
> > 	/* Setup kmalloc redzone when needed */
> > 	if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) {
> > 		set_orig_size(s, (void *)p, new_size);
> > 		if (s->flags & SLAB_RED_ZONE && new_size < ks)
> > 			memset_no_sanitize_memory((void *)p + new_size,
> > 						SLUB_RED_ACTIVE, ks - new_size);
> > 	}
> > 
> > 	p = kasan_krealloc((void *)p, new_size, flags);
> > 	return (void *)p;
> > 
> > alloc_new:
> > 	ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
> > 	if (ret && p) {
> > 		/* Disable KASAN checks as the object's redzone is accessed. */
> > 		kasan_disable_current();
> > 		memcpy(ret, kasan_reset_tag(p), orig_size ?: ks);
> > 		kasan_enable_current();
> > 	}
> > 
> > 	return ret;
> > }

Vlastimil Babka Oct. 14, 2024, 1:12 p.m. UTC | #9

On 10/14/24 14:52, Feng Tang wrote:
> On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote:
>> On 10/14/24 09:52, Feng Tang wrote:
>> > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote:
>> > Thanks for the suggestion!
>> > 
>> > As there were error report about the NULL slab for big kmalloc object, how
>> > about the following code for 
>> > 
>> > __do_krealloc(const void *p, size_t new_size, gfp_t flags)
>> > {
>> > 	void *ret;
>> > 	size_t ks = 0;
>> > 	int orig_size = 0;
>> > 	struct kmem_cache *s = NULL;
>> > 
>> > 	/* Check for double-free. */
>> > 	if (likely(!ZERO_OR_NULL_PTR(p))) {
>> > 		if (!kasan_check_byte(p))
>> > 			return NULL;
>> > 
>> > 		ks = ksize(p);
>> 
>> I think this will result in __ksize() doing
>>   skip_orig_size_check(folio_slab(folio)->slab_cache, object);
>> and we don't want that?
> 
> I think that's fine. As later code will re-set the orig_size anyway.

But you also read it first.

>> > 		/* Some objects have no orig_size, like big kmalloc case */
>> > 		if (is_kfence_address(p)) {
>> > 			orig_size = kfence_ksize(p);
>> > 		} else if (virt_to_slab(p)) {
>> > 			s = virt_to_cache(p);
>> > 			orig_size = get_orig_size(s, (void *)p);

here.

>> > 		}

>> Also the checks below repeat some of the checks of ksize().
> 
> Yes, there is some redundancy, mostly the virt_to_slab() 
> 
>> So I think in __do_krealloc() we should do things manually to determine ks
>> and not call ksize(). Just not break any of the cases ksize() handles
>> (kfence, large kmalloc).
> 
> OK, originally I tried not to expose internals of __ksize(). Let me
> try this way.

ksize() makes assumptions that a user outside of slab itself is calling it.

But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid
querying ksize() for the purposes of writing beyond the original
kmalloc(size) up to the bucket size. So maybe we can also investigate if the
skip_orig_size_check() mechanism can be removed now?

Still I think __do_krealloc() should rather do its own thing and not call
ksize().

> Thanks,
> Feng
> 
>> 
>> > 
>> > 	} else {
>> > 		goto alloc_new;
>> > 	}
>> > 
>> > 	/* If the object doesn't fit, allocate a bigger one */
>> > 	if (new_size > ks)
>> > 		goto alloc_new;
>> > 
>> > 	/* Zero out spare memory. */
>> > 	if (want_init_on_alloc(flags)) {
>> > 		kasan_disable_current();
>> > 		if (orig_size && orig_size < new_size)
>> > 			memset((void *)p + orig_size, 0, new_size - orig_size);
>> > 		else
>> > 			memset((void *)p + new_size, 0, ks - new_size);
>> > 		kasan_enable_current();
>> > 	}
>> > 
>> > 	/* Setup kmalloc redzone when needed */
>> > 	if (s && slub_debug_orig_size(s) && !is_kfence_address(p)) {
>> > 		set_orig_size(s, (void *)p, new_size);
>> > 		if (s->flags & SLAB_RED_ZONE && new_size < ks)
>> > 			memset_no_sanitize_memory((void *)p + new_size,
>> > 						SLUB_RED_ACTIVE, ks - new_size);
>> > 	}
>> > 
>> > 	p = kasan_krealloc((void *)p, new_size, flags);
>> > 	return (void *)p;
>> > 
>> > alloc_new:
>> > 	ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
>> > 	if (ret && p) {
>> > 		/* Disable KASAN checks as the object's redzone is accessed. */
>> > 		kasan_disable_current();
>> > 		memcpy(ret, kasan_reset_tag(p), orig_size ?: ks);
>> > 		kasan_enable_current();
>> > 	}
>> > 
>> > 	return ret;
>> > }

Feng Tang Oct. 14, 2024, 2:20 p.m. UTC | #10

On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote:
> On 10/14/24 14:52, Feng Tang wrote:
> > On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote:
> >> On 10/14/24 09:52, Feng Tang wrote:
> >> > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote:
> >> > Thanks for the suggestion!
> >> > 
> >> > As there were error report about the NULL slab for big kmalloc object, how
> >> > about the following code for 
> >> > 
> >> > __do_krealloc(const void *p, size_t new_size, gfp_t flags)
> >> > {
> >> > 	void *ret;
> >> > 	size_t ks = 0;
> >> > 	int orig_size = 0;
> >> > 	struct kmem_cache *s = NULL;
> >> > 
> >> > 	/* Check for double-free. */
> >> > 	if (likely(!ZERO_OR_NULL_PTR(p))) {
> >> > 		if (!kasan_check_byte(p))
> >> > 			return NULL;
> >> > 
> >> > 		ks = ksize(p);
> >> 
> >> I think this will result in __ksize() doing
> >>   skip_orig_size_check(folio_slab(folio)->slab_cache, object);
> >> and we don't want that?
> > 
> > I think that's fine. As later code will re-set the orig_size anyway.
> 
> But you also read it first.
> 
> >> > 		/* Some objects have no orig_size, like big kmalloc case */
> >> > 		if (is_kfence_address(p)) {
> >> > 			orig_size = kfence_ksize(p);
> >> > 		} else if (virt_to_slab(p)) {
> >> > 			s = virt_to_cache(p);
> >> > 			orig_size = get_orig_size(s, (void *)p);
> 
> here.

Aha, you are right!

> 
> >> > 		}
> 
> >> Also the checks below repeat some of the checks of ksize().
> > 
> > Yes, there is some redundancy, mostly the virt_to_slab() 
> > 
> >> So I think in __do_krealloc() we should do things manually to determine ks
> >> and not call ksize(). Just not break any of the cases ksize() handles
> >> (kfence, large kmalloc).
> > 
> > OK, originally I tried not to expose internals of __ksize(). Let me
> > try this way.
> 
> ksize() makes assumptions that a user outside of slab itself is calling it.
> 
> But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid
> querying ksize() for the purposes of writing beyond the original
> kmalloc(size) up to the bucket size. So maybe we can also investigate if the
> skip_orig_size_check() mechanism can be removed now?

I did a quick grep, and fortunately it seems that the ksize() user are
much less than before. We used to see some trouble in network code, which
is now very clean without the need to skip orig_size check. Will check
other call site later.

> Still I think __do_krealloc() should rather do its own thing and not call
> ksize().

Yes. I made some changes: 

static __always_inline __realloc_size(2) void *
__do_krealloc(const void *p, size_t new_size, gfp_t flags)
{
	void *ret;
	size_t ks = 0;
	int orig_size = 0;
	struct kmem_cache *s = NULL;

	/* Check for double-free. */
	if (unlikely(ZERO_OR_NULL_PTR(p)))
		goto alloc_new;

	if (!kasan_check_byte(p))
		return NULL;

	if (is_kfence_address(p)) {
		ks = orig_size = kfence_ksize(p);
	} else {
		struct folio *folio;

		folio = virt_to_folio(p);
		if (unlikely(!folio_test_slab(folio))) {
			/* Big kmalloc object */
			WARN_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE);
			WARN_ON(p != folio_address(folio));
			ks = folio_size(folio);
		} else {
			s = folio_slab(folio)->slab_cache;
			orig_size = get_orig_size(s, (void *)p);
			ks = s->object_size;
		}
	}

	/* If the old object doesn't fit, allocate a bigger one */
	if (new_size > ks)
		goto alloc_new;

	/* Zero out spare memory. */
	if (want_init_on_alloc(flags)) {
		kasan_disable_current();
		if (orig_size && orig_size < new_size)
			memset((void *)p + orig_size, 0, new_size - orig_size);
		else
			memset((void *)p + new_size, 0, ks - new_size);
		kasan_enable_current();
	}

	/* Setup kmalloc redzone when needed */
	if (s && slub_debug_orig_size(s)) {
		set_orig_size(s, (void *)p, new_size);
		if (s->flags & SLAB_RED_ZONE && new_size < ks)
			memset_no_sanitize_memory((void *)p + new_size,
						SLUB_RED_ACTIVE, ks - new_size);
	}

	p = kasan_krealloc((void *)p, new_size, flags);
	return (void *)p;

alloc_new:
	ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
	if (ret && p) {
		/* Disable KASAN checks as the object's redzone is accessed. */
		kasan_disable_current();
		memcpy(ret, kasan_reset_tag(p), orig_size ?: ks);
		kasan_enable_current();
	}

	return ret;
}

Thanks,
Feng

Kees Cook Oct. 14, 2024, 8:35 p.m. UTC | #11

On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote:
> On 10/14/24 14:52, Feng Tang wrote:
> > On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote:
> >> On 10/14/24 09:52, Feng Tang wrote:
> >> > On Fri, Oct 04, 2024 at 05:52:10PM +0800, Vlastimil Babka wrote:
> >> > Thanks for the suggestion!
> >> > 
> >> > As there were error report about the NULL slab for big kmalloc object, how
> >> > about the following code for 
> >> > 
> >> > __do_krealloc(const void *p, size_t new_size, gfp_t flags)
> >> > {
> >> > 	void *ret;
> >> > 	size_t ks = 0;
> >> > 	int orig_size = 0;
> >> > 	struct kmem_cache *s = NULL;
> >> > 
> >> > 	/* Check for double-free. */
> >> > 	if (likely(!ZERO_OR_NULL_PTR(p))) {
> >> > 		if (!kasan_check_byte(p))
> >> > 			return NULL;
> >> > 
> >> > 		ks = ksize(p);
> >> 
> >> I think this will result in __ksize() doing
> >>   skip_orig_size_check(folio_slab(folio)->slab_cache, object);
> >> and we don't want that?
> > 
> > I think that's fine. As later code will re-set the orig_size anyway.
> 
> But you also read it first.
> 
> >> > 		/* Some objects have no orig_size, like big kmalloc case */
> >> > 		if (is_kfence_address(p)) {
> >> > 			orig_size = kfence_ksize(p);
> >> > 		} else if (virt_to_slab(p)) {
> >> > 			s = virt_to_cache(p);
> >> > 			orig_size = get_orig_size(s, (void *)p);
> 
> here.
> 
> >> > 		}
> 
> >> Also the checks below repeat some of the checks of ksize().
> > 
> > Yes, there is some redundancy, mostly the virt_to_slab() 
> > 
> >> So I think in __do_krealloc() we should do things manually to determine ks
> >> and not call ksize(). Just not break any of the cases ksize() handles
> >> (kfence, large kmalloc).
> > 
> > OK, originally I tried not to expose internals of __ksize(). Let me
> > try this way.
> 
> ksize() makes assumptions that a user outside of slab itself is calling it.
> 
> But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid
> querying ksize() for the purposes of writing beyond the original
> kmalloc(size) up to the bucket size. So maybe we can also investigate if the
> skip_orig_size_check() mechanism can be removed now?
> 
> Still I think __do_krealloc() should rather do its own thing and not call
> ksize().

The goal was to avoid having users of the allocation APIs change the
sizes of allocations without calling into realloc. This is because
otherwise the "alloc_size" attribute used by compilers inform
__builtin_dynamic_object_size() can get confused:

ptr = alloc(less_than_bucket_size);
...
size = ksize(ptr); /* larger size! */
memcpy(ptr, src, size); /* compiler instrumentation doesn't see that ptr "grows" */

So the callers use kmalloc_size_roundup() to just allocate the rounded
up size immediately. Internally, the allocator can do what it wants.

Kees Cook Oct. 14, 2024, 8:40 p.m. UTC | #12

On Mon, Oct 14, 2024 at 10:20:36PM +0800, Feng Tang wrote:
> On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote:
> > On 10/14/24 14:52, Feng Tang wrote:
> > > On Mon, Oct 14, 2024 at 10:53:32AM +0200, Vlastimil Babka wrote:
> > >> On 10/14/24 09:52, Feng Tang wrote:
> > > OK, originally I tried not to expose internals of __ksize(). Let me
> > > try this way.
> > 
> > ksize() makes assumptions that a user outside of slab itself is calling it.
> > 
> > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid
> > querying ksize() for the purposes of writing beyond the original
> > kmalloc(size) up to the bucket size. So maybe we can also investigate if the
> > skip_orig_size_check() mechanism can be removed now?
> 
> I did a quick grep, and fortunately it seems that the ksize() user are
> much less than before. We used to see some trouble in network code, which
> is now very clean without the need to skip orig_size check. Will check
> other call site later.

Right -- only things that are performing a reallocation should be using
ksize(). e.g. see __slab_build_skb()

Feng Tang Nov. 4, 2024, 11:28 a.m. UTC | #13

On Mon, Oct 14, 2024 at 10:20:36PM +0800, Tang, Feng wrote:
> On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote:
> > > 
> > >> So I think in __do_krealloc() we should do things manually to determine ks
> > >> and not call ksize(). Just not break any of the cases ksize() handles
> > >> (kfence, large kmalloc).
> > > 
> > > OK, originally I tried not to expose internals of __ksize(). Let me
> > > try this way.
> > 
> > ksize() makes assumptions that a user outside of slab itself is calling it.
> > 
> > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid
> > querying ksize() for the purposes of writing beyond the original
> > kmalloc(size) up to the bucket size. So maybe we can also investigate if the
> > skip_orig_size_check() mechanism can be removed now?
> 
> I did a quick grep, and fortunately it seems that the ksize() user are
> much less than before. We used to see some trouble in network code, which
> is now very clean without the need to skip orig_size check. Will check
> other call site later.
 

I did more further check about ksize() usage, and there are still some
places to be handled. The thing stands out is kfree_sensitive(), and
another potential one is sound/soc/codecs/cs-amp-lib-test.c

Some details:

* Thanks to Kees Cook, who has cured many cases of ksize() as below:
  
  drivers/base/devres.c:        total_old_size = ksize(container_of(ptr, struct devres, data));
  drivers/net/ethernet/intel/igb/igb_main.c:        } else if (size > ksize(q_vector)) {   
  net/core/skbuff.c:        *size = ksize(data);
  net/openvswitch/flow_netlink.c:        new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
  kernel/bpf/verifier.c:        alloc_bytes = max(ksize(orig), kmalloc_size_roundup(bytes));

* Some callers use ksize() mostly for calculation or sanity check,
  and not for accessing those extra space, which are fine:

  drivers/gpu/drm/drm_managed.c:        WARN_ON(dev + 1 > (struct drm_device *) (container + ksize(container)));
  lib/kunit/string-stream-test.c:        actual_bytes_used = ksize(stream);
  lib/kunit/string-stream-test.c:                actual_bytes_used += ksize(frag_container);
  lib/kunit/string-stream-test.c:                actual_bytes_used += ksize(frag_container->fragment);
  mm/nommu.c:                return ksize(objp);
  mm/util.c:                        memcpy(n, kasan_reset_tag(p), ksize(p));
  security/tomoyo/gc.c:        tomoyo_memory_used[TOMOYO_MEMORY_POLICY] -= ksize(ptr);
  security/tomoyo/memory.c:                const size_t s = ksize(ptr);
  drivers/md/dm-vdo/memory-alloc.c:                        add_kmalloc_block(ksize(p));
  drivers/md/dm-vdo/memory-alloc.c:                add_kmalloc_block(ksize(p));
  drivers/md/dm-vdo/memory-alloc.c:                        remove_kmalloc_block(ksize(ptr));
	
* One usage may need to be handled 
 
  sound/soc/codecs/cs-amp-lib-test.c:        KUNIT_ASSERT_GE_MSG(test, ksize(buf), priv->cal_blob->size, "Buffer to small");

* bigger problem is the kfree_sensitive(), which will use ksize() to
  get the total size and then zero all of them.
  
  One solution for this could be get the kmem_cache first, and
  do the skip_orig_size_check() 

Thanks,
Feng

Vlastimil Babka Nov. 4, 2024, 11:45 a.m. UTC | #14

On 11/4/24 12:28, Feng Tang wrote:
> On Mon, Oct 14, 2024 at 10:20:36PM +0800, Tang, Feng wrote:
>> On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote:
>> > > 
>> > >> So I think in __do_krealloc() we should do things manually to determine ks
>> > >> and not call ksize(). Just not break any of the cases ksize() handles
>> > >> (kfence, large kmalloc).
>> > > 
>> > > OK, originally I tried not to expose internals of __ksize(). Let me
>> > > try this way.
>> > 
>> > ksize() makes assumptions that a user outside of slab itself is calling it.
>> > 
>> > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid
>> > querying ksize() for the purposes of writing beyond the original
>> > kmalloc(size) up to the bucket size. So maybe we can also investigate if the
>> > skip_orig_size_check() mechanism can be removed now?
>> 
>> I did a quick grep, and fortunately it seems that the ksize() user are
>> much less than before. We used to see some trouble in network code, which
>> is now very clean without the need to skip orig_size check. Will check
>> other call site later.
>  
> 
> I did more further check about ksize() usage, and there are still some
> places to be handled. The thing stands out is kfree_sensitive(), and
> another potential one is sound/soc/codecs/cs-amp-lib-test.c
> 
> Some details:
> 
> * Thanks to Kees Cook, who has cured many cases of ksize() as below:
>   
>   drivers/base/devres.c:        total_old_size = ksize(container_of(ptr, struct devres, data));
>   drivers/net/ethernet/intel/igb/igb_main.c:        } else if (size > ksize(q_vector)) {   
>   net/core/skbuff.c:        *size = ksize(data);
>   net/openvswitch/flow_netlink.c:        new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
>   kernel/bpf/verifier.c:        alloc_bytes = max(ksize(orig), kmalloc_size_roundup(bytes));
> 
> * Some callers use ksize() mostly for calculation or sanity check,
>   and not for accessing those extra space, which are fine:
> 
>   drivers/gpu/drm/drm_managed.c:        WARN_ON(dev + 1 > (struct drm_device *) (container + ksize(container)));
>   lib/kunit/string-stream-test.c:        actual_bytes_used = ksize(stream);
>   lib/kunit/string-stream-test.c:                actual_bytes_used += ksize(frag_container);
>   lib/kunit/string-stream-test.c:                actual_bytes_used += ksize(frag_container->fragment);
>   mm/nommu.c:                return ksize(objp);
>   mm/util.c:                        memcpy(n, kasan_reset_tag(p), ksize(p));
>   security/tomoyo/gc.c:        tomoyo_memory_used[TOMOYO_MEMORY_POLICY] -= ksize(ptr);
>   security/tomoyo/memory.c:                const size_t s = ksize(ptr);
>   drivers/md/dm-vdo/memory-alloc.c:                        add_kmalloc_block(ksize(p));
>   drivers/md/dm-vdo/memory-alloc.c:                add_kmalloc_block(ksize(p));
>   drivers/md/dm-vdo/memory-alloc.c:                        remove_kmalloc_block(ksize(ptr));
> 	
> * One usage may need to be handled 
>  
>   sound/soc/codecs/cs-amp-lib-test.c:        KUNIT_ASSERT_GE_MSG(test, ksize(buf), priv->cal_blob->size, "Buffer to small");
> 
> * bigger problem is the kfree_sensitive(), which will use ksize() to
>   get the total size and then zero all of them.
>   
>   One solution for this could be get the kmem_cache first, and
>   do the skip_orig_size_check() 

Maybe add a parameter for __ksize() that controls if we do
skip_orig_size_check(), current ksize() will pass "false" to it (once
remaining wrong users are handled), then another ksize_internal() variant
will pass "true" and be used from kfree_sensitive()?

> Thanks,
> Feng

Feng Tang Nov. 4, 2024, 12:37 p.m. UTC | #15

On Mon, Nov 04, 2024 at 12:45:51PM +0100, Vlastimil Babka wrote:
> On 11/4/24 12:28, Feng Tang wrote:
> > On Mon, Oct 14, 2024 at 10:20:36PM +0800, Tang, Feng wrote:
> >> On Mon, Oct 14, 2024 at 03:12:09PM +0200, Vlastimil Babka wrote:
> >> > > 
> >> > >> So I think in __do_krealloc() we should do things manually to determine ks
> >> > >> and not call ksize(). Just not break any of the cases ksize() handles
> >> > >> (kfence, large kmalloc).
> >> > > 
> >> > > OK, originally I tried not to expose internals of __ksize(). Let me
> >> > > try this way.
> >> > 
> >> > ksize() makes assumptions that a user outside of slab itself is calling it.
> >> > 
> >> > But we (well mostly Kees) also introduced kmalloc_size_roundup() to avoid
> >> > querying ksize() for the purposes of writing beyond the original
> >> > kmalloc(size) up to the bucket size. So maybe we can also investigate if the
> >> > skip_orig_size_check() mechanism can be removed now?
> >> 
> >> I did a quick grep, and fortunately it seems that the ksize() user are
> >> much less than before. We used to see some trouble in network code, which
> >> is now very clean without the need to skip orig_size check. Will check
> >> other call site later.
> >  
> > 
> > I did more further check about ksize() usage, and there are still some
> > places to be handled. The thing stands out is kfree_sensitive(), and
> > another potential one is sound/soc/codecs/cs-amp-lib-test.c
> > 
> > Some details:
> > 
> > * Thanks to Kees Cook, who has cured many cases of ksize() as below:
> >   
> >   drivers/base/devres.c:        total_old_size = ksize(container_of(ptr, struct devres, data));
> >   drivers/net/ethernet/intel/igb/igb_main.c:        } else if (size > ksize(q_vector)) {   
> >   net/core/skbuff.c:        *size = ksize(data);
> >   net/openvswitch/flow_netlink.c:        new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
> >   kernel/bpf/verifier.c:        alloc_bytes = max(ksize(orig), kmalloc_size_roundup(bytes));
> > 
> > * Some callers use ksize() mostly for calculation or sanity check,
> >   and not for accessing those extra space, which are fine:
> > 
> >   drivers/gpu/drm/drm_managed.c:        WARN_ON(dev + 1 > (struct drm_device *) (container + ksize(container)));
> >   lib/kunit/string-stream-test.c:        actual_bytes_used = ksize(stream);
> >   lib/kunit/string-stream-test.c:                actual_bytes_used += ksize(frag_container);
> >   lib/kunit/string-stream-test.c:                actual_bytes_used += ksize(frag_container->fragment);
> >   mm/nommu.c:                return ksize(objp);
> >   mm/util.c:                        memcpy(n, kasan_reset_tag(p), ksize(p));
> >   security/tomoyo/gc.c:        tomoyo_memory_used[TOMOYO_MEMORY_POLICY] -= ksize(ptr);
> >   security/tomoyo/memory.c:                const size_t s = ksize(ptr);
> >   drivers/md/dm-vdo/memory-alloc.c:                        add_kmalloc_block(ksize(p));
> >   drivers/md/dm-vdo/memory-alloc.c:                add_kmalloc_block(ksize(p));
> >   drivers/md/dm-vdo/memory-alloc.c:                        remove_kmalloc_block(ksize(ptr));
> > 	
> > * One usage may need to be handled 
> >  
> >   sound/soc/codecs/cs-amp-lib-test.c:        KUNIT_ASSERT_GE_MSG(test, ksize(buf), priv->cal_blob->size, "Buffer to small");
> > 
> > * bigger problem is the kfree_sensitive(), which will use ksize() to
> >   get the total size and then zero all of them.
> >   
> >   One solution for this could be get the kmem_cache first, and
> >   do the skip_orig_size_check() 
> 
> Maybe add a parameter for __ksize() that controls if we do
> skip_orig_size_check(), current ksize() will pass "false" to it (once
> remaining wrong users are handled), then another ksize_internal() variant
> will pass "true" and be used from kfree_sensitive()?

Sounds good to me! And for future wrong usages of ksize(), we can fix
them case by case when they are deteced.

Thanks,
Feng

[v2,0/5] mm/slub: Improve data handling of krealloc() when orig_size is enabled

Message

Comments