mbox series

[v2,00/23] common kmalloc for SLUB and SLAB v2

Message ID 20220414085727.643099-1-42.hyeyoo@gmail.com (mailing list archive)
Headers show
Series common kmalloc for SLUB and SLAB v2 | expand

Message

Hyeonggon Yoo April 14, 2022, 8:57 a.m. UTC
Changes since v1:
- The code is basically same, but split some patches to make it
easier to review.
- Adjusted comments and added Reviewed-by from Vlastimil. Thanks!

Hello, this series is cleanup of kmalloc code. After this series,
kmalloc subsystem is perfectly generalized on SLAB and SLUB.

This series is not small, but each patch is easy to review and
I bet you will like this :)

Many thanks to Matthew, Marco, Vlastimil who gave comments
in previous series.

Any feedbacks will be appreciated.
Thanks!

======== series description =======

patch 1-2 make slab_alloc_node() in SLAB available for non-NUMA
configurations for further cleanup.

patch 3-12 remove duplicate code in common kmalloc functions.

patch 13 makes SLAB pass requests larger than order-1 page to page
allocator. This is useful for further generalization.

patch 14-16 unify tracepoints. after them, we use only
kmem_cache_alloc and kmem_cache_free tracepoints in slab allocators.

patch 17-18 generalize whole kmalloc subsystem on SLAB/SLUB.

patch 19-20 factor out common code of __kmalloc_node()
and __kmalloc_node_track_caller().

patch 21 removes now unnecessary kmem_cache_alloc_node_trace().

patch 22-23 is small improvements of __ksize(). They are not part of
generalization but depends on this series.

Hyeonggon Yoo (23):
  mm/slab: move NUMA-related code to __do_cache_alloc()
  mm/slab: cleanup slab_alloc() and slab_alloc_node()
  mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
  mm/slab_common: cleanup kmalloc_track_caller()
  mm/slab_common: cleanup __kmalloc()
  mm/sl[auo]b: fold kmalloc_order_trace() into kmalloc_large()
  mm/slub: move kmalloc_large_node() to slab_common.c
  mm/slab_common: make kmalloc_large_node() consistent with
    kmalloc_large()
  mm/slab_common: cleanup kmalloc_large()
  mm/slab_common: cleanup kmem_cache_alloc{,node,lru}
  mm/slab_common: kmalloc_node: pass large requests to page allocator
  mm/slab_common: cleanup kmalloc()
  mm/slab: kmalloc: pass requests larger than order-1 page to page
    allocator
  mm/slab_common: print cache name in tracepoints
  mm/slab_common: use same tracepoint in kmalloc and normal caches
  mm/slab_common: rename tracepoint
  mm/slab_common: implement __kmem_cache_free()
  mm/sl[au]b: generalize kmalloc subsystem
  mm/slab_common: add kasan_kmalloc() in __kmalloc_node_track_caller()
  mm/slab_common: factor out __do_kmalloc_node()
  mm/sl[au]b: remove kmem_cache_alloc_node_trace()
  mm/sl[auo]b: move definition of __ksize() to mm/slab.h
  mm/sl[au]b: check if large object is valid in __ksize()

 include/linux/slab.h        | 245 ++++++++++++-------------
 include/trace/events/kmem.h | 109 ++---------
 mm/slab.c                   | 354 ++++++------------------------------
 mm/slab.h                   |   9 +
 mm/slab_common.c            | 146 ++++++++++++---
 mm/slob.c                   |  78 +++-----
 mm/slub.c                   | 264 ++-------------------------
 7 files changed, 350 insertions(+), 855 deletions(-)

Comments

Hyeonggon Yoo April 14, 2022, 12:36 p.m. UTC | #1
On Thu, Apr 14, 2022 at 05:57:04PM +0900, Hyeonggon Yoo wrote:
> Changes since v1:
> - The code is basically same, but split some patches to make it
> easier to review.
> - Adjusted comments and added Reviewed-by from Vlastimil. Thanks!
> 
> Hello, this series is cleanup of kmalloc code. After this series,
> kmalloc subsystem is perfectly generalized on SLAB and SLUB.
> 
> This series is not small, but each patch is easy to review and
> I bet you will like this :)
> 
> Many thanks to Matthew, Marco, Vlastimil who gave comments
> in previous series.
> 
> Any feedbacks will be appreciated.
> Thanks!

I mistakenly sent a wrong version of series, before fixing a build bug
and a build warning. The latest version is available at:

https://github.com/hygoni/linux/tree/slab-common-v5.19-v1r1

And below is difference between the series in this thread
and tree above is:

diff --git a/0018-mm-sl-au-b-generalize-kmalloc-subsystem.patch b/0018-mm-sl-au-b-generalize-kmalloc-subsystem-v2.patch
index 2389975..510f7fd 100644
--- a/0018-mm-sl-au-b-generalize-kmalloc-subsystem.patch
+++ b/0018-mm-sl-au-b-generalize-kmalloc-subsystem-v2.patch
@@ -207,7 +207,7 @@ index 3cd5d7a47ec7..daf626e25e12 100644
 +
 +      slab = folio_slab(folio);
 +      s = slab->slab_cache;
-+      __kmem_cache_free(s, object, _RET_IP_);
++      __kmem_cache_free(s, (void *)object, _RET_IP_);
 +}
 +EXPORT_SYMBOL(kfree);
 +

Fixed a warning here

diff --git a/0020-mm-slab_common-factor-out-__do_kmalloc_node.patch b/0020-mm-slab_common-factor-out-__do_kmalloc_node-v2.patch
index 133d293..30d9ca2 100644
--- a/0020-mm-slab_common-factor-out-__do_kmalloc_node.patch
+++ b/0020-mm-slab_common-factor-out-__do_kmalloc_node-v2.patch
@@ -60,7 +60,7 @@ index 6abe7f61c197..af563e64e8aa 100644
 -      ret = kasan_kmalloc(s, ret, size, gfpflags);
 -
 -      return ret;
-+      return __do_kmalloc_node(size, flags, node, caller);
++      return __do_kmalloc_node(size, gfpflags, node, caller);
  }
  EXPORT_SYMBOL(__kmalloc_node_track_caller);

And a build bug here.

Anything else looks just fine.
Thanks!
Vlastimil Babka April 26, 2022, 5:15 p.m. UTC | #2
On 4/14/22 10:57, Hyeonggon Yoo wrote:
> Move tracepoints into kmalloc_large_node() and add missing flag fix code.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Hm so there's a problem with the tracepoint's caller.

kmalloc_large() is only called from kmalloc() which is an inline  thus the
callsite of kmalloc() calls directly kmalloc_large(). So when
kmalloc_large() does "trace_kmalloc(_RET_IP_, ...)" the _RET_IP_ is the
callsite of kmalloc(), which is what we want.

But with kmalloc_large_node()...

> ---
>  mm/slab_common.c |  6 ++++++
>  mm/slub.c        | 22 ++++------------------
>  2 files changed, 10 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index e72089515030..cf17be8cd9ad 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -955,6 +955,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>  	void *ptr = NULL;
>  	unsigned int order = get_order(size);
>  
> +	if (unlikely(flags & GFP_SLAB_BUG_MASK))
> +		flags = kmalloc_fix_flags(flags);
> +
>  	flags |= __GFP_COMP;
>  	page = alloc_pages_node(node, flags, order);
>  	if (page) {
> @@ -966,6 +969,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>  	ptr = kasan_kmalloc_large(ptr, size, flags);
>  	/* As ptr might get tagged, call kmemleak hook after KASAN. */
>  	kmemleak_alloc(ptr, size, 1, flags);
> +	trace_kmalloc_node(_RET_IP_, ptr,
> +			   size, PAGE_SIZE << order,
> +			   flags, node);

... the _RET_IP_ here would be __kmalloc_node() which is not useful.

>  	return ptr;
>  }
> diff --git a/mm/slub.c b/mm/slub.c
> index 640712706f2b..f10a892f1772 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4396,15 +4396,8 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
>  	struct kmem_cache *s;
>  	void *ret;
>  
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> -		ret = kmalloc_large_node(size, flags, node);
> -
> -		trace_kmalloc_node(_RET_IP_, ret,
> -				   size, PAGE_SIZE << get_order(size),
> -				   flags, node);

Here it was OK because __kmalloc_node is expanded from something inline
coming from slab.h.

> -
> -		return ret;
> -	}
> +	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> +		return kmalloc_large_node(size, flags, node);
>  
>  	s = kmalloc_slab(size, flags);
>  
> @@ -4861,15 +4854,8 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
>  	struct kmem_cache *s;
>  	void *ret;
>  
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> -		ret = kmalloc_large_node(size, gfpflags, node);
> -
> -		trace_kmalloc_node(caller, ret,
> -				   size, PAGE_SIZE << get_order(size),
> -				   gfpflags, node);
> -
> -		return ret;
> -	}
> +	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> +		return kmalloc_large_node(size, gfpflags, node);

And here it even forgets the 'caller'.

>  
>  	s = kmalloc_slab(size, gfpflags);
>
Hyeonggon Yoo April 28, 2022, 6:35 a.m. UTC | #3
On Tue, Apr 26, 2022 at 07:15:06PM +0200, Vlastimil Babka wrote:
> On 4/14/22 10:57, Hyeonggon Yoo wrote:
> > Move tracepoints into kmalloc_large_node() and add missing flag fix code.
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 

Hello Vlastimil, thanks for review! ;-)

> Hm so there's a problem with the tracepoint's caller.
> 
> kmalloc_large() is only called from kmalloc() which is an inline  thus the
> callsite of kmalloc() calls directly kmalloc_large().
> So when kmalloc_large() does "trace_kmalloc(_RET_IP_, ...)" the _RET_IP_ is the
> callsite of kmalloc(), which is what we want.

kmalloc_large() had the exact problem before my series when called from __kmalloc().

On top of current slab/for-next:
  [000] .....    43.172574: kmalloc: call_site=__kmalloc+0x2aa/0x300 ptr=ffff88e2183a0000 bytes_req=12368 bytes_alloc=16384 gfp_flags=GFP_KERNEL

Considering different usecases of kmalloc_large_node() (called from kmalloc_node() or __kmalloc_node()),
I think we need trace/notrace version of kmalloc_large_node().

> 
> But with kmalloc_large_node()...
> 
> > ---
> >  mm/slab_common.c |  6 ++++++
> >  mm/slub.c        | 22 ++++------------------
> >  2 files changed, 10 insertions(+), 18 deletions(-)
> > 
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index e72089515030..cf17be8cd9ad 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -955,6 +955,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> >  	void *ptr = NULL;
> >  	unsigned int order = get_order(size);
> >  
> > +	if (unlikely(flags & GFP_SLAB_BUG_MASK))
> > +		flags = kmalloc_fix_flags(flags);
> > +
> >  	flags |= __GFP_COMP;
> >  	page = alloc_pages_node(node, flags, order);
> >  	if (page) {
> > @@ -966,6 +969,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> >  	ptr = kasan_kmalloc_large(ptr, size, flags);
> >  	/* As ptr might get tagged, call kmemleak hook after KASAN. */
> >  	kmemleak_alloc(ptr, size, 1, flags);
> > +	trace_kmalloc_node(_RET_IP_, ptr,
> > +			   size, PAGE_SIZE << order,
> > +			   flags, node);
> 
> ... the _RET_IP_ here would be __kmalloc_node() which is not useful.
> 
> >  	return ptr;
> >  }
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 640712706f2b..f10a892f1772 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -4396,15 +4396,8 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
> >  	struct kmem_cache *s;
> >  	void *ret;
> >  
> > -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> > -		ret = kmalloc_large_node(size, flags, node);
> > -
> > -		trace_kmalloc_node(_RET_IP_, ret,
> > -				   size, PAGE_SIZE << get_order(size),
> > -				   flags, node);
> 
> Here it was OK because __kmalloc_node is expanded from something inline
> coming from slab.h.
> 
> > -
> > -		return ret;
> > -	}
> > +	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> > +		return kmalloc_large_node(size, flags, node);
> >  
> >  	s = kmalloc_slab(size, flags);
> >  
> > @@ -4861,15 +4854,8 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> >  	struct kmem_cache *s;
> >  	void *ret;
> >  
> > -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> > -		ret = kmalloc_large_node(size, gfpflags, node);
> > -
> > -		trace_kmalloc_node(caller, ret,
> > -				   size, PAGE_SIZE << get_order(size),
> > -				   gfpflags, node);
> > -
> > -		return ret;
> > -	}
> > +	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> > +		return kmalloc_large_node(size, gfpflags, node);
> 
> And here it even forgets the 'caller'.
>

Thanks for catching this.
I think notrace version + tracepoint would fit here.

> >  
> >  	s = kmalloc_slab(size, gfpflags);
> >  
>