[v1] memcg: add charging of already allocated slab objects

Message ID	20240826232908.4076417-1-shakeel.butt@linux.dev (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4234199392 for <netdev@vger.kernel.org>; Mon, 26 Aug 2024 23:29:28 +0000 (UTC) From: Shakeel Butt <shakeel.butt@linux.dev> To: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>, Muchun Song <muchun.song@linux.dev>, Vlastimil Babka <vbabka@suse.cz>, David Rientjes <rientjes@google.com>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, Eric Dumazet <edumazet@google.com>, "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team <kernel-team@meta.com>, cgroups@vger.kernel.org, netdev@vger.kernel.org Subject: [PATCH v1] memcg: add charging of already allocated slab objects Date: Mon, 26 Aug 2024 16:29:08 -0700 Message-ID: <20240826232908.4076417-1-shakeel.butt@linux.dev> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v1] memcg: add charging of already allocated slab objects \| expand [v1] memcg: add charging of already allocated slab objects

Context	Check	Description
netdev/tree_selection	success	Not a local patch

Shakeel Butt Aug. 26, 2024, 11:29 p.m. UTC

At the moment, the slab objects are charged to the memcg at the
allocation time. However there are cases where slab objects are
allocated at the time where the right target memcg to charge it to is
not known. One such case is the network sockets for the incoming
connection which are allocated in the softirq context.

Couple hundred thousand connections are very normal on large loaded
server and almost all of those sockets underlying those connections get
allocated in the softirq context and thus not charged to any memcg.
However later at the accept() time we know the right target memcg to
charge. Let's add new API to charge already allocated objects, so we can
have better accounting of the memory usage.

To measure the performance impact of this change, tcp_crr is used from
the neper [1] performance suite. Basically it is a network ping pong
test with new connection for each ping pong.

The server and the client are run inside 3 level of cgroup hierarchy
using the following commands:

Server:
 $ tcp_crr -6

Client:
 $ tcp_crr -6 -c -H ${server_ip}

If the client and server run on different machines with 50 GBPS NIC,
there is no visible impact of the change.

For the same machine experiment with v6.11-rc5 as base.

          base (throughput)     with-patch
tcp_crr   14545 (+- 80)         14463 (+- 56)

It seems like the performance impact is within the noise.

Link: https://github.com/google/neper [1]
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---

Changes since the RFC:
- Added check for already charged slab objects.
- Added performance results from neper's tcp_crr

 include/linux/slab.h            |  1 +
 mm/slub.c                       | 54 +++++++++++++++++++++++++++++++++
 net/ipv4/inet_connection_sock.c |  5 +--
 3 files changed, 58 insertions(+), 2 deletions(-)

Roman Gushchin Aug. 27, 2024, 3:06 a.m. UTC | #1

On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote:
> At the moment, the slab objects are charged to the memcg at the
> allocation time. However there are cases where slab objects are
> allocated at the time where the right target memcg to charge it to is
> not known. One such case is the network sockets for the incoming
> connection which are allocated in the softirq context.
> 
> Couple hundred thousand connections are very normal on large loaded
> server and almost all of those sockets underlying those connections get
> allocated in the softirq context and thus not charged to any memcg.
> However later at the accept() time we know the right target memcg to
> charge. Let's add new API to charge already allocated objects, so we can
> have better accounting of the memory usage.
> 
> To measure the performance impact of this change, tcp_crr is used from
> the neper [1] performance suite. Basically it is a network ping pong
> test with new connection for each ping pong.
> 
> The server and the client are run inside 3 level of cgroup hierarchy
> using the following commands:
> 
> Server:
>  $ tcp_crr -6
> 
> Client:
>  $ tcp_crr -6 -c -H ${server_ip}
> 
> If the client and server run on different machines with 50 GBPS NIC,
> there is no visible impact of the change.
> 
> For the same machine experiment with v6.11-rc5 as base.
> 
>           base (throughput)     with-patch
> tcp_crr   14545 (+- 80)         14463 (+- 56)
> 
> It seems like the performance impact is within the noise.
> 
> Link: https://github.com/google/neper [1]
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Hi Shakeel,

I like the idea and performance numbers look good. However some comments on
the implementation:

> ---
> 
> Changes since the RFC:
> - Added check for already charged slab objects.
> - Added performance results from neper's tcp_crr
> 
>  include/linux/slab.h            |  1 +
>  mm/slub.c                       | 54 +++++++++++++++++++++++++++++++++
>  net/ipv4/inet_connection_sock.c |  5 +--
>  3 files changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index eb2bf4629157..05cfab107c72 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>  			    gfp_t gfpflags) __assume_slab_alignment __malloc;
>  #define kmem_cache_alloc_lru(...)	alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>  
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>  void kmem_cache_free(struct kmem_cache *s, void *objp);
>  
>  kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> diff --git a/mm/slub.c b/mm/slub.c
> index c9d8a2497fd6..580683597b5c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>  
>  	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>  }
> +
> +static __fastpath_inline
> +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags)
> +{
> +	if (likely(!memcg_kmem_online()))
> +		return true;

We do have this check in kmem_cache_charge(), why do we need to check it again?

> +
> +	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> +}
> +
>  #else /* CONFIG_MEMCG */
>  static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>  					      struct list_lru *lru,
> @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>  					void **p, int objects)
>  {
>  }
> +
> +static inline bool memcg_slab_post_charge(struct kmem_cache *s,
> +					  void *p,
> +					  gfp_t flags)
> +{
> +	return true;
> +}
>  #endif /* CONFIG_MEMCG */
>  
>  /*
> @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>  
> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
> +		      SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
> +
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> +{
> +	struct slabobj_ext *slab_exts;
> +	struct kmem_cache *s;
> +	struct folio *folio;
> +	struct slab *slab;
> +	unsigned long off;
> +
> +	if (!memcg_kmem_online())
> +		return true;
> +
> +	folio = virt_to_folio(objp);
> +	if (unlikely(!folio_test_slab(folio)))
> +		return false;

Does it handle the case of a too-big-to-be-a-slab-object allocation?
I think it's better to handle it properly. Also, why return false here?

> +
> +	slab = folio_slab(folio);
> +	s = slab->slab_cache;
> +
> +	/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> +	if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC)
> +		return true;

And true here? It seems to be a bit inconsistent.
Also, if we have this check here, it means your function won't handle kmallocs
at all? Because !KMALLOC_NORMAL allocations won't get here.

> +
> +	/* Ignore already charged objects. */
> +	slab_exts = slab_obj_exts(slab);
> +	if (slab_exts) {
> +		off = obj_to_index(s, slab, objp);
> +		if (unlikely(slab_exts[off].objcg))
> +			return true;
> +	}
> +
> +	return memcg_slab_post_charge(s, objp, gfpflags);
> +}
> +EXPORT_SYMBOL(kmem_cache_charge);
> +
>  /**
>   * kmem_cache_alloc_node - Allocate an object on the specified node
>   * @s: The cache to allocate from.
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 64d07b842e73..3c13ca8c11fb 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>  	release_sock(sk);
>  	if (newsk && mem_cgroup_sockets_enabled) {
>  		int amt = 0;
> +		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>  
>  		/* atomically get the memory usage, set and charge the
>  		 * newsk->sk_memcg.
> @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>  		}
>  
>  		if (amt)
> -			mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> -						GFP_KERNEL | __GFP_NOFAIL);
> +			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> +		kmem_cache_charge(newsk, gfp);

Wait, so we assume that newsk->sk_memcg === current memcg? Or we're ok with them being
different?

Thanks!

kernel test robot Aug. 27, 2024, 1:40 p.m. UTC | #2

Hi Shakeel,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.11-rc5 next-20240827]
[cannot apply to vbabka-slab/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shakeel-Butt/memcg-add-charging-of-already-allocated-slab-objects/20240827-073150
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240826232908.4076417-1-shakeel.butt%40linux.dev
patch subject: [PATCH v1] memcg: add charging of already allocated slab objects
config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20240827/202408272111.T6bMZmU9-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408272111.T6bMZmU9-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408272111.T6bMZmU9-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:5,
                    from arch/openrisc/include/asm/bug.h:5,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from mm/slub.c:13:
   mm/slub.c: In function 'kmem_cache_charge':
>> mm/slub.c:4115:44: error: 'struct slabobj_ext' has no member named 'objcg'
    4115 |                 if (unlikely(slab_exts[off].objcg))
         |                                            ^
   include/linux/compiler.h:77:45: note: in definition of macro 'unlikely'
      77 | # define unlikely(x)    __builtin_expect(!!(x), 0)
         |                                             ^


vim +4115 mm/slub.c

  4085	
  4086	#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
  4087			      SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
  4088	
  4089	bool kmem_cache_charge(void *objp, gfp_t gfpflags)
  4090	{
  4091		struct slabobj_ext *slab_exts;
  4092		struct kmem_cache *s;
  4093		struct folio *folio;
  4094		struct slab *slab;
  4095		unsigned long off;
  4096	
  4097		if (!memcg_kmem_online())
  4098			return true;
  4099	
  4100		folio = virt_to_folio(objp);
  4101		if (unlikely(!folio_test_slab(folio)))
  4102			return false;
  4103	
  4104		slab = folio_slab(folio);
  4105		s = slab->slab_cache;
  4106	
  4107		/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
  4108		if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC)
  4109			return true;
  4110	
  4111		/* Ignore already charged objects. */
  4112		slab_exts = slab_obj_exts(slab);
  4113		if (slab_exts) {
  4114			off = obj_to_index(s, slab, objp);
> 4115			if (unlikely(slab_exts[off].objcg))
  4116				return true;
  4117		}
  4118	
  4119		return memcg_slab_post_charge(s, objp, gfpflags);
  4120	}
  4121	EXPORT_SYMBOL(kmem_cache_charge);
  4122

kernel test robot Aug. 27, 2024, 4:03 p.m. UTC | #3

Hi Shakeel,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.11-rc5 next-20240827]
[cannot apply to vbabka-slab/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shakeel-Butt/memcg-add-charging-of-already-allocated-slab-objects/20240827-073150
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240826232908.4076417-1-shakeel.butt%40linux.dev
patch subject: [PATCH v1] memcg: add charging of already allocated slab objects
config: s390-allnoconfig (https://download.01.org/0day-ci/archive/20240827/202408272341.k4cl3jz0-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 08e5a1de8227512d4774a534b91cb2353cef6284)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408272341.k4cl3jz0-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408272341.k4cl3jz0-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from mm/slub.c:13:
   In file included from include/linux/mm.h:2198:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from mm/slub.c:49:
   In file included from mm/internal.h:13:
   include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      47 |         __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
         |                                    ~~~~~~~~~~~ ^ ~~~
   include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      49 |                                 NR_ZONE_LRU_BASE + lru, nr_pages);
         |                                 ~~~~~~~~~~~~~~~~ ^ ~~~
>> mm/slub.c:4115:31: error: no member named 'objcg' in 'struct slabobj_ext'
    4115 |                 if (unlikely(slab_exts[off].objcg))
         |                              ~~~~~~~~~~~~~~ ^
   include/linux/compiler.h:77:42: note: expanded from macro 'unlikely'
      77 | # define unlikely(x)    __builtin_expect(!!(x), 0)
         |                                             ^
   3 warnings and 1 error generated.


vim +4115 mm/slub.c

  4085	
  4086	#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
  4087			      SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
  4088	
  4089	bool kmem_cache_charge(void *objp, gfp_t gfpflags)
  4090	{
  4091		struct slabobj_ext *slab_exts;
  4092		struct kmem_cache *s;
  4093		struct folio *folio;
  4094		struct slab *slab;
  4095		unsigned long off;
  4096	
  4097		if (!memcg_kmem_online())
  4098			return true;
  4099	
  4100		folio = virt_to_folio(objp);
  4101		if (unlikely(!folio_test_slab(folio)))
  4102			return false;
  4103	
  4104		slab = folio_slab(folio);
  4105		s = slab->slab_cache;
  4106	
  4107		/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
  4108		if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC)
  4109			return true;
  4110	
  4111		/* Ignore already charged objects. */
  4112		slab_exts = slab_obj_exts(slab);
  4113		if (slab_exts) {
  4114			off = obj_to_index(s, slab, objp);
> 4115			if (unlikely(slab_exts[off].objcg))
  4116				return true;
  4117		}
  4118	
  4119		return memcg_slab_post_charge(s, objp, gfpflags);
  4120	}
  4121	EXPORT_SYMBOL(kmem_cache_charge);
  4122

Shakeel Butt Aug. 27, 2024, 5:23 p.m. UTC | #4

On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote:
> On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote:
> > At the moment, the slab objects are charged to the memcg at the
> > allocation time. However there are cases where slab objects are
> > allocated at the time where the right target memcg to charge it to is
> > not known. One such case is the network sockets for the incoming
> > connection which are allocated in the softirq context.
> > 
> > Couple hundred thousand connections are very normal on large loaded
> > server and almost all of those sockets underlying those connections get
> > allocated in the softirq context and thus not charged to any memcg.
> > However later at the accept() time we know the right target memcg to
> > charge. Let's add new API to charge already allocated objects, so we can
> > have better accounting of the memory usage.
> > 
> > To measure the performance impact of this change, tcp_crr is used from
> > the neper [1] performance suite. Basically it is a network ping pong
> > test with new connection for each ping pong.
> > 
> > The server and the client are run inside 3 level of cgroup hierarchy
> > using the following commands:
> > 
> > Server:
> >  $ tcp_crr -6
> > 
> > Client:
> >  $ tcp_crr -6 -c -H ${server_ip}
> > 
> > If the client and server run on different machines with 50 GBPS NIC,
> > there is no visible impact of the change.
> > 
> > For the same machine experiment with v6.11-rc5 as base.
> > 
> >           base (throughput)     with-patch
> > tcp_crr   14545 (+- 80)         14463 (+- 56)
> > 
> > It seems like the performance impact is within the noise.
> > 
> > Link: https://github.com/google/neper [1]
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> Hi Shakeel,
> 
> I like the idea and performance numbers look good. However some comments on
> the implementation:
> 

Thanks for taking a look.

> > ---
> > 
> > Changes since the RFC:
> > - Added check for already charged slab objects.
> > - Added performance results from neper's tcp_crr
> > 
> >  include/linux/slab.h            |  1 +
> >  mm/slub.c                       | 54 +++++++++++++++++++++++++++++++++
> >  net/ipv4/inet_connection_sock.c |  5 +--
> >  3 files changed, 58 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index eb2bf4629157..05cfab107c72 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >  			    gfp_t gfpflags) __assume_slab_alignment __malloc;
> >  #define kmem_cache_alloc_lru(...)	alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
> >  
> > +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
> >  
> >  kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> > diff --git a/mm/slub.c b/mm/slub.c
> > index c9d8a2497fd6..580683597b5c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> >  
> >  	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
> >  }
> > +
> > +static __fastpath_inline
> > +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags)
> > +{
> > +	if (likely(!memcg_kmem_online()))
> > +		return true;
> 
> We do have this check in kmem_cache_charge(), why do we need to check it again?
> 

I missed to remove this one. I am going to rearrange the code bit more
in these functions to avoid the build errors in non MEMCG builds.

> > +
> > +	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> > +}
> > +
> >  #else /* CONFIG_MEMCG */
> >  static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
> >  					      struct list_lru *lru,
> > @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> >  					void **p, int objects)
> >  {
> >  }
> > +
> > +static inline bool memcg_slab_post_charge(struct kmem_cache *s,
> > +					  void *p,
> > +					  gfp_t flags)
> > +{
> > +	return true;
> > +}
> >  #endif /* CONFIG_MEMCG */
> >  
> >  /*
> > @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >  }
> >  EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
> >  
> > +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
> > +		      SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
> > +
> > +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> > +{
> > +	struct slabobj_ext *slab_exts;
> > +	struct kmem_cache *s;
> > +	struct folio *folio;
> > +	struct slab *slab;
> > +	unsigned long off;
> > +
> > +	if (!memcg_kmem_online())
> > +		return true;
> > +
> > +	folio = virt_to_folio(objp);
> > +	if (unlikely(!folio_test_slab(folio)))
> > +		return false;
> 
> Does it handle the case of a too-big-to-be-a-slab-object allocation?
> I think it's better to handle it properly. Also, why return false here?
> 

Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
should just follow the kfree() hanlding on !folio_test_slab() i.e. that
the given object is the large or too-big-to-be-a-slab-object.

> > +
> > +	slab = folio_slab(folio);
> > +	s = slab->slab_cache;
> > +
> > +	/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> > +	if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC)
> > +		return true;
> 
> And true here? It seems to be a bit inconsistent.

Will be consistent after handling of the too-big-to-be-a-slab-object.

> Also, if we have this check here, it means your function won't handle kmallocs
> at all? Because !KMALLOC_NORMAL allocations won't get here.

The non-KMALLOC_NORMAL kmalloc caches should also have one of
SLAB_CACHE_DMA, SLAB_ACCOUNT and SLAB_RECLAIM_ACCOUNT flag, so the above
check will only be true for KMALLOC_NORMAL caches.

> 
> > +
> > +	/* Ignore already charged objects. */
> > +	slab_exts = slab_obj_exts(slab);
> > +	if (slab_exts) {
> > +		off = obj_to_index(s, slab, objp);
> > +		if (unlikely(slab_exts[off].objcg))
> > +			return true;
> > +	}
> > +
> > +	return memcg_slab_post_charge(s, objp, gfpflags);
> > +}
> > +EXPORT_SYMBOL(kmem_cache_charge);
> > +
> >  /**
> >   * kmem_cache_alloc_node - Allocate an object on the specified node
> >   * @s: The cache to allocate from.
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 64d07b842e73..3c13ca8c11fb 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
> >  	release_sock(sk);
> >  	if (newsk && mem_cgroup_sockets_enabled) {
> >  		int amt = 0;
> > +		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
> >  
> >  		/* atomically get the memory usage, set and charge the
> >  		 * newsk->sk_memcg.
> > @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
> >  		}
> >  
> >  		if (amt)
> > -			mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> > -						GFP_KERNEL | __GFP_NOFAIL);
> > +			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> > +		kmem_cache_charge(newsk, gfp);
> 
> Wait, so we assume that newsk->sk_memcg === current memcg? Or we're ok with them being
> different?

We set newsk->sk_memcg in the same function (see call to
mem_cgroup_sk_alloc(newsk) couple of lines above). So, the
newsk->sk_memcg will be equal to the current memcg.

Thanks a lot of valuable feedback.
Shakeel

Muchun Song Aug. 28, 2024, 2:36 a.m. UTC | #5

> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> 
> On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote:
>> On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote:
>>> At the moment, the slab objects are charged to the memcg at the
>>> allocation time. However there are cases where slab objects are
>>> allocated at the time where the right target memcg to charge it to is
>>> not known. One such case is the network sockets for the incoming
>>> connection which are allocated in the softirq context.
>>> 
>>> Couple hundred thousand connections are very normal on large loaded
>>> server and almost all of those sockets underlying those connections get
>>> allocated in the softirq context and thus not charged to any memcg.
>>> However later at the accept() time we know the right target memcg to
>>> charge. Let's add new API to charge already allocated objects, so we can
>>> have better accounting of the memory usage.
>>> 
>>> To measure the performance impact of this change, tcp_crr is used from
>>> the neper [1] performance suite. Basically it is a network ping pong
>>> test with new connection for each ping pong.
>>> 
>>> The server and the client are run inside 3 level of cgroup hierarchy
>>> using the following commands:
>>> 
>>> Server:
>>> $ tcp_crr -6
>>> 
>>> Client:
>>> $ tcp_crr -6 -c -H ${server_ip}
>>> 
>>> If the client and server run on different machines with 50 GBPS NIC,
>>> there is no visible impact of the change.
>>> 
>>> For the same machine experiment with v6.11-rc5 as base.
>>> 
>>>         base (throughput)     with-patch
>>> tcp_crr   14545 (+- 80)         14463 (+- 56)
>>> 
>>> It seems like the performance impact is within the noise.
>>> 
>>> Link: https://github.com/google/neper [1]
>>> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>> 
>> Hi Shakeel,
>> 
>> I like the idea and performance numbers look good. However some comments on
>> the implementation:
>> 
> 
> Thanks for taking a look.
> 
>>> ---
>>> 
>>> Changes since the RFC:
>>> - Added check for already charged slab objects.
>>> - Added performance results from neper's tcp_crr
>>> 
>>> include/linux/slab.h            |  1 +
>>> mm/slub.c                       | 54 +++++++++++++++++++++++++++++++++
>>> net/ipv4/inet_connection_sock.c |  5 +--
>>> 3 files changed, 58 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>> index eb2bf4629157..05cfab107c72 100644
>>> --- a/include/linux/slab.h
>>> +++ b/include/linux/slab.h
>>> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>>>    gfp_t gfpflags) __assume_slab_alignment __malloc;
>>> #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>>> 
>>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>>> void kmem_cache_free(struct kmem_cache *s, void *objp);
>>> 
>>> kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index c9d8a2497fd6..580683597b5c 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>>> 
>>> __memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>>> }
>>> +
>>> +static __fastpath_inline
>>> +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags)
>>> +{
>>> + if (likely(!memcg_kmem_online()))
>>> + return true;
>> 
>> We do have this check in kmem_cache_charge(), why do we need to check it again?
>> 
> 
> I missed to remove this one. I am going to rearrange the code bit more
> in these functions to avoid the build errors in non MEMCG builds.
> 
>>> +
>>> + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
>>> +}
>>> +
>>> #else /* CONFIG_MEMCG */
>>> static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>>>      struct list_lru *lru,
>>> @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>>> void **p, int objects)
>>> {
>>> }
>>> +
>>> +static inline bool memcg_slab_post_charge(struct kmem_cache *s,
>>> +   void *p,
>>> +   gfp_t flags)
>>> +{
>>> + return true;
>>> +}
>>> #endif /* CONFIG_MEMCG */
>>> 
>>> /*
>>> @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>>> }
>>> EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>>> 
>>> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
>>> +       SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
>>> +
>>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
>>> +{
>>> + struct slabobj_ext *slab_exts;
>>> + struct kmem_cache *s;
>>> + struct folio *folio;
>>> + struct slab *slab;
>>> + unsigned long off;
>>> +
>>> + if (!memcg_kmem_online())
>>> + return true;
>>> +
>>> + folio = virt_to_folio(objp);
>>> + if (unlikely(!folio_test_slab(folio)))
>>> + return false;
>> 
>> Does it handle the case of a too-big-to-be-a-slab-object allocation?
>> I think it's better to handle it properly. Also, why return false here?
>> 
> 
> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
> should just follow the kfree() hanlding on !folio_test_slab() i.e. that
> the given object is the large or too-big-to-be-a-slab-object.

Hi Shakeel,

If we decide to do this, I suppose you will use memcg_kmem_charge_page
to charge big-object. To be consistent, I suggest renaming kmem_cache_charge
to memcg_kmem_charge to handle both slab object and big-object. And I saw
all the functions related to object charging is moved to memcontrol.c (e.g.
__memcg_slab_post_alloc_hook), so maybe we should also do this for
memcg_kmem_charge?

Muhcun,
Thanks.

Shakeel Butt Aug. 28, 2024, 7:03 p.m. UTC | #6

Hi Muchun,

On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote:
> 
> 
> > On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > 
[...]
> >> 
> >> Does it handle the case of a too-big-to-be-a-slab-object allocation?
> >> I think it's better to handle it properly. Also, why return false here?
> >> 
> > 
> > Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
> > should just follow the kfree() hanlding on !folio_test_slab() i.e. that
> > the given object is the large or too-big-to-be-a-slab-object.
> 
> Hi Shakeel,
> 
> If we decide to do this, I suppose you will use memcg_kmem_charge_page
> to charge big-object. To be consistent, I suggest renaming kmem_cache_charge
> to memcg_kmem_charge to handle both slab object and big-object. And I saw
> all the functions related to object charging is moved to memcontrol.c (e.g.
> __memcg_slab_post_alloc_hook), so maybe we should also do this for
> memcg_kmem_charge?
> 

If I understand you correctly, you are suggesting to handle the general
kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE)
together with memcg_kmem_charge(). However that is not possible due to
slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for
this stat in the general kmem charging path (__memcg_kmem_charge_page in
page allocation code path).

Also this general kmem charging path is used by many other users like
vmalloc, kernel stack and thus we can not just plainly stuck updates to
NR_SLAB_UNRECLAIMABLE_B in that path.

Thanks for taking a look.
Shakeel

Muchun Song Aug. 29, 2024, 2:36 a.m. UTC | #7

> On Aug 29, 2024, at 03:03, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> 
> Hi Muchun,
> 
> On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote:
>> 
>> 
>>> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>> 
> [...]
>>>> 
>>>> Does it handle the case of a too-big-to-be-a-slab-object allocation?
>>>> I think it's better to handle it properly. Also, why return false here?
>>>> 
>>> 
>>> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
>>> should just follow the kfree() hanlding on !folio_test_slab() i.e. that
>>> the given object is the large or too-big-to-be-a-slab-object.
>> 
>> Hi Shakeel,
>> 
>> If we decide to do this, I suppose you will use memcg_kmem_charge_page
>> to charge big-object. To be consistent, I suggest renaming kmem_cache_charge
>> to memcg_kmem_charge to handle both slab object and big-object. And I saw
>> all the functions related to object charging is moved to memcontrol.c (e.g.
>> __memcg_slab_post_alloc_hook), so maybe we should also do this for
>> memcg_kmem_charge?
>> 
> 
> If I understand you correctly, you are suggesting to handle the general
> kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE)
> together with memcg_kmem_charge(). However that is not possible due to
> slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for
> this stat in the general kmem charging path (__memcg_kmem_charge_page in
> page allocation code path).
> 
> Also this general kmem charging path is used by many other users like
> vmalloc, kernel stack and thus we can not just plainly stuck updates to
> NR_SLAB_UNRECLAIMABLE_B in that path.

Sorry, maybe I am not clear . To make sure we are on the same page, let
me clarify my thought. In your v2, I thought if we can rename
kmem_cache_charge() to memcg_kmem_charge() since kmem_cache_charge()
already has handled both big-slab-object (size > KMALLOC_MAX_CACHE_SIZE)
and small-slab-object cases. You know, we have a function of
memcg_kmem_charge_page() which could be used for charging big-slab-object
but not small-slab-object. So I thought maybe memcg_kmem_charge() is a
good name for it to handle both cases. And if we do this, how about moving
this new function to memcontrol.c since all memcg charging functions are
moved to memcontrol.c instead of slub.c.

Muchun,
Thanks.

> 
> Thanks for taking a look.
> Shakeel

Shakeel Butt Aug. 29, 2024, 3:49 p.m. UTC | #8

On Thu, Aug 29, 2024 at 10:36:01AM GMT, Muchun Song wrote:
> 
> 
> > On Aug 29, 2024, at 03:03, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > 
> > Hi Muchun,
> > 
> > On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote:
> >> 
> >> 
> >>> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >>> 
> > [...]
> >>>> 
> >>>> Does it handle the case of a too-big-to-be-a-slab-object allocation?
> >>>> I think it's better to handle it properly. Also, why return false here?
> >>>> 
> >>> 
> >>> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
> >>> should just follow the kfree() hanlding on !folio_test_slab() i.e. that
> >>> the given object is the large or too-big-to-be-a-slab-object.
> >> 
> >> Hi Shakeel,
> >> 
> >> If we decide to do this, I suppose you will use memcg_kmem_charge_page
> >> to charge big-object. To be consistent, I suggest renaming kmem_cache_charge
> >> to memcg_kmem_charge to handle both slab object and big-object. And I saw
> >> all the functions related to object charging is moved to memcontrol.c (e.g.
> >> __memcg_slab_post_alloc_hook), so maybe we should also do this for
> >> memcg_kmem_charge?
> >> 
> > 
> > If I understand you correctly, you are suggesting to handle the general
> > kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE)
> > together with memcg_kmem_charge(). However that is not possible due to
> > slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for
> > this stat in the general kmem charging path (__memcg_kmem_charge_page in
> > page allocation code path).
> > 
> > Also this general kmem charging path is used by many other users like
> > vmalloc, kernel stack and thus we can not just plainly stuck updates to
> > NR_SLAB_UNRECLAIMABLE_B in that path.
> 
> Sorry, maybe I am not clear . To make sure we are on the same page, let
> me clarify my thought. In your v2, I thought if we can rename
> kmem_cache_charge() to memcg_kmem_charge() since kmem_cache_charge()
> already has handled both big-slab-object (size > KMALLOC_MAX_CACHE_SIZE)
> and small-slab-object cases. You know, we have a function of
> memcg_kmem_charge_page() which could be used for charging big-slab-object
> but not small-slab-object. So I thought maybe memcg_kmem_charge() is a
> good name for it to handle both cases. And if we do this, how about moving
> this new function to memcontrol.c since all memcg charging functions are
> moved to memcontrol.c instead of slub.c.
> 

Oh you want the core function to be in memcontrol.c. I don't have any
strong opinion where the code should exist but I do want the interface
to still be kmem_cache_charge() because that is what we are providing to
the users which charging slab objects. Yes some of those might be
big-slab-objects but that is transparent to the users.

Anyways, for now I will go with my current approach but on the followup
will explore and discuss with you on which code should exist in which
file. I hope that is acceptable to you.

thanks,
Shakeel

Muchun Song Aug. 30, 2024, 6:09 a.m. UTC | #9

> On Aug 29, 2024, at 23:49, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> 
> On Thu, Aug 29, 2024 at 10:36:01AM GMT, Muchun Song wrote:
>> 
>> 
>>> On Aug 29, 2024, at 03:03, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>> 
>>> Hi Muchun,
>>> 
>>> On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote:
>>>> 
>>>> 
>>>>> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>>> 
>>> [...]
>>>>>> 
>>>>>> Does it handle the case of a too-big-to-be-a-slab-object allocation?
>>>>>> I think it's better to handle it properly. Also, why return false here?
>>>>>> 
>>>>> 
>>>>> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
>>>>> should just follow the kfree() hanlding on !folio_test_slab() i.e. that
>>>>> the given object is the large or too-big-to-be-a-slab-object.
>>>> 
>>>> Hi Shakeel,
>>>> 
>>>> If we decide to do this, I suppose you will use memcg_kmem_charge_page
>>>> to charge big-object. To be consistent, I suggest renaming kmem_cache_charge
>>>> to memcg_kmem_charge to handle both slab object and big-object. And I saw
>>>> all the functions related to object charging is moved to memcontrol.c (e.g.
>>>> __memcg_slab_post_alloc_hook), so maybe we should also do this for
>>>> memcg_kmem_charge?
>>>> 
>>> 
>>> If I understand you correctly, you are suggesting to handle the general
>>> kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE)
>>> together with memcg_kmem_charge(). However that is not possible due to
>>> slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for
>>> this stat in the general kmem charging path (__memcg_kmem_charge_page in
>>> page allocation code path).
>>> 
>>> Also this general kmem charging path is used by many other users like
>>> vmalloc, kernel stack and thus we can not just plainly stuck updates to
>>> NR_SLAB_UNRECLAIMABLE_B in that path.
>> 
>> Sorry, maybe I am not clear . To make sure we are on the same page, let
>> me clarify my thought. In your v2, I thought if we can rename
>> kmem_cache_charge() to memcg_kmem_charge() since kmem_cache_charge()
>> already has handled both big-slab-object (size > KMALLOC_MAX_CACHE_SIZE)
>> and small-slab-object cases. You know, we have a function of
>> memcg_kmem_charge_page() which could be used for charging big-slab-object
>> but not small-slab-object. So I thought maybe memcg_kmem_charge() is a
>> good name for it to handle both cases. And if we do this, how about moving
>> this new function to memcontrol.c since all memcg charging functions are
>> moved to memcontrol.c instead of slub.c.
>> 
> 
> Oh you want the core function to be in memcontrol.c. I don't have any
> strong opinion where the code should exist but I do want the interface
> to still be kmem_cache_charge() because that is what we are providing to
> the users which charging slab objects. Yes some of those might be
> big-slab-objects but that is transparent to the users.
> 
> Anyways, for now I will go with my current approach but on the followup
> will explore and discuss with you on which code should exist in which
> file. I hope that is acceptable to you.

Fine. No problem.

Thanks.

> 
> thanks,
> Shakeel

[v1] memcg: add charging of already allocated slab objects

Checks

Commit Message

Comments

Patch