Message ID | 20240826232908.4076417-1-shakeel.butt@linux.dev (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v1] memcg: add charging of already allocated slab objects | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote: > At the moment, the slab objects are charged to the memcg at the > allocation time. However there are cases where slab objects are > allocated at the time where the right target memcg to charge it to is > not known. One such case is the network sockets for the incoming > connection which are allocated in the softirq context. > > Couple hundred thousand connections are very normal on large loaded > server and almost all of those sockets underlying those connections get > allocated in the softirq context and thus not charged to any memcg. > However later at the accept() time we know the right target memcg to > charge. Let's add new API to charge already allocated objects, so we can > have better accounting of the memory usage. > > To measure the performance impact of this change, tcp_crr is used from > the neper [1] performance suite. Basically it is a network ping pong > test with new connection for each ping pong. > > The server and the client are run inside 3 level of cgroup hierarchy > using the following commands: > > Server: > $ tcp_crr -6 > > Client: > $ tcp_crr -6 -c -H ${server_ip} > > If the client and server run on different machines with 50 GBPS NIC, > there is no visible impact of the change. > > For the same machine experiment with v6.11-rc5 as base. > > base (throughput) with-patch > tcp_crr 14545 (+- 80) 14463 (+- 56) > > It seems like the performance impact is within the noise. > > Link: https://github.com/google/neper [1] > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Hi Shakeel, I like the idea and performance numbers look good. However some comments on the implementation: > --- > > Changes since the RFC: > - Added check for already charged slab objects. > - Added performance results from neper's tcp_crr > > include/linux/slab.h | 1 + > mm/slub.c | 54 +++++++++++++++++++++++++++++++++ > net/ipv4/inet_connection_sock.c | 5 +-- > 3 files changed, 58 insertions(+), 2 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index eb2bf4629157..05cfab107c72 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > gfp_t gfpflags) __assume_slab_alignment __malloc; > #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) > > +bool kmem_cache_charge(void *objp, gfp_t gfpflags); > void kmem_cache_free(struct kmem_cache *s, void *objp); > > kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, > diff --git a/mm/slub.c b/mm/slub.c > index c9d8a2497fd6..580683597b5c 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, > > __memcg_slab_free_hook(s, slab, p, objects, obj_exts); > } > + > +static __fastpath_inline > +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags) > +{ > + if (likely(!memcg_kmem_online())) > + return true; We do have this check in kmem_cache_charge(), why do we need to check it again? > + > + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); > +} > + > #else /* CONFIG_MEMCG */ > static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, > struct list_lru *lru, > @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > void **p, int objects) > { > } > + > +static inline bool memcg_slab_post_charge(struct kmem_cache *s, > + void *p, > + gfp_t flags) > +{ > + return true; > +} > #endif /* CONFIG_MEMCG */ > > /* > @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > } > EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); > > +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ > + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) > + > +bool kmem_cache_charge(void *objp, gfp_t gfpflags) > +{ > + struct slabobj_ext *slab_exts; > + struct kmem_cache *s; > + struct folio *folio; > + struct slab *slab; > + unsigned long off; > + > + if (!memcg_kmem_online()) > + return true; > + > + folio = virt_to_folio(objp); > + if (unlikely(!folio_test_slab(folio))) > + return false; Does it handle the case of a too-big-to-be-a-slab-object allocation? I think it's better to handle it properly. Also, why return false here? > + > + slab = folio_slab(folio); > + s = slab->slab_cache; > + > + /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ > + if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC) > + return true; And true here? It seems to be a bit inconsistent. Also, if we have this check here, it means your function won't handle kmallocs at all? Because !KMALLOC_NORMAL allocations won't get here. > + > + /* Ignore already charged objects. */ > + slab_exts = slab_obj_exts(slab); > + if (slab_exts) { > + off = obj_to_index(s, slab, objp); > + if (unlikely(slab_exts[off].objcg)) > + return true; > + } > + > + return memcg_slab_post_charge(s, objp, gfpflags); > +} > +EXPORT_SYMBOL(kmem_cache_charge); > + > /** > * kmem_cache_alloc_node - Allocate an object on the specified node > * @s: The cache to allocate from. > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index 64d07b842e73..3c13ca8c11fb 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > release_sock(sk); > if (newsk && mem_cgroup_sockets_enabled) { > int amt = 0; > + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL; > > /* atomically get the memory usage, set and charge the > * newsk->sk_memcg. > @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > } > > if (amt) > - mem_cgroup_charge_skmem(newsk->sk_memcg, amt, > - GFP_KERNEL | __GFP_NOFAIL); > + mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp); > + kmem_cache_charge(newsk, gfp); Wait, so we assume that newsk->sk_memcg === current memcg? Or we're ok with them being different? Thanks!
Hi Shakeel, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.11-rc5 next-20240827] [cannot apply to vbabka-slab/for-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shakeel-Butt/memcg-add-charging-of-already-allocated-slab-objects/20240827-073150 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20240826232908.4076417-1-shakeel.butt%40linux.dev patch subject: [PATCH v1] memcg: add charging of already allocated slab objects config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20240827/202408272111.T6bMZmU9-lkp@intel.com/config) compiler: or1k-linux-gcc (GCC) 14.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408272111.T6bMZmU9-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408272111.T6bMZmU9-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from include/asm-generic/bug.h:5, from arch/openrisc/include/asm/bug.h:5, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:6, from mm/slub.c:13: mm/slub.c: In function 'kmem_cache_charge': >> mm/slub.c:4115:44: error: 'struct slabobj_ext' has no member named 'objcg' 4115 | if (unlikely(slab_exts[off].objcg)) | ^ include/linux/compiler.h:77:45: note: in definition of macro 'unlikely' 77 | # define unlikely(x) __builtin_expect(!!(x), 0) | ^ vim +4115 mm/slub.c 4085 4086 #define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ 4087 SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) 4088 4089 bool kmem_cache_charge(void *objp, gfp_t gfpflags) 4090 { 4091 struct slabobj_ext *slab_exts; 4092 struct kmem_cache *s; 4093 struct folio *folio; 4094 struct slab *slab; 4095 unsigned long off; 4096 4097 if (!memcg_kmem_online()) 4098 return true; 4099 4100 folio = virt_to_folio(objp); 4101 if (unlikely(!folio_test_slab(folio))) 4102 return false; 4103 4104 slab = folio_slab(folio); 4105 s = slab->slab_cache; 4106 4107 /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ 4108 if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC) 4109 return true; 4110 4111 /* Ignore already charged objects. */ 4112 slab_exts = slab_obj_exts(slab); 4113 if (slab_exts) { 4114 off = obj_to_index(s, slab, objp); > 4115 if (unlikely(slab_exts[off].objcg)) 4116 return true; 4117 } 4118 4119 return memcg_slab_post_charge(s, objp, gfpflags); 4120 } 4121 EXPORT_SYMBOL(kmem_cache_charge); 4122
Hi Shakeel, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.11-rc5 next-20240827] [cannot apply to vbabka-slab/for-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shakeel-Butt/memcg-add-charging-of-already-allocated-slab-objects/20240827-073150 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20240826232908.4076417-1-shakeel.butt%40linux.dev patch subject: [PATCH v1] memcg: add charging of already allocated slab objects config: s390-allnoconfig (https://download.01.org/0day-ci/archive/20240827/202408272341.k4cl3jz0-lkp@intel.com/config) compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 08e5a1de8227512d4774a534b91cb2353cef6284) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408272341.k4cl3jz0-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408272341.k4cl3jz0-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from mm/slub.c:13: In file included from include/linux/mm.h:2198: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from mm/slub.c:49: In file included from mm/internal.h:13: include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~ ^ ~~~ include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 49 | NR_ZONE_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~~~~~~ ^ ~~~ >> mm/slub.c:4115:31: error: no member named 'objcg' in 'struct slabobj_ext' 4115 | if (unlikely(slab_exts[off].objcg)) | ~~~~~~~~~~~~~~ ^ include/linux/compiler.h:77:42: note: expanded from macro 'unlikely' 77 | # define unlikely(x) __builtin_expect(!!(x), 0) | ^ 3 warnings and 1 error generated. vim +4115 mm/slub.c 4085 4086 #define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ 4087 SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) 4088 4089 bool kmem_cache_charge(void *objp, gfp_t gfpflags) 4090 { 4091 struct slabobj_ext *slab_exts; 4092 struct kmem_cache *s; 4093 struct folio *folio; 4094 struct slab *slab; 4095 unsigned long off; 4096 4097 if (!memcg_kmem_online()) 4098 return true; 4099 4100 folio = virt_to_folio(objp); 4101 if (unlikely(!folio_test_slab(folio))) 4102 return false; 4103 4104 slab = folio_slab(folio); 4105 s = slab->slab_cache; 4106 4107 /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ 4108 if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC) 4109 return true; 4110 4111 /* Ignore already charged objects. */ 4112 slab_exts = slab_obj_exts(slab); 4113 if (slab_exts) { 4114 off = obj_to_index(s, slab, objp); > 4115 if (unlikely(slab_exts[off].objcg)) 4116 return true; 4117 } 4118 4119 return memcg_slab_post_charge(s, objp, gfpflags); 4120 } 4121 EXPORT_SYMBOL(kmem_cache_charge); 4122
On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote: > On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote: > > At the moment, the slab objects are charged to the memcg at the > > allocation time. However there are cases where slab objects are > > allocated at the time where the right target memcg to charge it to is > > not known. One such case is the network sockets for the incoming > > connection which are allocated in the softirq context. > > > > Couple hundred thousand connections are very normal on large loaded > > server and almost all of those sockets underlying those connections get > > allocated in the softirq context and thus not charged to any memcg. > > However later at the accept() time we know the right target memcg to > > charge. Let's add new API to charge already allocated objects, so we can > > have better accounting of the memory usage. > > > > To measure the performance impact of this change, tcp_crr is used from > > the neper [1] performance suite. Basically it is a network ping pong > > test with new connection for each ping pong. > > > > The server and the client are run inside 3 level of cgroup hierarchy > > using the following commands: > > > > Server: > > $ tcp_crr -6 > > > > Client: > > $ tcp_crr -6 -c -H ${server_ip} > > > > If the client and server run on different machines with 50 GBPS NIC, > > there is no visible impact of the change. > > > > For the same machine experiment with v6.11-rc5 as base. > > > > base (throughput) with-patch > > tcp_crr 14545 (+- 80) 14463 (+- 56) > > > > It seems like the performance impact is within the noise. > > > > Link: https://github.com/google/neper [1] > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > > Hi Shakeel, > > I like the idea and performance numbers look good. However some comments on > the implementation: > Thanks for taking a look. > > --- > > > > Changes since the RFC: > > - Added check for already charged slab objects. > > - Added performance results from neper's tcp_crr > > > > include/linux/slab.h | 1 + > > mm/slub.c | 54 +++++++++++++++++++++++++++++++++ > > net/ipv4/inet_connection_sock.c | 5 +-- > > 3 files changed, 58 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > index eb2bf4629157..05cfab107c72 100644 > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > > gfp_t gfpflags) __assume_slab_alignment __malloc; > > #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) > > > > +bool kmem_cache_charge(void *objp, gfp_t gfpflags); > > void kmem_cache_free(struct kmem_cache *s, void *objp); > > > > kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, > > diff --git a/mm/slub.c b/mm/slub.c > > index c9d8a2497fd6..580683597b5c 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, > > > > __memcg_slab_free_hook(s, slab, p, objects, obj_exts); > > } > > + > > +static __fastpath_inline > > +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags) > > +{ > > + if (likely(!memcg_kmem_online())) > > + return true; > > We do have this check in kmem_cache_charge(), why do we need to check it again? > I missed to remove this one. I am going to rearrange the code bit more in these functions to avoid the build errors in non MEMCG builds. > > + > > + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); > > +} > > + > > #else /* CONFIG_MEMCG */ > > static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, > > struct list_lru *lru, > > @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > > void **p, int objects) > > { > > } > > + > > +static inline bool memcg_slab_post_charge(struct kmem_cache *s, > > + void *p, > > + gfp_t flags) > > +{ > > + return true; > > +} > > #endif /* CONFIG_MEMCG */ > > > > /* > > @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > > } > > EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); > > > > +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ > > + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) > > + > > +bool kmem_cache_charge(void *objp, gfp_t gfpflags) > > +{ > > + struct slabobj_ext *slab_exts; > > + struct kmem_cache *s; > > + struct folio *folio; > > + struct slab *slab; > > + unsigned long off; > > + > > + if (!memcg_kmem_online()) > > + return true; > > + > > + folio = virt_to_folio(objp); > > + if (unlikely(!folio_test_slab(folio))) > > + return false; > > Does it handle the case of a too-big-to-be-a-slab-object allocation? > I think it's better to handle it properly. Also, why return false here? > Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I should just follow the kfree() hanlding on !folio_test_slab() i.e. that the given object is the large or too-big-to-be-a-slab-object. > > + > > + slab = folio_slab(folio); > > + s = slab->slab_cache; > > + > > + /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ > > + if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC) > > + return true; > > And true here? It seems to be a bit inconsistent. Will be consistent after handling of the too-big-to-be-a-slab-object. > Also, if we have this check here, it means your function won't handle kmallocs > at all? Because !KMALLOC_NORMAL allocations won't get here. The non-KMALLOC_NORMAL kmalloc caches should also have one of SLAB_CACHE_DMA, SLAB_ACCOUNT and SLAB_RECLAIM_ACCOUNT flag, so the above check will only be true for KMALLOC_NORMAL caches. > > > + > > + /* Ignore already charged objects. */ > > + slab_exts = slab_obj_exts(slab); > > + if (slab_exts) { > > + off = obj_to_index(s, slab, objp); > > + if (unlikely(slab_exts[off].objcg)) > > + return true; > > + } > > + > > + return memcg_slab_post_charge(s, objp, gfpflags); > > +} > > +EXPORT_SYMBOL(kmem_cache_charge); > > + > > /** > > * kmem_cache_alloc_node - Allocate an object on the specified node > > * @s: The cache to allocate from. > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index 64d07b842e73..3c13ca8c11fb 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > > release_sock(sk); > > if (newsk && mem_cgroup_sockets_enabled) { > > int amt = 0; > > + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL; > > > > /* atomically get the memory usage, set and charge the > > * newsk->sk_memcg. > > @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > > } > > > > if (amt) > > - mem_cgroup_charge_skmem(newsk->sk_memcg, amt, > > - GFP_KERNEL | __GFP_NOFAIL); > > + mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp); > > + kmem_cache_charge(newsk, gfp); > > Wait, so we assume that newsk->sk_memcg === current memcg? Or we're ok with them being > different? We set newsk->sk_memcg in the same function (see call to mem_cgroup_sk_alloc(newsk) couple of lines above). So, the newsk->sk_memcg will be equal to the current memcg. Thanks a lot of valuable feedback. Shakeel
> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote: > > On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote: >> On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote: >>> At the moment, the slab objects are charged to the memcg at the >>> allocation time. However there are cases where slab objects are >>> allocated at the time where the right target memcg to charge it to is >>> not known. One such case is the network sockets for the incoming >>> connection which are allocated in the softirq context. >>> >>> Couple hundred thousand connections are very normal on large loaded >>> server and almost all of those sockets underlying those connections get >>> allocated in the softirq context and thus not charged to any memcg. >>> However later at the accept() time we know the right target memcg to >>> charge. Let's add new API to charge already allocated objects, so we can >>> have better accounting of the memory usage. >>> >>> To measure the performance impact of this change, tcp_crr is used from >>> the neper [1] performance suite. Basically it is a network ping pong >>> test with new connection for each ping pong. >>> >>> The server and the client are run inside 3 level of cgroup hierarchy >>> using the following commands: >>> >>> Server: >>> $ tcp_crr -6 >>> >>> Client: >>> $ tcp_crr -6 -c -H ${server_ip} >>> >>> If the client and server run on different machines with 50 GBPS NIC, >>> there is no visible impact of the change. >>> >>> For the same machine experiment with v6.11-rc5 as base. >>> >>> base (throughput) with-patch >>> tcp_crr 14545 (+- 80) 14463 (+- 56) >>> >>> It seems like the performance impact is within the noise. >>> >>> Link: https://github.com/google/neper [1] >>> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> >> >> Hi Shakeel, >> >> I like the idea and performance numbers look good. However some comments on >> the implementation: >> > > Thanks for taking a look. > >>> --- >>> >>> Changes since the RFC: >>> - Added check for already charged slab objects. >>> - Added performance results from neper's tcp_crr >>> >>> include/linux/slab.h | 1 + >>> mm/slub.c | 54 +++++++++++++++++++++++++++++++++ >>> net/ipv4/inet_connection_sock.c | 5 +-- >>> 3 files changed, 58 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/slab.h b/include/linux/slab.h >>> index eb2bf4629157..05cfab107c72 100644 >>> --- a/include/linux/slab.h >>> +++ b/include/linux/slab.h >>> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, >>> gfp_t gfpflags) __assume_slab_alignment __malloc; >>> #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) >>> >>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags); >>> void kmem_cache_free(struct kmem_cache *s, void *objp); >>> >>> kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, >>> diff --git a/mm/slub.c b/mm/slub.c >>> index c9d8a2497fd6..580683597b5c 100644 >>> --- a/mm/slub.c >>> +++ b/mm/slub.c >>> @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, >>> >>> __memcg_slab_free_hook(s, slab, p, objects, obj_exts); >>> } >>> + >>> +static __fastpath_inline >>> +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags) >>> +{ >>> + if (likely(!memcg_kmem_online())) >>> + return true; >> >> We do have this check in kmem_cache_charge(), why do we need to check it again? >> > > I missed to remove this one. I am going to rearrange the code bit more > in these functions to avoid the build errors in non MEMCG builds. > >>> + >>> + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); >>> +} >>> + >>> #else /* CONFIG_MEMCG */ >>> static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, >>> struct list_lru *lru, >>> @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, >>> void **p, int objects) >>> { >>> } >>> + >>> +static inline bool memcg_slab_post_charge(struct kmem_cache *s, >>> + void *p, >>> + gfp_t flags) >>> +{ >>> + return true; >>> +} >>> #endif /* CONFIG_MEMCG */ >>> >>> /* >>> @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, >>> } >>> EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); >>> >>> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ >>> + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) >>> + >>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags) >>> +{ >>> + struct slabobj_ext *slab_exts; >>> + struct kmem_cache *s; >>> + struct folio *folio; >>> + struct slab *slab; >>> + unsigned long off; >>> + >>> + if (!memcg_kmem_online()) >>> + return true; >>> + >>> + folio = virt_to_folio(objp); >>> + if (unlikely(!folio_test_slab(folio))) >>> + return false; >> >> Does it handle the case of a too-big-to-be-a-slab-object allocation? >> I think it's better to handle it properly. Also, why return false here? >> > > Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I > should just follow the kfree() hanlding on !folio_test_slab() i.e. that > the given object is the large or too-big-to-be-a-slab-object. Hi Shakeel, If we decide to do this, I suppose you will use memcg_kmem_charge_page to charge big-object. To be consistent, I suggest renaming kmem_cache_charge to memcg_kmem_charge to handle both slab object and big-object. And I saw all the functions related to object charging is moved to memcontrol.c (e.g. __memcg_slab_post_alloc_hook), so maybe we should also do this for memcg_kmem_charge? Muhcun, Thanks.
Hi Muchun, On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote: > > > > On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote: > > [...] > >> > >> Does it handle the case of a too-big-to-be-a-slab-object allocation? > >> I think it's better to handle it properly. Also, why return false here? > >> > > > > Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I > > should just follow the kfree() hanlding on !folio_test_slab() i.e. that > > the given object is the large or too-big-to-be-a-slab-object. > > Hi Shakeel, > > If we decide to do this, I suppose you will use memcg_kmem_charge_page > to charge big-object. To be consistent, I suggest renaming kmem_cache_charge > to memcg_kmem_charge to handle both slab object and big-object. And I saw > all the functions related to object charging is moved to memcontrol.c (e.g. > __memcg_slab_post_alloc_hook), so maybe we should also do this for > memcg_kmem_charge? > If I understand you correctly, you are suggesting to handle the general kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE) together with memcg_kmem_charge(). However that is not possible due to slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for this stat in the general kmem charging path (__memcg_kmem_charge_page in page allocation code path). Also this general kmem charging path is used by many other users like vmalloc, kernel stack and thus we can not just plainly stuck updates to NR_SLAB_UNRECLAIMABLE_B in that path. Thanks for taking a look. Shakeel
> On Aug 29, 2024, at 03:03, Shakeel Butt <shakeel.butt@linux.dev> wrote: > > Hi Muchun, > > On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote: >> >> >>> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote: >>> > [...] >>>> >>>> Does it handle the case of a too-big-to-be-a-slab-object allocation? >>>> I think it's better to handle it properly. Also, why return false here? >>>> >>> >>> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I >>> should just follow the kfree() hanlding on !folio_test_slab() i.e. that >>> the given object is the large or too-big-to-be-a-slab-object. >> >> Hi Shakeel, >> >> If we decide to do this, I suppose you will use memcg_kmem_charge_page >> to charge big-object. To be consistent, I suggest renaming kmem_cache_charge >> to memcg_kmem_charge to handle both slab object and big-object. And I saw >> all the functions related to object charging is moved to memcontrol.c (e.g. >> __memcg_slab_post_alloc_hook), so maybe we should also do this for >> memcg_kmem_charge? >> > > If I understand you correctly, you are suggesting to handle the general > kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE) > together with memcg_kmem_charge(). However that is not possible due to > slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for > this stat in the general kmem charging path (__memcg_kmem_charge_page in > page allocation code path). > > Also this general kmem charging path is used by many other users like > vmalloc, kernel stack and thus we can not just plainly stuck updates to > NR_SLAB_UNRECLAIMABLE_B in that path. Sorry, maybe I am not clear . To make sure we are on the same page, let me clarify my thought. In your v2, I thought if we can rename kmem_cache_charge() to memcg_kmem_charge() since kmem_cache_charge() already has handled both big-slab-object (size > KMALLOC_MAX_CACHE_SIZE) and small-slab-object cases. You know, we have a function of memcg_kmem_charge_page() which could be used for charging big-slab-object but not small-slab-object. So I thought maybe memcg_kmem_charge() is a good name for it to handle both cases. And if we do this, how about moving this new function to memcontrol.c since all memcg charging functions are moved to memcontrol.c instead of slub.c. Muchun, Thanks. > > Thanks for taking a look. > Shakeel
On Thu, Aug 29, 2024 at 10:36:01AM GMT, Muchun Song wrote: > > > > On Aug 29, 2024, at 03:03, Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > Hi Muchun, > > > > On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote: > >> > >> > >>> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote: > >>> > > [...] > >>>> > >>>> Does it handle the case of a too-big-to-be-a-slab-object allocation? > >>>> I think it's better to handle it properly. Also, why return false here? > >>>> > >>> > >>> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I > >>> should just follow the kfree() hanlding on !folio_test_slab() i.e. that > >>> the given object is the large or too-big-to-be-a-slab-object. > >> > >> Hi Shakeel, > >> > >> If we decide to do this, I suppose you will use memcg_kmem_charge_page > >> to charge big-object. To be consistent, I suggest renaming kmem_cache_charge > >> to memcg_kmem_charge to handle both slab object and big-object. And I saw > >> all the functions related to object charging is moved to memcontrol.c (e.g. > >> __memcg_slab_post_alloc_hook), so maybe we should also do this for > >> memcg_kmem_charge? > >> > > > > If I understand you correctly, you are suggesting to handle the general > > kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE) > > together with memcg_kmem_charge(). However that is not possible due to > > slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for > > this stat in the general kmem charging path (__memcg_kmem_charge_page in > > page allocation code path). > > > > Also this general kmem charging path is used by many other users like > > vmalloc, kernel stack and thus we can not just plainly stuck updates to > > NR_SLAB_UNRECLAIMABLE_B in that path. > > Sorry, maybe I am not clear . To make sure we are on the same page, let > me clarify my thought. In your v2, I thought if we can rename > kmem_cache_charge() to memcg_kmem_charge() since kmem_cache_charge() > already has handled both big-slab-object (size > KMALLOC_MAX_CACHE_SIZE) > and small-slab-object cases. You know, we have a function of > memcg_kmem_charge_page() which could be used for charging big-slab-object > but not small-slab-object. So I thought maybe memcg_kmem_charge() is a > good name for it to handle both cases. And if we do this, how about moving > this new function to memcontrol.c since all memcg charging functions are > moved to memcontrol.c instead of slub.c. > Oh you want the core function to be in memcontrol.c. I don't have any strong opinion where the code should exist but I do want the interface to still be kmem_cache_charge() because that is what we are providing to the users which charging slab objects. Yes some of those might be big-slab-objects but that is transparent to the users. Anyways, for now I will go with my current approach but on the followup will explore and discuss with you on which code should exist in which file. I hope that is acceptable to you. thanks, Shakeel
> On Aug 29, 2024, at 23:49, Shakeel Butt <shakeel.butt@linux.dev> wrote: > > On Thu, Aug 29, 2024 at 10:36:01AM GMT, Muchun Song wrote: >> >> >>> On Aug 29, 2024, at 03:03, Shakeel Butt <shakeel.butt@linux.dev> wrote: >>> >>> Hi Muchun, >>> >>> On Wed, Aug 28, 2024 at 10:36:06AM GMT, Muchun Song wrote: >>>> >>>> >>>>> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@linux.dev> wrote: >>>>> >>> [...] >>>>>> >>>>>> Does it handle the case of a too-big-to-be-a-slab-object allocation? >>>>>> I think it's better to handle it properly. Also, why return false here? >>>>>> >>>>> >>>>> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I >>>>> should just follow the kfree() hanlding on !folio_test_slab() i.e. that >>>>> the given object is the large or too-big-to-be-a-slab-object. >>>> >>>> Hi Shakeel, >>>> >>>> If we decide to do this, I suppose you will use memcg_kmem_charge_page >>>> to charge big-object. To be consistent, I suggest renaming kmem_cache_charge >>>> to memcg_kmem_charge to handle both slab object and big-object. And I saw >>>> all the functions related to object charging is moved to memcontrol.c (e.g. >>>> __memcg_slab_post_alloc_hook), so maybe we should also do this for >>>> memcg_kmem_charge? >>>> >>> >>> If I understand you correctly, you are suggesting to handle the general >>> kmem charging and slab's large kmalloc (size > KMALLOC_MAX_CACHE_SIZE) >>> together with memcg_kmem_charge(). However that is not possible due to >>> slab path updating NR_SLAB_UNRECLAIMABLE_B stats while no updates for >>> this stat in the general kmem charging path (__memcg_kmem_charge_page in >>> page allocation code path). >>> >>> Also this general kmem charging path is used by many other users like >>> vmalloc, kernel stack and thus we can not just plainly stuck updates to >>> NR_SLAB_UNRECLAIMABLE_B in that path. >> >> Sorry, maybe I am not clear . To make sure we are on the same page, let >> me clarify my thought. In your v2, I thought if we can rename >> kmem_cache_charge() to memcg_kmem_charge() since kmem_cache_charge() >> already has handled both big-slab-object (size > KMALLOC_MAX_CACHE_SIZE) >> and small-slab-object cases. You know, we have a function of >> memcg_kmem_charge_page() which could be used for charging big-slab-object >> but not small-slab-object. So I thought maybe memcg_kmem_charge() is a >> good name for it to handle both cases. And if we do this, how about moving >> this new function to memcontrol.c since all memcg charging functions are >> moved to memcontrol.c instead of slub.c. >> > > Oh you want the core function to be in memcontrol.c. I don't have any > strong opinion where the code should exist but I do want the interface > to still be kmem_cache_charge() because that is what we are providing to > the users which charging slab objects. Yes some of those might be > big-slab-objects but that is transparent to the users. > > Anyways, for now I will go with my current approach but on the followup > will explore and discuss with you on which code should exist in which > file. I hope that is acceptable to you. Fine. No problem. Thanks. > > thanks, > Shakeel
diff --git a/include/linux/slab.h b/include/linux/slab.h index eb2bf4629157..05cfab107c72 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags) __assume_slab_alignment __malloc; #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) +bool kmem_cache_charge(void *objp, gfp_t gfpflags); void kmem_cache_free(struct kmem_cache *s, void *objp); kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, diff --git a/mm/slub.c b/mm/slub.c index c9d8a2497fd6..580683597b5c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, __memcg_slab_free_hook(s, slab, p, objects, obj_exts); } + +static __fastpath_inline +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags) +{ + if (likely(!memcg_kmem_online())) + return true; + + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); +} + #else /* CONFIG_MEMCG */ static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru, @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects) { } + +static inline bool memcg_slab_post_charge(struct kmem_cache *s, + void *p, + gfp_t flags) +{ + return true; +} #endif /* CONFIG_MEMCG */ /* @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, } EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) + +bool kmem_cache_charge(void *objp, gfp_t gfpflags) +{ + struct slabobj_ext *slab_exts; + struct kmem_cache *s; + struct folio *folio; + struct slab *slab; + unsigned long off; + + if (!memcg_kmem_online()) + return true; + + folio = virt_to_folio(objp); + if (unlikely(!folio_test_slab(folio))) + return false; + + slab = folio_slab(folio); + s = slab->slab_cache; + + /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ + if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC) + return true; + + /* Ignore already charged objects. */ + slab_exts = slab_obj_exts(slab); + if (slab_exts) { + off = obj_to_index(s, slab, objp); + if (unlikely(slab_exts[off].objcg)) + return true; + } + + return memcg_slab_post_charge(s, objp, gfpflags); +} +EXPORT_SYMBOL(kmem_cache_charge); + /** * kmem_cache_alloc_node - Allocate an object on the specified node * @s: The cache to allocate from. diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 64d07b842e73..3c13ca8c11fb 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) release_sock(sk); if (newsk && mem_cgroup_sockets_enabled) { int amt = 0; + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL; /* atomically get the memory usage, set and charge the * newsk->sk_memcg. @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) } if (amt) - mem_cgroup_charge_skmem(newsk->sk_memcg, amt, - GFP_KERNEL | __GFP_NOFAIL); + mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp); + kmem_cache_charge(newsk, gfp); release_sock(newsk); }
At the moment, the slab objects are charged to the memcg at the allocation time. However there are cases where slab objects are allocated at the time where the right target memcg to charge it to is not known. One such case is the network sockets for the incoming connection which are allocated in the softirq context. Couple hundred thousand connections are very normal on large loaded server and almost all of those sockets underlying those connections get allocated in the softirq context and thus not charged to any memcg. However later at the accept() time we know the right target memcg to charge. Let's add new API to charge already allocated objects, so we can have better accounting of the memory usage. To measure the performance impact of this change, tcp_crr is used from the neper [1] performance suite. Basically it is a network ping pong test with new connection for each ping pong. The server and the client are run inside 3 level of cgroup hierarchy using the following commands: Server: $ tcp_crr -6 Client: $ tcp_crr -6 -c -H ${server_ip} If the client and server run on different machines with 50 GBPS NIC, there is no visible impact of the change. For the same machine experiment with v6.11-rc5 as base. base (throughput) with-patch tcp_crr 14545 (+- 80) 14463 (+- 56) It seems like the performance impact is within the noise. Link: https://github.com/google/neper [1] Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> --- Changes since the RFC: - Added check for already charged slab objects. - Added performance results from neper's tcp_crr include/linux/slab.h | 1 + mm/slub.c | 54 +++++++++++++++++++++++++++++++++ net/ipv4/inet_connection_sock.c | 5 +-- 3 files changed, 58 insertions(+), 2 deletions(-)