diff mbox series

[v4] memcg: add charging of already allocated slab objects

Message ID 20240905173422.1565480-1-shakeel.butt@linux.dev (mailing list archive)
State New
Headers show
Series [v4] memcg: add charging of already allocated slab objects | expand

Commit Message

Shakeel Butt Sept. 5, 2024, 5:34 p.m. UTC
At the moment, the slab objects are charged to the memcg at the
allocation time. However there are cases where slab objects are
allocated at the time where the right target memcg to charge it to is
not known. One such case is the network sockets for the incoming
connection which are allocated in the softirq context.

Couple hundred thousand connections are very normal on large loaded
server and almost all of those sockets underlying those connections get
allocated in the softirq context and thus not charged to any memcg.
However later at the accept() time we know the right target memcg to
charge. Let's add new API to charge already allocated objects, so we can
have better accounting of the memory usage.

To measure the performance impact of this change, tcp_crr is used from
the neper [1] performance suite. Basically it is a network ping pong
test with new connection for each ping pong.

The server and the client are run inside 3 level of cgroup hierarchy
using the following commands:

Server:
 $ tcp_crr -6

Client:
 $ tcp_crr -6 -c -H ${server_ip}

If the client and server run on different machines with 50 GBPS NIC,
there is no visible impact of the change.

For the same machine experiment with v6.11-rc5 as base.

          base (throughput)     with-patch
tcp_crr   14545 (+- 80)         14463 (+- 56)

It seems like the performance impact is within the noise.

Link: https://github.com/google/neper [1]
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
---
v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/
Changes since v3:
- Add kernel doc for kmem_cache_charge.

v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/
Change since v2:
- Add handling of already charged large kmalloc objects.
- Move the normal kmalloc cache check into a function.

v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
Changes since v1:
- Correctly handle large allocations which bypass slab
- Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds

RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
Changes since the RFC:
- Added check for already charged slab objects.
- Added performance results from neper's tcp_crr


 include/linux/slab.h            | 20 ++++++++++++++
 mm/slab.h                       |  7 +++++
 mm/slub.c                       | 49 +++++++++++++++++++++++++++++++++
 net/ipv4/inet_connection_sock.c |  5 ++--
 4 files changed, 79 insertions(+), 2 deletions(-)

Comments

Yosry Ahmed Sept. 5, 2024, 5:48 p.m. UTC | #1
On Thu, Sep 5, 2024 at 10:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> At the moment, the slab objects are charged to the memcg at the
> allocation time. However there are cases where slab objects are
> allocated at the time where the right target memcg to charge it to is
> not known. One such case is the network sockets for the incoming
> connection which are allocated in the softirq context.
>
> Couple hundred thousand connections are very normal on large loaded
> server and almost all of those sockets underlying those connections get
> allocated in the softirq context and thus not charged to any memcg.
> However later at the accept() time we know the right target memcg to
> charge. Let's add new API to charge already allocated objects, so we can
> have better accounting of the memory usage.
>
> To measure the performance impact of this change, tcp_crr is used from
> the neper [1] performance suite. Basically it is a network ping pong
> test with new connection for each ping pong.
>
> The server and the client are run inside 3 level of cgroup hierarchy
> using the following commands:
>
> Server:
>  $ tcp_crr -6
>
> Client:
>  $ tcp_crr -6 -c -H ${server_ip}
>
> If the client and server run on different machines with 50 GBPS NIC,
> there is no visible impact of the change.
>
> For the same machine experiment with v6.11-rc5 as base.
>
>           base (throughput)     with-patch
> tcp_crr   14545 (+- 80)         14463 (+- 56)
>
> It seems like the performance impact is within the noise.
>
> Link: https://github.com/google/neper [1]
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

LGTM from an MM perspective with a few nits below. FWIW:
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>

> ---
> v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/
> Changes since v3:
> - Add kernel doc for kmem_cache_charge.
>
> v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/
> Change since v2:
> - Add handling of already charged large kmalloc objects.
> - Move the normal kmalloc cache check into a function.
>
> v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
> Changes since v1:
> - Correctly handle large allocations which bypass slab
> - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
>
> RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
> Changes since the RFC:
> - Added check for already charged slab objects.
> - Added performance results from neper's tcp_crr
>
>
>  include/linux/slab.h            | 20 ++++++++++++++
>  mm/slab.h                       |  7 +++++
>  mm/slub.c                       | 49 +++++++++++++++++++++++++++++++++
>  net/ipv4/inet_connection_sock.c |  5 ++--
>  4 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index eb2bf4629157..68789c79a530 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>                             gfp_t gfpflags) __assume_slab_alignment __malloc;
>  #define kmem_cache_alloc_lru(...)      alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>
> +/**
> + * kmem_cache_charge - memcg charge an already allocated slab memory
> + * @objp: address of the slab object to memcg charge.
> + * @gfpflags: describe the allocation context
> + *
> + * kmem_cache_charge is the normal method to charge a slab object to the current
> + * memcg. The objp should be pointer returned by the slab allocator functions
> + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller

s/controller/controlled

> + * through gfpflags parameter.
> + *
> + * There are several cases where it will return true regardless. More
> + * specifically:
> + *
> + * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
> + * 2. Already charged slab objects.
> + * 3. For slab objects from KMALLOC_NORMAL caches.
> + *
> + * Return: true if charge was successful otherwise false.
> + */
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>  void kmem_cache_free(struct kmem_cache *s, void *objp);
>
>  kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> diff --git a/mm/slab.h b/mm/slab.h
> index dcdb56b8e7f5..9f907e930609 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -443,6 +443,13 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
>         return (s->flags & SLAB_KMALLOC);
>  }
>
> +static inline bool is_kmalloc_normal(struct kmem_cache *s)
> +{
> +       if (!is_kmalloc_cache(s))
> +               return false;
> +       return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
> +}
> +
>  /* Legal flag mask for kmem_cache_create(), for various configurations */
>  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>                          SLAB_CACHE_DMA32 | SLAB_PANIC | \
> diff --git a/mm/slub.c b/mm/slub.c
> index c9d8a2497fd6..3f2a89f7a23a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2185,6 +2185,41 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>
>         __memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>  }
> +
> +static __fastpath_inline
> +bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +       struct slabobj_ext *slab_exts;
> +       struct kmem_cache *s;
> +       struct folio *folio;
> +       struct slab *slab;
> +       unsigned long off;
> +
> +       folio = virt_to_folio(p);
> +       if (!folio_test_slab(folio)) {
> +               return folio_memcg_kmem(folio) ||

If the folio is charged user memory, we will still double charge here,
but that would be a bug. We can put a warning in this case or use
folio_memcg() instead to avoid double charges in that case as well.

> +                       (__memcg_kmem_charge_page(folio_page(folio, 0), flags,
> +                                                 folio_order(folio)) == 0);
> +       }
> +
> +       slab = folio_slab(folio);
> +       s = slab->slab_cache;
> +
> +       /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */

Is it possible to point to the commit that has the explanation here?
The one you pointed me to before? Otherwise it's not really obvious
where the circular dependency comes from (at least to me).

> +       if (is_kmalloc_normal(s))
> +               return true;
> +
> +       /* Ignore already charged objects. */
> +       slab_exts = slab_obj_exts(slab);
> +       if (slab_exts) {
> +               off = obj_to_index(s, slab, p);
> +               if (unlikely(slab_exts[off].objcg))
> +                       return true;
> +       }
> +
> +       return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> +}
> +
>  #else /* CONFIG_MEMCG */
>  static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>                                               struct list_lru *lru,
> @@ -2198,6 +2233,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>                                         void **p, int objects)
>  {
>  }
> +
> +static inline bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +       return true;
> +}
>  #endif /* CONFIG_MEMCG */
>
>  /*
> @@ -4062,6 +4102,15 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> +{
> +       if (!memcg_kmem_online())
> +               return true;
> +
> +       return memcg_slab_post_charge(objp, gfpflags);
> +}
> +EXPORT_SYMBOL(kmem_cache_charge);
> +
>  /**
>   * kmem_cache_alloc_node - Allocate an object on the specified node
>   * @s: The cache to allocate from.
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 64d07b842e73..3c13ca8c11fb 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>         release_sock(sk);
>         if (newsk && mem_cgroup_sockets_enabled) {
>                 int amt = 0;
> +               gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>
>                 /* atomically get the memory usage, set and charge the
>                  * newsk->sk_memcg.
> @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>                 }
>
>                 if (amt)
> -                       mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> -                                               GFP_KERNEL | __GFP_NOFAIL);
> +                       mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> +               kmem_cache_charge(newsk, gfp);
>
>                 release_sock(newsk);
>         }
> --
> 2.43.5
>
>
Shakeel Butt Sept. 5, 2024, 6:48 p.m. UTC | #2
On Thu, Sep 05, 2024 at 10:48:50AM GMT, Yosry Ahmed wrote:
> On Thu, Sep 5, 2024 at 10:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > At the moment, the slab objects are charged to the memcg at the
> > allocation time. However there are cases where slab objects are
> > allocated at the time where the right target memcg to charge it to is
> > not known. One such case is the network sockets for the incoming
> > connection which are allocated in the softirq context.
> >
> > Couple hundred thousand connections are very normal on large loaded
> > server and almost all of those sockets underlying those connections get
> > allocated in the softirq context and thus not charged to any memcg.
> > However later at the accept() time we know the right target memcg to
> > charge. Let's add new API to charge already allocated objects, so we can
> > have better accounting of the memory usage.
> >
> > To measure the performance impact of this change, tcp_crr is used from
> > the neper [1] performance suite. Basically it is a network ping pong
> > test with new connection for each ping pong.
> >
> > The server and the client are run inside 3 level of cgroup hierarchy
> > using the following commands:
> >
> > Server:
> >  $ tcp_crr -6
> >
> > Client:
> >  $ tcp_crr -6 -c -H ${server_ip}
> >
> > If the client and server run on different machines with 50 GBPS NIC,
> > there is no visible impact of the change.
> >
> > For the same machine experiment with v6.11-rc5 as base.
> >
> >           base (throughput)     with-patch
> > tcp_crr   14545 (+- 80)         14463 (+- 56)
> >
> > It seems like the performance impact is within the noise.
> >
> > Link: https://github.com/google/neper [1]
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> 
> LGTM from an MM perspective with a few nits below. FWIW:
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>

Thanks.

> 
> > ---
> > v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/
> > Changes since v3:
> > - Add kernel doc for kmem_cache_charge.
> >
> > v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/
> > Change since v2:
> > - Add handling of already charged large kmalloc objects.
> > - Move the normal kmalloc cache check into a function.
> >
> > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
> > Changes since v1:
> > - Correctly handle large allocations which bypass slab
> > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
> >
> > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
> > Changes since the RFC:
> > - Added check for already charged slab objects.
> > - Added performance results from neper's tcp_crr
> >
> >
> >  include/linux/slab.h            | 20 ++++++++++++++
> >  mm/slab.h                       |  7 +++++
> >  mm/slub.c                       | 49 +++++++++++++++++++++++++++++++++
> >  net/ipv4/inet_connection_sock.c |  5 ++--
> >  4 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index eb2bf4629157..68789c79a530 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >                             gfp_t gfpflags) __assume_slab_alignment __malloc;
> >  #define kmem_cache_alloc_lru(...)      alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
> >
> > +/**
> > + * kmem_cache_charge - memcg charge an already allocated slab memory
> > + * @objp: address of the slab object to memcg charge.
> > + * @gfpflags: describe the allocation context
> > + *
> > + * kmem_cache_charge is the normal method to charge a slab object to the current
> > + * memcg. The objp should be pointer returned by the slab allocator functions
> > + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller
> 
> s/controller/controlled

Thanks. Vlastimil please fix this when you pick this up.

> 
> > + * through gfpflags parameter.
> > + *
> > + * There are several cases where it will return true regardless. More
> > + * specifically:
> > + *
> > + * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
> > + * 2. Already charged slab objects.
> > + * 3. For slab objects from KMALLOC_NORMAL caches.
> > + *
> > + * Return: true if charge was successful otherwise false.
> > + */
> > +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
> >
> >  kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> > diff --git a/mm/slab.h b/mm/slab.h
> > index dcdb56b8e7f5..9f907e930609 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -443,6 +443,13 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
> >         return (s->flags & SLAB_KMALLOC);
> >  }
> >
> > +static inline bool is_kmalloc_normal(struct kmem_cache *s)
> > +{
> > +       if (!is_kmalloc_cache(s))
> > +               return false;
> > +       return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
> > +}
> > +
> >  /* Legal flag mask for kmem_cache_create(), for various configurations */
> >  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
> >                          SLAB_CACHE_DMA32 | SLAB_PANIC | \
> > diff --git a/mm/slub.c b/mm/slub.c
> > index c9d8a2497fd6..3f2a89f7a23a 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2185,6 +2185,41 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> >
> >         __memcg_slab_free_hook(s, slab, p, objects, obj_exts);
> >  }
> > +
> > +static __fastpath_inline
> > +bool memcg_slab_post_charge(void *p, gfp_t flags)
> > +{
> > +       struct slabobj_ext *slab_exts;
> > +       struct kmem_cache *s;
> > +       struct folio *folio;
> > +       struct slab *slab;
> > +       unsigned long off;
> > +
> > +       folio = virt_to_folio(p);
> > +       if (!folio_test_slab(folio)) {
> > +               return folio_memcg_kmem(folio) ||
> 
> If the folio is charged user memory, we will still double charge here,
> but that would be a bug. We can put a warning in this case or use
> folio_memcg() instead to avoid double charges in that case as well.
>

I don't think we need to do anything for such scenarios similar to how
other kmem function handles them. For example passing user memory to
kfree() will treat it similar to this and there is no warning as well.

> > +                       (__memcg_kmem_charge_page(folio_page(folio, 0), flags,
> > +                                                 folio_order(folio)) == 0);
> > +       }
> > +
> > +       slab = folio_slab(folio);
> > +       s = slab->slab_cache;
> > +
> > +       /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> 
> Is it possible to point to the commit that has the explanation here?
> The one you pointed me to before? Otherwise it's not really obvious
> where the circular dependency comes from (at least to me).
> 

Not sure about the commit reference. We can add more text here.
Vlastimil, how much detail do you prefer?

thanks,
Shakeel
Vlastimil Babka Sept. 6, 2024, 8:52 a.m. UTC | #3
On 9/5/24 20:48, Shakeel Butt wrote:
>> > ---
>> > v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/
>> > Changes since v3:
>> > - Add kernel doc for kmem_cache_charge.
>> >
>> > v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/
>> > Change since v2:
>> > - Add handling of already charged large kmalloc objects.
>> > - Move the normal kmalloc cache check into a function.
>> >
>> > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
>> > Changes since v1:
>> > - Correctly handle large allocations which bypass slab
>> > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
>> >
>> > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
>> > Changes since the RFC:
>> > - Added check for already charged slab objects.
>> > - Added performance results from neper's tcp_crr
>> >
>> >
>> >  include/linux/slab.h            | 20 ++++++++++++++
>> >  mm/slab.h                       |  7 +++++
>> >  mm/slub.c                       | 49 +++++++++++++++++++++++++++++++++
>> >  net/ipv4/inet_connection_sock.c |  5 ++--
>> >  4 files changed, 79 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/include/linux/slab.h b/include/linux/slab.h
>> > index eb2bf4629157..68789c79a530 100644
>> > --- a/include/linux/slab.h
>> > +++ b/include/linux/slab.h
>> > @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>> >                             gfp_t gfpflags) __assume_slab_alignment __malloc;
>> >  #define kmem_cache_alloc_lru(...)      alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>> >
>> > +/**
>> > + * kmem_cache_charge - memcg charge an already allocated slab memory
>> > + * @objp: address of the slab object to memcg charge.
>> > + * @gfpflags: describe the allocation context
>> > + *
>> > + * kmem_cache_charge is the normal method to charge a slab object to the current

what is "normal method"? 

>> > + * memcg. The objp should be pointer returned by the slab allocator functions
>> > + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller
>> 
>> s/controller/controlled
> 
> Thanks. Vlastimil please fix this when you pick this up.

I felt it could be improved more, so ended up with this. Thoughts?

/**
 * kmem_cache_charge - memcg charge an already allocated slab memory
 * @objp: address of the slab object to memcg charge
 * @gfpflags: describe the allocation context
 *
 * kmem_cache_charge allows charging a slab object to the current memcg,
 * primarily in cases where charging at allocation time might not be possible
 * because the target memcg is not known (i.e. softirq context)
 *
 * The objp should be pointer returned by the slab allocator functions like
 * kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge
 * behavior can be controlled through gfpflags parameter, which affects how the
 * necessary internal metadata can be allocated. Including __GFP_NOFAIL denotes
 * that overcharging is requested instead of failure, but is not applied for the
 * internal metadata allocation.
 *
 * There are several cases where it will return true even if the charging was
 * not done:
 * More specifically:
 *
 * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
 * 2. Already charged slab objects.
 * 3. For slab objects from KMALLOC_NORMAL caches - allocated by kmalloc()
 *    without __GFP_ACCOUNT
 * 4. Allocating internal metadata has failed
 *
 * Return: true if charge was successful otherwise false.
 */
 
>> > +
>> > +       /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
>> 
>> Is it possible to point to the commit that has the explanation here?
>> The one you pointed me to before? Otherwise it's not really obvious
>> where the circular dependency comes from (at least to me).
>> 
> 
> Not sure about the commit reference. We can add more text here.
> Vlastimil, how much detail do you prefer?

What about:

        /*
         * Ignore KMALLOC_NORMAL cache to avoid possible circular dependency
         * of slab_obj_exts being allocated from the same slab and thus the slab
         * becoming effectively unfreeable.
         */

 
> thanks,
> Shakeel
Shakeel Butt Sept. 6, 2024, 4:03 p.m. UTC | #4
On Fri, Sep 06, 2024 at 10:52:04AM GMT, Vlastimil Babka wrote:
> On 9/5/24 20:48, Shakeel Butt wrote:
> >> > ---
> >> > v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/
> >> > Changes since v3:
> >> > - Add kernel doc for kmem_cache_charge.
> >> >
> >> > v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/
> >> > Change since v2:
> >> > - Add handling of already charged large kmalloc objects.
> >> > - Move the normal kmalloc cache check into a function.
> >> >
> >> > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
> >> > Changes since v1:
> >> > - Correctly handle large allocations which bypass slab
> >> > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
> >> >
> >> > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
> >> > Changes since the RFC:
> >> > - Added check for already charged slab objects.
> >> > - Added performance results from neper's tcp_crr
> >> >
> >> >
> >> >  include/linux/slab.h            | 20 ++++++++++++++
> >> >  mm/slab.h                       |  7 +++++
> >> >  mm/slub.c                       | 49 +++++++++++++++++++++++++++++++++
> >> >  net/ipv4/inet_connection_sock.c |  5 ++--
> >> >  4 files changed, 79 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> >> > index eb2bf4629157..68789c79a530 100644
> >> > --- a/include/linux/slab.h
> >> > +++ b/include/linux/slab.h
> >> > @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >> >                             gfp_t gfpflags) __assume_slab_alignment __malloc;
> >> >  #define kmem_cache_alloc_lru(...)      alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
> >> >
> >> > +/**
> >> > + * kmem_cache_charge - memcg charge an already allocated slab memory
> >> > + * @objp: address of the slab object to memcg charge.
> >> > + * @gfpflags: describe the allocation context
> >> > + *
> >> > + * kmem_cache_charge is the normal method to charge a slab object to the current
> 
> what is "normal method"? 

This is just a copy-paste from kmalloc() documentation.

> 
> >> > + * memcg. The objp should be pointer returned by the slab allocator functions
> >> > + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller
> >> 
> >> s/controller/controlled
> > 
> > Thanks. Vlastimil please fix this when you pick this up.
> 
> I felt it could be improved more, so ended up with this. Thoughts?
> 
> /**
>  * kmem_cache_charge - memcg charge an already allocated slab memory
>  * @objp: address of the slab object to memcg charge
>  * @gfpflags: describe the allocation context
>  *
>  * kmem_cache_charge allows charging a slab object to the current memcg,
>  * primarily in cases where charging at allocation time might not be possible
>  * because the target memcg is not known (i.e. softirq context)
>  *
>  * The objp should be pointer returned by the slab allocator functions like
>  * kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge
>  * behavior can be controlled through gfpflags parameter, which affects how the
>  * necessary internal metadata can be allocated. Including __GFP_NOFAIL denotes
>  * that overcharging is requested instead of failure, but is not applied for the
>  * internal metadata allocation.
>  *
>  * There are several cases where it will return true even if the charging was
>  * not done:
>  * More specifically:
>  *
>  * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
>  * 2. Already charged slab objects.
>  * 3. For slab objects from KMALLOC_NORMAL caches - allocated by kmalloc()
>  *    without __GFP_ACCOUNT
>  * 4. Allocating internal metadata has failed
>  *
>  * Return: true if charge was successful otherwise false.
>  */
>  

Yes, this is much better.

> >> > +
> >> > +       /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> >> 
> >> Is it possible to point to the commit that has the explanation here?
> >> The one you pointed me to before? Otherwise it's not really obvious
> >> where the circular dependency comes from (at least to me).
> >> 
> > 
> > Not sure about the commit reference. We can add more text here.
> > Vlastimil, how much detail do you prefer?
> 
> What about:
> 
>         /*
>          * Ignore KMALLOC_NORMAL cache to avoid possible circular dependency
>          * of slab_obj_exts being allocated from the same slab and thus the slab
>          * becoming effectively unfreeable.
>          */
> 

Looks great to me.

thanks,
Shakeel
Yosry Ahmed Sept. 6, 2024, 5:19 p.m. UTC | #5
[..]
> I felt it could be improved more, so ended up with this. Thoughts?
>
> /**
>  * kmem_cache_charge - memcg charge an already allocated slab memory
>  * @objp: address of the slab object to memcg charge
>  * @gfpflags: describe the allocation context
>  *
>  * kmem_cache_charge allows charging a slab object to the current memcg,
>  * primarily in cases where charging at allocation time might not be possible
>  * because the target memcg is not known (i.e. softirq context)
>  *
>  * The objp should be pointer returned by the slab allocator functions like
>  * kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge

Aren't allocations done with kmalloc(__GFP_ACCOUNT) already accounted?
Why would we need to call kmem_cache_charge() for those?

I am assuming what you are referring to is kmalloc() allocations that
are not fulfilled from KMALLOC_NORMAL caches, but I am not sure how to
capture this here.

>  * behavior can be controlled through gfpflags parameter, which affects how the
>  * necessary internal metadata can be allocated. Including __GFP_NOFAIL denotes
>  * that overcharging is requested instead of failure, but is not applied for the
>  * internal metadata allocation.
>  *
>  * There are several cases where it will return true even if the charging was
>  * not done:
>  * More specifically:
>  *
>  * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
>  * 2. Already charged slab objects.
>  * 3. For slab objects from KMALLOC_NORMAL caches - allocated by kmalloc()
>  *    without __GFP_ACCOUNT
>  * 4. Allocating internal metadata has failed
>  *
>  * Return: true if charge was successful otherwise false.
>  */
>
> >> > +
> >> > +       /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> >>
> >> Is it possible to point to the commit that has the explanation here?
> >> The one you pointed me to before? Otherwise it's not really obvious
> >> where the circular dependency comes from (at least to me).
> >>
> >
> > Not sure about the commit reference. We can add more text here.
> > Vlastimil, how much detail do you prefer?
>
> What about:
>
>         /*
>          * Ignore KMALLOC_NORMAL cache to avoid possible circular dependency
>          * of slab_obj_exts being allocated from the same slab and thus the slab
>          * becoming effectively unfreeable.
>          */
>
>
> > thanks,
> > Shakeel
>
Vlastimil Babka Sept. 6, 2024, 5:28 p.m. UTC | #6
On 9/6/24 19:19, Yosry Ahmed wrote:
> [..]
>> I felt it could be improved more, so ended up with this. Thoughts?
>>
>> /**
>>  * kmem_cache_charge - memcg charge an already allocated slab memory
>>  * @objp: address of the slab object to memcg charge
>>  * @gfpflags: describe the allocation context
>>  *
>>  * kmem_cache_charge allows charging a slab object to the current memcg,
>>  * primarily in cases where charging at allocation time might not be possible
>>  * because the target memcg is not known (i.e. softirq context)
>>  *
>>  * The objp should be pointer returned by the slab allocator functions like
>>  * kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge
> 
> Aren't allocations done with kmalloc(__GFP_ACCOUNT) already accounted?
> Why would we need to call kmem_cache_charge() for those?

AFAIU current_obj_cgroup() returns NULL because we're in the interrupt
context and no remote memcg context has been set. Thus the charging is
skipped. The patch commit log describes such scenario for network receive.
But in case of kmalloc() the allocation must have been still attempted with
__GFP_ACCOUNT so a kmalloc-cg cache is used even if the charging fails.

If there's another usage for kmem_cache_charge() where the memcg is
available but we don't want to charge immediately on purpose (such as the
Linus' idea for struct file), we might need to find another way to tell
kmalloc() to use the kmalloc-cg cache but not charge immediately...

> I am assuming what you are referring to is kmalloc() allocations that
> are not fulfilled from KMALLOC_NORMAL caches, but I am not sure how to
> capture this here.
> 
>>  * behavior can be controlled through gfpflags parameter, which affects how the
>>  * necessary internal metadata can be allocated. Including __GFP_NOFAIL denotes
>>  * that overcharging is requested instead of failure, but is not applied for the
>>  * internal metadata allocation.
>>  *
>>  * There are several cases where it will return true even if the charging was
>>  * not done:
>>  * More specifically:
>>  *
>>  * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
>>  * 2. Already charged slab objects.
>>  * 3. For slab objects from KMALLOC_NORMAL caches - allocated by kmalloc()
>>  *    without __GFP_ACCOUNT
>>  * 4. Allocating internal metadata has failed
>>  *
>>  * Return: true if charge was successful otherwise false.
>>  */
>>
>> >> > +
>> >> > +       /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
>> >>
>> >> Is it possible to point to the commit that has the explanation here?
>> >> The one you pointed me to before? Otherwise it's not really obvious
>> >> where the circular dependency comes from (at least to me).
>> >>
>> >
>> > Not sure about the commit reference. We can add more text here.
>> > Vlastimil, how much detail do you prefer?
>>
>> What about:
>>
>>         /*
>>          * Ignore KMALLOC_NORMAL cache to avoid possible circular dependency
>>          * of slab_obj_exts being allocated from the same slab and thus the slab
>>          * becoming effectively unfreeable.
>>          */
>>
>>
>> > thanks,
>> > Shakeel
>>
Yosry Ahmed Sept. 6, 2024, 5:38 p.m. UTC | #7
On Fri, Sep 6, 2024 at 10:29 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/6/24 19:19, Yosry Ahmed wrote:
> > [..]
> >> I felt it could be improved more, so ended up with this. Thoughts?
> >>
> >> /**
> >>  * kmem_cache_charge - memcg charge an already allocated slab memory
> >>  * @objp: address of the slab object to memcg charge
> >>  * @gfpflags: describe the allocation context
> >>  *
> >>  * kmem_cache_charge allows charging a slab object to the current memcg,
> >>  * primarily in cases where charging at allocation time might not be possible
> >>  * because the target memcg is not known (i.e. softirq context)
> >>  *
> >>  * The objp should be pointer returned by the slab allocator functions like
> >>  * kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge
> >
> > Aren't allocations done with kmalloc(__GFP_ACCOUNT) already accounted?
> > Why would we need to call kmem_cache_charge() for those?
>
> AFAIU current_obj_cgroup() returns NULL because we're in the interrupt
> context and no remote memcg context has been set. Thus the charging is
> skipped. The patch commit log describes such scenario for network receive.

Oh yeah I missed that part. I thought the networking allocations in
interrupt context are made without __GFP_ACCOUNT to begin with.

> But in case of kmalloc() the allocation must have been still attempted with
> __GFP_ACCOUNT so a kmalloc-cg cache is used even if the charging fails.

It is still possible that the initial allocation did not have
__GFP_ACCOUNT, but not from a KMALLOC_NORMAL cache (e.g. KMALLOC_DMA
or KMALLOC_RECLAIM). In this case kmem_cache_charge() should still
work, right?

>
> If there's another usage for kmem_cache_charge() where the memcg is
> available but we don't want to charge immediately on purpose (such as the
> Linus' idea for struct file), we might need to find another way to tell
> kmalloc() to use the kmalloc-cg cache but not charge immediately...

Can we just use a dedicated kmem_cache for this instead?
Shakeel Butt Sept. 6, 2024, 7:04 p.m. UTC | #8
On Fri, Sep 06, 2024 at 07:28:56PM GMT, Vlastimil Babka wrote:
> On 9/6/24 19:19, Yosry Ahmed wrote:
> > [..]
> >> I felt it could be improved more, so ended up with this. Thoughts?
> >>
> >> /**
> >>  * kmem_cache_charge - memcg charge an already allocated slab memory
> >>  * @objp: address of the slab object to memcg charge
> >>  * @gfpflags: describe the allocation context
> >>  *
> >>  * kmem_cache_charge allows charging a slab object to the current memcg,
> >>  * primarily in cases where charging at allocation time might not be possible
> >>  * because the target memcg is not known (i.e. softirq context)
> >>  *
> >>  * The objp should be pointer returned by the slab allocator functions like
> >>  * kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge
> > 
> > Aren't allocations done with kmalloc(__GFP_ACCOUNT) already accounted?
> > Why would we need to call kmem_cache_charge() for those?
> 
> AFAIU current_obj_cgroup() returns NULL because we're in the interrupt
> context and no remote memcg context has been set. Thus the charging is
> skipped. The patch commit log describes such scenario for network receive.
> But in case of kmalloc() the allocation must have been still attempted with
> __GFP_ACCOUNT so a kmalloc-cg cache is used even if the charging fails.
> 
> If there's another usage for kmem_cache_charge() where the memcg is
> available but we don't want to charge immediately on purpose (such as the
> Linus' idea for struct file), we might need to find another way to tell
> kmalloc() to use the kmalloc-cg cache but not charge immediately...
> 

For the struct file, we already have a dedicated kmem_cache
(filp_cachep), so no additional handling would be needed. However in
future we might have cases where we want the kmalloc allocations to
happen from non-normal kmalloc caches and then we can add mechanism to
support such cases.
Vlastimil Babka Sept. 9, 2024, 7:59 a.m. UTC | #9
On 9/6/24 19:38, Yosry Ahmed wrote:
>> But in case of kmalloc() the allocation must have been still attempted with
>> __GFP_ACCOUNT so a kmalloc-cg cache is used even if the charging fails.
> 
> It is still possible that the initial allocation did not have
> __GFP_ACCOUNT, but not from a KMALLOC_NORMAL cache (e.g. KMALLOC_DMA
> or KMALLOC_RECLAIM). In this case kmem_cache_charge() should still
> work, right?

Yeah it would work, but that's rather a corner case implementation detail so
it's better to just require __GFP_ACCOUNT for kmalloc() in the comment.

>>
>> If there's another usage for kmem_cache_charge() where the memcg is
>> available but we don't want to charge immediately on purpose (such as the
>> Linus' idea for struct file), we might need to find another way to tell
>> kmalloc() to use the kmalloc-cg cache but not charge immediately...
> 
> Can we just use a dedicated kmem_cache for this instead?

Right, as Shakeel mentioned, that would be the case of struct file. If all
such cases in the future are fine with dedicated cache (i.e. not variable
sized allocations), it should be fine.
Yosry Ahmed Sept. 9, 2024, 5:20 p.m. UTC | #10
On Mon, Sep 9, 2024 at 12:59 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/6/24 19:38, Yosry Ahmed wrote:
> >> But in case of kmalloc() the allocation must have been still attempted with
> >> __GFP_ACCOUNT so a kmalloc-cg cache is used even if the charging fails.
> >
> > It is still possible that the initial allocation did not have
> > __GFP_ACCOUNT, but not from a KMALLOC_NORMAL cache (e.g. KMALLOC_DMA
> > or KMALLOC_RECLAIM). In this case kmem_cache_charge() should still
> > work, right?
>
> Yeah it would work, but that's rather a corner case implementation detail so
> it's better to just require __GFP_ACCOUNT for kmalloc() in the comment.

Fair enough, thanks!
Paolo Abeni Sept. 10, 2024, 8:26 a.m. UTC | #11
On 9/5/24 19:34, Shakeel Butt wrote:
> At the moment, the slab objects are charged to the memcg at the
> allocation time. However there are cases where slab objects are
> allocated at the time where the right target memcg to charge it to is
> not known. One such case is the network sockets for the incoming
> connection which are allocated in the softirq context.
> 
> Couple hundred thousand connections are very normal on large loaded
> server and almost all of those sockets underlying those connections get
> allocated in the softirq context and thus not charged to any memcg.
> However later at the accept() time we know the right target memcg to
> charge. Let's add new API to charge already allocated objects, so we can
> have better accounting of the memory usage.
> 
> To measure the performance impact of this change, tcp_crr is used from
> the neper [1] performance suite. Basically it is a network ping pong
> test with new connection for each ping pong.
> 
> The server and the client are run inside 3 level of cgroup hierarchy
> using the following commands:
> 
> Server:
>   $ tcp_crr -6
> 
> Client:
>   $ tcp_crr -6 -c -H ${server_ip}
> 
> If the client and server run on different machines with 50 GBPS NIC,
> there is no visible impact of the change.
> 
> For the same machine experiment with v6.11-rc5 as base.
> 
>            base (throughput)     with-patch
> tcp_crr   14545 (+- 80)         14463 (+- 56)
> 
> It seems like the performance impact is within the noise.
> 
> Link: https://github.com/google/neper [1]
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
> v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/
> Changes since v3:
> - Add kernel doc for kmem_cache_charge.
> 
> v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/
> Change since v2:
> - Add handling of already charged large kmalloc objects.
> - Move the normal kmalloc cache check into a function.
> 
> v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
> Changes since v1:
> - Correctly handle large allocations which bypass slab
> - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
> 
> RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
> Changes since the RFC:
> - Added check for already charged slab objects.
> - Added performance results from neper's tcp_crr
> 
> 
>   include/linux/slab.h            | 20 ++++++++++++++
>   mm/slab.h                       |  7 +++++
>   mm/slub.c                       | 49 +++++++++++++++++++++++++++++++++
>   net/ipv4/inet_connection_sock.c |  5 ++--
>   4 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index eb2bf4629157..68789c79a530 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>   			    gfp_t gfpflags) __assume_slab_alignment __malloc;
>   #define kmem_cache_alloc_lru(...)	alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>   
> +/**
> + * kmem_cache_charge - memcg charge an already allocated slab memory
> + * @objp: address of the slab object to memcg charge.
> + * @gfpflags: describe the allocation context
> + *
> + * kmem_cache_charge is the normal method to charge a slab object to the current
> + * memcg. The objp should be pointer returned by the slab allocator functions
> + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller
> + * through gfpflags parameter.
> + *
> + * There are several cases where it will return true regardless. More
> + * specifically:
> + *
> + * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
> + * 2. Already charged slab objects.
> + * 3. For slab objects from KMALLOC_NORMAL caches.
> + *
> + * Return: true if charge was successful otherwise false.
> + */
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>   void kmem_cache_free(struct kmem_cache *s, void *objp);
>   
>   kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> diff --git a/mm/slab.h b/mm/slab.h
> index dcdb56b8e7f5..9f907e930609 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -443,6 +443,13 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
>   	return (s->flags & SLAB_KMALLOC);
>   }
>   
> +static inline bool is_kmalloc_normal(struct kmem_cache *s)
> +{
> +	if (!is_kmalloc_cache(s))
> +		return false;
> +	return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
> +}
> +
>   /* Legal flag mask for kmem_cache_create(), for various configurations */
>   #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>   			 SLAB_CACHE_DMA32 | SLAB_PANIC | \
> diff --git a/mm/slub.c b/mm/slub.c
> index c9d8a2497fd6..3f2a89f7a23a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2185,6 +2185,41 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>   
>   	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>   }
> +
> +static __fastpath_inline
> +bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +	struct slabobj_ext *slab_exts;
> +	struct kmem_cache *s;
> +	struct folio *folio;
> +	struct slab *slab;
> +	unsigned long off;
> +
> +	folio = virt_to_folio(p);
> +	if (!folio_test_slab(folio)) {
> +		return folio_memcg_kmem(folio) ||
> +			(__memcg_kmem_charge_page(folio_page(folio, 0), flags,
> +						  folio_order(folio)) == 0);
> +	}
> +
> +	slab = folio_slab(folio);
> +	s = slab->slab_cache;
> +
> +	/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> +	if (is_kmalloc_normal(s))
> +		return true;
> +
> +	/* Ignore already charged objects. */
> +	slab_exts = slab_obj_exts(slab);
> +	if (slab_exts) {
> +		off = obj_to_index(s, slab, p);
> +		if (unlikely(slab_exts[off].objcg))
> +			return true;
> +	}
> +
> +	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> +}
> +
>   #else /* CONFIG_MEMCG */
>   static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>   					      struct list_lru *lru,
> @@ -2198,6 +2233,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>   					void **p, int objects)
>   {
>   }
> +
> +static inline bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +	return true;
> +}
>   #endif /* CONFIG_MEMCG */
>   
>   /*
> @@ -4062,6 +4102,15 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>   }
>   EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>   
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> +{
> +	if (!memcg_kmem_online())
> +		return true;
> +
> +	return memcg_slab_post_charge(objp, gfpflags);
> +}
> +EXPORT_SYMBOL(kmem_cache_charge);
> +
>   /**
>    * kmem_cache_alloc_node - Allocate an object on the specified node
>    * @s: The cache to allocate from.
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 64d07b842e73..3c13ca8c11fb 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>   	release_sock(sk);
>   	if (newsk && mem_cgroup_sockets_enabled) {
>   		int amt = 0;
> +		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>   
>   		/* atomically get the memory usage, set and charge the
>   		 * newsk->sk_memcg.
> @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>   		}
>   
>   		if (amt)
> -			mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> -						GFP_KERNEL | __GFP_NOFAIL);
> +			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> +		kmem_cache_charge(newsk, gfp);
>   
>   		release_sock(newsk);
>   	}

The networking bits looks sane to me - with a very minor nit about the 
reverse xmas tree order in variables declaration above.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Vlastimil Babka Sept. 10, 2024, 9:19 a.m. UTC | #12
On 9/10/24 10:26, Paolo Abeni wrote:
> On 9/5/24 19:34, Shakeel Butt wrote:
>>    * kmem_cache_alloc_node - Allocate an object on the specified node
>>    * @s: The cache to allocate from.
>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>> index 64d07b842e73..3c13ca8c11fb 100644
>> --- a/net/ipv4/inet_connection_sock.c
>> +++ b/net/ipv4/inet_connection_sock.c
>> @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>>   	release_sock(sk);
>>   	if (newsk && mem_cgroup_sockets_enabled) {
>>   		int amt = 0;
>> +		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>>   
>>   		/* atomically get the memory usage, set and charge the
>>   		 * newsk->sk_memcg.
>> @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>>   		}
>>   
>>   		if (amt)
>> -			mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
>> -						GFP_KERNEL | __GFP_NOFAIL);
>> +			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
>> +		kmem_cache_charge(newsk, gfp);
>>   
>>   		release_sock(newsk);
>>   	}
> 
> The networking bits looks sane to me - with a very minor nit about the 
> reverse xmas tree order in variables declaration above.
> 
> Acked-by: Paolo Abeni <pabeni@redhat.com>

Great, thanks, I will adjust the ordering.
diff mbox series

Patch

diff --git a/include/linux/slab.h b/include/linux/slab.h
index eb2bf4629157..68789c79a530 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -547,6 +547,26 @@  void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
 			    gfp_t gfpflags) __assume_slab_alignment __malloc;
 #define kmem_cache_alloc_lru(...)	alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
 
+/**
+ * kmem_cache_charge - memcg charge an already allocated slab memory
+ * @objp: address of the slab object to memcg charge.
+ * @gfpflags: describe the allocation context
+ *
+ * kmem_cache_charge is the normal method to charge a slab object to the current
+ * memcg. The objp should be pointer returned by the slab allocator functions
+ * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller
+ * through gfpflags parameter.
+ *
+ * There are several cases where it will return true regardless. More
+ * specifically:
+ *
+ * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
+ * 2. Already charged slab objects.
+ * 3. For slab objects from KMALLOC_NORMAL caches.
+ *
+ * Return: true if charge was successful otherwise false.
+ */
+bool kmem_cache_charge(void *objp, gfp_t gfpflags);
 void kmem_cache_free(struct kmem_cache *s, void *objp);
 
 kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
diff --git a/mm/slab.h b/mm/slab.h
index dcdb56b8e7f5..9f907e930609 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -443,6 +443,13 @@  static inline bool is_kmalloc_cache(struct kmem_cache *s)
 	return (s->flags & SLAB_KMALLOC);
 }
 
+static inline bool is_kmalloc_normal(struct kmem_cache *s)
+{
+	if (!is_kmalloc_cache(s))
+		return false;
+	return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
+}
+
 /* Legal flag mask for kmem_cache_create(), for various configurations */
 #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
 			 SLAB_CACHE_DMA32 | SLAB_PANIC | \
diff --git a/mm/slub.c b/mm/slub.c
index c9d8a2497fd6..3f2a89f7a23a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2185,6 +2185,41 @@  void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 
 	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
 }
+
+static __fastpath_inline
+bool memcg_slab_post_charge(void *p, gfp_t flags)
+{
+	struct slabobj_ext *slab_exts;
+	struct kmem_cache *s;
+	struct folio *folio;
+	struct slab *slab;
+	unsigned long off;
+
+	folio = virt_to_folio(p);
+	if (!folio_test_slab(folio)) {
+		return folio_memcg_kmem(folio) ||
+			(__memcg_kmem_charge_page(folio_page(folio, 0), flags,
+						  folio_order(folio)) == 0);
+	}
+
+	slab = folio_slab(folio);
+	s = slab->slab_cache;
+
+	/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
+	if (is_kmalloc_normal(s))
+		return true;
+
+	/* Ignore already charged objects. */
+	slab_exts = slab_obj_exts(slab);
+	if (slab_exts) {
+		off = obj_to_index(s, slab, p);
+		if (unlikely(slab_exts[off].objcg))
+			return true;
+	}
+
+	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
+}
+
 #else /* CONFIG_MEMCG */
 static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
 					      struct list_lru *lru,
@@ -2198,6 +2233,11 @@  static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
 					void **p, int objects)
 {
 }
+
+static inline bool memcg_slab_post_charge(void *p, gfp_t flags)
+{
+	return true;
+}
 #endif /* CONFIG_MEMCG */
 
 /*
@@ -4062,6 +4102,15 @@  void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
 }
 EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
 
+bool kmem_cache_charge(void *objp, gfp_t gfpflags)
+{
+	if (!memcg_kmem_online())
+		return true;
+
+	return memcg_slab_post_charge(objp, gfpflags);
+}
+EXPORT_SYMBOL(kmem_cache_charge);
+
 /**
  * kmem_cache_alloc_node - Allocate an object on the specified node
  * @s: The cache to allocate from.
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 64d07b842e73..3c13ca8c11fb 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -715,6 +715,7 @@  struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
 	release_sock(sk);
 	if (newsk && mem_cgroup_sockets_enabled) {
 		int amt = 0;
+		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
 
 		/* atomically get the memory usage, set and charge the
 		 * newsk->sk_memcg.
@@ -731,8 +732,8 @@  struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
 		}
 
 		if (amt)
-			mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
-						GFP_KERNEL | __GFP_NOFAIL);
+			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
+		kmem_cache_charge(newsk, gfp);
 
 		release_sock(newsk);
 	}