[v4,bpf-next,1/3] bpf: Add kmem_cache iterator

Message ID	20241002180956.1781008-2-namhyung@kernel.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Namhyung Kim <namhyung@kernel.org> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev>, Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, LKML <linux-kernel@vger.kernel.org>, bpf@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Vlastimil Babka <vbabka@suse.cz>, Roman Gushchin <roman.gushchin@linux.dev>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, Arnaldo Carvalho de Melo <acme@kernel.org>, Kees Cook <kees@kernel.org> Subject: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Date: Wed, 2 Oct 2024 11:09:54 -0700 Message-ID: <20241002180956.1781008-2-namhyung@kernel.org> In-Reply-To: <20241002180956.1781008-1-namhyung@kernel.org> References: <20241002180956.1781008-1-namhyung@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	bpf: Add kmem_cache iterator and kfunc \| expand [v4,bpf-next,0/3] bpf: Add kmem_cache iterator and kfunc [v4,bpf-next,1/3] bpf: Add kmem_cache iterator [v4,bpf-next,2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc [v4,bpf-next,3/3] selftests/bpf: Add a test for kmem_cache_iter

Namhyung Kim Oct. 2, 2024, 6:09 p.m. UTC

The new "kmem_cache" iterator will traverse the list of slab caches
and call attached BPF programs for each entry.  It should check the
argument (ctx.s) if it's NULL before using it.

Now the iteration grabs the slab_mutex only if it traverse the list and
releases the mutex when it runs the BPF program.  The kmem_cache entry
is protected by a refcount during the execution.

It includes the internal "mm/slab.h" header to access kmem_cache,
slab_caches and slab_mutex.  Hope it's ok to mm folks.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
I've removed the Acked-by's from Roman and Vlastimil since it's changed
not to hold the slab_mutex and to manage the refcount.  Please review
this change again!

 include/linux/btf_ids.h      |   1 +
 kernel/bpf/Makefile          |   1 +
 kernel/bpf/kmem_cache_iter.c | 174 +++++++++++++++++++++++++++++++++++
 3 files changed, 176 insertions(+)
 create mode 100644 kernel/bpf/kmem_cache_iter.c

Vlastimil Babka Oct. 3, 2024, 7:35 a.m. UTC | #1

On 10/2/24 20:09, Namhyung Kim wrote:
> The new "kmem_cache" iterator will traverse the list of slab caches
> and call attached BPF programs for each entry.  It should check the
> argument (ctx.s) if it's NULL before using it.
> 
> Now the iteration grabs the slab_mutex only if it traverse the list and
> releases the mutex when it runs the BPF program.  The kmem_cache entry
> is protected by a refcount during the execution.
> 
> It includes the internal "mm/slab.h" header to access kmem_cache,
> slab_caches and slab_mutex.  Hope it's ok to mm folks.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab

Song Liu Oct. 4, 2024, 8:33 p.m. UTC | #2

On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
[...]
> +
> +       mutex_lock(&slab_mutex);
> +
> +       /*
> +        * Find an entry at the given position in the slab_caches list instead

Nit: style of multi-line comment: "/* Find ...".

> +        * of keeping a reference (of the last visited entry, if any) out of
> +        * slab_mutex. It might miss something if one is deleted in the middle
> +        * while it releases the lock.  But it should be rare and there's not
> +        * much we can do about it.
> +        */
> +       list_for_each_entry(s, &slab_caches, list) {
> +               if (cnt == *pos) {
> +                       /*
> +                        * Make sure this entry remains in the list by getting
> +                        * a new reference count.  Note that boot_cache entries
> +                        * have a negative refcount, so don't touch them.
> +                        */
> +                       if (s->refcount > 0)
> +                               s->refcount++;
> +                       found = true;
> +                       break;
> +               }
> +               cnt++;
> +       }
> +       mutex_unlock(&slab_mutex);
> +
> +       if (!found)
> +               return NULL;
> +
> +       ++*pos;
> +       return s;
> +}
> +
> +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
> +{
> +       struct bpf_iter_meta meta;
> +       struct bpf_iter__kmem_cache ctx = {
> +               .meta = &meta,
> +               .s = v,
> +       };
> +       struct bpf_prog *prog;
> +       bool destroy = false;
> +
> +       meta.seq = seq;
> +       prog = bpf_iter_get_info(&meta, true);
> +       if (prog)
> +               bpf_iter_run_prog(prog, &ctx);
> +
> +       if (ctx.s == NULL)
> +               return;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       /* Skip kmem_cache_destroy() for active entries */
> +       if (ctx.s->refcount > 1)
> +               ctx.s->refcount--;
> +       else if (ctx.s->refcount == 1)
> +               destroy = true;
> +
> +       mutex_unlock(&slab_mutex);
> +
> +       if (destroy)
> +               kmem_cache_destroy(ctx.s);
> +}
> +
> +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +       struct kmem_cache *s = v;
> +       struct kmem_cache *next = NULL;
> +       bool destroy = false;
> +
> +       ++*pos;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> +               next = list_next_entry(s, list);
> +               if (next->refcount > 0)
> +                       next->refcount++;

What if next->refcount <=0? Shall we find next of next?

> +       }
> +
> +       /* Skip kmem_cache_destroy() for active entries */
> +       if (s->refcount > 1)
> +               s->refcount--;
> +       else if (s->refcount == 1)
> +               destroy = true;
> +
> +       mutex_unlock(&slab_mutex);
> +
> +       if (destroy)
> +               kmem_cache_destroy(s);
> +
> +       return next;
> +}
[...]

Song Liu Oct. 4, 2024, 8:45 p.m. UTC | #3

On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
[...]
> +
> +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +       loff_t cnt = 0;
> +       bool found = false;
> +       struct kmem_cache *s;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       /*
> +        * Find an entry at the given position in the slab_caches list instead
> +        * of keeping a reference (of the last visited entry, if any) out of
> +        * slab_mutex. It might miss something if one is deleted in the middle
> +        * while it releases the lock.  But it should be rare and there's not
> +        * much we can do about it.
> +        */
> +       list_for_each_entry(s, &slab_caches, list) {
> +               if (cnt == *pos) {
> +                       /*
> +                        * Make sure this entry remains in the list by getting
> +                        * a new reference count.  Note that boot_cache entries
> +                        * have a negative refcount, so don't touch them.
> +                        */
> +                       if (s->refcount > 0)
> +                               s->refcount++;
> +                       found = true;
> +                       break;
> +               }
> +               cnt++;
> +       }
> +       mutex_unlock(&slab_mutex);
> +
> +       if (!found)
> +               return NULL;
> +
> +       ++*pos;

This should be

if (*pos == 0)
    ++*pos;

> +       return s;
> +}
> +
> +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
[...]

Namhyung Kim Oct. 4, 2024, 9:37 p.m. UTC | #4

Hi Song,

On Fri, Oct 04, 2024 at 01:33:19PM -0700, Song Liu wrote:
> On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> [...]
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       /*
> > +        * Find an entry at the given position in the slab_caches list instead
> 
> Nit: style of multi-line comment: "/* Find ...".

Ok, will update.

> 
> > +        * of keeping a reference (of the last visited entry, if any) out of
> > +        * slab_mutex. It might miss something if one is deleted in the middle
> > +        * while it releases the lock.  But it should be rare and there's not
> > +        * much we can do about it.
> > +        */
> > +       list_for_each_entry(s, &slab_caches, list) {
> > +               if (cnt == *pos) {
> > +                       /*
> > +                        * Make sure this entry remains in the list by getting
> > +                        * a new reference count.  Note that boot_cache entries
> > +                        * have a negative refcount, so don't touch them.
> > +                        */
> > +                       if (s->refcount > 0)
> > +                               s->refcount++;
> > +                       found = true;
> > +                       break;
> > +               }
> > +               cnt++;
> > +       }
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (!found)
> > +               return NULL;
> > +
> > +       ++*pos;
> > +       return s;
> > +}
> > +
> > +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
> > +{
> > +       struct bpf_iter_meta meta;
> > +       struct bpf_iter__kmem_cache ctx = {
> > +               .meta = &meta,
> > +               .s = v,
> > +       };
> > +       struct bpf_prog *prog;
> > +       bool destroy = false;
> > +
> > +       meta.seq = seq;
> > +       prog = bpf_iter_get_info(&meta, true);
> > +       if (prog)
> > +               bpf_iter_run_prog(prog, &ctx);
> > +
> > +       if (ctx.s == NULL)
> > +               return;
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       /* Skip kmem_cache_destroy() for active entries */
> > +       if (ctx.s->refcount > 1)
> > +               ctx.s->refcount--;
> > +       else if (ctx.s->refcount == 1)
> > +               destroy = true;
> > +
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (destroy)
> > +               kmem_cache_destroy(ctx.s);
> > +}
> > +
> > +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > +{
> > +       struct kmem_cache *s = v;
> > +       struct kmem_cache *next = NULL;
> > +       bool destroy = false;
> > +
> > +       ++*pos;
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> > +               next = list_next_entry(s, list);
> > +               if (next->refcount > 0)
> > +                       next->refcount++;
> 
> What if next->refcount <=0? Shall we find next of next?

The slab_mutex should protect refcount == 0 case so it won't see that.
The negative refcount means it's a boot_cache and we shouldn't touch the
refcount.

Thanks,
Namhyung

> 
> > +       }
> > +
> > +       /* Skip kmem_cache_destroy() for active entries */
> > +       if (s->refcount > 1)
> > +               s->refcount--;
> > +       else if (s->refcount == 1)
> > +               destroy = true;
> > +
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (destroy)
> > +               kmem_cache_destroy(s);
> > +
> > +       return next;
> > +}
> [...]

Namhyung Kim Oct. 4, 2024, 9:42 p.m. UTC | #5

On Fri, Oct 04, 2024 at 01:45:09PM -0700, Song Liu wrote:
> On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> [...]
> > +
> > +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
> > +{
> > +       loff_t cnt = 0;
> > +       bool found = false;
> > +       struct kmem_cache *s;
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       /*
> > +        * Find an entry at the given position in the slab_caches list instead
> > +        * of keeping a reference (of the last visited entry, if any) out of
> > +        * slab_mutex. It might miss something if one is deleted in the middle
> > +        * while it releases the lock.  But it should be rare and there's not
> > +        * much we can do about it.
> > +        */
> > +       list_for_each_entry(s, &slab_caches, list) {
> > +               if (cnt == *pos) {
> > +                       /*
> > +                        * Make sure this entry remains in the list by getting
> > +                        * a new reference count.  Note that boot_cache entries
> > +                        * have a negative refcount, so don't touch them.
> > +                        */
> > +                       if (s->refcount > 0)
> > +                               s->refcount++;
> > +                       found = true;
> > +                       break;
> > +               }
> > +               cnt++;
> > +       }
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (!found)
> > +               return NULL;
> > +
> > +       ++*pos;
> 
> This should be
> 
> if (*pos == 0)
>     ++*pos;

Oh, I thought there's check for seq->count after the seq->op->show()
for the ->start().  I need to check this logic again, thanks for
pointing this out.

Thanks,
Namhyung

> 
> > +       return s;
> > +}
> > +
> > +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
> [...]

Song Liu Oct. 4, 2024, 9:46 p.m. UTC | #6

On Fri, Oct 4, 2024 at 2:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Song,
>
> On Fri, Oct 04, 2024 at 01:33:19PM -0700, Song Liu wrote:
[...]
> > > +
> > > +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > > +{
> > > +       struct kmem_cache *s = v;
> > > +       struct kmem_cache *next = NULL;
> > > +       bool destroy = false;
> > > +
> > > +       ++*pos;
> > > +
> > > +       mutex_lock(&slab_mutex);
> > > +
> > > +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> > > +               next = list_next_entry(s, list);
> > > +               if (next->refcount > 0)
> > > +                       next->refcount++;
> >
> > What if next->refcount <=0? Shall we find next of next?
>
> The slab_mutex should protect refcount == 0 case so it won't see that.
> The negative refcount means it's a boot_cache and we shouldn't touch the
> refcount.

I see. Thanks for the explanation!

Please add a comment here, and maybe also add

  WARN_ON_ONCE(next ->refcount == 0).

Song

Namhyung Kim Oct. 4, 2024, 11:29 p.m. UTC | #7

On Fri, Oct 04, 2024 at 02:46:43PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 2:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Song,
> >
> > On Fri, Oct 04, 2024 at 01:33:19PM -0700, Song Liu wrote:
> [...]
> > > > +
> > > > +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > > > +{
> > > > +       struct kmem_cache *s = v;
> > > > +       struct kmem_cache *next = NULL;
> > > > +       bool destroy = false;
> > > > +
> > > > +       ++*pos;
> > > > +
> > > > +       mutex_lock(&slab_mutex);
> > > > +
> > > > +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> > > > +               next = list_next_entry(s, list);
> > > > +               if (next->refcount > 0)
> > > > +                       next->refcount++;
> > >
> > > What if next->refcount <=0? Shall we find next of next?
> >
> > The slab_mutex should protect refcount == 0 case so it won't see that.
> > The negative refcount means it's a boot_cache and we shouldn't touch the
> > refcount.
> 
> I see. Thanks for the explanation!
> 
> Please add a comment here, and maybe also add
> 
>   WARN_ON_ONCE(next ->refcount == 0).

Sure, thanks for your review!
Namhyung

[v4,bpf-next,1/3] bpf: Add kmem_cache iterator

Commit Message

Comments

Patch