Message ID | 20241010232505.1339892-2-namhyung@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | bpf: Add kmem_cache iterator and kfunc | expand |
On Thu, Oct 10, 2024 at 4:25 PM Namhyung Kim <namhyung@kernel.org> wrote: > > The new "kmem_cache" iterator will traverse the list of slab caches > and call attached BPF programs for each entry. It should check the > argument (ctx.s) if it's NULL before using it. > > Now the iteration grabs the slab_mutex only if it traverse the list and traverses > releases the mutex when it runs the BPF program. The kmem_cache entry > is protected by a refcount during the execution. > > It includes the internal "mm/slab.h" header to access kmem_cache, > slab_caches and slab_mutex. Hope it's ok to mm folks. What was the reason you dropped Vlastimil's and Roman's acks from this patch while keeping them in patch 2 ? Folks pls Ack again if it looks ok. I'm ready to apply, but would like the acks first. Also I'd like to remove the above paragraph from mm/slab.h from the commit log. It was good to ask during v1, but looks odd at v5.
On Thu, Oct 10, 2024 at 4:25 PM Namhyung Kim <namhyung@kernel.org> wrote: > > +struct bpf_iter__kmem_cache { > + __bpf_md_ptr(struct bpf_iter_meta *, meta); > + __bpf_md_ptr(struct kmem_cache *, s); > +}; Just noticed this. Not your fault. You're copy pasting from bpf_iter__*. It looks like tech debt. Andrii, Song, do you remember why all iters are using this? __bpf_md_ptr() wrap was necessary in uapi/bpf.h, but this is kernel iters that go into vmlinux.h It should be fine to remove them all and progs wouldn't need to do the ugly dance of: #define bpf_iter__ksym bpf_iter__ksym___not_used #include "vmlinux.h" #undef bpf_iter__ksym
Hello Alexei, On Fri, Oct 11, 2024 at 11:33:31AM -0700, Alexei Starovoitov wrote: > On Thu, Oct 10, 2024 at 4:25 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > The new "kmem_cache" iterator will traverse the list of slab caches > > and call attached BPF programs for each entry. It should check the > > argument (ctx.s) if it's NULL before using it. > > > > Now the iteration grabs the slab_mutex only if it traverse the list and > > traverses > > > releases the mutex when it runs the BPF program. The kmem_cache entry > > is protected by a refcount during the execution. > > > > It includes the internal "mm/slab.h" header to access kmem_cache, > > slab_caches and slab_mutex. Hope it's ok to mm folks. > > What was the reason you dropped Vlastimil's and Roman's acks > from this patch while keeping them in patch 2 ? I wanted to make sure the slab maintainers agree with the refcounting and the locking logic changes. But I forgot to add back Vlastimil's Acked for the v4 which is the same in this regard. > > Folks pls Ack again if it looks ok. > > I'm ready to apply, but would like the acks first. > > Also I'd like to remove the above paragraph > from mm/slab.h from the commit log. > It was good to ask during v1, but looks odd at v5. Sure, feel free to make any changes. Thanks, Namhyung
On Fri, Oct 11, 2024 at 11:44 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Oct 10, 2024 at 4:25 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > +struct bpf_iter__kmem_cache { > > + __bpf_md_ptr(struct bpf_iter_meta *, meta); > > + __bpf_md_ptr(struct kmem_cache *, s); > > +}; > > Just noticed this. > Not your fault. You're copy pasting from bpf_iter__*. > It looks like tech debt. > > Andrii, Song, > > do you remember why all iters are using this? I don't *know*, but I suspect we are doing this because of 32-bit host architecture. BPF-side is always 64-bit, so to make memory layout inside the kernel and in BPF programs compatible we have to do this for pointers, no? > __bpf_md_ptr() wrap was necessary in uapi/bpf.h, > but this is kernel iters that go into vmlinux.h > It should be fine to remove them all and > progs wouldn't need to do the ugly dance of: > > #define bpf_iter__ksym bpf_iter__ksym___not_used > #include "vmlinux.h" > #undef bpf_iter__ksym I don't think __bpf_md_ptr is why we are doing this ___not_used dance. At some point we probably didn't want to rely on having the very latest vmlinux.h available in BPF selftests, so we chose to define local versions of all relevant context types. I think we can drop all that ___not_used dance regardless (and remove local definitions in progs/bpf_iter.h).
On Fri, Oct 11, 2024 at 12:41 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Fri, Oct 11, 2024 at 11:44 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Oct 10, 2024 at 4:25 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > +struct bpf_iter__kmem_cache { > > > + __bpf_md_ptr(struct bpf_iter_meta *, meta); > > > + __bpf_md_ptr(struct kmem_cache *, s); > > > +}; BTW, do we want/need to define an open-coded iterator version of this, so that this iteration can be done from other BPF programs? Seems like it has to be a sleepable BPF program, but that's probably fine? > > > > Just noticed this. > > Not your fault. You're copy pasting from bpf_iter__*. > > It looks like tech debt. > > > > Andrii, Song, > > > > do you remember why all iters are using this? > > I don't *know*, but I suspect we are doing this because of 32-bit host > architecture. BPF-side is always 64-bit, so to make memory layout > inside the kernel and in BPF programs compatible we have to do this > for pointers, no? > > > __bpf_md_ptr() wrap was necessary in uapi/bpf.h, > > but this is kernel iters that go into vmlinux.h > > It should be fine to remove them all and > > progs wouldn't need to do the ugly dance of: > > > > #define bpf_iter__ksym bpf_iter__ksym___not_used > > #include "vmlinux.h" > > #undef bpf_iter__ksym > > I don't think __bpf_md_ptr is why we are doing this ___not_used dance. > At some point we probably didn't want to rely on having the very > latest vmlinux.h available in BPF selftests, so we chose to define > local versions of all relevant context types. > > I think we can drop all that ___not_used dance regardless (and remove > local definitions in progs/bpf_iter.h).
On 10/11/24 01:25, Namhyung Kim wrote: > The new "kmem_cache" iterator will traverse the list of slab caches > and call attached BPF programs for each entry. It should check the > argument (ctx.s) if it's NULL before using it. > > Now the iteration grabs the slab_mutex only if it traverse the list and > releases the mutex when it runs the BPF program. The kmem_cache entry > is protected by a refcount during the execution. > > It includes the internal "mm/slab.h" header to access kmem_cache, > slab_caches and slab_mutex. Hope it's ok to mm folks. Yeah this paragraph can be dropped. > Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> #slab
diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index c0e3e1426a82f5c4..139bdececdcfaefb 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -283,5 +283,6 @@ extern u32 btf_tracing_ids[]; extern u32 bpf_cgroup_btf_id[]; extern u32 bpf_local_storage_map_btf_id[]; extern u32 btf_bpf_map_id[]; +extern u32 bpf_kmem_cache_btf_id[]; #endif diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 9b9c151b5c826b31..105328f0b9c04e37 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -52,3 +52,4 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o +obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o diff --git a/kernel/bpf/kmem_cache_iter.c b/kernel/bpf/kmem_cache_iter.c new file mode 100644 index 0000000000000000..2de0682c6d4c773f --- /dev/null +++ b/kernel/bpf/kmem_cache_iter.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024 Google */ +#include <linux/bpf.h> +#include <linux/btf_ids.h> +#include <linux/slab.h> +#include <linux/kernel.h> +#include <linux/seq_file.h> + +#include "../../mm/slab.h" /* kmem_cache, slab_caches and slab_mutex */ + +struct bpf_iter__kmem_cache { + __bpf_md_ptr(struct bpf_iter_meta *, meta); + __bpf_md_ptr(struct kmem_cache *, s); +}; + +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos) +{ + loff_t cnt = 0; + bool found = false; + struct kmem_cache *s; + + mutex_lock(&slab_mutex); + + /* Find an entry at the given position in the slab_caches list instead + * of keeping a reference (of the last visited entry, if any) out of + * slab_mutex. It might miss something if one is deleted in the middle + * while it releases the lock. But it should be rare and there's not + * much we can do about it. + */ + list_for_each_entry(s, &slab_caches, list) { + if (cnt == *pos) { + /* Make sure this entry remains in the list by getting + * a new reference count. Note that boot_cache entries + * have a negative refcount, so don't touch them. + */ + if (s->refcount > 0) + s->refcount++; + found = true; + break; + } + cnt++; + } + mutex_unlock(&slab_mutex); + + if (!found) + return NULL; + + return s; +} + +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v) +{ + struct bpf_iter_meta meta; + struct bpf_iter__kmem_cache ctx = { + .meta = &meta, + .s = v, + }; + struct bpf_prog *prog; + bool destroy = false; + + meta.seq = seq; + prog = bpf_iter_get_info(&meta, true); + if (prog && !ctx.s) + bpf_iter_run_prog(prog, &ctx); + + if (ctx.s == NULL) + return; + + mutex_lock(&slab_mutex); + + /* Skip kmem_cache_destroy() for active entries */ + if (ctx.s->refcount > 1) + ctx.s->refcount--; + else if (ctx.s->refcount == 1) + destroy = true; + + mutex_unlock(&slab_mutex); + + if (destroy) + kmem_cache_destroy(ctx.s); +} + +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct kmem_cache *s = v; + struct kmem_cache *next = NULL; + bool destroy = false; + + ++*pos; + + mutex_lock(&slab_mutex); + + if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) { + next = list_next_entry(s, list); + + WARN_ON_ONCE(next->refcount == 0); + + /* boot_caches have negative refcount, don't touch them */ + if (next->refcount > 0) + next->refcount++; + } + + /* Skip kmem_cache_destroy() for active entries */ + if (s->refcount > 1) + s->refcount--; + else if (s->refcount == 1) + destroy = true; + + mutex_unlock(&slab_mutex); + + if (destroy) + kmem_cache_destroy(s); + + return next; +} + +static int kmem_cache_iter_seq_show(struct seq_file *seq, void *v) +{ + struct bpf_iter_meta meta; + struct bpf_iter__kmem_cache ctx = { + .meta = &meta, + .s = v, + }; + struct bpf_prog *prog; + int ret = 0; + + meta.seq = seq; + prog = bpf_iter_get_info(&meta, false); + if (prog) + ret = bpf_iter_run_prog(prog, &ctx); + + return ret; +} + +static const struct seq_operations kmem_cache_iter_seq_ops = { + .start = kmem_cache_iter_seq_start, + .next = kmem_cache_iter_seq_next, + .stop = kmem_cache_iter_seq_stop, + .show = kmem_cache_iter_seq_show, +}; + +BTF_ID_LIST_GLOBAL_SINGLE(bpf_kmem_cache_btf_id, struct, kmem_cache) + +static const struct bpf_iter_seq_info kmem_cache_iter_seq_info = { + .seq_ops = &kmem_cache_iter_seq_ops, +}; + +static void bpf_iter_kmem_cache_show_fdinfo(const struct bpf_iter_aux_info *aux, + struct seq_file *seq) +{ + seq_puts(seq, "kmem_cache iter\n"); +} + +DEFINE_BPF_ITER_FUNC(kmem_cache, struct bpf_iter_meta *meta, + struct kmem_cache *s) + +static struct bpf_iter_reg bpf_kmem_cache_reg_info = { + .target = "kmem_cache", + .feature = BPF_ITER_RESCHED, + .show_fdinfo = bpf_iter_kmem_cache_show_fdinfo, + .ctx_arg_info_size = 1, + .ctx_arg_info = { + { offsetof(struct bpf_iter__kmem_cache, s), + PTR_TO_BTF_ID_OR_NULL | PTR_UNTRUSTED }, + }, + .seq_info = &kmem_cache_iter_seq_info, +}; + +static int __init bpf_kmem_cache_iter_init(void) +{ + bpf_kmem_cache_reg_info.ctx_arg_info[0].btf_id = bpf_kmem_cache_btf_id[0]; + return bpf_iter_reg_target(&bpf_kmem_cache_reg_info); +} + +late_initcall(bpf_kmem_cache_iter_init);
The new "kmem_cache" iterator will traverse the list of slab caches and call attached BPF programs for each entry. It should check the argument (ctx.s) if it's NULL before using it. Now the iteration grabs the slab_mutex only if it traverse the list and releases the mutex when it runs the BPF program. The kmem_cache entry is protected by a refcount during the execution. It includes the internal "mm/slab.h" header to access kmem_cache, slab_caches and slab_mutex. Hope it's ok to mm folks. Signed-off-by: Namhyung Kim <namhyung@kernel.org> --- include/linux/btf_ids.h | 1 + kernel/bpf/Makefile | 1 + kernel/bpf/kmem_cache_iter.c | 175 +++++++++++++++++++++++++++++++++++ 3 files changed, 177 insertions(+) create mode 100644 kernel/bpf/kmem_cache_iter.c