Message ID | 20220510001807.4132027-9-yosryahmed@google.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | bpf: cgroup hierarchical stats collection | expand |
Context | Check | Description |
---|---|---|
bpf/vmtest-bpf-next-PR | fail | merge-conflict |
netdev/tree_selection | success | Clearly marked for bpf-next, async |
netdev/apply | fail | Patch does not apply to bpf-next |
On Mon, May 9, 2022 at 5:18 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > From: Hao Luo <haoluo@google.com> > > Introduce a new type of iter prog: cgroup. Unlike other bpf_iter, this > iter doesn't iterate a set of kernel objects. Instead, it is supposed to > be parameterized by a cgroup id and prints only that cgroup. So one > needs to specify a target cgroup id when attaching this iter. The target > cgroup's state can be read out via a link of this iter. > > Signed-off-by: Hao Luo <haoluo@google.com> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > --- > include/linux/bpf.h | 2 + > include/uapi/linux/bpf.h | 6 ++ > kernel/bpf/Makefile | 2 +- > kernel/bpf/cgroup_iter.c | 148 +++++++++++++++++++++++++++++++++ > tools/include/uapi/linux/bpf.h | 6 ++ > 5 files changed, 163 insertions(+), 1 deletion(-) > create mode 100644 kernel/bpf/cgroup_iter.c > Thanks Yosry for posting this patch! Dear reviewers, this is v2 of the cgroup_iter change I sent previously at https://lore.kernel.org/bpf/20220225234339.2386398-9-haoluo@google.com/ v1 - > v2: - Getting the cgroup's reference at the time at attaching, instead of at the time when iterating. (Yonghong) (context [1]) - Remove .init_seq_private and .fini_seq_private callbacks for cgroup_iter. They are not needed now. (Yonghong) [1] https://lore.kernel.org/bpf/f780fc3a-dbc2-986c-d5a0-6b0ef1c4311f@fb.com/ Hao
Hello, On Tue, May 10, 2022 at 12:18:06AM +0000, Yosry Ahmed wrote: > From: Hao Luo <haoluo@google.com> > > Introduce a new type of iter prog: cgroup. Unlike other bpf_iter, this > iter doesn't iterate a set of kernel objects. Instead, it is supposed to > be parameterized by a cgroup id and prints only that cgroup. So one > needs to specify a target cgroup id when attaching this iter. The target > cgroup's state can be read out via a link of this iter. Is there a reason why this can't be a proper iterator which supports lseek64() to locate a specific cgroup? Thanks.
Hello Tejun, On Tue, May 10, 2022 at 11:54 AM Tejun Heo <tj@kernel.org> wrote: > > Hello, > > On Tue, May 10, 2022 at 12:18:06AM +0000, Yosry Ahmed wrote: > > From: Hao Luo <haoluo@google.com> > > > > Introduce a new type of iter prog: cgroup. Unlike other bpf_iter, this > > iter doesn't iterate a set of kernel objects. Instead, it is supposed to > > be parameterized by a cgroup id and prints only that cgroup. So one > > needs to specify a target cgroup id when attaching this iter. The target > > cgroup's state can be read out via a link of this iter. > > Is there a reason why this can't be a proper iterator which supports > lseek64() to locate a specific cgroup? > There are two reasons: - Bpf_iter assumes no_llseek. I haven't looked closely on why this is so and whether we can add its support. - Second, the name 'iter' in this patch is misleading. What this patch really does is reusing the functionality of dumping in bpf_iter. 'Dumper' is a better name. We want to create one file in bpffs for each cgroup. We are essentially just iterating a set of one single element. > Thanks. > > -- > tejun
Hello, On Tue, May 10, 2022 at 02:12:16PM -0700, Hao Luo wrote: > > Is there a reason why this can't be a proper iterator which supports > > lseek64() to locate a specific cgroup? > > > > There are two reasons: > > - Bpf_iter assumes no_llseek. I haven't looked closely on why this is > so and whether we can add its support. > > - Second, the name 'iter' in this patch is misleading. What this patch > really does is reusing the functionality of dumping in bpf_iter. > 'Dumper' is a better name. We want to create one file in bpffs for > each cgroup. We are essentially just iterating a set of one single > element. I see. I'm just shooting in the dark without context but at least in principle there's no reason why cgroups wouldn't be iterable, so it might be something worth at least thinking about before baking in the interface. Thanks.
On Tue, May 10, 2022 at 3:07 PM Tejun Heo <tj@kernel.org> wrote: > > Hello, > > On Tue, May 10, 2022 at 02:12:16PM -0700, Hao Luo wrote: > > > Is there a reason why this can't be a proper iterator which supports > > > lseek64() to locate a specific cgroup? > > > > > > > There are two reasons: > > > > - Bpf_iter assumes no_llseek. I haven't looked closely on why this is > > so and whether we can add its support. > > > > - Second, the name 'iter' in this patch is misleading. What this patch > > really does is reusing the functionality of dumping in bpf_iter. > > 'Dumper' is a better name. We want to create one file in bpffs for > > each cgroup. We are essentially just iterating a set of one single > > element. > > I see. I'm just shooting in the dark without context but at least in > principle there's no reason why cgroups wouldn't be iterable, so it might be > something worth at least thinking about before baking in the interface. > Yep. Conceptually there should be no problem to iterate cgroups in the system. It may be better to have two independent bpf objects: bpf_iter and bpf_dumper. In our use case, we want bpf_dumper, which just exports data out through fs interface. Hao
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f6fa35ffe311..f472f43521d2 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -43,6 +43,7 @@ struct kobject; struct mem_cgroup; struct module; struct bpf_func_state; +struct cgroup; extern struct idr btf_idr; extern spinlock_t btf_idr_lock; @@ -1601,6 +1602,7 @@ int bpf_obj_get_user(const char __user *pathname, int flags); struct bpf_iter_aux_info { struct bpf_map *map; + struct cgroup *cgroup; }; typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 015ed402c642..096c521e34de 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -91,6 +91,9 @@ union bpf_iter_link_info { struct { __u32 map_fd; } map; + struct { + __u64 cgroup_id; + } cgroup; }; /* BPF syscall commands, see bpf(2) man-page for more details. */ @@ -5963,6 +5966,9 @@ struct bpf_link_info { struct { __u32 map_id; } map; + struct { + __u64 cgroup_id; + } cgroup; }; } iter; struct { diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 6caf4a61e543..07a715b54190 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -8,7 +8,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy) obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o -obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o +obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o cgroup_iter.o obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c new file mode 100644 index 000000000000..86bdfe135d24 --- /dev/null +++ b/kernel/bpf/cgroup_iter.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022 Google */ +#include <linux/bpf.h> +#include <linux/btf_ids.h> +#include <linux/cgroup.h> +#include <linux/kernel.h> +#include <linux/seq_file.h> + +struct bpf_iter__cgroup { + __bpf_md_ptr(struct bpf_iter_meta *, meta); + __bpf_md_ptr(struct cgroup *, cgroup); +}; + +static void *cgroup_iter_seq_start(struct seq_file *seq, loff_t *pos) +{ + /* Only one session is supported. */ + if (*pos > 0) + return NULL; + + if (*pos == 0) + ++*pos; + + return *(struct cgroup **)seq->private; +} + +static void *cgroup_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + ++*pos; + return NULL; +} + +static int cgroup_iter_seq_show(struct seq_file *seq, void *v) +{ + struct bpf_iter__cgroup ctx; + struct bpf_iter_meta meta; + struct bpf_prog *prog; + int ret = 0; + + ctx.meta = &meta; + ctx.cgroup = v; + meta.seq = seq; + prog = bpf_iter_get_info(&meta, false); + if (prog) + ret = bpf_iter_run_prog(prog, &ctx); + + return ret; +} + +static void cgroup_iter_seq_stop(struct seq_file *seq, void *v) +{ +} + +static const struct seq_operations cgroup_iter_seq_ops = { + .start = cgroup_iter_seq_start, + .next = cgroup_iter_seq_next, + .stop = cgroup_iter_seq_stop, + .show = cgroup_iter_seq_show, +}; + +BTF_ID_LIST_SINGLE(bpf_cgroup_btf_id, struct, cgroup) + +static int cgroup_iter_seq_init(void *priv_data, struct bpf_iter_aux_info *aux) +{ + *(struct cgroup **)priv_data = aux->cgroup; + return 0; +} + +static const struct bpf_iter_seq_info cgroup_iter_seq_info = { + .seq_ops = &cgroup_iter_seq_ops, + .init_seq_private = cgroup_iter_seq_init, + .seq_priv_size = sizeof(struct cgroup *), +}; + +static int bpf_iter_attach_cgroup(struct bpf_prog *prog, + union bpf_iter_link_info *linfo, + struct bpf_iter_aux_info *aux) +{ + struct cgroup *cgroup; + + cgroup = cgroup_get_from_id(linfo->cgroup.cgroup_id); + if (!cgroup) + return -EBUSY; + + aux->cgroup = cgroup; + return 0; +} + +static void bpf_iter_detach_cgroup(struct bpf_iter_aux_info *aux) +{ + if (aux->cgroup) + cgroup_put(aux->cgroup); +} + +static void bpf_iter_cgroup_show_fdinfo(const struct bpf_iter_aux_info *aux, + struct seq_file *seq) +{ + char *buf; + + seq_printf(seq, "cgroup_id:\t%llu\n", cgroup_id(aux->cgroup)); + + buf = kmalloc(PATH_MAX, GFP_KERNEL); + if (!buf) { + seq_puts(seq, "cgroup_path:\n"); + return; + } + + /* If cgroup_path_ns() fails, buf will be an empty string, cgroup_path + * will print nothing. + * + * Cgroup_path is the path in the calliing process's cgroup namespace. + */ + cgroup_path_ns(aux->cgroup, buf, sizeof(buf), + current->nsproxy->cgroup_ns); + seq_printf(seq, "cgroup_path:\t%s\n", buf); + kfree(buf); +} + +static int bpf_iter_cgroup_fill_link_info(const struct bpf_iter_aux_info *aux, + struct bpf_link_info *info) +{ + info->iter.cgroup.cgroup_id = cgroup_id(aux->cgroup); + return 0; +} + +DEFINE_BPF_ITER_FUNC(cgroup, struct bpf_iter_meta *meta, + struct cgroup *cgroup) + +static struct bpf_iter_reg bpf_cgroup_reg_info = { + .target = "cgroup", + .attach_target = bpf_iter_attach_cgroup, + .detach_target = bpf_iter_detach_cgroup, + .show_fdinfo = bpf_iter_cgroup_show_fdinfo, + .fill_link_info = bpf_iter_cgroup_fill_link_info, + .ctx_arg_info_size = 1, + .ctx_arg_info = { + { offsetof(struct bpf_iter__cgroup, cgroup), + PTR_TO_BTF_ID }, + }, + .seq_info = &cgroup_iter_seq_info, +}; + +static int __init bpf_cgroup_iter_init(void) +{ + bpf_cgroup_reg_info.ctx_arg_info[0].btf_id = bpf_cgroup_btf_id[0]; + return bpf_iter_reg_target(&bpf_cgroup_reg_info); +} + +late_initcall(bpf_cgroup_iter_init); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 015ed402c642..096c521e34de 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -91,6 +91,9 @@ union bpf_iter_link_info { struct { __u32 map_fd; } map; + struct { + __u64 cgroup_id; + } cgroup; }; /* BPF syscall commands, see bpf(2) man-page for more details. */ @@ -5963,6 +5966,9 @@ struct bpf_link_info { struct { __u32 map_id; } map; + struct { + __u64 cgroup_id; + } cgroup; }; } iter; struct {