Message ID | 20230903142800.3870-2-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | bpf, cgroup: Enable cgroup_array map on cgroup1 | expand |
On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao wrote: > Currently, the function task_under_cgroup_hierarchy() allows us to > determine if a task resides exclusively within a cgroup2 hierarchy. > Nevertheless, given the continued prevalence of cgroup1, it's useful that > we make a minor adjustment to extend its functionality to cgroup1 as well. > Once this modification is implemented, we will have the ability to > effortlessly verify a task's cgroup membership within BPF programs. For > instance, we can easily check if a task belongs to a cgroup1 directory, > such as /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/ or > /sys/fs/cgroup/cpu,cpuacct/kubepods/besteffort/. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > --- > include/linux/cgroup.h | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index b307013..5414a2c 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -543,15 +543,33 @@ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp, > * @ancestor: possible ancestor of @task's cgroup > * > * Tests whether @task's default cgroup hierarchy is a descendant of @ancestor. > - * It follows all the same rules as cgroup_is_descendant, and only applies > - * to the default hierarchy. > + * It follows all the same rules as cgroup_is_descendant. > */ > static inline bool task_under_cgroup_hierarchy(struct task_struct *task, > struct cgroup *ancestor) > { > struct css_set *cset = task_css_set(task); > + struct cgroup *cgrp; > + bool ret = false; > + int ssid; > + > + if (ancestor->root == &cgrp_dfl_root) > + return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > + > + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) { > + if (!ancestor->subsys[ssid]) > + continue; This looks wrong. I believe cgroup_mutex should be held to iterate. Tejun ? > > - return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > + cgrp = task_css(task, ssid)->cgroup; > + if (!cgrp) > + continue; > + > + if (!cgroup_is_descendant(cgrp, ancestor)) > + return false; > + if (!ret) > + ret = true; > + } > + return ret; > } > > /* no synchronization, the result can only be used as a hint */ > -- > 1.8.3.1 >
Hello, On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao wrote: > static inline bool task_under_cgroup_hierarchy(struct task_struct *task, > struct cgroup *ancestor) > { > struct css_set *cset = task_css_set(task); > + struct cgroup *cgrp; > + bool ret = false; > + int ssid; > + > + if (ancestor->root == &cgrp_dfl_root) > + return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > + > + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) { > + if (!ancestor->subsys[ssid]) > + continue; > > - return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > + cgrp = task_css(task, ssid)->cgroup; > + if (!cgrp) > + continue; > + > + if (!cgroup_is_descendant(cgrp, ancestor)) > + return false; > + if (!ret) > + ret = true; > + } > + return ret; I feel ambivalent about adding support for this in cgroup1 especially given that this can only work for fd based interface which is worse than the ID based ones. Even if we're doing this, the above is definitely not what we want to do as it won't work for controller-less hierarchies like the one that systemd used to use. You'd have to lock css_set_lock and walk the cgpr_cset_links. Thanks.
On Thu, Sep 7, 2023 at 4:13 AM Tejun Heo <tj@kernel.org> wrote: > > Hello, > > On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao wrote: > > static inline bool task_under_cgroup_hierarchy(struct task_struct *task, > > struct cgroup *ancestor) > > { > > struct css_set *cset = task_css_set(task); > > + struct cgroup *cgrp; > > + bool ret = false; > > + int ssid; > > + > > + if (ancestor->root == &cgrp_dfl_root) > > + return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > > + > > + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) { > > + if (!ancestor->subsys[ssid]) > > + continue; > > > > - return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > > + cgrp = task_css(task, ssid)->cgroup; > > + if (!cgrp) > > + continue; > > + > > + if (!cgroup_is_descendant(cgrp, ancestor)) > > + return false; > > + if (!ret) > > + ret = true; > > + } > > + return ret; > > I feel ambivalent about adding support for this in cgroup1 especially given > that this can only work for fd based interface which is worse than the ID > based ones. The fd-based cgroup interface plays a crucial role in BPF programs, particularly in components such as cgroup_iter, bpf_cgrp_storage, and cgroup_array maps, as well as in the attachment and detachment of cgroups. However, it's important to note that as far as my knowledge goes, bpf_cgrp_storage, cgroup_array, and the attachment/detachment of cgroups are exclusively compatible with the cgroup fd-based interface. Unfortunately, all these functionalities are limited to cgroup2, which poses challenges in containerized environments. In our pursuit of enabling seamless BPF integration within our Kubernetes environment, we've been exploring the possibility of transitioning from cgroup1 to cgroup2. This transition, while desirable for its future-forward nature, presents complexities due to the need for numerous applications to adapt. We acknowledge that cgroup2 represents the future, but we also understand that such transitions require time and effort. Consequently, we are considering an alternative approach. Rather than migrating to cgroup2, we are contemplating modifications to the BPF kernel code to ensure compatibility with cgroup1. Moreover, it appears that these modifications may entail only minor adjustments, making this option more palatable. > Even if we're doing this, the above is definitely not what we > want to do as it won't work for controller-less hierarchies like the one > that systemd used to use. Right. It can't work for /sys/fs/cgroup/systemd/. > You'd have to lock css_set_lock and walk the > cgpr_cset_links. That seems better. Will investigate it. Thanks for your suggestion. -- Regards Yafang
On Thu, Sep 07, 2023 at 11:05:07AM +0800, Yafang Shao wrote: > The fd-based cgroup interface plays a crucial role in BPF programs, > particularly in components such as cgroup_iter, bpf_cgrp_storage, and > cgroup_array maps, as well as in the attachment and detachment of > cgroups. Yeah, I know they're used. It's just that they are inferior identifiers from cgroup's POV as they are ephemeral, can't easily be transferred across process boundaries or persisted beyond the lifetime of the fd-owing process or the cgroups which are being pointed to. Thanks.
On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > static inline bool task_under_cgroup_hierarchy(struct task_struct *task, > struct cgroup *ancestor) > { > struct css_set *cset = task_css_set(task); > + struct cgroup *cgrp; > + bool ret = false; > + int ssid; > + > + if (ancestor->root == &cgrp_dfl_root) > + return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > + > + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) { This loop were better an iteration over cset->cgrp_links to handle any v1 hierarchy (under css_set_lock :-/). > + if (!ancestor->subsys[ssid]) > + continue; > > - return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > + cgrp = task_css(task, ssid)->cgroup; Does this pass on a lockdep-enabled kernel? See conditions in task_css_set_check(), it seems at least RCU read lock would be needed (if not going through cgrp_links mentioned above). HTH, Michal
On Mon, Sep 18, 2023 at 10:45 PM Michal Koutný <mkoutny@suse.com> wrote: > > On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > > static inline bool task_under_cgroup_hierarchy(struct task_struct *task, > > struct cgroup *ancestor) > > { > > struct css_set *cset = task_css_set(task); > > + struct cgroup *cgrp; > > + bool ret = false; > > + int ssid; > > + > > + if (ancestor->root == &cgrp_dfl_root) > > + return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > > + > > + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) { > > This loop were better an iteration over cset->cgrp_links to handle any > v1 hierarchy (under css_set_lock :-/). Agree. That is better. > > > + if (!ancestor->subsys[ssid]) > > + continue; > > > > - return cgroup_is_descendant(cset->dfl_cgrp, ancestor); > > + cgrp = task_css(task, ssid)->cgroup; > > Does this pass on a lockdep-enabled kernel? Yes, the lockdep is enabled. > > See conditions in task_css_set_check(), it seems at least RCU read lock > would be needed (if not going through cgrp_links mentioned above). All the call sites of it are already under RCU protection, so we don't need to explicitly set RCU read lock here.
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b307013..5414a2c 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -543,15 +543,33 @@ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp, * @ancestor: possible ancestor of @task's cgroup * * Tests whether @task's default cgroup hierarchy is a descendant of @ancestor. - * It follows all the same rules as cgroup_is_descendant, and only applies - * to the default hierarchy. + * It follows all the same rules as cgroup_is_descendant. */ static inline bool task_under_cgroup_hierarchy(struct task_struct *task, struct cgroup *ancestor) { struct css_set *cset = task_css_set(task); + struct cgroup *cgrp; + bool ret = false; + int ssid; + + if (ancestor->root == &cgrp_dfl_root) + return cgroup_is_descendant(cset->dfl_cgrp, ancestor); + + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) { + if (!ancestor->subsys[ssid]) + continue; - return cgroup_is_descendant(cset->dfl_cgrp, ancestor); + cgrp = task_css(task, ssid)->cgroup; + if (!cgrp) + continue; + + if (!cgroup_is_descendant(cgrp, ancestor)) + return false; + if (!ret) + ret = true; + } + return ret; } /* no synchronization, the result can only be used as a hint */
Currently, the function task_under_cgroup_hierarchy() allows us to determine if a task resides exclusively within a cgroup2 hierarchy. Nevertheless, given the continued prevalence of cgroup1, it's useful that we make a minor adjustment to extend its functionality to cgroup1 as well. Once this modification is implemented, we will have the ability to effortlessly verify a task's cgroup membership within BPF programs. For instance, we can easily check if a task belongs to a cgroup1 directory, such as /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/ or /sys/fs/cgroup/cpu,cpuacct/kubepods/besteffort/. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- include/linux/cgroup.h | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-)