Message ID | 20231007140304.4390-2-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf, cgroup: Add BPF support for cgroup1 hierarchy | expand |
On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao wrote: > The task cannot modify cgroups if we have already acquired the > css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following > this change, task_cgroup_from_root() can be employed in non-sleepable contexts. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Maybe just drop lockdep_assert_held(&cgroup_mutex) from cset_cgroup_from_root()? Thanks.
On Sat, Oct 7, 2023 at 11:50 PM Tejun Heo <tj@kernel.org> wrote: > > On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao wrote: > > The task cannot modify cgroups if we have already acquired the > > css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following > > this change, task_cgroup_from_root() can be employed in non-sleepable contexts. > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > Maybe just drop lockdep_assert_held(&cgroup_mutex) from > cset_cgroup_from_root()? It seems we can do it. will do it in the next version.
On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > The task cannot modify cgroups if we have already acquired the > css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following > this change, task_cgroup_from_root() can be employed in non-sleepable contexts. IIRC, cset_cgroup_from_root() needs cgroup_mutex to make sure the `root` doesn't disappear under hands (synchronization with cgroup_destroy_root(). However, as I look at it now, cgroup_mutex won't synchronize against cgroup_kill_sb(), it may worked by accident(?) nowadays (i.e. callers pinned the root implicitly in another way). Still, it'd be good to have an annotation that ensures, the root is around when using it. (RCU read lock might be fine but you'd need cgroup_get_live() if passing it out of the RCU read section.) Basically, the code must be made safe against cgroup v1 unmounts. Michal
On Mon, Oct 9, 2023 at 10:45 PM Michal Koutný <mkoutny@suse.com> wrote: > > On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > > The task cannot modify cgroups if we have already acquired the > > css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following > > this change, task_cgroup_from_root() can be employed in non-sleepable contexts. > > IIRC, cset_cgroup_from_root() needs cgroup_mutex to make sure the `root` > doesn't disappear under hands (synchronization with > cgroup_destroy_root(). current_cgns_cgroup_from_root() doesn't hold the cgroup_mutext as well. Could this potentially lead to issues, such as triggering the BUG_ON() in __cset_cgroup_from_root(), if the root has already been destroyed? > However, as I look at it now, cgroup_mutex won't synchronize against > cgroup_kill_sb(), it may worked by accident(?) nowadays (i.e. callers > pinned the root implicitly in another way). > > Still, it'd be good to have an annotation that ensures, the root is around > when using it. (RCU read lock might be fine but you'd need > cgroup_get_live() if passing it out of the RCU read section.) > > Basically, the code must be made safe against cgroup v1 unmounts. What we aim to protect against is changes to the `root_list`, which occur exclusively during cgroup setup and destroy paths. Would it be beneficial to introduce a dedicated root_list_lock specifically for this purpose? This approach could potentially reduce the need for the broader cgroup_mutex in other scenarios. -- Regards Yafang
Hi Yafang. On Tue, Oct 10, 2023 at 11:58:14AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote: > current_cgns_cgroup_from_root() doesn't hold the cgroup_mutext as > well. Could this potentially lead to issues, such as triggering the > BUG_ON() in __cset_cgroup_from_root(), if the root has already been > destroyed? current_cgns_cgroup_from_root() is a tricky one, see also https://lore.kernel.org/r/20230502133847.14570-3-mkoutny@suse.com/ I argued there with RCU read lock but when I look at it now, it may not be sufficient for the cgroup returned from current_cgns_cgroup_from_root(). The 2nd half still applies, umount synchronization is ensured via VFS layer, so the cgroup_root nor its cgroup won't go away in the only caller cgroup_show_path(). > Would it be beneficial to introduce a dedicated root_list_lock > specifically for this purpose? This approach could potentially reduce > the need for the broader cgroup_mutex in other scenarios. It may be a convenience lock but v2 (cgrp_dfl_root could make do just fine without it). I'm keeping this dicussuion to illustrate the difficulties of adding the BPF support for cgroup v1. That is a benefit I see ;-) Michal
On Tue, Oct 10, 2023 at 4:29 PM Michal Koutný <mkoutny@suse.com> wrote: > > Hi Yafang. > > On Tue, Oct 10, 2023 at 11:58:14AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote: > > current_cgns_cgroup_from_root() doesn't hold the cgroup_mutext as > > well. Could this potentially lead to issues, such as triggering the > > BUG_ON() in __cset_cgroup_from_root(), if the root has already been > > destroyed? > > current_cgns_cgroup_from_root() is a tricky one, see also > https://lore.kernel.org/r/20230502133847.14570-3-mkoutny@suse.com/ > > I argued there with RCU read lock but when I look at it now, it may not be > sufficient for the cgroup returned from current_cgns_cgroup_from_root(). > > The 2nd half still applies, umount synchronization is ensured via VFS > layer, so the cgroup_root nor its cgroup won't go away in the > only caller cgroup_show_path(). Thanks for your explanation. > > > > Would it be beneficial to introduce a dedicated root_list_lock > > specifically for this purpose? This approach could potentially reduce > > the need for the broader cgroup_mutex in other scenarios. > > It may be a convenience lock but v2 (cgrp_dfl_root could make do just > fine without it). Right. cgroup2 doesn't require it. > > I'm keeping this dicussuion to illustrate the difficulties of adding the > BPF support for cgroup v1. That is a benefit I see ;-) A potentially more effective approach might be to employ RCU protection for the cgroup_list. That said, use list_for_each_entry_rcu/list_add_rcu/list_del_rcu instead. -- Regards Yafang
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1fb7f562289d..bd1692f48be5 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1453,7 +1453,7 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, /* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. + * called with css_set_lock held. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) @@ -1462,7 +1462,8 @@ struct cgroup *task_cgroup_from_root(struct task_struct *task, * No need to lock the task - since we hold css_set_lock the * task can't change groups. */ - return cset_cgroup_from_root(task_css_set(task), root); + lockdep_assert_held(&css_set_lock); + return __cset_cgroup_from_root(task_css_set(task), root); } /*
The task cannot modify cgroups if we have already acquired the css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following this change, task_cgroup_from_root() can be employed in non-sleepable contexts. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- kernel/cgroup/cgroup.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)