diff mbox series

[RFC,bpf-next,1/8] cgroup: Don't have to hold cgroup_mutex in task_cgroup_from_root()

Message ID 20231007140304.4390-2-laoar.shao@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf, cgroup: Add BPF support for cgroup1 hierarchy | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-0 success Logs for ShellCheck
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1349 this patch: 1349
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 1364 this patch: 1364
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1372 this patch: 1372
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 17 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yafang Shao Oct. 7, 2023, 2:02 p.m. UTC
The task cannot modify cgroups if we have already acquired the
css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following
this change, task_cgroup_from_root() can be employed in non-sleepable contexts.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/cgroup/cgroup.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Tejun Heo Oct. 7, 2023, 3:50 p.m. UTC | #1
On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao wrote:
> The task cannot modify cgroups if we have already acquired the
> css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following
> this change, task_cgroup_from_root() can be employed in non-sleepable contexts.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

Maybe just drop lockdep_assert_held(&cgroup_mutex) from
cset_cgroup_from_root()?

Thanks.
Yafang Shao Oct. 8, 2023, 2:32 a.m. UTC | #2
On Sat, Oct 7, 2023 at 11:50 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao wrote:
> > The task cannot modify cgroups if we have already acquired the
> > css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following
> > this change, task_cgroup_from_root() can be employed in non-sleepable contexts.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>
> Maybe just drop lockdep_assert_held(&cgroup_mutex) from
> cset_cgroup_from_root()?

It seems we can do it.
will do it in the next version.
Michal Koutný Oct. 9, 2023, 2:45 p.m. UTC | #3
On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> The task cannot modify cgroups if we have already acquired the
> css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following
> this change, task_cgroup_from_root() can be employed in non-sleepable contexts.

IIRC, cset_cgroup_from_root() needs cgroup_mutex to make sure the `root`
doesn't disappear under hands (synchronization with
cgroup_destroy_root().
However, as I look at it now, cgroup_mutex won't synchronize against
cgroup_kill_sb(), it may worked by accident(?) nowadays (i.e. callers
pinned the root implicitly in another way).

Still, it'd be good to have an annotation that ensures, the root is around
when using it. (RCU read lock might be fine but you'd need
cgroup_get_live() if passing it out of the RCU read section.)

Basically, the code must be made safe against cgroup v1 unmounts.


Michal
Yafang Shao Oct. 10, 2023, 3:58 a.m. UTC | #4
On Mon, Oct 9, 2023 at 10:45 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Sat, Oct 07, 2023 at 02:02:57PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> > The task cannot modify cgroups if we have already acquired the
> > css_set_lock, thus eliminating the need to hold the cgroup_mutex. Following
> > this change, task_cgroup_from_root() can be employed in non-sleepable contexts.
>
> IIRC, cset_cgroup_from_root() needs cgroup_mutex to make sure the `root`
> doesn't disappear under hands (synchronization with
> cgroup_destroy_root().

current_cgns_cgroup_from_root() doesn't hold the cgroup_mutext as
well. Could this potentially lead to issues, such as triggering the
BUG_ON() in __cset_cgroup_from_root(), if the root has already been
destroyed?

> However, as I look at it now, cgroup_mutex won't synchronize against
> cgroup_kill_sb(), it may worked by accident(?) nowadays (i.e. callers
> pinned the root implicitly in another way).
>
> Still, it'd be good to have an annotation that ensures, the root is around
> when using it. (RCU read lock might be fine but you'd need
> cgroup_get_live() if passing it out of the RCU read section.)
>
> Basically, the code must be made safe against cgroup v1 unmounts.

What we aim to protect against is changes to the `root_list`, which
occur exclusively during cgroup setup and destroy paths. Would it be
beneficial to introduce a dedicated root_list_lock specifically for
this purpose? This approach could potentially reduce the need for the
broader cgroup_mutex in other scenarios.

--
Regards
Yafang
Michal Koutný Oct. 10, 2023, 8:29 a.m. UTC | #5
Hi Yafang.

On Tue, Oct 10, 2023 at 11:58:14AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> current_cgns_cgroup_from_root() doesn't hold the cgroup_mutext as
> well. Could this potentially lead to issues, such as triggering the
> BUG_ON() in __cset_cgroup_from_root(), if the root has already been
> destroyed?

current_cgns_cgroup_from_root() is a tricky one, see also
https://lore.kernel.org/r/20230502133847.14570-3-mkoutny@suse.com/

I argued there with RCU read lock but when I look at it now, it may not be
sufficient for the cgroup returned from current_cgns_cgroup_from_root().

The 2nd half still applies, umount synchronization is ensured via VFS
layer, so the cgroup_root nor its cgroup won't go away in the
only caller cgroup_show_path().


> Would it be beneficial to introduce a dedicated root_list_lock
> specifically for this purpose? This approach could potentially reduce
> the need for the broader cgroup_mutex in other scenarios.

It may be a convenience lock but v2 (cgrp_dfl_root could make do just
fine without it).

I'm keeping this dicussuion to illustrate the difficulties of adding the
BPF support for cgroup v1. That is a benefit I see ;-)

Michal
Yafang Shao Oct. 10, 2023, 11:58 a.m. UTC | #6
On Tue, Oct 10, 2023 at 4:29 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hi Yafang.
>
> On Tue, Oct 10, 2023 at 11:58:14AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> > current_cgns_cgroup_from_root() doesn't hold the cgroup_mutext as
> > well. Could this potentially lead to issues, such as triggering the
> > BUG_ON() in __cset_cgroup_from_root(), if the root has already been
> > destroyed?
>
> current_cgns_cgroup_from_root() is a tricky one, see also
> https://lore.kernel.org/r/20230502133847.14570-3-mkoutny@suse.com/
>
> I argued there with RCU read lock but when I look at it now, it may not be
> sufficient for the cgroup returned from current_cgns_cgroup_from_root().
>
> The 2nd half still applies, umount synchronization is ensured via VFS
> layer, so the cgroup_root nor its cgroup won't go away in the
> only caller cgroup_show_path().

Thanks for your explanation.

>
>
> > Would it be beneficial to introduce a dedicated root_list_lock
> > specifically for this purpose? This approach could potentially reduce
> > the need for the broader cgroup_mutex in other scenarios.
>
> It may be a convenience lock but v2 (cgrp_dfl_root could make do just
> fine without it).

Right. cgroup2 doesn't require it.

>
> I'm keeping this dicussuion to illustrate the difficulties of adding the
> BPF support for cgroup v1. That is a benefit I see ;-)

A potentially more effective approach might be to employ RCU
protection for the cgroup_list. That said, use
list_for_each_entry_rcu/list_add_rcu/list_del_rcu instead.

--
Regards

Yafang
diff mbox series

Patch

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1fb7f562289d..bd1692f48be5 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1453,7 +1453,7 @@  static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
 
 /*
  * Return the cgroup for "task" from the given hierarchy. Must be
- * called with cgroup_mutex and css_set_lock held.
+ * called with css_set_lock held.
  */
 struct cgroup *task_cgroup_from_root(struct task_struct *task,
 				     struct cgroup_root *root)
@@ -1462,7 +1462,8 @@  struct cgroup *task_cgroup_from_root(struct task_struct *task,
 	 * No need to lock the task - since we hold css_set_lock the
 	 * task can't change groups.
 	 */
-	return cset_cgroup_from_root(task_css_set(task), root);
+	lockdep_assert_held(&css_set_lock);
+	return __cset_cgroup_from_root(task_css_set(task), root);
 }
 
 /*