diff mbox series

[RFC,bpf-next,v2,2/9] cgroup: Eliminate the need for cgroup_mutex in proc_cgroup_show()

Message ID 20231017124546.24608-3-laoar.shao@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf, cgroup: Add BPF support for cgroup1 hierarchy | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-0 success Logs for ShellCheck
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1369 this patch: 1369
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 1386 this patch: 1386
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1394 this patch: 1394
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yafang Shao Oct. 17, 2023, 12:45 p.m. UTC
The cgroup root_list is already RCU-safe. Therefore, we can replace the
cgroup_mutex with the RCU read lock in some particular paths. This change
will be particularly beneficial for frequent operations, such as
`cat /proc/self/cgroup`, in a cgroup1-based container environment.

I did stress tests with this change, as outlined below
(with CONFIG_PROVE_RCU_LIST enabled):

- Continuously mounting and unmounting named cgroups in some tasks,
  for example:

  cgrp_name=$1
  while true
  do
      mount -t cgroup -o none,name=$cgrp_name none /$cgrp_name
      umount /$cgrp_name
  done

- Continuously triggering proc_cgroup_show() in some tasks concurrently,
  for example:
  while true; do cat /proc/self/cgroup > /dev/null; done

They can ran successfully after implementing this change, with no RCU
warnings in dmesg. It's worth noting that this change can also catch
deleted cgroups, as demonstrated by running the following task at the
same time:

  while true; do grep deleted /proc/self/cgroup; done

Results in output like:

  7995:name=cgrp2: (deleted)
  8594:name=cgrp1: (deleted)

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/cgroup/cgroup.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Michal Koutný Oct. 17, 2023, 2:04 p.m. UTC | #1
Hi.

I'd like this proc_cgroup_show de-contention.
(Provided the previous patch and this one can be worked out somehow.)

On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> They can ran successfully after implementing this change, with no RCU
> warnings in dmesg. It's worth noting that this change can also catch
> deleted cgroups, as demonstrated by running the following task at the
> same time:

Can those be other than v1 root cgroups? A suffix "(unmounted)" may be
more informative then.

(Non-zombie tasks prevent their cgroup removal, zombie tasks won't have
any non-trivial path rendered.)


> @@ -6256,7 +6256,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
>  	if (!buf)
>  		goto out;
>  
> -	cgroup_lock();
> +	rcu_read_lock();

What about the cgroup_path_ns_locked() that prints the path?

(I argue above that no non-trivial path is rendered but I'm not sure
whether rcu_read_lock() is sufficient sync wrt cgroup rmdir.)
Yafang Shao Oct. 18, 2023, 3:12 a.m. UTC | #2
On Tue, Oct 17, 2023 at 10:04 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hi.
>
> I'd like this proc_cgroup_show de-contention.
> (Provided the previous patch and this one can be worked out somehow.)

Thanks

>
> On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> > They can ran successfully after implementing this change, with no RCU
> > warnings in dmesg. It's worth noting that this change can also catch
> > deleted cgroups, as demonstrated by running the following task at the
> > same time:
>
> Can those be other than v1 root cgroups? A suffix "(unmounted)" may be
> more informative then.

They can only be a v1 root cgroups. will use the "(unmounted)"
instead. Thanks for your suggestion.

>
> (Non-zombie tasks prevent their cgroup removal, zombie tasks won't have
> any non-trivial path rendered.)
>
>
> > @@ -6256,7 +6256,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
> >       if (!buf)
> >               goto out;
> >
> > -     cgroup_lock();
> > +     rcu_read_lock();
>
> What about the cgroup_path_ns_locked() that prints the path?

I believe we can further enhance cgroup_path_ns() by replacing the
cgroup_lock with rcu_read_lock. However, we need to explicitly address
the NULL root case, which may necessitate some refactoring. Perhaps
this can be improved in a separate patchset?

>
> (I argue above that no non-trivial path is rendered but I'm not sure
> whether rcu_read_lock() is sufficient sync wrt cgroup rmdir.)
>
Tejun Heo Oct. 18, 2023, 9:38 a.m. UTC | #3
On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao wrote:
> Results in output like:
> 
>   7995:name=cgrp2: (deleted)
>   8594:name=cgrp1: (deleted)

Just skip them? What would userspace do with the above information?

Thanks.
Yafang Shao Oct. 19, 2023, 6:39 a.m. UTC | #4
On Wed, Oct 18, 2023 at 5:38 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao wrote:
> > Results in output like:
> >
> >   7995:name=cgrp2: (deleted)
> >   8594:name=cgrp1: (deleted)
>
> Just skip them? What would userspace do with the above information?

It should be useless to userspace.
will skip it.
diff mbox series

Patch

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index bae8f9f27792..30bdb3bf1dcd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6256,7 +6256,7 @@  int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 	if (!buf)
 		goto out;
 
-	cgroup_lock();
+	rcu_read_lock();
 	spin_lock_irq(&css_set_lock);
 
 	for_each_root(root) {
@@ -6279,6 +6279,10 @@  int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 		seq_putc(m, ':');
 
 		cgrp = task_cgroup_from_root(tsk, root);
+		if (!cgrp) {
+			seq_puts(m, " (deleted)\n");
+			continue;
+		}
 
 		/*
 		 * On traditional hierarchies, all zombie tasks show up as
@@ -6311,7 +6315,7 @@  int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 	retval = 0;
 out_unlock:
 	spin_unlock_irq(&css_set_lock);
-	cgroup_unlock();
+	rcu_read_unlock();
 	kfree(buf);
 out:
 	return retval;