Message ID | 20231017124546.24608-3-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf, cgroup: Add BPF support for cgroup1 hierarchy | expand |
Hi. I'd like this proc_cgroup_show de-contention. (Provided the previous patch and this one can be worked out somehow.) On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > They can ran successfully after implementing this change, with no RCU > warnings in dmesg. It's worth noting that this change can also catch > deleted cgroups, as demonstrated by running the following task at the > same time: Can those be other than v1 root cgroups? A suffix "(unmounted)" may be more informative then. (Non-zombie tasks prevent their cgroup removal, zombie tasks won't have any non-trivial path rendered.) > @@ -6256,7 +6256,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, > if (!buf) > goto out; > > - cgroup_lock(); > + rcu_read_lock(); What about the cgroup_path_ns_locked() that prints the path? (I argue above that no non-trivial path is rendered but I'm not sure whether rcu_read_lock() is sufficient sync wrt cgroup rmdir.)
On Tue, Oct 17, 2023 at 10:04 PM Michal Koutný <mkoutny@suse.com> wrote: > > Hi. > > I'd like this proc_cgroup_show de-contention. > (Provided the previous patch and this one can be worked out somehow.) Thanks > > On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > > They can ran successfully after implementing this change, with no RCU > > warnings in dmesg. It's worth noting that this change can also catch > > deleted cgroups, as demonstrated by running the following task at the > > same time: > > Can those be other than v1 root cgroups? A suffix "(unmounted)" may be > more informative then. They can only be a v1 root cgroups. will use the "(unmounted)" instead. Thanks for your suggestion. > > (Non-zombie tasks prevent their cgroup removal, zombie tasks won't have > any non-trivial path rendered.) > > > > @@ -6256,7 +6256,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, > > if (!buf) > > goto out; > > > > - cgroup_lock(); > > + rcu_read_lock(); > > What about the cgroup_path_ns_locked() that prints the path? I believe we can further enhance cgroup_path_ns() by replacing the cgroup_lock with rcu_read_lock. However, we need to explicitly address the NULL root case, which may necessitate some refactoring. Perhaps this can be improved in a separate patchset? > > (I argue above that no non-trivial path is rendered but I'm not sure > whether rcu_read_lock() is sufficient sync wrt cgroup rmdir.) >
On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao wrote: > Results in output like: > > 7995:name=cgrp2: (deleted) > 8594:name=cgrp1: (deleted) Just skip them? What would userspace do with the above information? Thanks.
On Wed, Oct 18, 2023 at 5:38 PM Tejun Heo <tj@kernel.org> wrote: > > On Tue, Oct 17, 2023 at 12:45:39PM +0000, Yafang Shao wrote: > > Results in output like: > > > > 7995:name=cgrp2: (deleted) > > 8594:name=cgrp1: (deleted) > > Just skip them? What would userspace do with the above information? It should be useless to userspace. will skip it.
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index bae8f9f27792..30bdb3bf1dcd 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6256,7 +6256,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, if (!buf) goto out; - cgroup_lock(); + rcu_read_lock(); spin_lock_irq(&css_set_lock); for_each_root(root) { @@ -6279,6 +6279,10 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, seq_putc(m, ':'); cgrp = task_cgroup_from_root(tsk, root); + if (!cgrp) { + seq_puts(m, " (deleted)\n"); + continue; + } /* * On traditional hierarchies, all zombie tasks show up as @@ -6311,7 +6315,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, retval = 0; out_unlock: spin_unlock_irq(&css_set_lock); - cgroup_unlock(); + rcu_read_unlock(); kfree(buf); out: return retval;
The cgroup root_list is already RCU-safe. Therefore, we can replace the cgroup_mutex with the RCU read lock in some particular paths. This change will be particularly beneficial for frequent operations, such as `cat /proc/self/cgroup`, in a cgroup1-based container environment. I did stress tests with this change, as outlined below (with CONFIG_PROVE_RCU_LIST enabled): - Continuously mounting and unmounting named cgroups in some tasks, for example: cgrp_name=$1 while true do mount -t cgroup -o none,name=$cgrp_name none /$cgrp_name umount /$cgrp_name done - Continuously triggering proc_cgroup_show() in some tasks concurrently, for example: while true; do cat /proc/self/cgroup > /dev/null; done They can ran successfully after implementing this change, with no RCU warnings in dmesg. It's worth noting that this change can also catch deleted cgroups, as demonstrated by running the following task at the same time: while true; do grep deleted /proc/self/cgroup; done Results in output like: 7995:name=cgrp2: (deleted) 8594:name=cgrp1: (deleted) Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- kernel/cgroup/cgroup.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)