Message ID | 20231017124546.24608-2-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf, cgroup: Add BPF support for cgroup1 hierarchy | expand |
On Tue, Oct 17, 2023 at 12:45:38PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > Therefore, making it RCU-safe would be beneficial. Notice that whole cgroup_destroy_root() is actually an RCU callback (via css_free_rwork_fn()). So the list traversal under RCU should alreay be OK wrt its stability. Do you see a loophole in this argument? > /* iterate across the hierarchies */ > #define for_each_root(root) \ > - list_for_each_entry((root), &cgroup_roots, root_list) > + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ > + !lockdep_is_held(&cgroup_mutex)) The extra condition should be constant false (regardless of cgroup_mutex). IOW, RCU read lock is always required. > @@ -1386,13 +1386,15 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset, > } > } > > - BUG_ON(!res_cgroup); > + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); > return res_cgroup; Hm, this would benefit from a comment why !res_cgroup is conditionally acceptable. > } > > /* > * look up cgroup associated with current task's cgroup namespace on the > - * specified hierarchy > + * specified hierarchy. Umount synchronization is ensured via VFS layer, > + * so we don't have to hold cgroup_mutex to prevent the root from being > + * destroyed. I tried the similar via explicit lockdep invocation (in a thread I linked to you previously) and VFS folks weren't ethusiastic about it. You cannot hide this synchronization assumption in a mere comment. Thanks, Michal
On Tue, Oct 17, 2023 at 9:20 PM Michal Koutný <mkoutny@suse.com> wrote: > > On Tue, Oct 17, 2023 at 12:45:38PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote: > > Therefore, making it RCU-safe would be beneficial. > > Notice that whole cgroup_destroy_root() is actually an RCU callback (via > css_free_rwork_fn()). So the list traversal under RCU should alreay be > OK wrt its stability. Do you see a loophole in this argument? css_free_rwork_fn() is designed for handling the cgroup's CSS. When the RCU callback is executed, this specific CSS is not accessible to other tasks. However, the cgroup root remains accessible to other tasks. For instance, other tasks can still traverse the cgroup root_list and access the cgroup root that is currently being destroyed. Hence, we must take explicit measures to make access to the cgroup root RCU-safe. > > > > /* iterate across the hierarchies */ > > #define for_each_root(root) \ > > - list_for_each_entry((root), &cgroup_roots, root_list) > > + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ > > + !lockdep_is_held(&cgroup_mutex)) > > The extra condition should be constant false (regardless of > cgroup_mutex). IOW, RCU read lock is always required. IIUC, the RCU read lock is not required, while the cgroup_mutex is required when we want to traverse the cgroup root_list, such as in the case of cgroup_attach_task_all. > > > @@ -1386,13 +1386,15 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset, > > } > > } > > > > - BUG_ON(!res_cgroup); > > + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); > > return res_cgroup; > > Hm, this would benefit from a comment why !res_cgroup is conditionally > acceptable. The cgrp_cset_link is freed before we remove the cgroup root from the root_list. Consequently, when accessing a cgroup root, the cset_link may have already been freed, resulting in a NULL res_cgroup. However, by holding the cgroup_mutex, we ensure that res_cgroup cannot be NULL. Will add the comments in the next version. > > > } > > > > /* > > * look up cgroup associated with current task's cgroup namespace on the > > - * specified hierarchy > > + * specified hierarchy. Umount synchronization is ensured via VFS layer, > > + * so we don't have to hold cgroup_mutex to prevent the root from being > > + * destroyed. > > I tried the similar via explicit lockdep invocation (in a thread I > linked to you previously) and VFS folks weren't ethusiastic about it. Thanks for your information. will take a look at it. > > You cannot hide this synchronization assumption in a mere comment. will think about how to address it. -- Regards Yafang
On Tue, Oct 17, 2023 at 12:45:38PM +0000, Yafang Shao wrote: > #define for_each_root(root) \ > - list_for_each_entry((root), &cgroup_roots, root_list) > + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ > + !lockdep_is_held(&cgroup_mutex)) Shouldn't that be lockdep_is_held() without the leading negation? > @@ -1386,13 +1386,15 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset, > } > } > > - BUG_ON(!res_cgroup); > + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); This doesn't work. lockdep_is_held() is always true if !PROVE_LOCKING. > return res_cgroup; > } > > /* > * look up cgroup associated with current task's cgroup namespace on the > - * specified hierarchy > + * specified hierarchy. Umount synchronization is ensured via VFS layer, > + * so we don't have to hold cgroup_mutex to prevent the root from being > + * destroyed. > */ Yeah, as Michal said, let's not do it this way. Thanks.
On Wed, Oct 18, 2023 at 5:35 PM Tejun Heo <tj@kernel.org> wrote: > > On Tue, Oct 17, 2023 at 12:45:38PM +0000, Yafang Shao wrote: > > #define for_each_root(root) \ > > - list_for_each_entry((root), &cgroup_roots, root_list) > > + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ > > + !lockdep_is_held(&cgroup_mutex)) > > Shouldn't that be lockdep_is_held() without the leading negation? right. will fix it. > > > @@ -1386,13 +1386,15 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset, > > } > > } > > > > - BUG_ON(!res_cgroup); > > + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); > > This doesn't work. lockdep_is_held() is always true if !PROVE_LOCKING. will use mutex_is_locked() instead. > > > return res_cgroup; > > } > > > > /* > > * look up cgroup associated with current task's cgroup namespace on the > > - * specified hierarchy > > + * specified hierarchy. Umount synchronization is ensured via VFS layer, > > + * so we don't have to hold cgroup_mutex to prevent the root from being > > + * destroyed. > > */ > > Yeah, as Michal said, let's not do it this way. sure, will do it.
On Thu, Oct 19, 2023 at 02:38:52PM +0800, Yafang Shao wrote: > > > - BUG_ON(!res_cgroup); > > > + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); > > > > This doesn't work. lockdep_is_held() is always true if !PROVE_LOCKING. > > will use mutex_is_locked() instead. But then, someone else can hold the lock and trigger the condition spuriously. The kernel doesn't track who's holding the lock unless lockdep is enabled. Thanks.
On 10/19/23 15:08, Tejun Heo wrote: > On Thu, Oct 19, 2023 at 02:38:52PM +0800, Yafang Shao wrote: >>>> - BUG_ON(!res_cgroup); >>>> + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); >>> This doesn't work. lockdep_is_held() is always true if !PROVE_LOCKING. >> will use mutex_is_locked() instead. > But then, someone else can hold the lock and trigger the condition > spuriously. The kernel doesn't track who's holding the lock unless lockdep > is enabled. It is actually possible to detect if the current process is the owner of a mutex since there is a owner field in the mutex structure. However, the owner field also contains additional information which need to be masked off before comparing with "current". If such a functionality is really needed, we will have to add a helper function mutex_is_held(), for example, to kernel/locking/mutex.c. Cheers, Longman
On Fri, Oct 20, 2023 at 3:43 AM Waiman Long <longman@redhat.com> wrote: > > On 10/19/23 15:08, Tejun Heo wrote: > > On Thu, Oct 19, 2023 at 02:38:52PM +0800, Yafang Shao wrote: > >>>> - BUG_ON(!res_cgroup); > >>>> + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); > >>> This doesn't work. lockdep_is_held() is always true if !PROVE_LOCKING. > >> will use mutex_is_locked() instead. > > But then, someone else can hold the lock and trigger the condition > > spuriously. The kernel doesn't track who's holding the lock unless lockdep > > is enabled. > > It is actually possible to detect if the current process is the owner of > a mutex since there is a owner field in the mutex structure. However, > the owner field also contains additional information which need to be > masked off before comparing with "current". If such a functionality is > really needed, we will have to add a helper function mutex_is_held(), > for example, to kernel/locking/mutex.c. 、 Agreed. We should first introduce mutex_is_held(). Thanks for your suggestion.
On Fri, Oct 20, 2023 at 05:36:57PM +0800, Yafang Shao wrote: > On Fri, Oct 20, 2023 at 3:43 AM Waiman Long <longman@redhat.com> wrote: > > > > On 10/19/23 15:08, Tejun Heo wrote: > > > On Thu, Oct 19, 2023 at 02:38:52PM +0800, Yafang Shao wrote: > > >>>> - BUG_ON(!res_cgroup); > > >>>> + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); > > >>> This doesn't work. lockdep_is_held() is always true if !PROVE_LOCKING. > > >> will use mutex_is_locked() instead. > > > But then, someone else can hold the lock and trigger the condition > > > spuriously. The kernel doesn't track who's holding the lock unless lockdep > > > is enabled. > > > > It is actually possible to detect if the current process is the owner of > > a mutex since there is a owner field in the mutex structure. However, > > the owner field also contains additional information which need to be > > masked off before comparing with "current". If such a functionality is > > really needed, we will have to add a helper function mutex_is_held(), > > for example, to kernel/locking/mutex.c. > 、 > Agreed. We should first introduce mutex_is_held(). Thanks for your suggestion. I'm not sure this is the right occassion to add such thing. It's just a warn_on, we can either pass in the necessary condition from the callers or just drop the warning. Thanks.
On Sat, Oct 21, 2023 at 1:51 AM Tejun Heo <tj@kernel.org> wrote: > > On Fri, Oct 20, 2023 at 05:36:57PM +0800, Yafang Shao wrote: > > On Fri, Oct 20, 2023 at 3:43 AM Waiman Long <longman@redhat.com> wrote: > > > > > > On 10/19/23 15:08, Tejun Heo wrote: > > > > On Thu, Oct 19, 2023 at 02:38:52PM +0800, Yafang Shao wrote: > > > >>>> - BUG_ON(!res_cgroup); > > > >>>> + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); > > > >>> This doesn't work. lockdep_is_held() is always true if !PROVE_LOCKING. > > > >> will use mutex_is_locked() instead. > > > > But then, someone else can hold the lock and trigger the condition > > > > spuriously. The kernel doesn't track who's holding the lock unless lockdep > > > > is enabled. > > > > > > It is actually possible to detect if the current process is the owner of > > > a mutex since there is a owner field in the mutex structure. However, > > > the owner field also contains additional information which need to be > > > masked off before comparing with "current". If such a functionality is > > > really needed, we will have to add a helper function mutex_is_held(), > > > for example, to kernel/locking/mutex.c. > > 、 > > Agreed. We should first introduce mutex_is_held(). Thanks for your suggestion. > > I'm not sure this is the right occassion to add such thing. It's just a > warn_on, we can either pass in the necessary condition from the callers or > just drop the warning. OK. will just drop the warning.
Hello, kernel test robot noticed "WARNING:suspicious_RCU_usage" on: commit: bf652e038501425f21e30c78c210ce892b9747c5 ("[RFC PATCH bpf-next v2 1/9] cgroup: Make operations on the cgroup root_list RCU safe") url: https://github.com/intel-lab-lkp/linux/commits/Yafang-Shao/cgroup-Make-operations-on-the-cgroup-root_list-RCU-safe/20231017-204729 base: https://git.kernel.org/cgit/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/all/20231017124546.24608-2-laoar.shao@gmail.com/ patch subject: [RFC PATCH bpf-next v2 1/9] cgroup: Make operations on the cgroup root_list RCU safe in testcase: boot compiler: gcc-12 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +------------------------------------------------------------------+------------+------------+ | | 137df1189d | bf652e0385 | +------------------------------------------------------------------+------------+------------+ | WARNING:suspicious_RCU_usage | 0 | 6 | | kernel/cgroup/cgroup#c:#RCU-list_traversed_in_non-reader_section | 0 | 6 | +------------------------------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202310251500.9683d034-oliver.sang@intel.com [ 245.792697][ T1] WARNING: suspicious RCU usage [ 245.793945][ T1] 6.6.0-rc5-01395-gbf652e038501 #1 Not tainted [ 245.795294][ T1] ----------------------------- [ 245.796458][ T1] kernel/cgroup/cgroup-v1.c:1171 RCU-list traversed in non-reader section!! [ 245.798005][ T1] [ 245.798005][ T1] other info that might help us debug this: [ 245.798005][ T1] [ 245.800905][ T1] [ 245.800905][ T1] rcu_scheduler_active = 2, debug_locks = 1 [ 245.815563][ T1] 1 lock held by init/1: [ 245.816666][ T1] #0: c3bfffb8 (cgroup_mutex){+.+.}-{3:3}, at: cgroup_lock_and_drain_offline (kernel/cgroup/cgroup.c:3066) [ 245.818306][ T1] [ 245.818306][ T1] stack backtrace: [ 245.820360][ T1] CPU: 0 PID: 1 Comm: init Not tainted 6.6.0-rc5-01395-gbf652e038501 #1 [ 245.821860][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 245.823506][ T1] Call Trace: [ 245.824459][ T1] dump_stack_lvl (lib/dump_stack.c:107) [ 245.825654][ T1] dump_stack (lib/dump_stack.c:114) [ 245.826665][ T1] lockdep_rcu_suspicious (include/linux/context_tracking.h:153 kernel/locking/lockdep.c:6712) [ 245.827845][ T1] cgroup1_root_to_use (kernel/cgroup/cgroup-v1.c:1171 (discriminator 9)) [ 245.828999][ T1] cgroup1_get_tree (kernel/cgroup/cgroup-v1.c:1245) [ 245.830101][ T1] vfs_get_tree (fs/super.c:1750) [ 245.831172][ T1] do_new_mount (fs/namespace.c:3335) [ 245.832199][ T1] path_mount (fs/namespace.c:3662) [ 245.833147][ T1] ? user_path_at_empty (fs/namei.c:2914) [ 245.834198][ T1] __ia32_sys_mount (fs/namespace.c:3675 fs/namespace.c:3884 fs/namespace.c:3861 fs/namespace.c:3861) [ 245.835242][ T1] do_int80_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:132) [ 245.836343][ T1] ? kfree (mm/slab_common.c:1073) [ 245.837335][ T1] ? __ia32_sys_mount (fs/namespace.c:3861) [ 245.838404][ T1] ? do_int80_syscall_32 (arch/x86/entry/common.c:136) [ 245.839479][ T1] ? syscall_exit_to_user_mode (kernel/entry/common.c:131 kernel/entry/common.c:298) [ 245.840593][ T1] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4565) [ 245.841824][ T1] ? syscall_exit_to_user_mode (kernel/entry/common.c:299) [ 245.842967][ T1] ? do_int80_syscall_32 (arch/x86/entry/common.c:136) [ 245.843964][ T1] ? __ia32_sys_mount (fs/namespace.c:3861) [ 245.844935][ T1] ? do_int80_syscall_32 (arch/x86/entry/common.c:136) [ 245.845958][ T1] ? syscall_exit_to_user_mode (kernel/entry/common.c:131 kernel/entry/common.c:298) [ 245.847004][ T1] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4565) [ 245.848103][ T1] ? syscall_exit_to_user_mode (kernel/entry/common.c:299) [ 245.849159][ T1] ? do_int80_syscall_32 (arch/x86/entry/common.c:136) [ 245.850163][ T1] ? syscall_exit_to_user_mode (kernel/entry/common.c:299) [ 245.851209][ T1] ? do_int80_syscall_32 (arch/x86/entry/common.c:136) [ 245.852179][ T1] ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5775) [ 245.853126][ T1] ? irqentry_exit (kernel/entry/common.c:445) [ 245.858431][ T1] ? irqentry_exit_to_user_mode (kernel/entry/common.c:131 kernel/entry/common.c:311) [ 245.859731][ T1] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4565) [ 245.860985][ T1] ? irqentry_exit_to_user_mode (kernel/entry/common.c:312) [ 245.862110][ T1] ? irqentry_exit (kernel/entry/common.c:445) [ 245.863099][ T1] ? exc_page_fault (arch/x86/mm/fault.c:1565) [ 245.864073][ T1] entry_INT80_32 (arch/x86/entry/entry_32.S:947) [ 245.865070][ T1] EIP: 0xb7f895ed [ 245.865966][ T1] Code: 8b 7c 24 0c 50 e8 06 00 00 00 89 da 5b 5b 5f c3 8b 04 24 05 77 ec 04 00 8b 00 85 c0 74 06 50 8b 44 24 08 c3 8b 44 24 04 cd 80 <c3> 55 50 8b 6c 24 0c 8b 45 00 8b 6d 04 50 8b 44 24 04 e8 b9 ff ff All code ======== 0: 8b 7c 24 0c mov 0xc(%rsp),%edi 4: 50 push %rax 5: e8 06 00 00 00 call 0x10 a: 89 da mov %ebx,%edx c: 5b pop %rbx d: 5b pop %rbx e: 5f pop %rdi f: c3 ret 10: 8b 04 24 mov (%rsp),%eax 13: 05 77 ec 04 00 add $0x4ec77,%eax 18: 8b 00 mov (%rax),%eax 1a: 85 c0 test %eax,%eax 1c: 74 06 je 0x24 1e: 50 push %rax 1f: 8b 44 24 08 mov 0x8(%rsp),%eax 23: c3 ret 24: 8b 44 24 04 mov 0x4(%rsp),%eax 28: cd 80 int $0x80 2a:* c3 ret <-- trapping instruction 2b: 55 push %rbp 2c: 50 push %rax 2d: 8b 6c 24 0c mov 0xc(%rsp),%ebp 31: 8b 45 00 mov 0x0(%rbp),%eax 34: 8b 6d 04 mov 0x4(%rbp),%ebp 37: 50 push %rax 38: 8b 44 24 04 mov 0x4(%rsp),%eax 3c: e8 .byte 0xe8 3d: b9 .byte 0xb9 3e: ff (bad) 3f: ff .byte 0xff Code starting with the faulting instruction =========================================== 0: c3 ret 1: 55 push %rbp 2: 50 push %rax 3: 8b 6c 24 0c mov 0xc(%rsp),%ebp 7: 8b 45 00 mov 0x0(%rbp),%eax a: 8b 6d 04 mov 0x4(%rbp),%ebp d: 50 push %rax e: 8b 44 24 04 mov 0x4(%rsp),%eax 12: e8 .byte 0xe8 13: b9 .byte 0xb9 14: ff (bad) 15: ff .byte 0xff [ 245.868936][ T1] EAX: ffffffda EBX: 0804a431 ECX: 0804a429 EDX: 0804a431 [ 245.870279][ T1] ESI: 0000000e EDI: 00000000 EBP: bffa7bd8 ESP: bffa7bb8 [ 245.871623][ T1] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246 [ 245.935209][ T1] init: Console is alive The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231025/202310251500.9683d034-oliver.sang@intel.com
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index f1b3151ac30b..8505eeae6e41 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -558,6 +558,7 @@ struct cgroup_root { /* A list running through the active hierarchies */ struct list_head root_list; + struct rcu_head rcu; /* Hierarchy-specific flags */ unsigned int flags; diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index c56071f150f2..321af20ea15f 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -170,7 +170,8 @@ extern struct list_head cgroup_roots; /* iterate across the hierarchies */ #define for_each_root(root) \ - list_for_each_entry((root), &cgroup_roots, root_list) + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ + !lockdep_is_held(&cgroup_mutex)) /** * for_each_subsys - iterate all enabled cgroup subsystems diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1fb7f562289d..bae8f9f27792 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1313,7 +1313,7 @@ static void cgroup_exit_root_id(struct cgroup_root *root) void cgroup_free_root(struct cgroup_root *root) { - kfree(root); + kfree_rcu(root, rcu); } static void cgroup_destroy_root(struct cgroup_root *root) @@ -1346,7 +1346,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) spin_unlock_irq(&css_set_lock); if (!list_empty(&root->root_list)) { - list_del(&root->root_list); + list_del_rcu(&root->root_list); cgroup_root_count--; } @@ -1386,13 +1386,15 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset, } } - BUG_ON(!res_cgroup); + WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex)); return res_cgroup; } /* * look up cgroup associated with current task's cgroup namespace on the - * specified hierarchy + * specified hierarchy. Umount synchronization is ensured via VFS layer, + * so we don't have to hold cgroup_mutex to prevent the root from being + * destroyed. */ static struct cgroup * current_cgns_cgroup_from_root(struct cgroup_root *root) @@ -1445,7 +1447,6 @@ static struct cgroup *current_cgns_cgroup_dfl(void) static struct cgroup *cset_cgroup_from_root(struct css_set *cset, struct cgroup_root *root) { - lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&css_set_lock); return __cset_cgroup_from_root(cset, root); @@ -1453,7 +1454,9 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, /* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. + * called with css_set_lock held to prevent task's groups from being modified. + * Must be called with either cgroup_mutex or rcu read lock to prevent the + * cgroup root from being destroyed. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) @@ -2097,7 +2100,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) * care of subsystems' refcounts, which are explicitly dropped in * the failure exit path. */ - list_add(&root->root_list, &cgroup_roots); + list_add_rcu(&root->root_list, &cgroup_roots); cgroup_root_count++; /*
At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold the cgroup_mutex during traversal. Modifications to the list only occur in the cgroup root setup and destroy paths, which should be infrequent in a production environment. In contrast, traversal may occur frequently. Therefore, making it RCU-safe would be beneficial. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- include/linux/cgroup-defs.h | 1 + kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup.c | 17 ++++++++++------- 3 files changed, 13 insertions(+), 8 deletions(-)