Message ID | 20230208033408.3997610-1-qiang1.zhang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | rcu: Fix bind rcu related kthreads to housekeeping CPUs | expand |
Did the testing and checked the CPU affinities of RCU kthreads by running "ps -eo psr,comm| grep rcu" [1]: Without Zqiang's patch: ... 0 rcu_preempt 0 rcuog/0 0 rcuop/0 0 rcuop/1 0 rcuog/2 0 rcuop/2 2 rcuop/3 With Zqiang's patch: ... 3 rcu_preempt // on housekeeping CPU 3 2 rcuog/0 // on housekeeping CPU 2 3 rcuop/0 // on housekeeping CPU 3 3 rcuop/1 // on housekeeping CPU 3 0 rcuog/2 // on non-housekeeping CPU 0. [2] 0 rcuop/2 // on non-housekeeping CPU 0. [2] 2 rcuop/3 [1] The 1st column of the output is for CPU IDs and the 2nd column is for thread names. [2] Added debug messages into the two threads, and found that the two threads didn't run after all CPUs were online. So if they run again after all CPUs are online, they will also be moved to housekeeping CPUs by Zqiang's patch. > From: Zqiang <qiang1.zhang@intel.com> > Sent: Wednesday, February 8, 2023 11:34 AM > To: paulmck@kernel.org; frederic@kernel.org; joel@joelfernandes.org > Cc: rcu@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH] rcu: Fix bind rcu related kthreads to housekeeping CPUs > > For kernels built with CONFIG_NO_HZ_FULL=y and CONFIG_RCU_NOCB_CPU=y, > run the following tests: > > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" > bootparams="console=ttyS0 isolcpus=0,1 nohz_full=0,1 rcu_nocbs=0,1" > > root@qemux86-64:~# ps -ef | grep "rcu_preempt" | grep -v grep | awk '{print > $2}' > 15 > root@qemux86-64:~# ps -ef | grep "rcuop/0" | grep -v grep | awk '{print $2}' > 17 > root@qemux86-64:~# taskset -p 15 > pid 15's current affinity mask: 1 > root@qemux86-64:~# taskset -p 17 > pid 17's current affinity mask: 1 > > The affinity of these rcu related kthreads is not set to housekeeping cpumask, > even if called rcu_bind_gp_kthread() when the rcu related kthread starts. > > set_cpus_allowed_ptr() > ->__set_cpus_allowed_ptr() > ->__set_cpus_allowed_ptr_locked > { > bool kthread = p->flags & PF_KTHREAD; > .... > if (kthread || is_migration_disabled(p)) > cpu_valid_mask = cpu_online_mask; > .... > dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, ctx- > >new_mask); > if (dest_cpu >= nr_cpu_ids) { > ret = -EINVAL; > goto out; > } > .... > } > > Due to these rcu related kthreads be created before bringup other CPUS, so > when they running and set hosekeeping cpus affinity, found that only CPU0 is > online at this time and CPU0 is set to no_hz_full CPU, the ctx->new_mask not > contain CPU0 and this will cause dest_cpu in the above code snippet to be an > illegal value and return directly, ultimately, these rcu related kthreads failed to > bind housekeeping CPUS. s/so// s/hosekeeping/housekeeping/ ... > This commit therefore rebind these rcu related kthreads to housekeeping CPUs s/rebind/rebinds/ Could you tweak all the commit messages to fix the typos and the grammar errors?
>Did the testing and checked the CPU affinities of RCU kthreads by running "ps -eo psr,comm| grep rcu" [1]: > >Without Zqiang's patch: > ... > 0 rcu_preempt > 0 rcuog/0 > 0 rcuop/0 > 0 rcuop/1 > 0 rcuog/2 > 0 rcuop/2 > 2 rcuop/3 > >With Zqiang's patch: > ... > 3 rcu_preempt // on housekeeping CPU 3 > 2 rcuog/0 // on housekeeping CPU 2 > 3 rcuop/0 // on housekeeping CPU 3 > 3 rcuop/1 // on housekeeping CPU 3 > 0 rcuog/2 // on non-housekeeping CPU 0. [2] > 0 rcuop/2 // on non-housekeeping CPU 0. [2] > 2 rcuop/3 > >[1] The 1st column of the output is for CPU IDs and the 2nd column is for thread names. >[2] Added debug messages into the two threads, and found that the two threads didn't run after all CPUs were online. > So if they run again after all CPUs are online, they will also be moved to housekeeping CPUs by Zqiang's patch. > > From: Zqiang <qiang1.zhang@intel.com> > Sent: Wednesday, February 8, 2023 11:34 AM > To: paulmck@kernel.org; frederic@kernel.org; joel@joelfernandes.org > Cc: rcu@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH] rcu: Fix bind rcu related kthreads to housekeeping CPUs > > For kernels built with CONFIG_NO_HZ_FULL=y and CONFIG_RCU_NOCB_CPU=y, > run the following tests: > > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" > bootparams="console=ttyS0 isolcpus=0,1 nohz_full=0,1 rcu_nocbs=0,1" > > root@qemux86-64:~# ps -ef | grep "rcu_preempt" | grep -v grep | awk '{print > $2}' > 15 > root@qemux86-64:~# ps -ef | grep "rcuop/0" | grep -v grep | awk '{print $2}' > 17 > root@qemux86-64:~# taskset -p 15 > pid 15's current affinity mask: 1 > root@qemux86-64:~# taskset -p 17 > pid 17's current affinity mask: 1 > > The affinity of these rcu related kthreads is not set to housekeeping cpumask, > even if called rcu_bind_gp_kthread() when the rcu related kthread starts. > > set_cpus_allowed_ptr() > ->__set_cpus_allowed_ptr() > ->__set_cpus_allowed_ptr_locked > { > bool kthread = p->flags & PF_KTHREAD; > .... > if (kthread || is_migration_disabled(p)) > cpu_valid_mask = cpu_online_mask; > .... > dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, ctx- > >new_mask); > if (dest_cpu >= nr_cpu_ids) { > ret = -EINVAL; > goto out; > } > .... > } > > Due to these rcu related kthreads be created before bringup other CPUS, so > when they running and set hosekeeping cpus affinity, found that only CPU0 is > online at this time and CPU0 is set to no_hz_full CPU, the ctx->new_mask not > contain CPU0 and this will cause dest_cpu in the above code snippet to be an > illegal value and return directly, ultimately, these rcu related kthreads failed to > bind housekeeping CPUS. > >s/so// >s/hosekeeping/housekeeping/ >... > > This commit therefore rebind these rcu related kthreads to housekeeping CPUs > >s/rebind/rebinds/ > >Could you tweak all the commit messages to fix the typos and the grammar errors?
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index baf7ec178155..8b3530cca291 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -544,9 +544,8 @@ static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot) static int __noreturn rcu_tasks_kthread(void *arg) { struct rcu_tasks *rtp = arg; + bool rcu_setaffinity_setup = false; - /* Run on housekeeping CPUs by default. Sysadm can move if desired. */ - housekeeping_affine(current, HK_TYPE_RCU); WRITE_ONCE(rtp->kthread_ptr, current); // Let GPs start! /* @@ -556,6 +555,10 @@ static int __noreturn rcu_tasks_kthread(void *arg) * This loop is terminated by the system going down. ;-) */ for (;;) { + if (!rcu_setaffinity_setup && rcu_inkernel_boot_has_ended()) { + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_RCU)); + rcu_setaffinity_setup = true; + } // Wait for one grace period and invoke any callbacks // that are ready. rcu_tasks_one_gp(rtp, false); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ee27a03d7576..0ac47a773e13 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1781,8 +1781,13 @@ static noinline void rcu_gp_cleanup(void) */ static int __noreturn rcu_gp_kthread(void *unused) { - rcu_bind_gp_kthread(); + bool rcu_setaffinity_setup = false; + for (;;) { + if (!rcu_setaffinity_setup && rcu_inkernel_boot_has_ended()) { + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_RCU)); + rcu_setaffinity_setup = true; + } /* Handle grace-period start. */ for (;;) { diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 192536916f9a..391e3fae4ff5 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -495,7 +495,6 @@ do { \ #define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags) #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ -static void rcu_bind_gp_kthread(void); static bool rcu_nohz_full_cpu(void); /* Forward declarations for tree_stall.h */ diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index f2280616f9d5..254d0f631d57 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -894,8 +894,14 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) static int rcu_nocb_gp_kthread(void *arg) { struct rcu_data *rdp = arg; + bool rcu_setaffinity_setup = false; for (;;) { + if (!rcu_setaffinity_setup && rcu_inkernel_boot_has_ended()) { + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_RCU)); + rcu_setaffinity_setup = true; + } + WRITE_ONCE(rdp->nocb_gp_loops, rdp->nocb_gp_loops + 1); nocb_gp_wait(rdp); cond_resched_tasks_rcu_qs(); @@ -1002,10 +1008,15 @@ static void nocb_cb_wait(struct rcu_data *rdp) static int rcu_nocb_cb_kthread(void *arg) { struct rcu_data *rdp = arg; - + bool rcu_setaffinity_setup = false; // Each pass through this loop does one callback batch, and, // if there are no more ready callbacks, waits for them. for (;;) { + if (!rcu_setaffinity_setup && rcu_inkernel_boot_has_ended()) { + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_RCU)); + rcu_setaffinity_setup = true; + } + nocb_cb_wait(rdp); cond_resched_tasks_rcu_qs(); } diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 7b0fe741a088..fdde71ebb83e 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1294,12 +1294,3 @@ static bool rcu_nohz_full_cpu(void) return false; } -/* - * Bind the RCU grace-period kthreads to the housekeeping CPU. - */ -static void rcu_bind_gp_kthread(void) -{ - if (!tick_nohz_full_enabled()) - return; - housekeeping_affine(current, HK_TYPE_RCU); -}
For kernels built with CONFIG_NO_HZ_FULL=y and CONFIG_RCU_NOCB_CPU=y, run the following tests: runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" bootparams="console=ttyS0 isolcpus=0,1 nohz_full=0,1 rcu_nocbs=0,1" root@qemux86-64:~# ps -ef | grep "rcu_preempt" | grep -v grep | awk '{print $2}' 15 root@qemux86-64:~# ps -ef | grep "rcuop/0" | grep -v grep | awk '{print $2}' 17 root@qemux86-64:~# taskset -p 15 pid 15's current affinity mask: 1 root@qemux86-64:~# taskset -p 17 pid 17's current affinity mask: 1 The affinity of these rcu related kthreads is not set to housekeeping cpumask, even if called rcu_bind_gp_kthread() when the rcu related kthread starts. set_cpus_allowed_ptr() ->__set_cpus_allowed_ptr() ->__set_cpus_allowed_ptr_locked { bool kthread = p->flags & PF_KTHREAD; .... if (kthread || is_migration_disabled(p)) cpu_valid_mask = cpu_online_mask; .... dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, ctx->new_mask); if (dest_cpu >= nr_cpu_ids) { ret = -EINVAL; goto out; } .... } Due to these rcu related kthreads be created before bringup other CPUS, so when they running and set hosekeeping cpus affinity, found that only CPU0 is online at this time and CPU0 is set to no_hz_full CPU, the ctx->new_mask not contain CPU0 and this will cause dest_cpu in the above code snippet to be an illegal value and return directly, ultimately, these rcu related kthreads failed to bind housekeeping CPUS. This commit therefore rebind these rcu related kthreads to housekeeping CPUs after the kernel boot sequence ends, at this point all CPUs are online. Signed-off-by: Zqiang <qiang1.zhang@intel.com> --- kernel/rcu/tasks.h | 7 +++++-- kernel/rcu/tree.c | 7 ++++++- kernel/rcu/tree.h | 1 - kernel/rcu/tree_nocb.h | 13 ++++++++++++- kernel/rcu/tree_plugin.h | 9 --------- 5 files changed, 23 insertions(+), 14 deletions(-)