diff mbox series

[v2,2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice

Message ID 20230712151557.760-3-thunder.leizhen@huawei.com (mailing list archive)
State Accepted
Commit 5df5ceeff840fdc9bfae4bfb530b8b7e05d7728e
Headers show
Series rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice | expand

Commit Message

Leizhen (ThunderTown) July 12, 2023, 3:15 p.m. UTC
The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks().
If the CPU on where RCU GP kthread last ran is stalled, its stack does
not need to be dumped again. We can search the corresponding backtrace
based on the printed CPU ID.

For example:
[   87.328275] rcu: rcu_sched kthread starved for ... ->cpu=3  <--------|
... ...                                                                 |
[   89.385007] NMI backtrace for cpu 3                         <--------|
[   89.385179] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.0+ #22 <--|
[   89.385188] Hardware name: linux,dummy-virt (DT)
[   89.385196] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   89.385204] pc : arch_cpu_idle+0x40/0xc0
[   89.385211] lr : arch_cpu_idle+0x2c/0xc0
... ...
[   89.385566] Call trace:
[   89.385574]  arch_cpu_idle+0x40/0xc0
[   89.385581]  default_idle_call+0x100/0x450
[   89.385589]  cpuidle_idle_call+0x2f8/0x460
[   89.385596]  do_idle+0x1dc/0x3d0
[   89.385604]  cpu_startup_entry+0x5c/0xb0
[   89.385613]  secondary_start_kernel+0x35c/0x520

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_stall.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Paul E. McKenney July 17, 2023, 4:29 p.m. UTC | #1
On Wed, Jul 12, 2023 at 11:15:57PM +0800, Zhen Lei wrote:
> The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks().
> If the CPU on where RCU GP kthread last ran is stalled, its stack does
> not need to be dumped again. We can search the corresponding backtrace
> based on the printed CPU ID.
> 
> For example:
> [   87.328275] rcu: rcu_sched kthread starved for ... ->cpu=3  <--------|
> ... ...                                                                 |
> [   89.385007] NMI backtrace for cpu 3                         <--------|
> [   89.385179] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.0+ #22 <--|
> [   89.385188] Hardware name: linux,dummy-virt (DT)
> [   89.385196] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [   89.385204] pc : arch_cpu_idle+0x40/0xc0
> [   89.385211] lr : arch_cpu_idle+0x2c/0xc0
> ... ...
> [   89.385566] Call trace:
> [   89.385574]  arch_cpu_idle+0x40/0xc0
> [   89.385581]  default_idle_call+0x100/0x450
> [   89.385589]  cpuidle_idle_call+0x2f8/0x460
> [   89.385596]  do_idle+0x1dc/0x3d0
> [   89.385604]  cpu_startup_entry+0x5c/0xb0
> [   89.385613]  secondary_start_kernel+0x35c/0x520
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

I queued both patches, thank you both!

							Thanx, Paul

> ---
>  kernel/rcu/tree_stall.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void)
>  		       data_race(READ_ONCE(rcu_state.gp_state)),
>  		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
>  		if (gpk) {
> +			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>  			pr_err("RCU grace-period kthread stack dump:\n");
>  			sched_show_task(gpk);
>  			if (cpu_is_offline(cpu)) {
>  				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> -			} else  {
> +			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
>  				pr_err("Stack dump where RCU GP kthread last ran:\n");
>  				dump_cpu_task(cpu);
>  			}
> -- 
> 2.25.1
>
diff mbox series

Patch

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -534,12 +534,14 @@  static void rcu_check_gp_kthread_starvation(void)
 		       data_race(READ_ONCE(rcu_state.gp_state)),
 		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
 		if (gpk) {
+			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
 			sched_show_task(gpk);
 			if (cpu_is_offline(cpu)) {
 				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
-			} else  {
+			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
 				pr_err("Stack dump where RCU GP kthread last ran:\n");
 				dump_cpu_task(cpu);
 			}