Message ID | 20230705073020.2030-3-thunder.leizhen@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice | expand |
On 2023/7/5 15:30, Zhen Lei wrote: > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > kthread last ran is stalled, its stack does not need to be dumped again. Should I add Fixes? Fixes: 243027a3c805 ("rcu: For RCU grace-period kthread starvation, dump last CPU it ran on") > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > --- > kernel/rcu/tree_stall.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index dcfaa3d5db2cbc7..cc884cd49e026a3 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) > data_race(READ_ONCE(rcu_state.gp_state)), > gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); > if (gpk) { > + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > + > pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); > pr_err("RCU grace-period kthread stack dump:\n"); > sched_show_task(gpk); > if (cpu_is_offline(cpu)) { > pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > - } else { > + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { > pr_err("Stack dump where RCU GP kthread last ran:\n"); > dump_cpu_task(cpu); > } >
On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > kthread last ran is stalled, its stack does not need to be dumped again. > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> This one looks good. Please feel free to rebase it before 1/2 and repost. Thanx, Paul > --- > kernel/rcu/tree_stall.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index dcfaa3d5db2cbc7..cc884cd49e026a3 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) > data_race(READ_ONCE(rcu_state.gp_state)), > gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); > if (gpk) { > + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > + > pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); > pr_err("RCU grace-period kthread stack dump:\n"); > sched_show_task(gpk); > if (cpu_is_offline(cpu)) { > pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > - } else { > + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { > pr_err("Stack dump where RCU GP kthread last ran:\n"); > dump_cpu_task(cpu); > } > -- > 2.25.1 >
On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: > > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > > kthread last ran is stalled, its stack does not need to be dumped again. > > > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > > This one looks good. Please feel free to rebase it before 1/2 and repost. Just a small comment: I wondered if this would make it harder to identify which stack among the various CPU stacks corresponds to the one the GP kthread is running on. However, this line does print the CPU number of the thread, so it is perhaps not an issue: pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#x ->cpu=%d\n", Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Thanks.
On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote: > On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: > > > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > > > kthread last ran is stalled, its stack does not need to be dumped again. > > > > > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > > > > This one looks good. Please feel free to rebase it before 1/2 and repost. > > Just a small comment: > I wondered if this would make it harder to identify which stack among > the various CPU stacks corresponds to the one the GP kthread is > running on. However, this line does print the CPU number of the > thread, so it is perhaps not an issue: > > pr_err("%s kthread starved for %ld jiffies! g%ld f%#x > %s(%d) ->state=%#x ->cpu=%d\n", > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Thank you! Zhen Lei, please feel free to add Joel's Reviewed-by on your next posting. Thanx, Paul
On 2023/7/11 4:32, Paul E. McKenney wrote: > On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote: >> On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote: >>> >>> On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: >>>> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP >>>> kthread last ran is stalled, its stack does not need to be dumped again. >>>> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >>> >>> This one looks good. Please feel free to rebase it before 1/2 and repost. >> >> Just a small comment: >> I wondered if this would make it harder to identify which stack among >> the various CPU stacks corresponds to the one the GP kthread is >> running on. However, this line does print the CPU number of the >> thread, so it is perhaps not an issue: >> >> pr_err("%s kthread starved for %ld jiffies! g%ld f%#x >> %s(%d) ->state=%#x ->cpu=%d\n", >> >> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Thank you! Zhen Lei, please feel free to add Joel's Reviewed-by on your > next posting. OK > > Thanx, Paul > . >
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index dcfaa3d5db2cbc7..cc884cd49e026a3 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) data_race(READ_ONCE(rcu_state.gp_state)), gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); if (gpk) { + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); if (cpu_is_offline(cpu)) { pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); - } else { + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { pr_err("Stack dump where RCU GP kthread last ran:\n"); dump_cpu_task(cpu); }
The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP kthread last ran is stalled, its stack does not need to be dumped again. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> --- kernel/rcu/tree_stall.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)