diff mbox series

[2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice

Message ID 20230705073020.2030-3-thunder.leizhen@huawei.com (mailing list archive)
State Superseded
Headers show
Series rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice | expand

Commit Message

Leizhen (ThunderTown) July 5, 2023, 7:30 a.m. UTC
The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
kthread last ran is stalled, its stack does not need to be dumped again.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree_stall.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Leizhen (ThunderTown) July 5, 2023, 8:17 a.m. UTC | #1
On 2023/7/5 15:30, Zhen Lei wrote:
> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> kthread last ran is stalled, its stack does not need to be dumped again.

Should I add Fixes?

Fixes: 243027a3c805 ("rcu: For RCU grace-period kthread starvation, dump last CPU it ran on")

> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  kernel/rcu/tree_stall.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void)
>  		       data_race(READ_ONCE(rcu_state.gp_state)),
>  		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
>  		if (gpk) {
> +			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>  			pr_err("RCU grace-period kthread stack dump:\n");
>  			sched_show_task(gpk);
>  			if (cpu_is_offline(cpu)) {
>  				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> -			} else  {
> +			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
>  				pr_err("Stack dump where RCU GP kthread last ran:\n");
>  				dump_cpu_task(cpu);
>  			}
>
Paul E. McKenney July 10, 2023, 7:05 p.m. UTC | #2
On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> kthread last ran is stalled, its stack does not need to be dumped again.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

This one looks good.  Please feel free to rebase it before 1/2 and repost.

							Thanx, Paul

> ---
>  kernel/rcu/tree_stall.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void)
>  		       data_race(READ_ONCE(rcu_state.gp_state)),
>  		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
>  		if (gpk) {
> +			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
>  			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>  			pr_err("RCU grace-period kthread stack dump:\n");
>  			sched_show_task(gpk);
>  			if (cpu_is_offline(cpu)) {
>  				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
> -			} else  {
> +			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
>  				pr_err("Stack dump where RCU GP kthread last ran:\n");
>  				dump_cpu_task(cpu);
>  			}
> -- 
> 2.25.1
>
Joel Fernandes July 10, 2023, 7:55 p.m. UTC | #3
On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
> > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> > kthread last ran is stalled, its stack does not need to be dumped again.
> >
> > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>
> This one looks good.  Please feel free to rebase it before 1/2 and repost.

Just a small comment:
I wondered if this would make it harder to identify which stack among
the various CPU stacks corresponds to the one the GP kthread is
running on. However, this line does print the CPU number of the
thread, so it is perhaps not an issue:

                pr_err("%s kthread starved for %ld jiffies! g%ld f%#x
%s(%d) ->state=%#x ->cpu=%d\n",

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thanks.
Paul E. McKenney July 10, 2023, 8:32 p.m. UTC | #4
On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote:
> On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
> > > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
> > > kthread last ran is stalled, its stack does not need to be dumped again.
> > >
> > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >
> > This one looks good.  Please feel free to rebase it before 1/2 and repost.
> 
> Just a small comment:
> I wondered if this would make it harder to identify which stack among
> the various CPU stacks corresponds to the one the GP kthread is
> running on. However, this line does print the CPU number of the
> thread, so it is perhaps not an issue:
> 
>                 pr_err("%s kthread starved for %ld jiffies! g%ld f%#x
> %s(%d) ->state=%#x ->cpu=%d\n",
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thank you!  Zhen Lei, please feel free to add Joel's Reviewed-by on your
next posting.

							Thanx, Paul
Leizhen (ThunderTown) July 11, 2023, 3:26 a.m. UTC | #5
On 2023/7/11 4:32, Paul E. McKenney wrote:
> On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote:
>> On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote:
>>>> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP
>>>> kthread last ran is stalled, its stack does not need to be dumped again.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>
>>> This one looks good.  Please feel free to rebase it before 1/2 and repost.
>>
>> Just a small comment:
>> I wondered if this would make it harder to identify which stack among
>> the various CPU stacks corresponds to the one the GP kthread is
>> running on. However, this line does print the CPU number of the
>> thread, so it is perhaps not an issue:
>>
>>                 pr_err("%s kthread starved for %ld jiffies! g%ld f%#x
>> %s(%d) ->state=%#x ->cpu=%d\n",
>>
>> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Thank you!  Zhen Lei, please feel free to add Joel's Reviewed-by on your
> next posting.

OK

> 
> 							Thanx, Paul
> .
>
diff mbox series

Patch

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index dcfaa3d5db2cbc7..cc884cd49e026a3 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -534,12 +534,14 @@  static void rcu_check_gp_kthread_starvation(void)
 		       data_race(READ_ONCE(rcu_state.gp_state)),
 		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
 		if (gpk) {
+			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
 			sched_show_task(gpk);
 			if (cpu_is_offline(cpu)) {
 				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
-			} else  {
+			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
 				pr_err("Stack dump where RCU GP kthread last ran:\n");
 				dump_cpu_task(cpu);
 			}