diff mbox series

[v3,1/3] rcu/exp: Use NMI to get the backtrace of cpu_curr(other_cpu) first

Message ID 20220730102330.1255-2-thunder.leizhen@huawei.com (mailing list archive)
State Superseded
Headers show
Series rcu: Display registers of self-detected stall as far as possible | expand

Commit Message

Leizhen (ThunderTown) July 30, 2022, 10:23 a.m. UTC
The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
during its last switch-out. For the most part, it's out of date. So try
to use NMI to get the backtrace first, just like those functions in
"tree_stall.h" did. Such as rcu_dump_cpu_stacks().

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree_exp.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Paul E. McKenney Aug. 1, 2022, 11:14 p.m. UTC | #1
On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
> during its last switch-out. For the most part, it's out of date. So try
> to use NMI to get the backtrace first, just like those functions in
> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Much better, thank you!

> ---
>  kernel/rcu/tree_exp.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 0f70f62039a9090..21381697de23f0b 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
>  				mask = leaf_node_cpu_bit(rnp, cpu);
>  				if (!(READ_ONCE(rnp->expmask) & mask))
>  					continue;
> -				dump_cpu_task(cpu);
> +				if (!trigger_single_cpu_backtrace(cpu))
> +					dump_cpu_task(cpu);

But why not just leave this unchanged, rather than adding the call to
trigger_single_cpu_backtrace() in this patch and then removing it in
the next patch?

							Thanx, Paul

>  			}
>  		}
>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
> -- 
> 2.25.1
>
Leizhen (ThunderTown) Aug. 2, 2022, 2:06 a.m. UTC | #2
On 2022/8/2 7:14, Paul E. McKenney wrote:
> On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
>> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
>> during its last switch-out. For the most part, it's out of date. So try
>> to use NMI to get the backtrace first, just like those functions in
>> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> 
> Much better, thank you!
> 
>> ---
>>  kernel/rcu/tree_exp.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>> index 0f70f62039a9090..21381697de23f0b 100644
>> --- a/kernel/rcu/tree_exp.h
>> +++ b/kernel/rcu/tree_exp.h
>> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
>>  				mask = leaf_node_cpu_bit(rnp, cpu);
>>  				if (!(READ_ONCE(rnp->expmask) & mask))
>>  					continue;
>> -				dump_cpu_task(cpu);
>> +				if (!trigger_single_cpu_backtrace(cpu))
>> +					dump_cpu_task(cpu);
> 
> But why not just leave this unchanged, rather than adding the call to
> trigger_single_cpu_backtrace() in this patch and then removing it in
> the next patch?

To make the patch clear and easy to describe. Otherwise, I need to
give an additional description of it in the next patch, because I
searched all dump_cpu_task(). This seems to make the next patch
less simple.

Some of the patch sets I've seen have been done step by step like
this. But I can't find it now.

On the other hand, this patch is a small fix. Earlier versions may
only backport it, not the next cleanup patch.

> 
> 							Thanx, Paul
> 
>>  			}
>>  		}
>>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
>> -- 
>> 2.25.1
>>
> .
>
Paul E. McKenney Aug. 4, 2022, 12:07 a.m. UTC | #3
On Tue, Aug 02, 2022 at 10:06:00AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/8/2 7:14, Paul E. McKenney wrote:
> > On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
> >> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
> >> during its last switch-out. For the most part, it's out of date. So try
> >> to use NMI to get the backtrace first, just like those functions in
> >> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> > 
> > Much better, thank you!
> > 
> >> ---
> >>  kernel/rcu/tree_exp.h | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> >> index 0f70f62039a9090..21381697de23f0b 100644
> >> --- a/kernel/rcu/tree_exp.h
> >> +++ b/kernel/rcu/tree_exp.h
> >> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
> >>  				mask = leaf_node_cpu_bit(rnp, cpu);
> >>  				if (!(READ_ONCE(rnp->expmask) & mask))
> >>  					continue;
> >> -				dump_cpu_task(cpu);
> >> +				if (!trigger_single_cpu_backtrace(cpu))
> >> +					dump_cpu_task(cpu);
> > 
> > But why not just leave this unchanged, rather than adding the call to
> > trigger_single_cpu_backtrace() in this patch and then removing it in
> > the next patch?
> 
> To make the patch clear and easy to describe. Otherwise, I need to
> give an additional description of it in the next patch, because I
> searched all dump_cpu_task(). This seems to make the next patch
> less simple.
> 
> Some of the patch sets I've seen have been done step by step like
> this. But I can't find it now.
> 
> On the other hand, this patch is a small fix. Earlier versions may
> only backport it, not the next cleanup patch.

You do have the option of doing a Cc to stable to control the backporting,
if that is a potential issue for you.

On the commit log, just say that the one use case already avoided doing
the trigger_single_cpu_backtrace(), and thus did not need to be updated.

So please resend the series, but without the undo/redo.  There would
thus be two patches rather than three, but there are plenty of other
things that need fixing anyway.

							Thanx, Paul

> >>  			}
> >>  		}
> >>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
> >> -- 
> >> 2.25.1
> >>
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei
Leizhen (ThunderTown) Aug. 4, 2022, 1:34 a.m. UTC | #4
On 2022/8/4 8:07, Paul E. McKenney wrote:
> On Tue, Aug 02, 2022 at 10:06:00AM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/8/2 7:14, Paul E. McKenney wrote:
>>> On Sat, Jul 30, 2022 at 06:23:28PM +0800, Zhen Lei wrote:
>>>> The backtrace of cpu_curr(other_cpu) is unwinded based on the 'fp' saved
>>>> during its last switch-out. For the most part, it's out of date. So try
>>>> to use NMI to get the backtrace first, just like those functions in
>>>> "tree_stall.h" did. Such as rcu_dump_cpu_stacks().
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>
>>> Much better, thank you!
>>>
>>>> ---
>>>>  kernel/rcu/tree_exp.h | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>> index 0f70f62039a9090..21381697de23f0b 100644
>>>> --- a/kernel/rcu/tree_exp.h
>>>> +++ b/kernel/rcu/tree_exp.h
>>>> @@ -665,7 +665,8 @@ static void synchronize_rcu_expedited_wait(void)
>>>>  				mask = leaf_node_cpu_bit(rnp, cpu);
>>>>  				if (!(READ_ONCE(rnp->expmask) & mask))
>>>>  					continue;
>>>> -				dump_cpu_task(cpu);
>>>> +				if (!trigger_single_cpu_backtrace(cpu))
>>>> +					dump_cpu_task(cpu);
>>>
>>> But why not just leave this unchanged, rather than adding the call to
>>> trigger_single_cpu_backtrace() in this patch and then removing it in
>>> the next patch?
>>
>> To make the patch clear and easy to describe. Otherwise, I need to
>> give an additional description of it in the next patch, because I
>> searched all dump_cpu_task(). This seems to make the next patch
>> less simple.
>>
>> Some of the patch sets I've seen have been done step by step like
>> this. But I can't find it now.
>>
>> On the other hand, this patch is a small fix. Earlier versions may
>> only backport it, not the next cleanup patch.
> 
> You do have the option of doing a Cc to stable to control the backporting,
> if that is a potential issue for you.
> 
> On the commit log, just say that the one use case already avoided doing
> the trigger_single_cpu_backtrace(), and thus did not need to be updated.
> 
> So please resend the series, but without the undo/redo.  There would
> thus be two patches rather than three, but there are plenty of other
> things that need fixing anyway.

OK, thanks.

> 
> 							Thanx, Paul
> 
>>>>  			}
>>>>  		}
>>>>  		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
>>>> -- 
>>>> 2.25.1
>>>>
>>> .
>>>
>>
>> -- 
>> Regards,
>>   Zhen Lei
> .
>
diff mbox series

Patch

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 0f70f62039a9090..21381697de23f0b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -665,7 +665,8 @@  static void synchronize_rcu_expedited_wait(void)
 				mask = leaf_node_cpu_bit(rnp, cpu);
 				if (!(READ_ONCE(rnp->expmask) & mask))
 					continue;
-				dump_cpu_task(cpu);
+				if (!trigger_single_cpu_backtrace(cpu))
+					dump_cpu_task(cpu);
 			}
 		}
 		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;