diff mbox series

ARM64: fixed dump_backtrace() when task running on another cpu

Message ID 1586425106-7369-1-git-send-email-wangqing@vivo.com (mailing list archive)
State New, archived
Headers show
Series ARM64: fixed dump_backtrace() when task running on another cpu | expand

Commit Message

王擎 April 9, 2020, 9:38 a.m. UTC
We cannot get FP and PC when the task is running on another CPU,
task->thread.cpu_context is the last time the task was switched out,
it's better to give a reminder than to provide wrong information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
---
 arch/arm64/kernel/traps.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Mark Rutland April 9, 2020, 10:40 a.m. UTC | #1
Hi,

On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
> We cannot get FP and PC when the task is running on another CPU,
> task->thread.cpu_context is the last time the task was switched out,
> it's better to give a reminder than to provide wrong information.
> 
> Signed-off-by: Wang Qing <wangqing@vivo.com>

Are you seeing this happen anywhere in particular today?

> ---
>  arch/arm64/kernel/traps.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index cf402be..c04e3e8 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>  		start_backtrace(&frame,
>  				(unsigned long)__builtin_frame_address(0),
>  				(unsigned long)dump_backtrace);
> +	} else if (tsk->on_cpu) {
> +		/*
> +		 * The task is running in another cpu, so the call stack
> +		 * is changing and we cannot get it.
> +		 */
> +		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
> +			tsk->comm, tsk->cpu);

I believe that we can race with a concurrent write to tsk->cpu in both
cases above. We could use READ_ONCE() to get a snapshot, but we can
still race and miss cases where the task was runnning as we backtrace
it.

Thanks,
Mark.

> +		return;
>  	} else {
>  		/*
>  		 * task blocked in __switch_to
> -- 
> 2.7.4
>
王擎 April 13, 2020, 9:23 a.m. UTC | #2
>Hi,
>
>On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
>> We cannot get FP and PC when the task is running on another CPU,
>> task->thread.cpu_context is the last time the task was switched out,
>> it's better to give a reminder than to provide wrong information.
>> 
>> Signed-off-by: Wang Qing <wangqing@vivo.com>
>
>Are you seeing this happen anywhere in particular today?

This problem is not so obvious, because it will not cause any exceptions
but will show "old" stack always ending with switch_to, I finally confirmed
the problem through debugging.

For example:Task blocked in spinlock/interrupt/busy loop, I want to print
the backtrace when detected(like PSI in Android), the printing is wrong(old).

>
>> ---
>>  arch/arm64/kernel/traps.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index cf402be..c04e3e8 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>>  		start_backtrace(&frame,
>>  				(unsigned long)__builtin_frame_address(0),
>>  				(unsigned long)dump_backtrace);
>> +	} else if (tsk->on_cpu) {
>> +		/*
>> +		 * The task is running in another cpu, so the call stack
>> +		 * is changing and we cannot get it.
>> +		 */
>> +		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
>> +			tsk->comm, tsk->cpu);
>
>I believe that we can race with a concurrent write to tsk->cpu in both
>cases above. We could use READ_ONCE() to get a snapshot, but we can
>still race and miss cases where the task was runnning as we backtrace
>it.
>
>Thanks,
>Mark.

I will use task_cpu(tsk) instead of tsk->cpu, and add task_running_oncpu() in
include/linux/sched.h instead of tsk->on_cpu, but as you said, by this patch,
we can still race and miss cases where the task was runnning as we backtrace.

But from the user's perspective, printing wrong backtrace is confused when
we call this function while task already running. However, it's reasonable to
print the last backtrace when task enter running during the function is called.

Thanks,
Wang Qing.

>
>> +		return;
>>  	} else {
>>  		/*
>>  		 * task blocked in __switch_to
>> -- 
>> 2.7.4
>>
Mark Rutland April 14, 2020, 12:54 p.m. UTC | #3
On Mon, Apr 13, 2020 at 05:23:08PM +0800, Wang Qing wrote:
> >Hi,
> >
> >On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
> >> We cannot get FP and PC when the task is running on another CPU,
> >> task->thread.cpu_context is the last time the task was switched out,
> >> it's better to give a reminder than to provide wrong information.
> >> 
> >> Signed-off-by: Wang Qing <wangqing@vivo.com>
> >
> >Are you seeing this happen anywhere in particular today?
> 
> This problem is not so obvious, because it will not cause any exceptions
> but will show "old" stack always ending with switch_to, I finally confirmed
> the problem through debugging.
> 
> For example:Task blocked in spinlock/interrupt/busy loop, I want to print
> the backtrace when detected(like PSI in Android), the printing is wrong(old).

Sure, but *where* are you seeing this?

Is this a problem in mainline, or only in code that you add?
 
> >
> >> ---
> >>  arch/arm64/kernel/traps.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >> 
> >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> >> index cf402be..c04e3e8 100644
> >> --- a/arch/arm64/kernel/traps.c
> >> +++ b/arch/arm64/kernel/traps.c
> >> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> >>  		start_backtrace(&frame,
> >>  				(unsigned long)__builtin_frame_address(0),
> >>  				(unsigned long)dump_backtrace);
> >> +	} else if (tsk->on_cpu) {
> >> +		/*
> >> +		 * The task is running in another cpu, so the call stack
> >> +		 * is changing and we cannot get it.
> >> +		 */
> >> +		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
> >> +			tsk->comm, tsk->cpu);
> >
> >I believe that we can race with a concurrent write to tsk->cpu in both
> >cases above. We could use READ_ONCE() to get a snapshot, but we can
> >still race and miss cases where the task was runnning as we backtrace
> >it.
> >
> >Thanks,
> >Mark.
> 
> I will use task_cpu(tsk) instead of tsk->cpu, and add task_running_oncpu() in
> include/linux/sched.h instead of tsk->on_cpu, but as you said, by this patch,
> we can still race and miss cases where the task was runnning as we backtrace.
> 
> But from the user's perspective, printing wrong backtrace is confused when
> we call this function while task already running. However, it's reasonable to
> print the last backtrace when task enter running during the function is called.

The contract of dump_backtrace() is that it's only called for either:

* the current task
* a task that is blocked in switch_to()

... so I don't think that the current behaviour is wrong as such, though
if it's easy to catch misuse I agree it would be nice to do so for
robustness.

However, we can alwatys race here, so detecting this case is best-effort
and not entirely reliable, and I don't want to leave the impression that
it is. I'm also not a fan of pr_warn() here, since this is indicative of
a kernel bug rather than a user/system issue, and can spam the console
due to a lack of ratelimiting.

So, based on whether this is a problem in existing code, I'd like that
code fix first, and then we can consider adding a WARN_ONCE() or
something ratelimited. Ideally something that'll give us the bactrace or
the code that's calling dump_backtrace() erroneously.

Thanks,
Mark.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cf402be..c04e3e8 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -106,6 +106,14 @@  void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
 		start_backtrace(&frame,
 				(unsigned long)__builtin_frame_address(0),
 				(unsigned long)dump_backtrace);
+	} else if (tsk->on_cpu) {
+		/*
+		 * The task is running in another cpu, so the call stack
+		 * is changing and we cannot get it.
+		 */
+		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
+			tsk->comm, tsk->cpu);
+		return;
 	} else {
 		/*
 		 * task blocked in __switch_to