diff mbox

arm64: fix infinite stacktrace

Message ID alpine.LRH.2.02.1806141457200.4243@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mikulas Patocka June 14, 2018, 6:58 p.m. UTC
I've got this infinite stacktrace when debugging another problem:
[  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
[  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
[  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
[  908.800284] Task dump for CPU 1:
[  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
[  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
[  908.820285] Call trace:
[  908.824785]  __switch_to+0x68/0x90
[  908.837661]  0xfffffe00603afd90
[  908.844119]  0xfffffe00603afd90
[  908.850091]  0xfffffe00603afd90
[  908.854285]  0xfffffe00603afd90
[  908.863538]  0xfffffe00603afd90
[  908.865523]  0xfffffe00603afd90

The machine just locked up and kept on printing the same line over and
over again. This patch fixes it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

Comments

Mark Rutland June 15, 2018, 11:58 a.m. UTC | #1
Hi,

On Thu, Jun 14, 2018 at 02:58:21PM -0400, Mikulas Patocka wrote:
> I've got this infinite stacktrace when debugging another problem:
> [  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
> [  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
> [  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
> [  908.800284] Task dump for CPU 1:
> [  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
> [  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
> [  908.820285] Call trace:
> [  908.824785]  __switch_to+0x68/0x90
> [  908.837661]  0xfffffe00603afd90
> [  908.844119]  0xfffffe00603afd90
> [  908.850091]  0xfffffe00603afd90
> [  908.854285]  0xfffffe00603afd90
> [  908.863538]  0xfffffe00603afd90
> [  908.865523]  0xfffffe00603afd90
> 
> The machine just locked up and kept on printing the same line over and
> over again. This patch fixes it.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org

Given this can only occur when there's a corrupted stack (where a frame
record points to itself), I'm not sure this requires a cc stable.

> Index: linux-2.6/arch/arm64/kernel/stacktrace.c
> ===================================================================
> --- linux-2.6.orig/arch/arm64/kernel/stacktrace.c
> +++ linux-2.6/arch/arm64/kernel/stacktrace.c
> @@ -56,6 +56,9 @@ int notrace unwind_frame(struct task_str
>  	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>  	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
>  
> +	if (frame->fp <= fp)
> +		return -EINVAL;
> +

Dave Martin had a series [1] which addressed this along with a number of
other cases where stack traces might not terminate.

Dave, do you plan to respin that?

Thanks,
Mark.

[1] https://lkml.kernel.org/r/1524503223-17576-1-git-send-email-Dave.Martin@arm.com
Will Deacon June 27, 2018, 4:41 p.m. UTC | #2
Hi all,

On Fri, Jun 15, 2018 at 12:58:23PM +0100, Mark Rutland wrote:
> On Thu, Jun 14, 2018 at 02:58:21PM -0400, Mikulas Patocka wrote:
> > I've got this infinite stacktrace when debugging another problem:
> > [  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
> > [  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
> > [  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
> > [  908.800284] Task dump for CPU 1:
> > [  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
> > [  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
> > [  908.820285] Call trace:
> > [  908.824785]  __switch_to+0x68/0x90
> > [  908.837661]  0xfffffe00603afd90
> > [  908.844119]  0xfffffe00603afd90
> > [  908.850091]  0xfffffe00603afd90
> > [  908.854285]  0xfffffe00603afd90
> > [  908.863538]  0xfffffe00603afd90
> > [  908.865523]  0xfffffe00603afd90
> > 
> > The machine just locked up and kept on printing the same line over and
> > over again. This patch fixes it.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Cc: stable@vger.kernel.org
> 
> Given this can only occur when there's a corrupted stack (where a frame
> record points to itself), I'm not sure this requires a cc stable.
> 
> > Index: linux-2.6/arch/arm64/kernel/stacktrace.c
> > ===================================================================
> > --- linux-2.6.orig/arch/arm64/kernel/stacktrace.c
> > +++ linux-2.6/arch/arm64/kernel/stacktrace.c
> > @@ -56,6 +56,9 @@ int notrace unwind_frame(struct task_str
> >  	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >  	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> >  
> > +	if (frame->fp <= fp)
> > +		return -EINVAL;
> > +
> 
> Dave Martin had a series [1] which addressed this along with a number of
> other cases where stack traces might not terminate.
> 
> Dave, do you plan to respin that?

I'd be interested in an update on that; we clearly should be fixing this in
one way or another.

Mikulus -- would you be able to test and/or review it, please?

Will
Dave Martin June 28, 2018, 4:49 p.m. UTC | #3
On Wed, Jun 27, 2018 at 05:41:51PM +0100, Will Deacon wrote:
> Hi all,
> 
> On Fri, Jun 15, 2018 at 12:58:23PM +0100, Mark Rutland wrote:
> > On Thu, Jun 14, 2018 at 02:58:21PM -0400, Mikulas Patocka wrote:
> > > I've got this infinite stacktrace when debugging another problem:
> > > [  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
> > > [  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
> > > [  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
> > > [  908.800284] Task dump for CPU 1:
> > > [  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
> > > [  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
> > > [  908.820285] Call trace:
> > > [  908.824785]  __switch_to+0x68/0x90
> > > [  908.837661]  0xfffffe00603afd90
> > > [  908.844119]  0xfffffe00603afd90
> > > [  908.850091]  0xfffffe00603afd90
> > > [  908.854285]  0xfffffe00603afd90
> > > [  908.863538]  0xfffffe00603afd90
> > > [  908.865523]  0xfffffe00603afd90
> > > 
> > > The machine just locked up and kept on printing the same line over and
> > > over again. This patch fixes it.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > Cc: stable@vger.kernel.org
> > 
> > Given this can only occur when there's a corrupted stack (where a frame
> > record points to itself), I'm not sure this requires a cc stable.
> > 
> > > Index: linux-2.6/arch/arm64/kernel/stacktrace.c
> > > ===================================================================
> > > --- linux-2.6.orig/arch/arm64/kernel/stacktrace.c
> > > +++ linux-2.6/arch/arm64/kernel/stacktrace.c
> > > @@ -56,6 +56,9 @@ int notrace unwind_frame(struct task_str
> > >  	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> > >  	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > >  
> > > +	if (frame->fp <= fp)
> > > +		return -EINVAL;
> > > +
> > 
> > Dave Martin had a series [1] which addressed this along with a number of
> > other cases where stack traces might not terminate.
> > 
> > Dave, do you plan to respin that?
> 
> I'd be interested in an update on that; we clearly should be fixing this in
> one way or another.
> 
> Mikulus -- would you be able to test and/or review it, please?

My patch was arguably over-engineered, and broken in some way connected
with SDEI.  Unfortunately I've had too much other stuff to do...

I could take another look, but it may take time to get to it.

Alternatively, if someone wants to pick up my patch and take it forward,
I'm happy to comment.

Cheers
---Dave
diff mbox

Patch

Index: linux-2.6/arch/arm64/kernel/stacktrace.c
===================================================================
--- linux-2.6.orig/arch/arm64/kernel/stacktrace.c
+++ linux-2.6/arch/arm64/kernel/stacktrace.c
@@ -56,6 +56,9 @@  int notrace unwind_frame(struct task_str
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
 
+	if (frame->fp <= fp)
+		return -EINVAL;
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
 			(frame->pc == (unsigned long)return_to_handler)) {