diff mbox series

arm64: debug: Make 'btc' and similar work in kdb

Message ID 20190730221800.28326-1-dianders@chromium.org (mailing list archive)
State New, archived
Headers show
Series arm64: debug: Make 'btc' and similar work in kdb | expand

Commit Message

Doug Anderson July 30, 2019, 10:18 p.m. UTC
In kdb when you do 'btc' (back trace on CPU) it doesn't give you the
right info.  This can be seen by this:

 echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
 # wait 2 seconds
 <sysrq>g

Here's what I see now on rk3399-gru-kevin.  I see the stack crawl for
the CPU that handled the sysrq but everything else just shows me stuck
in __switch_to() which is bogus:

======

[0]kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0, 1-3(I), 4, 5(I)
Stack traceback for pid 0
0xffffff801101a9c0        0        0  1    0   R  0xffffff801101b3b0 *swapper/0
Call trace:
 dump_backtrace+0x0/0x138
 ...
 kgdb_compiled_brk_fn+0x34/0x44
 ...
 sysrq_handle_dbg+0x34/0x5c
Stack traceback for pid 0
0xffffffc0f175a040        0        0  1    1   I  0xffffffc0f175aa30  swapper/1
Call trace:
 __switch_to+0x1e4/0x240
 0xffffffc0f65616c0
Stack traceback for pid 0
0xffffffc0f175d040        0        0  1    2   I  0xffffffc0f175da30  swapper/2
Call trace:
 __switch_to+0x1e4/0x240
 0xffffffc0f65806c0
Stack traceback for pid 0
0xffffffc0f175b040        0        0  1    3   I  0xffffffc0f175ba30  swapper/3
Call trace:
 __switch_to+0x1e4/0x240
 0xffffffc0f659f6c0
Stack traceback for pid 1474
0xffffffc0dde8b040     1474      727  1    4   R  0xffffffc0dde8ba30  bash
Call trace:
 __switch_to+0x1e4/0x240
 __schedule+0x464/0x618
 0xffffffc0dde8b040
Stack traceback for pid 0
0xffffffc0f17b0040        0        0  1    5   I  0xffffffc0f17b0a30  swapper/5
Call trace:
 __switch_to+0x1e4/0x240
 0xffffffc0f65dd6c0

===

The problem is that 'btc' eventually boils down to
  show_stack(task_struct, NULL);

...and show_stack() doesn't work for "running" CPUs because their
registers haven't been stashed.

On x86 things might work better (I haven't tested) because kdb has a
special case for x86 in kdb_show_stack() where it passes the stack
pointer to show_stack().  This wouldn't work on arm64 where the stack
crawling function seems needs the "fp" and "pc", not the "sp" which is
presumably why arm64's show_stack() function totally ignores the "sp"
parameter.

NOTE: we _can_ get a good stack dump for all the cpus if we manually
switch each one to the kdb master and do a back trace.  AKA:
  cpu 4
  bt
...will give the expected trace.  That's because now arm64's
dump_backtrace will now see that "tsk == current" and go through a
different path.

In this patch I fix the problems by stashing the "pt_regs" into the
"cpu_context" when a CPU enters the debugger.  Now all the normal stack
crawling code will be able to find it.  This is possible because:
* When a task is "running" nobody else is using the "cpu_context"
* The task isn't really "running" (it's in the debugger) so there are
  actually some sane registers to save.

This patch works without any extra kgdb API changes by just
implementing the weak kgdb_call_nmi_hook().  I don't try to address
the existing caveat in kgdb_call_nmi_hook() around pt_regs, so I copy
the comment from the generic code.

After this patch the same test shows much more sane stack crawls.  The
idle tasks now show:

Stack traceback for pid 0
0xffffffc0f175b040        0        0  1    3   I  0xffffffc0f175ba30  swapper/3
Call trace:
 cpuidle_enter_state+0x284/0x428
 cpuidle_enter+0x38/0x4c
 do_idle+0x168/0x29c
 cpu_startup_entry+0x24/0x28
 secondary_start_kernel+0x140/0x14c

...and the locked task:

Stack traceback for pid 1603
0xffffffc0d98c7040     1603      724  1    4   R  0xffffffc0d98c7a30  bash
Call trace:
 lkdtm_SOFTLOCKUP+0x1c/0x24
 lkdtm_do_action+0x24/0x44
 direct_entry+0x130/0x178
 full_proxy_write+0x60/0xb4
 __vfs_write+0x54/0x18c
 vfs_write+0xcc/0x174
 ksys_write+0x7c/0xe4
 __arm64_sys_write+0x20/0x2c
 el0_svc_common+0x9c/0x14c
 el0_svc_compat_handler+0x28/0x34
 el0_svc_compat+0x8/0x10

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 arch/arm64/kernel/kgdb.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Comments

Will Deacon July 31, 2019, 12:57 p.m. UTC | #1
Hi Doug,

On Tue, Jul 30, 2019 at 03:18:00PM -0700, Douglas Anderson wrote:
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index 43119922341f..b666210fbc75 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -148,6 +148,45 @@ sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *task)
>  	gdb_regs[32] = cpu_context->pc;
>  }
>  
> +void kgdb_call_nmi_hook(void *ignored)
> +{
> +	struct pt_regs *regs;
> +
> +	/*
> +	 * NOTE: get_irq_regs() is supposed to get the registers from
> +	 * before the IPI interrupt happened and so is supposed to
> +	 * show where the processor was.  In some situations it's
> +	 * possible we might be called without an IPI, so it might be
> +	 * safer to figure out how to make kgdb_breakpoint() work
> +	 * properly here.
> +	 */
> +	regs = get_irq_regs();
> +
> +	/*
> +	 * Some commands (like 'btc') assume that they can find info about
> +	 * a task in the 'cpu_context'.  Unfortunately that's only valid
> +	 * for sleeping tasks.  ...but let's make it work anyway by just
> +	 * writing the registers to the right place.  This is safe because
> +	 * nobody else is using the 'cpu_context' for a running task.
> +	 */
> +	current->thread.cpu_context.x19 = regs->regs[19];
> +	current->thread.cpu_context.x20 = regs->regs[20];
> +	current->thread.cpu_context.x21 = regs->regs[21];
> +	current->thread.cpu_context.x22 = regs->regs[22];
> +	current->thread.cpu_context.x23 = regs->regs[23];
> +	current->thread.cpu_context.x24 = regs->regs[24];
> +	current->thread.cpu_context.x25 = regs->regs[25];
> +	current->thread.cpu_context.x26 = regs->regs[26];
> +	current->thread.cpu_context.x27 = regs->regs[27];
> +	current->thread.cpu_context.x28 = regs->regs[28];
> +	current->thread.cpu_context.fp = regs->regs[29];
> +
> +	current->thread.cpu_context.sp = regs->sp;
> +	current->thread.cpu_context.pc = regs->pc;
> +
> +	kgdb_nmicallback(raw_smp_processor_id(), regs);
> +}

This is really gross... :/

Can you IPI the other CPUs instead and have them backtrace locally, like we
do for things like magic sysrq (sysrq_handle_showallcpus())?

Will
Doug Anderson July 31, 2019, 6:41 p.m. UTC | #2
Hi,

On Wed, Jul 31, 2019 at 5:57 AM Will Deacon <will@kernel.org> wrote:
>
> Hi Doug,
>
> On Tue, Jul 30, 2019 at 03:18:00PM -0700, Douglas Anderson wrote:
> > diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> > index 43119922341f..b666210fbc75 100644
> > --- a/arch/arm64/kernel/kgdb.c
> > +++ b/arch/arm64/kernel/kgdb.c
> > @@ -148,6 +148,45 @@ sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *task)
> >       gdb_regs[32] = cpu_context->pc;
> >  }
> >
> > +void kgdb_call_nmi_hook(void *ignored)
> > +{
> > +     struct pt_regs *regs;
> > +
> > +     /*
> > +      * NOTE: get_irq_regs() is supposed to get the registers from
> > +      * before the IPI interrupt happened and so is supposed to
> > +      * show where the processor was.  In some situations it's
> > +      * possible we might be called without an IPI, so it might be
> > +      * safer to figure out how to make kgdb_breakpoint() work
> > +      * properly here.
> > +      */
> > +     regs = get_irq_regs();
> > +
> > +     /*
> > +      * Some commands (like 'btc') assume that they can find info about
> > +      * a task in the 'cpu_context'.  Unfortunately that's only valid
> > +      * for sleeping tasks.  ...but let's make it work anyway by just
> > +      * writing the registers to the right place.  This is safe because
> > +      * nobody else is using the 'cpu_context' for a running task.
> > +      */
> > +     current->thread.cpu_context.x19 = regs->regs[19];
> > +     current->thread.cpu_context.x20 = regs->regs[20];
> > +     current->thread.cpu_context.x21 = regs->regs[21];
> > +     current->thread.cpu_context.x22 = regs->regs[22];
> > +     current->thread.cpu_context.x23 = regs->regs[23];
> > +     current->thread.cpu_context.x24 = regs->regs[24];
> > +     current->thread.cpu_context.x25 = regs->regs[25];
> > +     current->thread.cpu_context.x26 = regs->regs[26];
> > +     current->thread.cpu_context.x27 = regs->regs[27];
> > +     current->thread.cpu_context.x28 = regs->regs[28];
> > +     current->thread.cpu_context.fp = regs->regs[29];
> > +
> > +     current->thread.cpu_context.sp = regs->sp;
> > +     current->thread.cpu_context.pc = regs->pc;
> > +
> > +     kgdb_nmicallback(raw_smp_processor_id(), regs);
> > +}
>
> This is really gross... :/

Well, sort of.  At first I definitely thought of it as a hack.  ...but
then I realized that it's actually not so terrible.  Although the
other processors (the ones that are not the kgdb master) are
technically "running" as far as Linux is concerned, you can also think
of them as "stopped" in the debugger.  It's convenient to think of
them the same way you'd think of sleeping tasks.

Said another way: normally for a "running" task you couldn't put
anything in the "cpu_context" because it'd be wrong the moment you put
it there.  ...but when a CPU is stopped in kgdb then there's actually
something quite sane to put there.

So with just a small shift in the concept of what "cpu_context" is for
then it becomes not so gross.


> Can you IPI the other CPUs instead and have them backtrace locally, like we
> do for things like magic sysrq (sysrq_handle_showallcpus())?

No, unfortunately.  All the other CPUs are in a tight loop (with
interrupts off) waiting to be released by the kgdb master.  Amusingly,
it's possible to simulate this.  You can run a sysrq from the kdb
prompt.  When I do "sr l" from kdb:

A) The CPU running the kgdb master provides a stack crawl but it's not
really what you want.  Presumably this doesn't matter (we wouldn't
want to send the IPI to the calling CPU), but it's interesting to look
at.  We end up in the fallback workqueue case in
sysrq_handle_showallcpus().  Then we will backtrace based on the regs
returned by "get_irq_regs()".  Thus instead of:

[0]kdb> bt
Stack traceback for pid 0
0xffffff801101a9c0        0        0  1    0   R  0xffffff801101b3b0 *swapper/0
Call trace:
 dump_backtrace+0x0/0x138
 show_stack+0x20/0x2c
 kdb_show_stack+0x60/0x84
 kdb_bt1+0xb8/0x100
 kdb_bt+0x24c/0x408
 kdb_parse+0x53c/0x664
 kdb_main_loop+0x7fc/0x888
 kdb_stub+0x2b0/0x3d0
 kgdb_cpu_enter+0x27c/0x5c4
 kgdb_handle_exception+0x198/0x1f4
 kgdb_compiled_brk_fn+0x34/0x44
 brk_handler+0x88/0xd0
 do_debug_exception+0xe0/0x128
 el1_dbg+0x18/0x8c
 kgdb_breakpoint+0x20/0x3c
 sysrq_handle_dbg+0x34/0x5c
 __handle_sysrq+0x14c/0x170
 handle_sysrq+0x38/0x44
 serial8250_handle_irq+0xe8/0xfc
 dw8250_handle_irq+0x94/0xd0
 serial8250_interrupt+0x48/0xa4
 __handle_irq_event_percpu+0x134/0x25c
 handle_irq_event_percpu+0x34/0x8c
 handle_irq_event+0x48/0x78
 handle_fasteoi_irq+0xd0/0x1a0
 __handle_domain_irq+0x84/0xc4
 gic_handle_irq+0x10c/0x180
 el1_irq+0xb8/0x180
 cpuidle_enter_state+0x284/0x428
 cpuidle_enter+0x38/0x4c
 do_idle+0x168/0x29c
 cpu_startup_entry+0x24/0x28
 rest_init+0xd4/0xe0
 arch_call_rest_init+0x10/0x18
 start_kernel+0x320/0x3a4

I get:

[0]kdb> sr l
sysrq: Show backtrace of all active CPUs
sysrq: CPU0:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc2+ #28
Hardware name: Google Kevin (DT)
pstate: 20000005 (nzCv daif -PAN -UAO)
pc : cpuidle_enter_state+0x284/0x428
lr : cpuidle_enter_state+0x274/0x428
sp : ffffff8011003e60
x29: ffffff8011003eb0 x28: ffffff8010f366b8
x27: ffffff8011010000 x26: 0000000000000001
x25: ffffff80110eb000 x24: 0000000000000000
x23: 00000024232e8f0a x22: 0000002420501a35
x21: 0000000000000002 x20: ffffffc0ee86e080
x19: ffffffc0f65426c0 x18: 0000000000000000
x17: 000000000000003e x16: 000000000000003f
x15: 0000000000000000 x14: ffffff801101a9c0
x13: 0000000000013320 x12: 0000000000000020
x11: 000000000624dd2f x10: 00000000ffffffff
x9 : 0000000100000001 x8 : 00000000000000c0
x7 : 00000032b5593519 x6 : 0000000000300000
x5 : 0000000000000000 x4 : 0000000000000101
x3 : 00000000ffffffff x2 : 0000000000000001
x1 : ffffffc0f6548d80 x0 : 0000000000000000
Call trace:
 cpuidle_enter_state+0x284/0x428
 cpuidle_enter+0x38/0x4c
 do_idle+0x168/0x29c
 cpu_startup_entry+0x24/0x28
 rest_init+0xd4/0xe0
 arch_call_rest_init+0x10/0x18
 start_kernel+0x320/0x3a4


B) All the other CPUs don't respond.  ...until you exit the debugger
and then they all print their stacks, a little too late.

---

The weird crawl for the master CPU made me think that maybe I could
use "show_regs()" to show the stacks of the other CPUs, but that
didn't work.  The arm64 stack crawling code really only works for a
sleeping task or for the current running task.

...this again gets back to the fact that we can really think of those
other CPUs stopped in the debugger as "sleeping".

=====

OK, so I think I managed to get something that maybe you're happy with:

https://lkml.kernel.org/r/20190731183732.178134-1-dianders@chromium.org

...I still think it's not such a hack to stash the state in the
"cpu_context" and I could imagine it might have other benefits when
debugging, but my new patch does have the advantage that it's more
platform agnostic.  ;-)  Let me know what you think...

-Doug
diff mbox series

Patch

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 43119922341f..b666210fbc75 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -148,6 +148,45 @@  sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *task)
 	gdb_regs[32] = cpu_context->pc;
 }
 
+void kgdb_call_nmi_hook(void *ignored)
+{
+	struct pt_regs *regs;
+
+	/*
+	 * NOTE: get_irq_regs() is supposed to get the registers from
+	 * before the IPI interrupt happened and so is supposed to
+	 * show where the processor was.  In some situations it's
+	 * possible we might be called without an IPI, so it might be
+	 * safer to figure out how to make kgdb_breakpoint() work
+	 * properly here.
+	 */
+	regs = get_irq_regs();
+
+	/*
+	 * Some commands (like 'btc') assume that they can find info about
+	 * a task in the 'cpu_context'.  Unfortunately that's only valid
+	 * for sleeping tasks.  ...but let's make it work anyway by just
+	 * writing the registers to the right place.  This is safe because
+	 * nobody else is using the 'cpu_context' for a running task.
+	 */
+	current->thread.cpu_context.x19 = regs->regs[19];
+	current->thread.cpu_context.x20 = regs->regs[20];
+	current->thread.cpu_context.x21 = regs->regs[21];
+	current->thread.cpu_context.x22 = regs->regs[22];
+	current->thread.cpu_context.x23 = regs->regs[23];
+	current->thread.cpu_context.x24 = regs->regs[24];
+	current->thread.cpu_context.x25 = regs->regs[25];
+	current->thread.cpu_context.x26 = regs->regs[26];
+	current->thread.cpu_context.x27 = regs->regs[27];
+	current->thread.cpu_context.x28 = regs->regs[28];
+	current->thread.cpu_context.fp = regs->regs[29];
+
+	current->thread.cpu_context.sp = regs->sp;
+	current->thread.cpu_context.pc = regs->pc;
+
+	kgdb_nmicallback(raw_smp_processor_id(), regs);
+}
+
 void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
 {
 	regs->pc = pc;