mbox series

[v2,00/10] arm64: stacktrace: improve unwind reporting

Message ID 20241017092538.1859841-1-mark.rutland@arm.com (mailing list archive)
Headers show
Series arm64: stacktrace: improve unwind reporting | expand

Message

Mark Rutland Oct. 17, 2024, 9:25 a.m. UTC
This series improves arm64's unwinder to explicitly identify exception
boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly
identifying the source of elements in the stacktrace. This is useful to
humans when reviewing a stacktrace, and serves as infrastructure that
can be used for RELIABLE_STACKTRACE in future.

The first 6 patches are preparatory work that are not intended to have
any functional impact, with patches 7 to 10 making the key changes.
Largely this involves teaching the unwinder to track metadata for each
unwind step, and modifying the way we manage pt_regs::stackframe so that
exception boundaries can be identifier explcitily.

With this series applied, the unwinder will report when unwind elements are not
simply the result of a frame pointer based unwind, e.g.

| Call trace:
|  show_stack+0x20/0x38 (CF)
|  dump_stack_lvl+0x60/0x80 (F)
|  dump_stack+0x18/0x28
|  nmi_cpu_backtrace+0xfc/0x140
|  nmi_trigger_cpumask_backtrace+0x1c8/0x200
|  arch_trigger_cpumask_backtrace+0x20/0x40
|  sysrq_handle_showallcpus+0x24/0x38 (F)
|  __handle_sysrq+0xa8/0x1b0 (F)
|  handle_sysrq+0x38/0x50 (F)
|  pl011_int+0x420/0x570 (F)
|  __handle_irq_event_percpu+0x60/0x220 (F)
|  handle_irq_event+0x54/0xc0 (F)
|  handle_fasteoi_irq+0xa8/0x1d0 (F)
|  generic_handle_domain_irq+0x34/0x58 (F)
|  gic_handle_irq+0x54/0x140 (FK)
|  call_on_irq_stack+0x24/0x58 (F)
|  do_interrupt_handler+0x88/0xa0
|  el1_interrupt+0x34/0x68 (F)
|  el1h_64_irq_handler+0x18/0x28
|  el1h_64_irq+0x6c/0x70
|  default_idle_call+0x34/0x180 (P)
|  default_idle_call+0x28/0x180 (L)
|  do_idle+0x204/0x268
|  cpu_startup_entry+0x3c/0x50 (F)
|  rest_init+0xe4/0xf0
|  start_kernel+0x738/0x740
|  __primary_switched+0x88/0x98

... where:

* "C" indicates that the first element of the trace was the caller of an unwind
  function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc).

* "F" indicates that the element was recovered from fgraph (and
  would otherwise have been reported as return_to_handler).

* "K" indicates that the element was recovered from kretprobes (and
  would otherwise have been reported as __kretprobe_trampoline).

* "P" indicates that the element was recovered from pt_regs::pc, and therefore
  this is the first element after an exception boundary.

* "L" indidates that the element was recovered from pt_regs::lr, and therefore
  this may or may not be reliable depending on whether the LR was live at the
  moment the exception was taken.

I've pushed the series to my arm64/stacktrace/metadata branch on kernel.org,
based on v6.12-rc2:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/metadata
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/stacktrace/metadata

Since v1 [1]:
* Fix typos in commit messages
* Apply Reviewed-by tags

[1] https://lore.kernel.org/linux-arm-kernel/20241010101510.1487477-1-mark.rutland@arm.com/

Mark.

Mark Rutland (10):
  arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
  arm64: pt_regs: remove stale big-endian layout
  arm64: pt_regs: rename "pmr_save" -> "pmr"
  arm64: pt_regs: swap 'unused' and 'pmr' fields
  arm64: use a common struct frame_record
  arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
  arm64: stacktrace: report source of unwind data
  arm64: stacktrace: report recovered PCs
  arm64: stacktrace: split unwind_consume_stack()
  arm64: stacktrace: unwind exception boundaries

 arch/arm64/include/asm/daifflags.h         |   2 +-
 arch/arm64/include/asm/processor.h         |   2 +-
 arch/arm64/include/asm/ptrace.h            |  22 ++-
 arch/arm64/include/asm/stacktrace/common.h |  74 +++++----
 arch/arm64/include/asm/stacktrace/frame.h  |  48 ++++++
 arch/arm64/kernel/asm-offsets.c            |   3 +-
 arch/arm64/kernel/entry.S                  |  16 +-
 arch/arm64/kernel/head.S                   |   3 +
 arch/arm64/kernel/process.c                |   5 +-
 arch/arm64/kernel/signal.c                 |   5 -
 arch/arm64/kernel/stacktrace.c             | 176 +++++++++++++++++++--
 11 files changed, 287 insertions(+), 69 deletions(-)
 create mode 100644 arch/arm64/include/asm/stacktrace/frame.h

Comments

Catalin Marinas Oct. 17, 2024, 6 p.m. UTC | #1
On Thu, 17 Oct 2024 10:25:28 +0100, Mark Rutland wrote:
> This series improves arm64's unwinder to explicitly identify exception
> boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly
> identifying the source of elements in the stacktrace. This is useful to
> humans when reviewing a stacktrace, and serves as infrastructure that
> can be used for RELIABLE_STACKTRACE in future.
> 
> The first 6 patches are preparatory work that are not intended to have
> any functional impact, with patches 7 to 10 making the key changes.
> Largely this involves teaching the unwinder to track metadata for each
> unwind step, and modifying the way we manage pt_regs::stackframe so that
> exception boundaries can be identifier explcitily.
> 
> [...]

Applied to arm64 (for-next/stacktrace), thanks!

[01/10] arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
        https://git.kernel.org/arm64/c/c87df9cb9a21
[02/10] arm64: pt_regs: remove stale big-endian layout
        https://git.kernel.org/arm64/c/2716d59bf483
[03/10] arm64: pt_regs: rename "pmr_save" -> "pmr"
        https://git.kernel.org/arm64/c/00d9597903d0
[04/10] arm64: pt_regs: swap 'unused' and 'pmr' fields
        https://git.kernel.org/arm64/c/1454363098a0
[05/10] arm64: use a common struct frame_record
        https://git.kernel.org/arm64/c/886c2b0ba820
[06/10] arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
        https://git.kernel.org/arm64/c/b7794795c93d
[07/10] arm64: stacktrace: report source of unwind data
        https://git.kernel.org/arm64/c/bdf8eafbf7f5
[08/10] arm64: stacktrace: report recovered PCs
        https://git.kernel.org/arm64/c/8094df1cf092
[09/10] arm64: stacktrace: split unwind_consume_stack()
        https://git.kernel.org/arm64/c/f05a4a42de90
[10/10] arm64: stacktrace: unwind exception boundaries
        https://git.kernel.org/arm64/c/c2c6b27b5aa1