diff mbox series

arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests

Message ID 20241118120204.3961548-1-mark.rutland@arm.com (mailing list archive)
State New
Headers show
Series arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests | expand

Commit Message

Mark Rutland Nov. 18, 2024, 12:02 p.m. UTC
The kprobes_test suite's test_stacktrace_on_nested_kretprobe() test
currently fails on arm64, e.g.

| KTAP version 1
| 1..1
|     KTAP version 1
|     # Subtest: kprobes_test
|     # module: test_kprobes
|     1..7
|     ok 1 test_kprobe
|     ok 2 test_kprobes
|     ok 3 test_kprobe_missed
|     ok 4 test_kretprobe
|     ok 5 test_kretprobes
|     ok 6 test_stacktrace_on_kretprobe
|     # test_stacktrace_on_nested_kretprobe: EXPECTATION FAILED at lib/test_kprobes.c:327
|     Expected stack_buf[i + 1] == target_return_address[1], but
|         stack_buf[i + 1] == -96519936577004 (0xffffa83733777214)
|         target_return_address[1] == -96519936577136 (0xffffa83733777190)
|     # test_stacktrace_on_nested_kretprobe: EXPECTATION FAILED at lib/test_kprobes.c:338
|     Expected stack_buf[1] == target_return_address[1], but
|         stack_buf[1] == -96519936577004 (0xffffa83733777214)
|         target_return_address[1] == -96519936577136 (0xffffa83733777190)
|     not ok 7 test_stacktrace_on_nested_kretprobe
| # kprobes_test: pass:6 fail:1 skip:0 total:7
| # Totals: pass:6 fail:1 skip:0 total:7
| not ok 1 kprobes_test

The test assumes that when a stacktrace straddles an exception boundary,
no necessary entries will be omitted and no extraneous entries will be
reported, and when unwinding from a kretprobed callee, the next entry in
the trace will be its immediate caller (whether kretprobed or not).

Recently the arm64 stacktrace code was changed to always report the LR
at an exception boundary, where we don't know whether the LR is live.
In the case of the kretprobe trampoline the LR is not live at the time
the stacktrace is performed, and so the entry in the trace for the LR is
extraneous. This can be seen if a call to show_stack() is added to
stacktrace_internal_return_handler():

| Call trace:
|  show_stack+0x18/0x30 (C)
|  stacktrace_internal_return_handler+0x130/0x43c
|  __kretprobe_trampoline_handler+0xa0/0x130
|  kretprobe_breakpoint_handler+0x50/0x70
|  call_break_hook+0x74/0x8c
|  brk_handler+0x1c/0x60
|  do_debug_exception+0x68/0x114
|  el1_dbg+0x70/0x94
|  el1h_64_sync_handler+0xc4/0xe4
|  el1h_64_sync+0x6c/0x70
|  kprobe_stacktrace_target+0x34/0x48 (P)
|  kprobe_stacktrace_target+0x34/0x48 (LK) <-------- extra entry here
|  kprobe_stacktrace_driver+0x24/0x40 (K)
|  test_stacktrace_on_nested_kretprobe+0x84/0x160
|  kunit_try_run_case+0x6c/0x160
|  kunit_generic_run_threadfn_adapter+0x28/0x4c
|  kthread+0x110/0x114
|  ret_from_fork+0x10/0x20

This breaks test_stacktrace_on_nested_kretprobe() because while the
caller (kprobe_stacktrace_driver()) appears in the trace, it doesn't
occur *immediately* after the first instance of callee
(kprobe_stacktrace_target()).

While this behaviour is unfortunate for the kretprobes tests, the
behaviour is desirable elsewhere (e.g. anywhere a human will read the
trace), and is otherwise not harmful.

For the moment, deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE on arm64
to disable the tests which depend on this behaviour. With
ARCH_CORRECT_STACKTRACE_ON_KRETPROBE deselected, the remaining tests
work as expected, e.g.

| KTAP version 1
| 1..1
|     KTAP version 1
|     # Subtest: kprobes_test
|     # module: test_kprobes
|     1..5
|     ok 1 test_kprobe
|     ok 2 test_kprobes
|     ok 3 test_kprobe_missed
|     ok 4 test_kretprobe
|     ok 5 test_kretprobes
| # kprobes_test: pass:5 fail:0 skip:0 total:5
| # Totals: pass:5 fail:0 skip:0 total:5
| ok 1 kprobes_test

In future we have several options to improve matters, e.g.

* Add metadata and update arm64's unwinder to skip the LR in this case.
  This is likely to happen as part of work for RELIABLE_STACKTRACE for
  other reasons, and might solve this case by coincidence.

* Modify the kretprobes tests to only require that the caller appears in
  the trace after the callee, rather than requiring that it is
  *immediately* after the callee. We might want separate
  strict/not-strict options for this.

* Use reliable stacktrace for these tests, so that architectures which
  cannot unwind across exception boundaries can explicitly handle this
  by returning an error.

Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Puranjay Mohan <puranjay12@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig | 1 -
 1 file changed, 1 deletion(-)
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3e29b44d2d7bd..234d926290731 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -14,7 +14,6 @@  config ARM64
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_BINFMT_ELF_EXTRA_PHDRS
 	select ARCH_BINFMT_ELF_STATE
-	select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
 	select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
 	select ARCH_ENABLE_MEMORY_HOTPLUG
 	select ARCH_ENABLE_MEMORY_HOTREMOVE