diff mbox series

riscv: fgraph: Fix stack layout to match __arch_ftrace_regs argument of ftrace_return_to_handler

Message ID 20250311132243.2178271-1-pulehui@huaweicloud.com (mailing list archive)
State New
Headers show
Series riscv: fgraph: Fix stack layout to match __arch_ftrace_regs argument of ftrace_return_to_handler | expand

Checks

Context Check Description
bjorn/pre-ci_am success Success
bjorn/build-rv32-defconfig success build-rv32-defconfig
bjorn/build-rv64-clang-allmodconfig success build-rv64-clang-allmodconfig
bjorn/build-rv64-gcc-allmodconfig success build-rv64-gcc-allmodconfig
bjorn/build-rv64-nommu-k210-defconfig success build-rv64-nommu-k210-defconfig
bjorn/build-rv64-nommu-k210-virt success build-rv64-nommu-k210-virt
bjorn/checkpatch success checkpatch
bjorn/dtb-warn-rv64 success dtb-warn-rv64
bjorn/header-inline success header-inline
bjorn/kdoc success kdoc
bjorn/module-param success module-param
bjorn/verify-fixes success verify-fixes
bjorn/verify-signedoff success verify-signedoff

Commit Message

Pu Lehui March 11, 2025, 1:22 p.m. UTC
From: Pu Lehui <pulehui@huawei.com>

Naresh Kamboju reported a "Bad frame pointer" kernel warning while
running LTP trace ftrace_stress_test.sh in riscv. We can reproduce the
same issue with the following command:

```
$ cd /sys/kernel/debug/tracing
$ echo 'f:myprobe do_nanosleep%return args1=$retval' > dynamic_events
$ echo 1 > events/fprobes/enable
$ echo 1 > tracing_on
$ sleep 1
```

And we can get the following kernel warning:

[  127.692888] ------------[ cut here ]------------
[  127.693755] Bad frame pointer: expected ff2000000065be50, received ba34c141e9594000
[  127.693755]   from func do_nanosleep return to ffffffff800ccb16
[  127.698699] WARNING: CPU: 1 PID: 129 at kernel/trace/fgraph.c:755 ftrace_return_to_handler+0x1b2/0x1be
[  127.699894] Modules linked in:
[  127.700908] CPU: 1 UID: 0 PID: 129 Comm: sleep Not tainted 6.14.0-rc3-g0ab191c74642 #32
[  127.701453] Hardware name: riscv-virtio,qemu (DT)
[  127.701859] epc : ftrace_return_to_handler+0x1b2/0x1be
[  127.702032]  ra : ftrace_return_to_handler+0x1b2/0x1be
[  127.702151] epc : ffffffff8013b5e0 ra : ffffffff8013b5e0 sp : ff2000000065bd10
[  127.702221]  gp : ffffffff819c12f8 tp : ff60000080853100 t0 : 6e00000000000000
[  127.702284]  t1 : 0000000000000020 t2 : 6e7566206d6f7266 s0 : ff2000000065bd80
[  127.702346]  s1 : ff60000081262000 a0 : 000000000000007b a1 : ffffffff81894f20
[  127.702408]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
[  127.702470]  a5 : 0000000000000000 a6 : 0000000000000008 a7 : 0000000000000038
[  127.702530]  s2 : ba34c141e9594000 s3 : 0000000000000000 s4 : ff2000000065bdd0
[  127.702591]  s5 : 00007fff8adcf400 s6 : 000055556dc1d8c0 s7 : 0000000000000068
[  127.702651]  s8 : 00007fff8adf5d10 s9 : 000000000000006d s10: 0000000000000001
[  127.702710]  s11: 00005555737377c8 t3 : ffffffff819d899e t4 : ffffffff819d899e
[  127.702769]  t5 : ffffffff819d89a0 t6 : ff2000000065bb18
[  127.702826] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
[  127.703292] [<ffffffff8013b5e0>] ftrace_return_to_handler+0x1b2/0x1be
[  127.703760] [<ffffffff80017bce>] return_to_handler+0x16/0x26
[  127.704009] [<ffffffff80017bb8>] return_to_handler+0x0/0x26
[  127.704057] [<ffffffff800d3352>] common_nsleep+0x42/0x54
[  127.704117] [<ffffffff800d44a2>] __riscv_sys_clock_nanosleep+0xba/0x10a
[  127.704176] [<ffffffff80901c56>] do_trap_ecall_u+0x188/0x218
[  127.704295] [<ffffffff8090cc3e>] handle_exception+0x14a/0x156
[  127.705436] ---[ end trace 0000000000000000 ]---

The reason is that the stack layout for constructing argument for the
ftrace_return_to_handler in the return_to_handler does not match the
__arch_ftrace_regs structure of riscv, leading to unexpected results.

Fixes: a3ed4157b7d8 ("fgraph: Replace fgraph_ret_regs with ftrace_regs")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYvp_oAxeDFj88Tk2rfEZ7jtYKAKSwfYS66=57Db9TBdyA@mail.gmail.com
Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/kernel/mcount.S | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

Comments

Alexandre Ghiti March 11, 2025, 2:14 p.m. UTC | #1
Hi Pu,

Thanks for working on this, we were talking about this issue with Björn 
this morning!

On 11/03/2025 14:22, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
>
> Naresh Kamboju reported a "Bad frame pointer" kernel warning while
> running LTP trace ftrace_stress_test.sh in riscv. We can reproduce the
> same issue with the following command:
>
> ```
> $ cd /sys/kernel/debug/tracing
> $ echo 'f:myprobe do_nanosleep%return args1=$retval' > dynamic_events
> $ echo 1 > events/fprobes/enable
> $ echo 1 > tracing_on
> $ sleep 1
> ```
>
> And we can get the following kernel warning:
>
> [  127.692888] ------------[ cut here ]------------
> [  127.693755] Bad frame pointer: expected ff2000000065be50, received ba34c141e9594000
> [  127.693755]   from func do_nanosleep return to ffffffff800ccb16
> [  127.698699] WARNING: CPU: 1 PID: 129 at kernel/trace/fgraph.c:755 ftrace_return_to_handler+0x1b2/0x1be
> [  127.699894] Modules linked in:
> [  127.700908] CPU: 1 UID: 0 PID: 129 Comm: sleep Not tainted 6.14.0-rc3-g0ab191c74642 #32
> [  127.701453] Hardware name: riscv-virtio,qemu (DT)
> [  127.701859] epc : ftrace_return_to_handler+0x1b2/0x1be
> [  127.702032]  ra : ftrace_return_to_handler+0x1b2/0x1be
> [  127.702151] epc : ffffffff8013b5e0 ra : ffffffff8013b5e0 sp : ff2000000065bd10
> [  127.702221]  gp : ffffffff819c12f8 tp : ff60000080853100 t0 : 6e00000000000000
> [  127.702284]  t1 : 0000000000000020 t2 : 6e7566206d6f7266 s0 : ff2000000065bd80
> [  127.702346]  s1 : ff60000081262000 a0 : 000000000000007b a1 : ffffffff81894f20
> [  127.702408]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
> [  127.702470]  a5 : 0000000000000000 a6 : 0000000000000008 a7 : 0000000000000038
> [  127.702530]  s2 : ba34c141e9594000 s3 : 0000000000000000 s4 : ff2000000065bdd0
> [  127.702591]  s5 : 00007fff8adcf400 s6 : 000055556dc1d8c0 s7 : 0000000000000068
> [  127.702651]  s8 : 00007fff8adf5d10 s9 : 000000000000006d s10: 0000000000000001
> [  127.702710]  s11: 00005555737377c8 t3 : ffffffff819d899e t4 : ffffffff819d899e
> [  127.702769]  t5 : ffffffff819d89a0 t6 : ff2000000065bb18
> [  127.702826] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
> [  127.703292] [<ffffffff8013b5e0>] ftrace_return_to_handler+0x1b2/0x1be
> [  127.703760] [<ffffffff80017bce>] return_to_handler+0x16/0x26
> [  127.704009] [<ffffffff80017bb8>] return_to_handler+0x0/0x26
> [  127.704057] [<ffffffff800d3352>] common_nsleep+0x42/0x54
> [  127.704117] [<ffffffff800d44a2>] __riscv_sys_clock_nanosleep+0xba/0x10a
> [  127.704176] [<ffffffff80901c56>] do_trap_ecall_u+0x188/0x218
> [  127.704295] [<ffffffff8090cc3e>] handle_exception+0x14a/0x156
> [  127.705436] ---[ end trace 0000000000000000 ]---
>
> The reason is that the stack layout for constructing argument for the
> ftrace_return_to_handler in the return_to_handler does not match the
> __arch_ftrace_regs structure of riscv, leading to unexpected results.
>
> Fixes: a3ed4157b7d8 ("fgraph: Replace fgraph_ret_regs with ftrace_regs")
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Closes: https://lore.kernel.org/all/CA+G9fYvp_oAxeDFj88Tk2rfEZ7jtYKAKSwfYS66=57Db9TBdyA@mail.gmail.com
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>   arch/riscv/kernel/mcount.S | 24 +++++++++++-------------
>   1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/arch/riscv/kernel/mcount.S b/arch/riscv/kernel/mcount.S
> index 068168046e0e..da4a4000e57e 100644
> --- a/arch/riscv/kernel/mcount.S
> +++ b/arch/riscv/kernel/mcount.S
> @@ -12,8 +12,6 @@
>   #include <asm/asm-offsets.h>
>   #include <asm/ftrace.h>
>   
> -#define ABI_SIZE_ON_STACK	80
> -
>   	.text
>   
>   	.macro SAVE_ABI_STATE
> @@ -28,12 +26,12 @@
>   	 * register if a0 was not saved.
>   	 */
>   	.macro SAVE_RET_ABI_STATE
> -	addi	sp, sp, -ABI_SIZE_ON_STACK
> -	REG_S	ra, 1*SZREG(sp)
> -	REG_S	s0, 8*SZREG(sp)
> -	REG_S	a0, 10*SZREG(sp)
> -	REG_S	a1, 11*SZREG(sp)
> -	addi	s0, sp, ABI_SIZE_ON_STACK
> +	addi	sp, sp, -FREGS_SIZE_ON_STACK
> +	REG_S	ra, FREGS_RA(sp)
> +	REG_S	s0, FREGS_S0(sp)
> +	REG_S	a0, FREGS_A0(sp)
> +	REG_S	a1, FREGS_A1(sp)
> +	addi	s0, sp, FREGS_SIZE_ON_STACK
>   	.endm
>   
>   	.macro RESTORE_ABI_STATE
> @@ -43,11 +41,11 @@
>   	.endm
>   
>   	.macro RESTORE_RET_ABI_STATE
> -	REG_L	ra, 1*SZREG(sp)
> -	REG_L	s0, 8*SZREG(sp)
> -	REG_L	a0, 10*SZREG(sp)
> -	REG_L	a1, 11*SZREG(sp)
> -	addi	sp, sp, ABI_SIZE_ON_STACK
> +	REG_L	ra, FREGS_RA(sp)
> +	REG_L	s0, FREGS_S0(sp)
> +	REG_L	a0, FREGS_A0(sp)
> +	REG_L	a1, FREGS_A1(sp)
> +	addi	sp, sp, FREGS_SIZE_ON_STACK
>   	.endm
>   
>   SYM_TYPED_FUNC_START(ftrace_stub)


It looks good to me, so:

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex
Björn Töpel March 12, 2025, 7:35 p.m. UTC | #2
Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> Naresh Kamboju reported a "Bad frame pointer" kernel warning while
> running LTP trace ftrace_stress_test.sh in riscv. We can reproduce the
> same issue with the following command:
>
> ```
> $ cd /sys/kernel/debug/tracing
> $ echo 'f:myprobe do_nanosleep%return args1=$retval' > dynamic_events
> $ echo 1 > events/fprobes/enable
> $ echo 1 > tracing_on
> $ sleep 1
> ```
>
> And we can get the following kernel warning:
>
> [  127.692888] ------------[ cut here ]------------
> [  127.693755] Bad frame pointer: expected ff2000000065be50, received ba34c141e9594000
> [  127.693755]   from func do_nanosleep return to ffffffff800ccb16
> [  127.698699] WARNING: CPU: 1 PID: 129 at kernel/trace/fgraph.c:755 ftrace_return_to_handler+0x1b2/0x1be
> [  127.699894] Modules linked in:
> [  127.700908] CPU: 1 UID: 0 PID: 129 Comm: sleep Not tainted 6.14.0-rc3-g0ab191c74642 #32
> [  127.701453] Hardware name: riscv-virtio,qemu (DT)
> [  127.701859] epc : ftrace_return_to_handler+0x1b2/0x1be
> [  127.702032]  ra : ftrace_return_to_handler+0x1b2/0x1be
> [  127.702151] epc : ffffffff8013b5e0 ra : ffffffff8013b5e0 sp : ff2000000065bd10
> [  127.702221]  gp : ffffffff819c12f8 tp : ff60000080853100 t0 : 6e00000000000000
> [  127.702284]  t1 : 0000000000000020 t2 : 6e7566206d6f7266 s0 : ff2000000065bd80
> [  127.702346]  s1 : ff60000081262000 a0 : 000000000000007b a1 : ffffffff81894f20
> [  127.702408]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
> [  127.702470]  a5 : 0000000000000000 a6 : 0000000000000008 a7 : 0000000000000038
> [  127.702530]  s2 : ba34c141e9594000 s3 : 0000000000000000 s4 : ff2000000065bdd0
> [  127.702591]  s5 : 00007fff8adcf400 s6 : 000055556dc1d8c0 s7 : 0000000000000068
> [  127.702651]  s8 : 00007fff8adf5d10 s9 : 000000000000006d s10: 0000000000000001
> [  127.702710]  s11: 00005555737377c8 t3 : ffffffff819d899e t4 : ffffffff819d899e
> [  127.702769]  t5 : ffffffff819d89a0 t6 : ff2000000065bb18
> [  127.702826] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
> [  127.703292] [<ffffffff8013b5e0>] ftrace_return_to_handler+0x1b2/0x1be
> [  127.703760] [<ffffffff80017bce>] return_to_handler+0x16/0x26
> [  127.704009] [<ffffffff80017bb8>] return_to_handler+0x0/0x26
> [  127.704057] [<ffffffff800d3352>] common_nsleep+0x42/0x54
> [  127.704117] [<ffffffff800d44a2>] __riscv_sys_clock_nanosleep+0xba/0x10a
> [  127.704176] [<ffffffff80901c56>] do_trap_ecall_u+0x188/0x218
> [  127.704295] [<ffffffff8090cc3e>] handle_exception+0x14a/0x156
> [  127.705436] ---[ end trace 0000000000000000 ]---
>
> The reason is that the stack layout for constructing argument for the
> ftrace_return_to_handler in the return_to_handler does not match the
> __arch_ftrace_regs structure of riscv, leading to unexpected results.
>
> Fixes: a3ed4157b7d8 ("fgraph: Replace fgraph_ret_regs with ftrace_regs")
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Closes: https://lore.kernel.org/all/CA+G9fYvp_oAxeDFj88Tk2rfEZ7jtYKAKSwfYS66=57Db9TBdyA@mail.gmail.com
> Signed-off-by: Pu Lehui <pulehui@huawei.com>

With this patch, I get (from ftrace startup tests):
  Testing tracer function_graph: PASSED
instead of the splat above! Thanks!

Tested-by: Björn Töpel <bjorn@rivosinc.com>
diff mbox series

Patch

diff --git a/arch/riscv/kernel/mcount.S b/arch/riscv/kernel/mcount.S
index 068168046e0e..da4a4000e57e 100644
--- a/arch/riscv/kernel/mcount.S
+++ b/arch/riscv/kernel/mcount.S
@@ -12,8 +12,6 @@ 
 #include <asm/asm-offsets.h>
 #include <asm/ftrace.h>
 
-#define ABI_SIZE_ON_STACK	80
-
 	.text
 
 	.macro SAVE_ABI_STATE
@@ -28,12 +26,12 @@ 
 	 * register if a0 was not saved.
 	 */
 	.macro SAVE_RET_ABI_STATE
-	addi	sp, sp, -ABI_SIZE_ON_STACK
-	REG_S	ra, 1*SZREG(sp)
-	REG_S	s0, 8*SZREG(sp)
-	REG_S	a0, 10*SZREG(sp)
-	REG_S	a1, 11*SZREG(sp)
-	addi	s0, sp, ABI_SIZE_ON_STACK
+	addi	sp, sp, -FREGS_SIZE_ON_STACK
+	REG_S	ra, FREGS_RA(sp)
+	REG_S	s0, FREGS_S0(sp)
+	REG_S	a0, FREGS_A0(sp)
+	REG_S	a1, FREGS_A1(sp)
+	addi	s0, sp, FREGS_SIZE_ON_STACK
 	.endm
 
 	.macro RESTORE_ABI_STATE
@@ -43,11 +41,11 @@ 
 	.endm
 
 	.macro RESTORE_RET_ABI_STATE
-	REG_L	ra, 1*SZREG(sp)
-	REG_L	s0, 8*SZREG(sp)
-	REG_L	a0, 10*SZREG(sp)
-	REG_L	a1, 11*SZREG(sp)
-	addi	sp, sp, ABI_SIZE_ON_STACK
+	REG_L	ra, FREGS_RA(sp)
+	REG_L	s0, FREGS_S0(sp)
+	REG_L	a0, FREGS_A0(sp)
+	REG_L	a1, FREGS_A1(sp)
+	addi	sp, sp, FREGS_SIZE_ON_STACK
 	.endm
 
 SYM_TYPED_FUNC_START(ftrace_stub)