diff mbox series

riscv: eliminate unreliable __builtin_frame_address(1)

Message ID 20220117154433.3124-1-changbin.du@gmail.com (mailing list archive)
State New, archived
Headers show
Series riscv: eliminate unreliable __builtin_frame_address(1) | expand

Commit Message

Changbin Du Jan. 17, 2022, 3:44 p.m. UTC
I tried different pieces of code which uses __builtin_frame_address(1)
(with both gcc version 7.5.0 and 10.3.0) to verify whether it works as
expected on riscv64. The result is negative.

What the compiler had generated is as below:
31                      fp = (unsigned long)__builtin_frame_address(1);
   0xffffffff80006024 <+200>:   ld      s1,0(s0)

It takes '0(s0)' as the address of frame 1 (caller), but the actual address
should be '-16(s0)'.

          |       ...       | <-+
          +-----------------+   |
          | return address  |   |
          | previous fp     |   |
          | saved registers |   |
          | local variables |   |
  $fp --> |       ...       |   |
          +-----------------+   |
          | return address  |   |
          | previous fp --------+
          | saved registers |
  $sp --> | local variables |
          +-----------------+

This leads the kernel can not dump the full stack trace on riscv.

[    7.222126][    T1] Call Trace:
[    7.222804][    T1] [<ffffffff80006058>] dump_backtrace+0x2c/0x3a

This problem is not exposed on most riscv builds just because the '0(s0)'
occasionally is the address frame 2 (caller's caller), if only ra and fp
are stored in frame 1 (caller).

          |       ...       | <-+
          +-----------------+   |
          | return address  |   |
  $fp --> | previous fp     |   |
          +-----------------+   |
          | return address  |   |
          | previous fp --------+
          | saved registers |
  $sp --> | local variables |
          +-----------------+

This could be a *bug* of gcc that should be fixed. But as noted in gcc
manual "Calling this function with a nonzero argument can have
unpredictable effects, including crashing the calling program.", let's
remove the '__builtin_frame_address(1)' in backtrace code.

With this fix now it can show full stack trace:
[   10.444838][    T1] Call Trace:
[   10.446199][    T1] [<ffffffff8000606c>] dump_backtrace+0x2c/0x3a
[   10.447711][    T1] [<ffffffff800060ac>] show_stack+0x32/0x3e
[   10.448710][    T1] [<ffffffff80a005c0>] dump_stack_lvl+0x58/0x7a
[   10.449941][    T1] [<ffffffff80a005f6>] dump_stack+0x14/0x1c
[   10.450929][    T1] [<ffffffff804c04ee>] ubsan_epilogue+0x10/0x5a
[   10.451869][    T1] [<ffffffff804c092e>] __ubsan_handle_load_invalid_value+0x6c/0x78
[   10.453049][    T1] [<ffffffff8018f834>] __pagevec_release+0x62/0x64
[   10.455476][    T1] [<ffffffff80190830>] truncate_inode_pages_range+0x132/0x5be
[   10.456798][    T1] [<ffffffff80190ce0>] truncate_inode_pages+0x24/0x30
[   10.457853][    T1] [<ffffffff8045bb04>] kill_bdev+0x32/0x3c
...

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 arch/riscv/kernel/stacktrace.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Andreas Schwab Jan. 17, 2022, 4:10 p.m. UTC | #1
On Jan 17 2022, Changbin Du wrote:

> I tried different pieces of code which uses __builtin_frame_address(1)
> (with both gcc version 7.5.0 and 10.3.0) to verify whether it works as
> expected on riscv64. The result is negative.
>
> What the compiler had generated is as below:
> 31                      fp = (unsigned long)__builtin_frame_address(1);
>    0xffffffff80006024 <+200>:   ld      s1,0(s0)
>
> It takes '0(s0)' as the address of frame 1 (caller), but the actual address
> should be '-16(s0)'.
>
>           |       ...       | <-+
>           +-----------------+   |
>           | return address  |   |
>           | previous fp     |   |
>           | saved registers |   |
>           | local variables |   |
>   $fp --> |       ...       |   |
>           +-----------------+   |
>           | return address  |   |
>           | previous fp --------+
>           | saved registers |
>   $sp --> | local variables |
>           +-----------------+
>
> This leads the kernel can not dump the full stack trace on riscv.
>
> [    7.222126][    T1] Call Trace:
> [    7.222804][    T1] [<ffffffff80006058>] dump_backtrace+0x2c/0x3a
>
> This problem is not exposed on most riscv builds just because the '0(s0)'
> occasionally is the address frame 2 (caller's caller), if only ra and fp
> are stored in frame 1 (caller).
>
>           |       ...       | <-+
>           +-----------------+   |
>           | return address  |   |
>   $fp --> | previous fp     |   |
>           +-----------------+   |
>           | return address  |   |
>           | previous fp --------+
>           | saved registers |
>   $sp --> | local variables |
>           +-----------------+
>
> This could be a *bug* of gcc that should be fixed.

Yes, it would be nice to get this fixed.  The riscv target does not
override DYNAMIC_CHAIN_ADDRESS, thus the default is used, which has the
noted effect.
Jessica Clarke Jan. 17, 2022, 5:33 p.m. UTC | #2
On 17 Jan 2022, at 15:44, Changbin Du <changbin.du@gmail.com> wrote:
> 
> I tried different pieces of code which uses __builtin_frame_address(1)
> (with both gcc version 7.5.0 and 10.3.0) to verify whether it works as
> expected on riscv64. The result is negative.
> 
> What the compiler had generated is as below:
> 31                      fp = (unsigned long)__builtin_frame_address(1);
>   0xffffffff80006024 <+200>:   ld      s1,0(s0)
> 
> It takes '0(s0)' as the address of frame 1 (caller), but the actual address
> should be '-16(s0)'.
> 
>          |       ...       | <-+
>          +-----------------+   |
>          | return address  |   |
>          | previous fp     |   |
>          | saved registers |   |
>          | local variables |   |
>  $fp --> |       ...       |   |
>          +-----------------+   |
>          | return address  |   |
>          | previous fp --------+
>          | saved registers |
>  $sp --> | local variables |
>          +-----------------+
> 
> This leads the kernel can not dump the full stack trace on riscv.
> 
> [    7.222126][    T1] Call Trace:
> [    7.222804][    T1] [<ffffffff80006058>] dump_backtrace+0x2c/0x3a
> 
> This problem is not exposed on most riscv builds just because the '0(s0)'
> occasionally is the address frame 2 (caller's caller), if only ra and fp
> are stored in frame 1 (caller).
> 
>          |       ...       | <-+
>          +-----------------+   |
>          | return address  |   |
>  $fp --> | previous fp     |   |
>          +-----------------+   |
>          | return address  |   |
>          | previous fp --------+
>          | saved registers |
>  $sp --> | local variables |
>          +-----------------+
> 
> This could be a *bug* of gcc that should be fixed. But as noted in gcc
> manual "Calling this function with a nonzero argument can have
> unpredictable effects, including crashing the calling program.", let's
> remove the '__builtin_frame_address(1)' in backtrace code.

Yes, this is a bug, that is always wrong. LLVM gets this right.

https://godbolt.org/z/MrhsoPPM6

Jess
Andreas Schwab Jan. 19, 2022, 10:58 a.m. UTC | #3
On Jan 17 2022, Jessica Clarke wrote:

> Yes, this is a bug, that is always wrong. LLVM gets this right.

Is that an ABI requirement?  In case of a leaf function, gcc saves the
caller's frame pointer in the first slot, not the second (it doesn't
save the return address).
Jessica Clarke Jan. 19, 2022, 7:05 p.m. UTC | #4
On 19 Jan 2022, at 10:58, Andreas Schwab <schwab@linux-m68k.org> wrote:
> 
> On Jan 17 2022, Jessica Clarke wrote:
> 
>> Yes, this is a bug, that is always wrong. LLVM gets this right.
> 
> Is that an ABI requirement?  In case of a leaf function, gcc saves the
> caller's frame pointer in the first slot, not the second (it doesn't
> save the return address).

Leaf functions by definition don’t have callees that are trying to read
their frame pointer so aren’t relevant here. The stack frame layout
isn’t specified by the ABI, only that the in-memory outgoing arguments
be at the bottom when calling other functions. However, GCC knows what
layout it uses, so it should be consistent and follow that layout for
walking back up frames. Especially for __builtin_frame_address(1), that
just pertains to the current function’s frame, which it clearly knows
without a doubt, so there’s no reason to get that wrong. Accessing
0(s0) is just straight up wrong, that’s accessing past the top of the
stack frame, which is never going to make any sense.

Jess
Andreas Schwab Jan. 19, 2022, 8:44 p.m. UTC | #5
On Jan 19 2022, Jessica Clarke wrote:

> Leaf functions by definition don’t have callees that are trying to read
> their frame pointer so aren’t relevant here.

??? __builtin_frame_address(1) is about the caller, not the callee.
Jessica Clarke Jan. 19, 2022, 8:48 p.m. UTC | #6
On 19 Jan 2022, at 20:44, Andreas Schwab <schwab@linux-m68k.org> wrote:
> 
> On Jan 19 2022, Jessica Clarke wrote:
> 
>> Leaf functions by definition don’t have callees that are trying to read
>> their frame pointer so aren’t relevant here.
> 
> ??? __builtin_frame_address(1) is about the caller, not the callee.

My point is that the only thing that can possibly read the incoming
frame pointer of a leaf function is the leaf function itself, and since
it knows where it’s putting it then there is no ABI issue, it just
remembers where it put it and loads it from there. The issue of whether
it’s part of the ABI or not only arises when you’re trying to read the
incoming frame pointer from a caller, which by definition is not a leaf
function.

Jess
Andreas Schwab Jan. 19, 2022, 9:07 p.m. UTC | #7
On Jan 19 2022, Jessica Clarke wrote:

> My point is that the only thing that can possibly read the incoming
> frame pointer of a leaf function is the leaf function itself, and since
> it knows where it’s putting it then there is no ABI issue, it just
> remembers where it put it and loads it from there.

llvm sidesteps that issue by always saving ra when creating a frame,
even in a leaf function, so it can use a constant offset.
Jessica Clarke Jan. 19, 2022, 9:27 p.m. UTC | #8
On 19 Jan 2022, at 21:07, Andreas Schwab <schwab@linux-m68k.org> wrote:
> 
> On Jan 19 2022, Jessica Clarke wrote:
> 
>> My point is that the only thing that can possibly read the incoming
>> frame pointer of a leaf function is the leaf function itself, and since
>> it knows where it’s putting it then there is no ABI issue, it just
>> remembers where it put it and loads it from there.
> 
> llvm sidesteps that issue by always saving ra when creating a frame,
> even in a leaf function, so it can use a constant offset.

What’s your point? That’s a correct implementation, just a simple one.
If it wanted to RISCVFrameLowering::spillCalleeSavedRegisters could
easily save that information, or recompute it when trying to load s0,
it just doesn’t because there’s no need. Also saving s0 alongside ra is
deliberate, it aids debugging when calling noreturn functions (e.g.
panic in an OS kernel). So yes, we avoid complexity in LLVM by not
doing things we don’t need to; there’s nothing wrong with that and it
doesn’t mean other toolchains that do need that to be correct should
just do something wrong.

Jess
Andreas Schwab Jan. 19, 2022, 11:53 p.m. UTC | #9
On Jan 19 2022, Jessica Clarke wrote:

> What’s your point?

LLVM doesn't have to deal with the extra complexity.

> doesn’t mean other toolchains that do need that to be correct should
> just do something wrong.

__builtin_frame_address with count > 0 is considered bad.  Nobody should
use it.

You don't have to be arrogant.
Palmer Dabbelt Jan. 20, 2022, 12:15 a.m. UTC | #10
On Wed, 19 Jan 2022 15:53:07 PST (-0800), schwab@linux-m68k.org wrote:
> On Jan 19 2022, Jessica Clarke wrote:
>
>> What’s your point?
>
> LLVM doesn't have to deal with the extra complexity.
>
>> doesn’t mean other toolchains that do need that to be correct should
>> just do something wrong.
>
> __builtin_frame_address with count > 0 is considered bad.  Nobody should
> use it.

The documentation is very clear about this.

I don't really see anything to argue about here: our code violates the 
spec and is producing results we don't like, though the spec allows for 
much worse.  We shouldn't have had that code in the first place, but it 
slipped through as these things sometimes do.  This is just a regular 
old bug that deserves to be fixed.  Just because one compiler produces 
answers we like doesn't mean it's valid code, that's the whole point of 
having a spec in the first place.

> You don't have to be arrogant.

This has been a persistent problem, it's really just not productive.  
We're still trying to dig out from the last two rounds of silliness, 
let's not have another one.

I don't see anything wrong with the patch in question, but these "stack 
trace without debug info" things are always tricky and thus warrant a 
proper look.  I'm in the middle of juggling some patches right now, I'll 
try to take a look but it's fairly far down the queue.

Always happy to have help looking these things over, let's try to keep 
things constructive, though.  We've already got enough work to do.
Palmer Dabbelt Feb. 4, 2022, 9:56 p.m. UTC | #11
On Mon, 17 Jan 2022 07:44:33 PST (-0800), changbin.du@gmail.com wrote:
> I tried different pieces of code which uses __builtin_frame_address(1)
> (with both gcc version 7.5.0 and 10.3.0) to verify whether it works as
> expected on riscv64. The result is negative.
>
> What the compiler had generated is as below:
> 31                      fp = (unsigned long)__builtin_frame_address(1);
>    0xffffffff80006024 <+200>:   ld      s1,0(s0)
>
> It takes '0(s0)' as the address of frame 1 (caller), but the actual address
> should be '-16(s0)'.
>
>           |       ...       | <-+
>           +-----------------+   |
>           | return address  |   |
>           | previous fp     |   |
>           | saved registers |   |
>           | local variables |   |
>   $fp --> |       ...       |   |
>           +-----------------+   |
>           | return address  |   |
>           | previous fp --------+
>           | saved registers |
>   $sp --> | local variables |
>           +-----------------+
>
> This leads the kernel can not dump the full stack trace on riscv.
>
> [    7.222126][    T1] Call Trace:
> [    7.222804][    T1] [<ffffffff80006058>] dump_backtrace+0x2c/0x3a
>
> This problem is not exposed on most riscv builds just because the '0(s0)'
> occasionally is the address frame 2 (caller's caller), if only ra and fp
> are stored in frame 1 (caller).
>
>           |       ...       | <-+
>           +-----------------+   |
>           | return address  |   |
>   $fp --> | previous fp     |   |
>           +-----------------+   |
>           | return address  |   |
>           | previous fp --------+
>           | saved registers |
>   $sp --> | local variables |
>           +-----------------+
>
> This could be a *bug* of gcc that should be fixed. But as noted in gcc
> manual "Calling this function with a nonzero argument can have
> unpredictable effects, including crashing the calling program.", let's
> remove the '__builtin_frame_address(1)' in backtrace code.
>
> With this fix now it can show full stack trace:
> [   10.444838][    T1] Call Trace:
> [   10.446199][    T1] [<ffffffff8000606c>] dump_backtrace+0x2c/0x3a
> [   10.447711][    T1] [<ffffffff800060ac>] show_stack+0x32/0x3e
> [   10.448710][    T1] [<ffffffff80a005c0>] dump_stack_lvl+0x58/0x7a
> [   10.449941][    T1] [<ffffffff80a005f6>] dump_stack+0x14/0x1c
> [   10.450929][    T1] [<ffffffff804c04ee>] ubsan_epilogue+0x10/0x5a
> [   10.451869][    T1] [<ffffffff804c092e>] __ubsan_handle_load_invalid_value+0x6c/0x78
> [   10.453049][    T1] [<ffffffff8018f834>] __pagevec_release+0x62/0x64
> [   10.455476][    T1] [<ffffffff80190830>] truncate_inode_pages_range+0x132/0x5be
> [   10.456798][    T1] [<ffffffff80190ce0>] truncate_inode_pages+0x24/0x30
> [   10.457853][    T1] [<ffffffff8045bb04>] kill_bdev+0x32/0x3c
> ...
>
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
> ---
>  arch/riscv/kernel/stacktrace.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
> index 201ee206fb57..14d2b53ec322 100644
> --- a/arch/riscv/kernel/stacktrace.c
> +++ b/arch/riscv/kernel/stacktrace.c
> @@ -22,15 +22,16 @@ void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
>  			     bool (*fn)(void *, unsigned long), void *arg)
>  {
>  	unsigned long fp, sp, pc;
> +	int level = 0;
>
>  	if (regs) {
>  		fp = frame_pointer(regs);
>  		sp = user_stack_pointer(regs);
>  		pc = instruction_pointer(regs);
>  	} else if (task == NULL || task == current) {
> -		fp = (unsigned long)__builtin_frame_address(1);
> -		sp = (unsigned long)__builtin_frame_address(0);
> -		pc = (unsigned long)__builtin_return_address(0);
> +		fp = (unsigned long)__builtin_frame_address(0);
> +		sp = sp_in_global;
> +		pc = (unsigned long)walk_stackframe;
>  	} else {
>  		/* task blocked in __switch_to */
>  		fp = task->thread.s[0];
> @@ -42,7 +43,7 @@ void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
>  		unsigned long low, high;
>  		struct stackframe *frame;
>
> -		if (unlikely(!__kernel_text_address(pc) || !fn(arg, pc)))
> +		if (unlikely(!__kernel_text_address(pc) || (level++ >= 1 && !fn(arg, pc))))
>  			break;
>
>  		/* Validate frame pointer */

Thanks, this is on fixes.
diff mbox series

Patch

diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index 201ee206fb57..14d2b53ec322 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -22,15 +22,16 @@  void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
 			     bool (*fn)(void *, unsigned long), void *arg)
 {
 	unsigned long fp, sp, pc;
+	int level = 0;
 
 	if (regs) {
 		fp = frame_pointer(regs);
 		sp = user_stack_pointer(regs);
 		pc = instruction_pointer(regs);
 	} else if (task == NULL || task == current) {
-		fp = (unsigned long)__builtin_frame_address(1);
-		sp = (unsigned long)__builtin_frame_address(0);
-		pc = (unsigned long)__builtin_return_address(0);
+		fp = (unsigned long)__builtin_frame_address(0);
+		sp = sp_in_global;
+		pc = (unsigned long)walk_stackframe;
 	} else {
 		/* task blocked in __switch_to */
 		fp = task->thread.s[0];
@@ -42,7 +43,7 @@  void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
 		unsigned long low, high;
 		struct stackframe *frame;
 
-		if (unlikely(!__kernel_text_address(pc) || !fn(arg, pc)))
+		if (unlikely(!__kernel_text_address(pc) || (level++ >= 1 && !fn(arg, pc))))
 			break;
 
 		/* Validate frame pointer */