diff mbox series

[3/3] arm64: vdso: Fix CFI directives in sigreturn trampoline

Message ID 20200519121818.14511-4-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series arm64 sigreturn unwinding fixes | expand

Commit Message

Will Deacon May 19, 2020, 12:18 p.m. UTC
Daniel reports that the .cfi_startproc is misplaced for the sigreturn
trampoline, which causes LLVM's unwinder to misbehave:

  | I run into this with LLVM’s unwinder.
  | This combination was always broken.

This prompted Dave to realise that our CFI directives are contradictory,
as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter
unconditionally identifying the interrupted context as opposed to the
values in the sigcontext.

Rework the CFI directives so that we only use .cfi_signal_frame, and
include the "mysterious NOP" as part of the .cfi_{start,end}proc block.

Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
Reported-by: Dave Martin <dave.martin@arm.com>
Reported-by: Daniel Kiss <daniel.kiss@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/vdso/sigreturn.S | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

Dave Martin May 19, 2020, 1:09 p.m. UTC | #1
On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote:
> Daniel reports that the .cfi_startproc is misplaced for the sigreturn
> trampoline, which causes LLVM's unwinder to misbehave:
> 
>   | I run into this with LLVM’s unwinder.
>   | This combination was always broken.
> 
> This prompted Dave to realise that our CFI directives are contradictory,
> as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter
> unconditionally identifying the interrupted context as opposed to the
> values in the sigcontext.
> 
> Rework the CFI directives so that we only use .cfi_signal_frame, and
> include the "mysterious NOP" as part of the .cfi_{start,end}proc block.
> 
> Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
> Reported-by: Dave Martin <dave.martin@arm.com>
> Reported-by: Daniel Kiss <daniel.kiss@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/vdso/sigreturn.S | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
> index 7853fa9692f6..28b33f7d0604 100644
> --- a/arch/arm64/kernel/vdso/sigreturn.S
> +++ b/arch/arm64/kernel/vdso/sigreturn.S
> @@ -14,6 +14,9 @@
>  
>  	.text
>  
> +/* Ensure that the mysterious NOP can be associated with a function. */
> +	.cfi_startproc
> +	.cfi_signal_frame
>  /*
>   * This mysterious NOP is required for some unwinders that subtract one from
>   * the return address in order to identify the calling function.
> @@ -28,11 +31,6 @@
>   * is perfectly fine.
>   */
>  SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN)
> -	.cfi_startproc
> -	.cfi_signal_frame
> -	.cfi_def_cfa	x29, 0
> -	.cfi_offset	x29, 0 * 8
> -	.cfi_offset	x30, 1 * 8

Having thought about this again, I think it might be better to stick to
the original version.

If the signal handler is halfway through mungeing the sigcontext then
backtracing using sigcontext won't be reliable.

In any case, if something in the interrupted code caused the signal, the
backtrace of the old stack is likely to me more useful, and that's what
x29 will give us.  If there's no old stack because we blew it away,
that's too bad.


Plus, in the absence of any spec that says exactly what
.cfi_signal_frame means*, we probably don't want to rock the boat.

Cheers
---Dave


[*] assumption, but the arch ABI and Dwarf specs are unlikely to cover
this, and Linux doesn't go in for specs.
Will Deacon May 19, 2020, 1:39 p.m. UTC | #2
On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote:
> On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote:
> > Daniel reports that the .cfi_startproc is misplaced for the sigreturn
> > trampoline, which causes LLVM's unwinder to misbehave:
> > 
> >   | I run into this with LLVM’s unwinder.
> >   | This combination was always broken.
> > 
> > This prompted Dave to realise that our CFI directives are contradictory,
> > as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter
> > unconditionally identifying the interrupted context as opposed to the
> > values in the sigcontext.
> > 
> > Rework the CFI directives so that we only use .cfi_signal_frame, and
> > include the "mysterious NOP" as part of the .cfi_{start,end}proc block.
> > 
> > Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
> > Reported-by: Dave Martin <dave.martin@arm.com>
> > Reported-by: Daniel Kiss <daniel.kiss@arm.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/vdso/sigreturn.S | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
> > index 7853fa9692f6..28b33f7d0604 100644
> > --- a/arch/arm64/kernel/vdso/sigreturn.S
> > +++ b/arch/arm64/kernel/vdso/sigreturn.S
> > @@ -14,6 +14,9 @@
> >  
> >  	.text
> >  
> > +/* Ensure that the mysterious NOP can be associated with a function. */
> > +	.cfi_startproc
> > +	.cfi_signal_frame
> >  /*
> >   * This mysterious NOP is required for some unwinders that subtract one from
> >   * the return address in order to identify the calling function.
> > @@ -28,11 +31,6 @@
> >   * is perfectly fine.
> >   */
> >  SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN)
> > -	.cfi_startproc
> > -	.cfi_signal_frame
> > -	.cfi_def_cfa	x29, 0
> > -	.cfi_offset	x29, 0 * 8
> > -	.cfi_offset	x30, 1 * 8
> 
> Having thought about this again, I think it might be better to stick to
> the original version.
> 
> If the signal handler is halfway through mungeing the sigcontext then
> backtracing using sigcontext won't be reliable.

I suppose, but then what does .cfi_signal_frame do? I'll see if I can
find something that uses it. The frame record is still sitting on the
stack, so it does feel redundant to say both '.cfi_signal_frame' and
'.cfi_def_cfa' (and other architectures, e.g. riscv don't do this).

But I'm also happy to play it safe if I can stick a comment in here
saying what it does.

> Plus, in the absence of any spec that says exactly what
> .cfi_signal_frame means*, we probably don't want to rock the boat.

The gas docs say:

	"Mark current function as signal trampoline."

which is really informative.

Will
Dave Martin May 19, 2020, 1:55 p.m. UTC | #3
On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote:
> On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote:
> > On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote:
> > > Daniel reports that the .cfi_startproc is misplaced for the sigreturn
> > > trampoline, which causes LLVM's unwinder to misbehave:
> > > 
> > >   | I run into this with LLVM’s unwinder.
> > >   | This combination was always broken.
> > > 
> > > This prompted Dave to realise that our CFI directives are contradictory,
> > > as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter
> > > unconditionally identifying the interrupted context as opposed to the
> > > values in the sigcontext.
> > > 
> > > Rework the CFI directives so that we only use .cfi_signal_frame, and
> > > include the "mysterious NOP" as part of the .cfi_{start,end}proc block.
> > > 
> > > Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
> > > Reported-by: Dave Martin <dave.martin@arm.com>
> > > Reported-by: Daniel Kiss <daniel.kiss@arm.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kernel/vdso/sigreturn.S | 8 +++-----
> > >  1 file changed, 3 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
> > > index 7853fa9692f6..28b33f7d0604 100644
> > > --- a/arch/arm64/kernel/vdso/sigreturn.S
> > > +++ b/arch/arm64/kernel/vdso/sigreturn.S
> > > @@ -14,6 +14,9 @@
> > >  
> > >  	.text
> > >  
> > > +/* Ensure that the mysterious NOP can be associated with a function. */
> > > +	.cfi_startproc
> > > +	.cfi_signal_frame
> > >  /*
> > >   * This mysterious NOP is required for some unwinders that subtract one from
> > >   * the return address in order to identify the calling function.
> > > @@ -28,11 +31,6 @@
> > >   * is perfectly fine.
> > >   */
> > >  SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN)
> > > -	.cfi_startproc
> > > -	.cfi_signal_frame
> > > -	.cfi_def_cfa	x29, 0
> > > -	.cfi_offset	x29, 0 * 8
> > > -	.cfi_offset	x30, 1 * 8
> > 
> > Having thought about this again, I think it might be better to stick to
> > the original version.
> > 
> > If the signal handler is halfway through mungeing the sigcontext then
> > backtracing using sigcontext won't be reliable.
> 
> I suppose, but then what does .cfi_signal_frame do? I'll see if I can
> find something that uses it. The frame record is still sitting on the
> stack, so it does feel redundant to say both '.cfi_signal_frame' and
> '.cfi_def_cfa' (and other architectures, e.g. riscv don't do this).
> 
> But I'm also happy to play it safe if I can stick a comment in here
> saying what it does.
> 
> > Plus, in the absence of any spec that says exactly what
> > .cfi_signal_frame means*, we probably don't want to rock the boat.
> 
> The gas docs say:
> 
> 	"Mark current function as signal trampoline."
> 
> which is really informative.

Well, we've demonstrated that identifying the signal frame is a gross
bodge.  The cfi annotation should provide a reliable way to identify the
signal frame, but I guess it was too poorly specified or came too late
to prevent the bodges from spreading.

Since this seems to be a nonstandard invention, I wouldn't hold out
much hope of finding a usable spec.

Of course, something might be using it now, so I guess we have to leave
it.

---Dave
Will Deacon May 19, 2020, 3:24 p.m. UTC | #4
On Tue, May 19, 2020 at 02:55:37PM +0100, Dave Martin wrote:
> On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote:
> > The gas docs say:
> > 
> > 	"Mark current function as signal trampoline."
> > 
> > which is really informative.
> 
> Well, we've demonstrated that identifying the signal frame is a gross
> bodge.  The cfi annotation should provide a reliable way to identify the
> signal frame, but I guess it was too poorly specified or came too late
> to prevent the bodges from spreading.
> 
> Since this seems to be a nonstandard invention, I wouldn't hold out
> much hope of finding a usable spec.
> 
> Of course, something might be using it now, so I guess we have to leave
> it.

I had a quick look at libstdc++ (the horror!) and it looks like the DWARF
backend in there /does/ make use of this information as part of the
_Unwind_GetIPInfo() function:

https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/baselib--unwind-getipinfo.html

*ip_before_insn is set to 1 or 0 depending on whether or not the PC
corresponds to a function annotated with .cfi_signal_frame. So I think
the code in libstdc++-v3/libsupc++/eh_personality.cc doesn't need the
mysterious NOP at all!

Unfortunately, it looks like the LLVM libc++ doesn't use this, and instead
calls _Unwind_GetIP():

https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/baselib--unwind-getip.html

and unconditionally subtracts 1 in libcxxabi/src/cxa_personality.cpp,
meaning that the NOP is necessary.

So, after giving myself a splitting headache, it looks like:

  1. We need the mysterious NOP for LLVM
  2. We could drop .cfi_signal_frame but it's harmless to keep it
  3. We need the .cfi_def_cfa directive to locate the frame record, as
     .cfi_signal_frame doesn't do very much at all.

Make sense? If so, I'll spin a v2 of the patches along with a comment
trying to summarise some of this.

Cheers,

Will
Daniel Kiss May 19, 2020, 3:30 p.m. UTC | #5
> On 19 May 2020, at 15:55, Dave Martin <Dave.Martin@arm.com> wrote:
> 
> On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote:
>> On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote:
>>> On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote:
>>>> Daniel reports that the .cfi_startproc is misplaced for the sigreturn
>>>> trampoline, which causes LLVM's unwinder to misbehave:
>>>> 
>>>>  | I run into this with LLVM’s unwinder.
>>>>  | This combination was always broken.
>>>> 
>>>> This prompted Dave to realise that our CFI directives are contradictory,
>>>> as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter
>>>> unconditionally identifying the interrupted context as opposed to the
>>>> values in the sigcontext.
>>>> 
>>>> Rework the CFI directives so that we only use .cfi_signal_frame, and
>>>> include the "mysterious NOP" as part of the .cfi_{start,end}proc block.
>>>> 
>>>> Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
>>>> Reported-by: Dave Martin <dave.martin@arm.com>
>>>> Reported-by: Daniel Kiss <daniel.kiss@arm.com>
>>>> Signed-off-by: Will Deacon <will@kernel.org>
>>>> ---
>>>> arch/arm64/kernel/vdso/sigreturn.S | 8 +++-----
>>>> 1 file changed, 3 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
>>>> index 7853fa9692f6..28b33f7d0604 100644
>>>> --- a/arch/arm64/kernel/vdso/sigreturn.S
>>>> +++ b/arch/arm64/kernel/vdso/sigreturn.S
>>>> @@ -14,6 +14,9 @@
>>>> 
>>>> 	.text
>>>> 
>>>> +/* Ensure that the mysterious NOP can be associated with a function. */
>>>> +	.cfi_startproc
>>>> +	.cfi_signal_frame
>>>> /*
>>>>  * This mysterious NOP is required for some unwinders that subtract one from
>>>>  * the return address in order to identify the calling function.
>>>> @@ -28,11 +31,6 @@
>>>>  * is perfectly fine.
>>>>  */
>>>> SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN)
>>>> -	.cfi_startproc
>>>> -	.cfi_signal_frame
>>>> -	.cfi_def_cfa	x29, 0
>>>> -	.cfi_offset	x29, 0 * 8
>>>> -	.cfi_offset	x30, 1 * 8
LLVM’s unwinder does not like this version of the CFI. It needs a bit more information, 
the cfi_signal_frame is not used for finding the frame.

>>> 
>>> Having thought about this again, I think it might be better to stick to
>>> the original version.
>>> 
>>> If the signal handler is halfway through mungeing the sigcontext then
>>> backtracing using sigcontext won't be reliable.
>> 
>> I suppose, but then what does .cfi_signal_frame do? I'll see if I can
>> find something that uses it. The frame record is still sitting on the
>> stack, so it does feel redundant to say both '.cfi_signal_frame' and
>> '.cfi_def_cfa' (and other architectures, e.g. riscv don't do this).
>> 
>> But I'm also happy to play it safe if I can stick a comment in here
>> saying what it does.
Sounds good to me.

>> 
>>> Plus, in the absence of any spec that says exactly what
>>> .cfi_signal_frame means*, we probably don't want to rock the boat.
>> 
>> The gas docs say:
>> 
>> 	"Mark current function as signal trampoline."
>> 
>> which is really informative.
> 
> Well, we've demonstrated that identifying the signal frame is a gross
> bodge.  The cfi annotation should provide a reliable way to identify the
> signal frame, but I guess it was too poorly specified or came too late
> to prevent the bodges from spreading.
> 
> Since this seems to be a nonstandard invention, I wouldn't hold out
> much hope of finding a usable spec.
> 
> Of course, something might be using it now, so I guess we have to leave
> it.
> 
> ---Dave
Will Deacon May 19, 2020, 3:55 p.m. UTC | #6
On Tue, May 19, 2020 at 03:30:57PM +0000, Daniel Kiss wrote:
> > On 19 May 2020, at 15:55, Dave Martin <Dave.Martin@arm.com> wrote:
> > On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote:
> >> On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote:
> >>> On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote:
> >>>> Daniel reports that the .cfi_startproc is misplaced for the sigreturn
> >>>> trampoline, which causes LLVM's unwinder to misbehave:
> >>>> 
> >>>>  | I run into this with LLVM’s unwinder.
> >>>>  | This combination was always broken.
> >>>> 
> >>>> This prompted Dave to realise that our CFI directives are contradictory,
> >>>> as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter
> >>>> unconditionally identifying the interrupted context as opposed to the
> >>>> values in the sigcontext.
> >>>> 
> >>>> Rework the CFI directives so that we only use .cfi_signal_frame, and
> >>>> include the "mysterious NOP" as part of the .cfi_{start,end}proc block.
> >>>> 
> >>>> Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
> >>>> Reported-by: Dave Martin <dave.martin@arm.com>
> >>>> Reported-by: Daniel Kiss <daniel.kiss@arm.com>
> >>>> Signed-off-by: Will Deacon <will@kernel.org>
> >>>> ---
> >>>> arch/arm64/kernel/vdso/sigreturn.S | 8 +++-----
> >>>> 1 file changed, 3 insertions(+), 5 deletions(-)
> >>>> 
> >>>> diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
> >>>> index 7853fa9692f6..28b33f7d0604 100644
> >>>> --- a/arch/arm64/kernel/vdso/sigreturn.S
> >>>> +++ b/arch/arm64/kernel/vdso/sigreturn.S
> >>>> @@ -14,6 +14,9 @@
> >>>> 
> >>>> 	.text
> >>>> 
> >>>> +/* Ensure that the mysterious NOP can be associated with a function. */
> >>>> +	.cfi_startproc
> >>>> +	.cfi_signal_frame
> >>>> /*
> >>>>  * This mysterious NOP is required for some unwinders that subtract one from
> >>>>  * the return address in order to identify the calling function.
> >>>> @@ -28,11 +31,6 @@
> >>>>  * is perfectly fine.
> >>>>  */
> >>>> SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN)
> >>>> -	.cfi_startproc
> >>>> -	.cfi_signal_frame
> >>>> -	.cfi_def_cfa	x29, 0
> >>>> -	.cfi_offset	x29, 0 * 8
> >>>> -	.cfi_offset	x30, 1 * 8
> LLVM’s unwinder does not like this version of the CFI. It needs a bit more information, 
> the cfi_signal_frame is not used for finding the frame.

Thanks, Daniel. That is, at least, aligned with my current understanding
of how this is supposed to work.

I'll send out a v2 in a bit.

Will
diff mbox series

Patch

diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
index 7853fa9692f6..28b33f7d0604 100644
--- a/arch/arm64/kernel/vdso/sigreturn.S
+++ b/arch/arm64/kernel/vdso/sigreturn.S
@@ -14,6 +14,9 @@ 
 
 	.text
 
+/* Ensure that the mysterious NOP can be associated with a function. */
+	.cfi_startproc
+	.cfi_signal_frame
 /*
  * This mysterious NOP is required for some unwinders that subtract one from
  * the return address in order to identify the calling function.
@@ -28,11 +31,6 @@ 
  * is perfectly fine.
  */
 SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN)
-	.cfi_startproc
-	.cfi_signal_frame
-	.cfi_def_cfa	x29, 0
-	.cfi_offset	x29, 0 * 8
-	.cfi_offset	x30, 1 * 8
 	mov	x8, #__NR_rt_sigreturn
 	svc	#0
 	.cfi_endproc