mbox series

[RFC,00/17] objtool: add base support for arm64

Message ID 20210120173800.1660730-1-jthierry@redhat.com (mailing list archive)
Headers show
Series objtool: add base support for arm64 | expand

Message

Julien Thierry Jan. 20, 2021, 5:37 p.m. UTC
Hi,

This series enables objtool to start doing stack validation on arm64
kernel builds. It relies on the previous series I sent, refactoring
the arm64 decoder [1].

First, the aarch64 instruction decoder needed to be made available
to code under tools/. This is done in a similar manner to x86
instruction decoded. One limitation I encountered there is that most
of aarch64 instruction decoder is __kprobe annotated. To bypass that
it remove the kprobe include and had to add an empty __kprobe
definition, but I'd welcome a proper solution to that.

Then instruction semantics are progressively added so objtool can track
the stack state through the execution flow.
There are a few things that needed consideration:
- Generation of constants within executable sections, these either
  caused objtool to fail decoding or to wrongly decode constants
  as jumps or other instructions affecting execution flow and
  causing confusion. To solve this, tracking locations referenced
  by instructions using literals was needed.
- Jump tables from switch statements in aarch64 don't have enough
  information to link branches with the branch instruction leading to
  them. For this, we use a gcc plugin to add some information to establish
  those missing links in a format that is already supported by objtool

With this, there are still some errors when building with objtool. A
number of cleanups/annotations are needed on the arm64, as well as
handling SYM_DATA objects in objtool.

Those changes can be found on top of this branch here:
git clone https://github.com/julien-thierry/linux.git -b objtoolxarm64-latest

But it would be nice to have some feedback on this before I start submitting everyting.

[1] https://lkml.org/lkml/2021/1/20/791

Thanks,

Julien

-->

Julien Thierry (15):
  tools: Add some generic functions and headers
  tools: arm64: Make aarch64 instruction decoder available to tools
  tools: bug: Remove duplicate definition
  objtool: arm64: Add base definition for arm64 backend
  objtool: arm64: Decode add/sub instructions
  objtool: arm64: Decode jump and call related instructions
  objtool: arm64: Decode other system instructions
  objtool: arm64: Decode load/store instructions
  objtool: arm64: Decode LDR instructions
  objtool: arm64: Accept padding in code sections
  efi: libstub: Ignore relocations for .discard sections
  objtool: arm64: Implement functions to add switch tables alternatives
  objtool: arm64: Cache section with switch table information
  objtool: arm64: Handle supported relocations in alternatives
  objtool: arm64: Ignore replacement section for alternative callback

Raphael Gault (2):
  gcc-plugins: objtool: Add plugin to detect switch table on arm64
  objtool: arm64: Enable stack validation for arm64

 arch/arm64/Kconfig                            |    2 +
 drivers/firmware/efi/libstub/Makefile         |    2 +-
 scripts/Makefile.gcc-plugins                  |    2 +
 scripts/gcc-plugins/Kconfig                   |    4 +
 .../arm64_switch_table_detection_plugin.c     |   85 +
 tools/arch/arm64/include/asm/aarch64-insn.h   |  551 +++++++
 tools/arch/arm64/lib/aarch64-insn.c           | 1425 +++++++++++++++++
 tools/include/asm-generic/bitops/__ffs.h      |   11 +
 tools/include/linux/bug.h                     |    6 +-
 tools/include/linux/kernel.h                  |   21 +
 tools/include/linux/printk.h                  |   40 +
 tools/objtool/Makefile                        |    5 +
 tools/objtool/arch/arm64/Build                |    8 +
 tools/objtool/arch/arm64/decode.c             |  471 ++++++
 .../arch/arm64/include/arch/cfi_regs.h        |   14 +
 tools/objtool/arch/arm64/include/arch/elf.h   |    6 +
 .../arch/arm64/include/arch/endianness.h      |    9 +
 .../objtool/arch/arm64/include/arch/special.h |   23 +
 tools/objtool/arch/arm64/special.c            |  134 ++
 tools/objtool/arch/x86/decode.c               |    5 +
 tools/objtool/check.c                         |    6 +
 tools/objtool/include/objtool/arch.h          |    3 +
 tools/objtool/sync-check.sh                   |    5 +
 23 files changed, 2832 insertions(+), 6 deletions(-)
 create mode 100644 scripts/gcc-plugins/arm64_switch_table_detection_plugin.c
 create mode 100644 tools/arch/arm64/include/asm/aarch64-insn.h
 create mode 100644 tools/arch/arm64/lib/aarch64-insn.c
 create mode 100644 tools/include/linux/printk.h
 create mode 100644 tools/objtool/arch/arm64/Build
 create mode 100644 tools/objtool/arch/arm64/decode.c
 create mode 100644 tools/objtool/arch/arm64/include/arch/cfi_regs.h
 create mode 100644 tools/objtool/arch/arm64/include/arch/elf.h
 create mode 100644 tools/objtool/arch/arm64/include/arch/endianness.h
 create mode 100644 tools/objtool/arch/arm64/include/arch/special.h
 create mode 100644 tools/objtool/arch/arm64/special.c

--
2.25.4

Comments

Ard Biesheuvel Jan. 21, 2021, 9:03 a.m. UTC | #1
Hello Julien,

On Wed, 20 Jan 2021 at 18:38, Julien Thierry <jthierry@redhat.com> wrote:
>
> Hi,
>
> This series enables objtool to start doing stack validation on arm64
> kernel builds.

Could we elaborate on this point, please? 'Stack validation' means
getting an accurate picture of all kernel code that will be executed
at some point in the future, due to the fact that there are stack
frames pointing to them. And this ability is essential in order to do
live patching safely?

If this is the goal, I wonder whether this is the right approach for
arm64 (or for any other architecture, for that matter)

Parsing/decoding the object code and even worse, relying on GCC
plugins to annotate some of the idioms as they are being generated, in
order to infer intent on the part of the compiler goes *way* beyond
what we should be comfortable with. The whole point of this exercise
is to guarantee that there are no false positives when it comes to
deciding whether the kernel is in a live patchable state, and I don't
see how we can ever provide such a guarantee when it is built on such
a fragile foundation.

If we want to ensure that the stack contents are always an accurate
reflection of the real call stack, we should work with the toolchain
folks to identify issues that may interfere with this, and implement
controls over these behaviors that we can decide to use in the build.
In the past, I have already proposed adding a 'kernel' code model to
the AArch64 compiler that guarantees certain things, such as adrp/add
for symbol references, and no GOT indirections for position
independent code. Inhibiting optimizations that may impact our ability
to infer the real call stack from the stack contents is something we
might add here as well.

Another thing that occurred to me is that inferring which kernel code
is actually live in terms of pending function returns could be
inferred much more easily from a shadow call stack, which is a thing
we already implement for Clang builds.

In summary, I would not be in favor of enabling objtool on arm64 at
all until we have exhausted other options for providing the
functionality that we need it for (given that objtool provides many
other things that only x86 cares about, IIUC)
Julien Thierry Jan. 21, 2021, 10:26 a.m. UTC | #2
Hi Ard,

On 1/21/21 10:03 AM, Ard Biesheuvel wrote:
> Hello Julien,
> 
> On Wed, 20 Jan 2021 at 18:38, Julien Thierry <jthierry@redhat.com> wrote:
>>
>> Hi,
>>
>> This series enables objtool to start doing stack validation on arm64
>> kernel builds.
> 
> Could we elaborate on this point, please? 'Stack validation' means
> getting an accurate picture of all kernel code that will be executed
> at some point in the future, due to the fact that there are stack
> frames pointing to them. And this ability is essential in order to do
> live patching safely?
> 
> If this is the goal, I wonder whether this is the right approach for
> arm64 (or for any other architecture, for that matter)
> 
> Parsing/decoding the object code and even worse, relying on GCC
> plugins to annotate some of the idioms as they are being generated, in
> order to infer intent on the part of the compiler goes *way* beyond
> what we should be comfortable with. The whole point of this exercise
> is to guarantee that there are no false positives when it comes to
> deciding whether the kernel is in a live patchable state, and I don't
> see how we can ever provide such a guarantee when it is built on such
> a fragile foundation.
> 
> If we want to ensure that the stack contents are always an accurate
> reflection of the real call stack, we should work with the toolchain
> folks to identify issues that may interfere with this, and implement
> controls over these behaviors that we can decide to use in the build.
> In the past, I have already proposed adding a 'kernel' code model to
> the AArch64 compiler that guarantees certain things, such as adrp/add
> for symbol references, and no GOT indirections for position
> independent code. Inhibiting optimizations that may impact our ability
> to infer the real call stack from the stack contents is something we
> might add here as well.
> 

I'm not familiar with toolcahin code models, but would this approach be 
able to validate assembly code (either inline or in assembly files?)

> Another thing that occurred to me is that inferring which kernel code
> is actually live in terms of pending function returns could be
> inferred much more easily from a shadow call stack, which is a thing
> we already implement for Clang builds.
> 

I was not familiar with the shadow call stack. If I understand correctly 
that would be a stack of return addresses of function currently on the 
call stack, is that correct?

That would indeed be a simpler approach, however I guess the 
instrumentation has a cost. Is the instrumentation also available with 
GCC? And is this instrumentation efficient enough to be suitable for 
production builds?

If we can rely on shadow call stack to implement the reliable unwinder, 
I guess this could be the way to go.

> In summary, I would not be in favor of enabling objtool on arm64 at
> all until we have exhausted other options for providing the
> functionality that we need it for (given that objtool provides many
> other things that only x86 cares about, IIUC)
> 
I understand the concern and appreciate the suggestion. I guess this 
does need some thorough discussions for the right approach.

Thanks,
Ard Biesheuvel Jan. 21, 2021, 11:08 a.m. UTC | #3
On Thu, 21 Jan 2021 at 11:26, Julien Thierry <jthierry@redhat.com> wrote:
>
> Hi Ard,
>
> On 1/21/21 10:03 AM, Ard Biesheuvel wrote:
> > Hello Julien,
> >
> > On Wed, 20 Jan 2021 at 18:38, Julien Thierry <jthierry@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> This series enables objtool to start doing stack validation on arm64
> >> kernel builds.
> >
> > Could we elaborate on this point, please? 'Stack validation' means
> > getting an accurate picture of all kernel code that will be executed
> > at some point in the future, due to the fact that there are stack
> > frames pointing to them. And this ability is essential in order to do
> > live patching safely?
> >
> > If this is the goal, I wonder whether this is the right approach for
> > arm64 (or for any other architecture, for that matter)
> >
> > Parsing/decoding the object code and even worse, relying on GCC
> > plugins to annotate some of the idioms as they are being generated, in
> > order to infer intent on the part of the compiler goes *way* beyond
> > what we should be comfortable with. The whole point of this exercise
> > is to guarantee that there are no false positives when it comes to
> > deciding whether the kernel is in a live patchable state, and I don't
> > see how we can ever provide such a guarantee when it is built on such
> > a fragile foundation.
> >
> > If we want to ensure that the stack contents are always an accurate
> > reflection of the real call stack, we should work with the toolchain
> > folks to identify issues that may interfere with this, and implement
> > controls over these behaviors that we can decide to use in the build.
> > In the past, I have already proposed adding a 'kernel' code model to
> > the AArch64 compiler that guarantees certain things, such as adrp/add
> > for symbol references, and no GOT indirections for position
> > independent code. Inhibiting optimizations that may impact our ability
> > to infer the real call stack from the stack contents is something we
> > might add here as well.
> >
>
> I'm not familiar with toolcahin code models, but would this approach be
> able to validate assembly code (either inline or in assembly files?)
>

No, it would not. But those files are part of the code base, and can
be reviewed and audited.

> > Another thing that occurred to me is that inferring which kernel code
> > is actually live in terms of pending function returns could be
> > inferred much more easily from a shadow call stack, which is a thing
> > we already implement for Clang builds.
> >
>
> I was not familiar with the shadow call stack. If I understand correctly
> that would be a stack of return addresses of function currently on the
> call stack, is that correct?
>
> That would indeed be a simpler approach, however I guess the
> instrumentation has a cost. Is the instrumentation also available with
> GCC? And is this instrumentation efficient enough to be suitable for
> production builds?
>

I am not aware of any plans to enable this in GCC, but the Clang
implementation is definitely intended for production use (it's a CFI
feature for ROP/JOP mitigation)

> If we can rely on shadow call stack to implement the reliable unwinder,
> I guess this could be the way to go.
>
> > In summary, I would not be in favor of enabling objtool on arm64 at
> > all until we have exhausted other options for providing the
> > functionality that we need it for (given that objtool provides many
> > other things that only x86 cares about, IIUC)
> >
> I understand the concern and appreciate the suggestion. I guess this
> does need some thorough discussions for the right approach.
>

Agreed.
Peter Zijlstra Jan. 21, 2021, 11:23 a.m. UTC | #4
On Thu, Jan 21, 2021 at 12:08:23PM +0100, Ard Biesheuvel wrote:
> On Thu, 21 Jan 2021 at 11:26, Julien Thierry <jthierry@redhat.com> wrote:

> > I'm not familiar with toolcahin code models, but would this approach be
> > able to validate assembly code (either inline or in assembly files?)
> >
> 
> No, it would not. But those files are part of the code base, and can
> be reviewed and audited.

x86 has a long history if failing at exactly that.
Ard Biesheuvel Jan. 21, 2021, 11:48 a.m. UTC | #5
On Thu, 21 Jan 2021 at 12:23, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jan 21, 2021 at 12:08:23PM +0100, Ard Biesheuvel wrote:
> > On Thu, 21 Jan 2021 at 11:26, Julien Thierry <jthierry@redhat.com> wrote:
>
> > > I'm not familiar with toolcahin code models, but would this approach be
> > > able to validate assembly code (either inline or in assembly files?)
> > >
> >
> > No, it would not. But those files are part of the code base, and can
> > be reviewed and audited.
>
> x86 has a long history if failing at exactly that.

That's a fair point. But on the flip side, maintaining objtool does
not look like it has been a walk in the park either.

What i am especially concerned about is things like 3193c0836f20,
where we actually have to disable certain compiler optimizations
because they interfere with objtool's ability to understand the
resulting object code. Correctness and performance are challenging
enough as requirements for generated code.

Mind you, I am not saying it is not worth it *for x86*, where there is
a lot of other stuff going on. But on arm64, we don't care about ORC,
about -fomit-frame-pointer, about retpolines or about any of the other
things objtool enables.

On arm64, all it currently seems to provide is a way to capture the
call stack accurately, and given that it needs a GCC plugin for this
(which needs to be maintained as well, which is non-trivial, and also
bars us from using objtool with Clang builds), my current position is
simply that opening this can of worms at this point is just not worth
it.
Julien Thierry Jan. 21, 2021, 1:23 p.m. UTC | #6
On 1/21/21 12:08 PM, Ard Biesheuvel wrote:
> On Thu, 21 Jan 2021 at 11:26, Julien Thierry <jthierry@redhat.com> wrote:
>>
>> Hi Ard,
>>
>> On 1/21/21 10:03 AM, Ard Biesheuvel wrote:
>>> Hello Julien,
>>>
>>> On Wed, 20 Jan 2021 at 18:38, Julien Thierry <jthierry@redhat.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This series enables objtool to start doing stack validation on arm64
>>>> kernel builds.
>>>
>>> Could we elaborate on this point, please? 'Stack validation' means
>>> getting an accurate picture of all kernel code that will be executed
>>> at some point in the future, due to the fact that there are stack
>>> frames pointing to them. And this ability is essential in order to do
>>> live patching safely?
>>>
>>> If this is the goal, I wonder whether this is the right approach for
>>> arm64 (or for any other architecture, for that matter)
>>>
>>> Parsing/decoding the object code and even worse, relying on GCC
>>> plugins to annotate some of the idioms as they are being generated, in
>>> order to infer intent on the part of the compiler goes *way* beyond
>>> what we should be comfortable with. The whole point of this exercise
>>> is to guarantee that there are no false positives when it comes to
>>> deciding whether the kernel is in a live patchable state, and I don't
>>> see how we can ever provide such a guarantee when it is built on such
>>> a fragile foundation.
>>>
>>> If we want to ensure that the stack contents are always an accurate
>>> reflection of the real call stack, we should work with the toolchain
>>> folks to identify issues that may interfere with this, and implement
>>> controls over these behaviors that we can decide to use in the build.
>>> In the past, I have already proposed adding a 'kernel' code model to
>>> the AArch64 compiler that guarantees certain things, such as adrp/add
>>> for symbol references, and no GOT indirections for position
>>> independent code. Inhibiting optimizations that may impact our ability
>>> to infer the real call stack from the stack contents is something we
>>> might add here as well.
>>>
>>
>> I'm not familiar with toolcahin code models, but would this approach be
>> able to validate assembly code (either inline or in assembly files?)
>>
> 
> No, it would not. But those files are part of the code base, and can
> be reviewed and audited.
> 

That means that every actor maintaining their own stable version of the 
kernel have to do their own audit when they do backports (assuming the 
audit would be done for upstream) to be able to provide a safe 
livepatching feature in their kernel.

>>> Another thing that occurred to me is that inferring which kernel code
>>> is actually live in terms of pending function returns could be
>>> inferred much more easily from a shadow call stack, which is a thing
>>> we already implement for Clang builds.
>>>
>>
>> I was not familiar with the shadow call stack. If I understand correctly
>> that would be a stack of return addresses of function currently on the
>> call stack, is that correct?
>>
>> That would indeed be a simpler approach, however I guess the
>> instrumentation has a cost. Is the instrumentation also available with
>> GCC? And is this instrumentation efficient enough to be suitable for
>> production builds?
>>
> 
> I am not aware of any plans to enable this in GCC, but the Clang
> implementation is definitely intended for production use (it's a CFI
> feature for ROP/JOP mitigation)
> 

I think most people interested in livepatching are using GCC built 
kernels, but I could be mistaken (althought in the long run, both 
compilers should be supported, and yes, I realize the objtool solution 
currently only would support GCC).

I don't know how feasible it will be to get it into GCC if people decide 
to go with that. Also, now that I think about it, it will probably come 
with similar limitations as stackframes where the unwinder would need to 
know when/where the shadow call stack is unavailable for some reason and 
the stack trace is not reliable. (it might be a bit simpler to audit 
than stack frame setting and maybe have less limitations, but I guess 
there will still be cases where we can't rely on it)
Mark Brown Jan. 21, 2021, 2:23 p.m. UTC | #7
On Thu, Jan 21, 2021 at 02:23:53PM +0100, Julien Thierry wrote:
> On 1/21/21 12:08 PM, Ard Biesheuvel wrote:

> > I am not aware of any plans to enable this in GCC, but the Clang
> > implementation is definitely intended for production use (it's a CFI
> > feature for ROP/JOP mitigation)

> I think most people interested in livepatching are using GCC built kernels,
> but I could be mistaken (althought in the long run, both compilers should be
> supported, and yes, I realize the objtool solution currently only would
> support GCC).

There definitely seem to be some users interested in both livepatch and
clang built kernels so it might come up relatively quickly.
Josh Poimboeuf Jan. 21, 2021, 6:54 p.m. UTC | #8
Ard,

Sorry, I was late to the party, attempting to reply to the entire thread
at once.  Also, adding the live-patching ML.

I agree with a lot of your concerns.  Reverse engineering the control
flow of the compiled binary is kind of ridiculous.  I was always
surprised that it works.  I still am!  But I think it's more robust than
you give it credit for.

Most of the existing code just works, with (annual) tweaks for new
compiler versions.  In fact now it works well with both GCC and Clang,
across several versions.  Soon it will work with LTO.

It has grown many uses beyond stack validation: ORC, static calls,
retpolines validation, noinstr validation, SMAP validation.  It has
found a *lot* of compiler bugs.  And there will probably be more use
cases when we get vmlinux validation running fast enough.

But there is indeed a maintenance burden.  I often ask myself if it's
worth it.  So far the answer has been yes :-)  Particularly because it
has influenced many positive changes to the kernel.  And it helps now
that even more people are contributing and adding useful features.

But you should definitely think twice before letting it loose on your
arch, especially if you have some other way to ensure reliable stack
metadata, and if you don't have a need for the other objtool features.


Regarding your other proposals:

1) I'm doubtful we can ever rely on the toolchain to ensure reliable
   unwind metadata, because:

   a) most of the problems are in asm and inline-asm; good luck getting
      the toolchain to care about those.

   b) the toolchain is fragile; do we want to entrust the integrity of
      live patching to the compiler's frame pointer generation (or other
      unwind metadata) without having some sort of checking mechanism?


2) The shadow stack idea sounds promising -- how hard would it be to
   make a prototype reliable unwinder?


More comments below:


On Thu, Jan 21, 2021 at 12:48:43PM +0100, Ard Biesheuvel wrote:
> On Thu, 21 Jan 2021 at 12:23, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Jan 21, 2021 at 12:08:23PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 21 Jan 2021 at 11:26, Julien Thierry <jthierry@redhat.com> wrote:
> >
> > > > I'm not familiar with toolcahin code models, but would this approach be
> > > > able to validate assembly code (either inline or in assembly files?)
> > > >
> > >
> > > No, it would not. But those files are part of the code base, and can
> > > be reviewed and audited.
> >
> > x86 has a long history if failing at exactly that.
> 
> That's a fair point. But on the flip side, maintaining objtool does
> not look like it has been a walk in the park either.

I think you missed Peter's point: it's not that it's *hard* for humans
to continuously review/audit all asm and inline-asm; it's just not
feasible to do it 100% correctly, 100% of the time.

Like any other code, objtool requires maintenance, but its analysis is
orders of magnitude more robust than any human.

> What i am especially concerned about is things like 3193c0836f20,
> where we actually have to disable certain compiler optimizations
> because they interfere with objtool's ability to understand the
> resulting object code. Correctness and performance are challenging
> enough as requirements for generated code.

Well, you managed to find the worst case scenario.  I think that's the
only time we ever had to do that.  Please don't think that's normal, or
even a generally realistic concern.  Objtool tries really hard to stay
out of the way.

Long term we really want to prevent that type of thing with the help of
annotations from compiler plugins, similar to what Julien did here.

Yes, it would mean two objtool compiler plugins (GCC and Clang), but it
would ease the objtool maintenance burden and risk in many ways.  And
prevent situations like that commit you found.  It may sound fragile,
but it will actually make things simpler overall: less reverse
engineering of GCC switch jump tables and __noreturn functions is a good
thing.

> Mind you, I am not saying it is not worth it *for x86*, where there is
> a lot of other stuff going on. But on arm64, we don't care about ORC,
> about -fomit-frame-pointer, about retpolines or about any of the other
> things objtool enables.
> 
> On arm64, all it currently seems to provide is a way to capture the
> call stack accurately, and given that it needs a GCC plugin for this
> (which needs to be maintained as well, which is non-trivial, and also
> bars us from using objtool with Clang builds), my current position is
> simply that opening this can of worms at this point is just not worth
> it.

As far as GCC plugins go, it looks pretty basic to me.  Also this
doesn't *at all* prevent Clang from being used for live patching.  If
anybody actually tries running Julien's patches on a Clang-built kernel,
it might just work.  But if not, and the switch tables turn out to be
unparseable like on GCC, we could have a Clang plugin.  As I mentioned,
we'll probably end up having one anyway for x86.
Mark Brown Jan. 22, 2021, 5:43 p.m. UTC | #9
On Thu, Jan 21, 2021 at 12:54:52PM -0600, Josh Poimboeuf wrote:

> 2) The shadow stack idea sounds promising -- how hard would it be to
>    make a prototype reliable unwinder?

In theory it doesn't look too hard and I can't see a particular reason
not to try doing this - there's going to be edge cases but hopefully for
reliable stack trace they're all in areas where we would be happy to
just decide the stack isn't reliable anyway, things like nesting which
allocates separate shadow stacks for each nested level for example.
I'll take a look.
Ard Biesheuvel Jan. 22, 2021, 5:54 p.m. UTC | #10
On Fri, 22 Jan 2021 at 18:44, Mark Brown <broonie@kernel.org> wrote:
>
> On Thu, Jan 21, 2021 at 12:54:52PM -0600, Josh Poimboeuf wrote:
>
> > 2) The shadow stack idea sounds promising -- how hard would it be to
> >    make a prototype reliable unwinder?
>
> In theory it doesn't look too hard and I can't see a particular reason
> not to try doing this - there's going to be edge cases but hopefully for
> reliable stack trace they're all in areas where we would be happy to
> just decide the stack isn't reliable anyway, things like nesting which
> allocates separate shadow stacks for each nested level for example.
> I'll take a look.

This reminds me - a while ago, I had a stab at writing a rudimentary
GCC plugin that pushes/pops return addresses to a shadow call stack
pointed to by x18 [0]
I am by no means suggesting that we should rely on a GCC plugin for
this, only that it does seem rather straight-forward for the compiler
to manage a stack with return addresses like that (although the devil
is probably in the details, as usual)

[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-scs-gcc
Madhavan T. Venkataraman Jan. 22, 2021, 9:15 p.m. UTC | #11
On 1/22/21 11:43 AM, Mark Brown wrote:
> On Thu, Jan 21, 2021 at 12:54:52PM -0600, Josh Poimboeuf wrote:
> 
>> 2) The shadow stack idea sounds promising -- how hard would it be to
>>    make a prototype reliable unwinder?
> 
> In theory it doesn't look too hard and I can't see a particular reason
> not to try doing this - there's going to be edge cases but hopefully for
> reliable stack trace they're all in areas where we would be happy to
> just decide the stack isn't reliable anyway, things like nesting which
> allocates separate shadow stacks for each nested level for example.
> I'll take a look.
> 

I am a new comer to this discussion and I am learning. Just have some
questions. Pardon me if they are obvious or if they have already been
asked and answered.

Doesn't Clang already have support for a shadow stack implementation for ARM64?
We could take a look at how Clang does it.

Will there not be a significant performance hit? May be, some of it can be
mitigated by using a parallel shadow stack rather than a compact one.

Are there any longjmp style situations in the kernel where the stack is
unwound by several frames? In these cases, the shadow stack must be unwound
accordingly.

Madhavan

> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Madhavan T. Venkataraman Jan. 22, 2021, 9:16 p.m. UTC | #12
On 1/22/21 11:43 AM, Mark Brown wrote:
> On Thu, Jan 21, 2021 at 12:54:52PM -0600, Josh Poimboeuf wrote:
> 
>> 2) The shadow stack idea sounds promising -- how hard would it be to
>>    make a prototype reliable unwinder?
> 
> In theory it doesn't look too hard and I can't see a particular reason
> not to try doing this - there's going to be edge cases but hopefully for
> reliable stack trace they're all in areas where we would be happy to
> just decide the stack isn't reliable anyway, things like nesting which
> allocates separate shadow stacks for each nested level for example.
> I'll take a look.
> 

I am a new comer to this discussion and I am learning. Just have some
questions. Pardon me if they are obvious or if they have already been
asked and answered.

Doesn't Clang already have support for a shadow stack implementation for ARM64?
We could take a look at how Clang does it.

Will there not be a significant performance hit? May be, some of it can be
mitigated by using a parallel shadow stack rather than a compact one.

Are there any longjmp style situations in the kernel where the stack is
unwound by several frames? In these cases, the shadow stack must be unwound
accordingly.

Madhavan

> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Ard Biesheuvel Jan. 22, 2021, 9:43 p.m. UTC | #13
On Fri, 22 Jan 2021 at 22:15, Madhavan T. Venkataraman
<madvenka@linux.microsoft.com> wrote:
>
>
>
> On 1/22/21 11:43 AM, Mark Brown wrote:
> > On Thu, Jan 21, 2021 at 12:54:52PM -0600, Josh Poimboeuf wrote:
> >
> >> 2) The shadow stack idea sounds promising -- how hard would it be to
> >>    make a prototype reliable unwinder?
> >
> > In theory it doesn't look too hard and I can't see a particular reason
> > not to try doing this - there's going to be edge cases but hopefully for
> > reliable stack trace they're all in areas where we would be happy to
> > just decide the stack isn't reliable anyway, things like nesting which
> > allocates separate shadow stacks for each nested level for example.
> > I'll take a look.
> >
>
> I am a new comer to this discussion and I am learning. Just have some
> questions. Pardon me if they are obvious or if they have already been
> asked and answered.
>
> Doesn't Clang already have support for a shadow stack implementation for ARM64?
> We could take a look at how Clang does it.
>
> Will there not be a significant performance hit? May be, some of it can be
> mitigated by using a parallel shadow stack rather than a compact one.
>
> Are there any longjmp style situations in the kernel where the stack is
> unwound by several frames? In these cases, the shadow stack must be unwound
> accordingly.
>

Hello Madhavan,

Let's discuss the details of shadow call stacks on a separate thread,
instead of further hijacking Julien's series.
Madhavan T. Venkataraman Jan. 22, 2021, 9:44 p.m. UTC | #14
On 1/22/21 3:43 PM, Ard Biesheuvel wrote:
> On Fri, 22 Jan 2021 at 22:15, Madhavan T. Venkataraman
> <madvenka@linux.microsoft.com> wrote:
>>
>>
>>
>> On 1/22/21 11:43 AM, Mark Brown wrote:
>>> On Thu, Jan 21, 2021 at 12:54:52PM -0600, Josh Poimboeuf wrote:
>>>
>>>> 2) The shadow stack idea sounds promising -- how hard would it be to
>>>>    make a prototype reliable unwinder?
>>>
>>> In theory it doesn't look too hard and I can't see a particular reason
>>> not to try doing this - there's going to be edge cases but hopefully for
>>> reliable stack trace they're all in areas where we would be happy to
>>> just decide the stack isn't reliable anyway, things like nesting which
>>> allocates separate shadow stacks for each nested level for example.
>>> I'll take a look.
>>>
>>
>> I am a new comer to this discussion and I am learning. Just have some
>> questions. Pardon me if they are obvious or if they have already been
>> asked and answered.
>>
>> Doesn't Clang already have support for a shadow stack implementation for ARM64?
>> We could take a look at how Clang does it.
>>
>> Will there not be a significant performance hit? May be, some of it can be
>> mitigated by using a parallel shadow stack rather than a compact one.
>>
>> Are there any longjmp style situations in the kernel where the stack is
>> unwound by several frames? In these cases, the shadow stack must be unwound
>> accordingly.
>>
> 
> Hello Madhavan,
> 
> Let's discuss the details of shadow call stacks on a separate thread,
> instead of further hijacking Julien's series.
> 

OK. Sounds good.
Josh Poimboeuf Jan. 25, 2021, 9:19 p.m. UTC | #15
On Fri, Jan 22, 2021 at 10:43:09PM +0100, Ard Biesheuvel wrote:
> On Fri, 22 Jan 2021 at 22:15, Madhavan T. Venkataraman <madvenka@linux.microsoft.com> wrote:
> > On 1/22/21 11:43 AM, Mark Brown wrote:
> > > On Thu, Jan 21, 2021 at 12:54:52PM -0600, Josh Poimboeuf wrote:
> > >
> > >> 2) The shadow stack idea sounds promising -- how hard would it be to
> > >>    make a prototype reliable unwinder?
> > >
> > > In theory it doesn't look too hard and I can't see a particular reason
> > > not to try doing this - there's going to be edge cases but hopefully for
> > > reliable stack trace they're all in areas where we would be happy to
> > > just decide the stack isn't reliable anyway, things like nesting which
> > > allocates separate shadow stacks for each nested level for example.
> > > I'll take a look.
> > >
> >
> > I am a new comer to this discussion and I am learning. Just have some
> > questions. Pardon me if they are obvious or if they have already been
> > asked and answered.
> >
> > Doesn't Clang already have support for a shadow stack implementation for ARM64?
> > We could take a look at how Clang does it.
> >
> > Will there not be a significant performance hit? May be, some of it can be
> > mitigated by using a parallel shadow stack rather than a compact one.
> >
> > Are there any longjmp style situations in the kernel where the stack is
> > unwound by several frames? In these cases, the shadow stack must be unwound
> > accordingly.
> >
> 
> Hello Madhavan,
> 
> Let's discuss the details of shadow call stacks on a separate thread,
> instead of further hijacking Julien's series.

It's quite relevant to this thread.  There's no need to consider merging
Julien's patches if you have a better approach.  Why not discuss it
here?  I'm also interested in the answers to Madhavan's questions.
Madhavan T. Venkataraman Jan. 28, 2021, 10:10 p.m. UTC | #16
Hi,

I sent this suggestion to linux-arm-kernel in response to the Reliable Stacktrace RFC from Mark Brown
and Mark Rutland. I am repeating it here for two reasons:

- It involves objtool.

- There are many more recipients in this thread that may be interested in this topic.

Please let me know if this suggestion is acceptable. If it is not, please let me know why.
Thanks.

Also, I apologize to all of you who have received this more than once.

FP and no-FP functions
=====================

I have a suggestion for objtool and the unwinder for ARM64.

IIUC, objtool is responsible for walking all the code paths (except unreachable
and ignored ones) and making sure that every function has proper frame pointer
code (prolog, epilog, etc). If a function is found to not have it, the kernel
build is failed. Is this understanding correct?

If so, can we take a different approach for ARM64?

Instead of failing the kernel build, can we just mark the functions as:

	FP	Functions that have proper FP code
	no-FP	Functions that don't

May be, we can add an "FP" flag in the symbol table entry for this.

Then, the unwinder can check the functions it encounters in the stack trace and
inform the caller if it found any no-FP functions. The caller of the unwinder can
decide what he wants to do with that information.

	- the caller can ignore it

	- the caller can print the stack trace with a warning that no-FP functions
	  were found

	- if the caller is livepatch, the caller can retry until the no-FP functions
	  disappear from the stack trace. This way, we can have live patching even
	  when some of the functions in the kernel are no-FP.

Does this make any sense? Is this acceptable? What are the pitfalls?

If we can do this, the unwinder could detect cases such as:

- If gcc thinks that a function is a leaf function but the function contains
  inline assembly code that calls another function.

- If a call to a function bounces through some intermediate code such as a
  trampoline.

- etc.

For specific no-FP functions, the unwinder might be able to deduce the original
caller. In these cases, the stack trace would still be reliable. For all the others,
the stack trace would be considered unreliable.

Compiler instead of objtool
===========================

If the above suggestion is acceptable, I have another suggestion.

It is a lot of work for every new architecture to add frame pointer verification
support in objtool. Can we get some help from the compiler?

The compiler knows which C functions it generates the FP prolog and epilog for. It can
mark those functions as FP. As for assembly functions, kernel developers could manually
annotate functions that have proper FP code. The compiler/assembler would mark them
as FP. Only a small subset of assembly functions would even have FP prolog and epilog.

Is this acceptable? What are the pitfalls?

This can be implemented easily for all architectures for which the compiler generates
FP code.

Can this be implemented using a GCC plugin? I know squat about GCC plugins.

Thanks!

Madhavan
Mark Brown Jan. 29, 2021, 3:47 p.m. UTC | #17
On Thu, Jan 28, 2021 at 04:10:51PM -0600, Madhavan T. Venkataraman wrote:

> I sent this suggestion to linux-arm-kernel in response to the Reliable Stacktrace RFC from Mark Brown
> and Mark Rutland. I am repeating it here for two reasons:

> - It involves objtool.

> - There are many more recipients in this thread that may be interested in this topic.

> Please let me know if this suggestion is acceptable. If it is not, please let me know why.
> Thanks.

There was some feedback on the other thread raising concerns, including
questions about the benefits of trusting the compiler to flag
non-standard code without expectation mismatches and the utility of
allowing the compiler to do so in the first place.