diff mbox series

[RFC,12/17] gcc-plugins: objtool: Add plugin to detect switch table on arm64

Message ID 20210120173800.1660730-13-jthierry@redhat.com (mailing list archive)
State New, archived
Headers show
Series objtool: add base support for arm64 | expand

Commit Message

Julien Thierry Jan. 20, 2021, 5:37 p.m. UTC
From: Raphael Gault <raphael.gault@arm.com>

This plugins comes into play before the final 2 RTL passes of GCC and
detects switch-tables that are to be outputed in the ELF and writes
information in an ".discard.switch_table_info" section which will be
used by objtool.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
[J.T.: Change section name to store switch table information,
       Make plugin Kconfig be selected rather than opt-in by user,
       Add a relocation in the switch_table_info that points to
       the jump operation itself]
Signed-off-by: Julien Thierry <jthierry@redhat.com>
---
 arch/arm64/Kconfig                            |  1 +
 scripts/Makefile.gcc-plugins                  |  2 +
 scripts/gcc-plugins/Kconfig                   |  4 +
 .../arm64_switch_table_detection_plugin.c     | 85 +++++++++++++++++++
 4 files changed, 92 insertions(+)
 create mode 100644 scripts/gcc-plugins/arm64_switch_table_detection_plugin.c

Comments

Nick Desaulniers Jan. 27, 2021, 10:15 p.m. UTC | #1
> From: Raphael Gault <raphael.gault@arm.com>
> 
> This plugins comes into play before the final 2 RTL passes of GCC and
> detects switch-tables that are to be outputed in the ELF and writes
> information in an ".discard.switch_table_info" section which will be
> used by objtool.
> 
> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> [J.T.: Change section name to store switch table information,
>        Make plugin Kconfig be selected rather than opt-in by user,
>        Add a relocation in the switch_table_info that points to
>        the jump operation itself]
> Signed-off-by: Julien Thierry <jthierry@redhat.com>

Rather than tightly couple this feature to a particular toolchain via
plugin, it might be nice to consider what features could be spec'ed out
for toolchains to implement (perhaps via a -f flag).

Distributions (like Android, CrOS) wont be able to use such a feature as
is.
Josh Poimboeuf Jan. 27, 2021, 11:26 p.m. UTC | #2
On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
> > From: Raphael Gault <raphael.gault@arm.com>
> > 
> > This plugins comes into play before the final 2 RTL passes of GCC and
> > detects switch-tables that are to be outputed in the ELF and writes
> > information in an ".discard.switch_table_info" section which will be
> > used by objtool.
> > 
> > Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> > [J.T.: Change section name to store switch table information,
> >        Make plugin Kconfig be selected rather than opt-in by user,
> >        Add a relocation in the switch_table_info that points to
> >        the jump operation itself]
> > Signed-off-by: Julien Thierry <jthierry@redhat.com>
> 
> Rather than tightly couple this feature to a particular toolchain via
> plugin, it might be nice to consider what features could be spec'ed out
> for toolchains to implement (perhaps via a -f flag).

The problem is being able to detect switch statement jump table vectors.

For a given indirect branch (due to a switch statement), what are all
the corresponding jump targets?

We would need the compiler to annotate that information somehow.

> Distributions (like Android, CrOS) wont be able to use such a feature as
> is.

Would a Clang plugin be out of the question?
Nick Desaulniers Jan. 29, 2021, 6:10 p.m. UTC | #3
On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
> > > From: Raphael Gault <raphael.gault@arm.com>
> > >
> > > This plugins comes into play before the final 2 RTL passes of GCC and
> > > detects switch-tables that are to be outputed in the ELF and writes
> > > information in an ".discard.switch_table_info" section which will be
> > > used by objtool.
> > >
> > > Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> > > [J.T.: Change section name to store switch table information,
> > >        Make plugin Kconfig be selected rather than opt-in by user,
> > >        Add a relocation in the switch_table_info that points to
> > >        the jump operation itself]
> > > Signed-off-by: Julien Thierry <jthierry@redhat.com>
> >
> > Rather than tightly couple this feature to a particular toolchain via
> > plugin, it might be nice to consider what features could be spec'ed out
> > for toolchains to implement (perhaps via a -f flag).
>
> The problem is being able to detect switch statement jump table vectors.
>
> For a given indirect branch (due to a switch statement), what are all
> the corresponding jump targets?
>
> We would need the compiler to annotate that information somehow.

Makes sense, the compiler should have this information.  How is this
problem solved on x86?

>
> > Distributions (like Android, CrOS) wont be able to use such a feature as
> > is.
>
> Would a Clang plugin be out of the question?

Generally, we frown on out of tree kernel modules for a couple reasons.

Maintaining ABI compatibility when the core kernel changes is
generally not instantaneous; someone has to notice the ABI has changed
which will be more delayed than if the module was in tree.  Worse is
when semantics subtly change.  While we must not break userspace, we
provide no such guarantees within the kernel proper.

Also, it's less likely that out of tree kernel modules have been
reviewed by kernel developers.  They may not have the same quality,
use the recommended interfaces, follow coding conventions, etc..

Oh, did I say "out of tree kernel modules?"  I meant "compiler
plugins."  But it's two different sides of the same coin to me.

FWIW, I think the approach taken by -mstack-protector-guard-reg= is a
useful case study.  It was prototyped as a GCC extension, then added
to GCC proper, then added to LLVM (currently only x86, but most of the
machinery is in place in the compiler to get it running on arm64).  My
recommendation is to skip the plugin part and work on a standard
interface for compilers to implement, with input from compiler
developers.
Josh Poimboeuf Feb. 1, 2021, 9:44 p.m. UTC | #4
On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote:
> On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
> > > > From: Raphael Gault <raphael.gault@arm.com>
> > > >
> > > > This plugins comes into play before the final 2 RTL passes of GCC and
> > > > detects switch-tables that are to be outputed in the ELF and writes
> > > > information in an ".discard.switch_table_info" section which will be
> > > > used by objtool.
> > > >
> > > > Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> > > > [J.T.: Change section name to store switch table information,
> > > >        Make plugin Kconfig be selected rather than opt-in by user,
> > > >        Add a relocation in the switch_table_info that points to
> > > >        the jump operation itself]
> > > > Signed-off-by: Julien Thierry <jthierry@redhat.com>
> > >
> > > Rather than tightly couple this feature to a particular toolchain via
> > > plugin, it might be nice to consider what features could be spec'ed out
> > > for toolchains to implement (perhaps via a -f flag).
> >
> > The problem is being able to detect switch statement jump table vectors.
> >
> > For a given indirect branch (due to a switch statement), what are all
> > the corresponding jump targets?
> >
> > We would need the compiler to annotate that information somehow.
> 
> Makes sense, the compiler should have this information.  How is this
> problem solved on x86?

Thus far we've been able to successfully reverse engineer it on x86,
though it hasn't been easy.

There were some particulars for arm64 which made doing so impossible.
(I don't remember the details.)


> > > Distributions (like Android, CrOS) wont be able to use such a feature as
> > > is.
> >
> > Would a Clang plugin be out of the question?
> 
> Generally, we frown on out of tree kernel modules for a couple reasons.
> 
> Maintaining ABI compatibility when the core kernel changes is
> generally not instantaneous; someone has to notice the ABI has changed
> which will be more delayed than if the module was in tree.  Worse is
> when semantics subtly change.  While we must not break userspace, we
> provide no such guarantees within the kernel proper.
> 
> Also, it's less likely that out of tree kernel modules have been
> reviewed by kernel developers.  They may not have the same quality,
> use the recommended interfaces, follow coding conventions, etc..
> 
> Oh, did I say "out of tree kernel modules?"  I meant "compiler
> plugins."  But it's two different sides of the same coin to me.

I thought Android already relied on OOT modules.

GCC plugins generally enforce the exact same GCC version for OOT
modules.  So there's no ABI to worry about.  I assume Clang does the
same?

Or did I miss your point?

> FWIW, I think the approach taken by -mstack-protector-guard-reg= is a
> useful case study.  It was prototyped as a GCC extension, then added
> to GCC proper, then added to LLVM (currently only x86, but most of the
> machinery is in place in the compiler to get it running on arm64).  My
> recommendation is to skip the plugin part and work on a standard
> interface for compilers to implement, with input from compiler
> developers.

I like the idea.  Is there a recommended forum for such discussions?
Just an email to GCC/Clang development lists?
Nick Desaulniers Feb. 1, 2021, 11:17 p.m. UTC | #5
On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote:
> > On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
> > > > > From: Raphael Gault <raphael.gault@arm.com>
> > > > >
> > > > > This plugins comes into play before the final 2 RTL passes of GCC and
> > > > > detects switch-tables that are to be outputed in the ELF and writes
> > > > > information in an ".discard.switch_table_info" section which will be
> > > > > used by objtool.
> > > > >
> > > > > Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> > > > > [J.T.: Change section name to store switch table information,
> > > > >        Make plugin Kconfig be selected rather than opt-in by user,
> > > > >        Add a relocation in the switch_table_info that points to
> > > > >        the jump operation itself]
> > > > > Signed-off-by: Julien Thierry <jthierry@redhat.com>
> > > >
> > > > Rather than tightly couple this feature to a particular toolchain via
> > > > plugin, it might be nice to consider what features could be spec'ed out
> > > > for toolchains to implement (perhaps via a -f flag).
> > >
> > > The problem is being able to detect switch statement jump table vectors.
> > >
> > > For a given indirect branch (due to a switch statement), what are all
> > > the corresponding jump targets?
> > >
> > > We would need the compiler to annotate that information somehow.
> >
> > Makes sense, the compiler should have this information.  How is this
> > problem solved on x86?
>
> Thus far we've been able to successfully reverse engineer it on x86,
> though it hasn't been easy.
>
> There were some particulars for arm64 which made doing so impossible.
> (I don't remember the details.)

I think the details are pertinent to finding a portable solution.  The
commit message of this commit in particular doesn't document such
details, such as why such an approach is necessary or how the data is
laid out for objtool to consume it.

> > > > Distributions (like Android, CrOS) wont be able to use such a feature as
> > > > is.
> > >
> > > Would a Clang plugin be out of the question?
> >
> > Generally, we frown on out of tree kernel modules for a couple reasons.
> >
> > Maintaining ABI compatibility when the core kernel changes is
> > generally not instantaneous; someone has to notice the ABI has changed
> > which will be more delayed than if the module was in tree.  Worse is
> > when semantics subtly change.  While we must not break userspace, we
> > provide no such guarantees within the kernel proper.
> >
> > Also, it's less likely that out of tree kernel modules have been
> > reviewed by kernel developers.  They may not have the same quality,
> > use the recommended interfaces, follow coding conventions, etc..
> >
> > Oh, did I say "out of tree kernel modules?"  I meant "compiler
> > plugins."  But it's two different sides of the same coin to me.
>
> I thought Android already relied on OOT modules.

Android Common Kernel does not *rely* on OOT modules or compiler
plugins.  Android doesn't forbid vendors from using OOT modules,
though, just as the mainline kernel does not prevent modules from
being built out of tree, or users from insmod'ing them.  It's
certainly a pragmatic approach; idealism doesn't work for all OEMs.

Personally, I lean more towards idealistic; I prefer in-tree modules,
dislike compiler plugins (for much the same reasons), and idealize
loose coupling of software components. This series looks to me like it
tightly couples arm64 live patching to objtool and GCC.

On the earlier thread, Julien writes:

>> I think most people interested in livepatching are using GCC built
>> kernels, but I could be mistaken (althought in the long run, both
>> compilers should be supported, and yes, I realize the objtool solution
>> currently only would support GCC).

Google's production kernels are using livepatching and are built with
Clang.  Getting similar functionality working for arm64 would be of
interest.

> GCC plugins generally enforce the exact same GCC version for OOT
> modules.  So there's no ABI to worry about.  I assume Clang does the
> same?
>
> Or did I miss your point?

I think so.  It would seem that the current solution for stack
validation would require multiple different compiler plugins to work
reliably.  If jump tables are an issue, I don't see why you wouldn't
pursue a more portable option like -fno-jump-tables first, then work
to claw back any performance loss from that, if relevant?  Then
there's no complaint about toolchain specific implementations or tight
coupling.

Objtool support on arm64 is interesting to me though, because it has
found bugs in LLVM codegen. That alone is extremely valuable.  But not
it's not helpful if it's predicated or tightly coupled to GCC, as this
series appears to do.  The idea of rebuilding control flow from binary
analysis and using that to find codegen bugs is a really cool idea
(novel, even? idk), and I wish we had some analog for userspace
binaries that could perform similar checks.

>
> > FWIW, I think the approach taken by -mstack-protector-guard-reg= is a
> > useful case study.  It was prototyped as a GCC extension, then added
> > to GCC proper, then added to LLVM (currently only x86, but most of the
> > machinery is in place in the compiler to get it running on arm64).  My
> > recommendation is to skip the plugin part and work on a standard
> > interface for compilers to implement, with input from compiler
> > developers.
>
> I like the idea.  Is there a recommended forum for such discussions?
> Just an email to GCC/Clang development lists?

linux-toolchains@vger.kernel.org is probably a good start.

--
Thanks,
~Nick Desaulniers
Josh Poimboeuf Feb. 2, 2021, 12:02 a.m. UTC | #6
On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote:
> > > > > Rather than tightly couple this feature to a particular toolchain via
> > > > > plugin, it might be nice to consider what features could be spec'ed out
> > > > > for toolchains to implement (perhaps via a -f flag).
> > > >
> > > > The problem is being able to detect switch statement jump table vectors.
> > > >
> > > > For a given indirect branch (due to a switch statement), what are all
> > > > the corresponding jump targets?
> > > >
> > > > We would need the compiler to annotate that information somehow.
> > >
> > > Makes sense, the compiler should have this information.  How is this
> > > problem solved on x86?
> >
> > Thus far we've been able to successfully reverse engineer it on x86,
> > though it hasn't been easy.
> >
> > There were some particulars for arm64 which made doing so impossible.
> > (I don't remember the details.)
> 
> I think the details are pertinent to finding a portable solution.  The
> commit message of this commit in particular doesn't document such
> details, such as why such an approach is necessary or how the data is
> laid out for objtool to consume it.

Agreed, the commit message needs more details.

> > > > > Distributions (like Android, CrOS) wont be able to use such a feature as
> > > > > is.
> > > >
> > > > Would a Clang plugin be out of the question?
> > >
> > > Generally, we frown on out of tree kernel modules for a couple reasons.
> > >
> > > Maintaining ABI compatibility when the core kernel changes is
> > > generally not instantaneous; someone has to notice the ABI has changed
> > > which will be more delayed than if the module was in tree.  Worse is
> > > when semantics subtly change.  While we must not break userspace, we
> > > provide no such guarantees within the kernel proper.
> > >
> > > Also, it's less likely that out of tree kernel modules have been
> > > reviewed by kernel developers.  They may not have the same quality,
> > > use the recommended interfaces, follow coding conventions, etc..
> > >
> > > Oh, did I say "out of tree kernel modules?"  I meant "compiler
> > > plugins."  But it's two different sides of the same coin to me.
> >
> > I thought Android already relied on OOT modules.
> 
> Android Common Kernel does not *rely* on OOT modules or compiler
> plugins.  Android doesn't forbid vendors from using OOT modules,
> though, just as the mainline kernel does not prevent modules from
> being built out of tree, or users from insmod'ing them.  It's
> certainly a pragmatic approach; idealism doesn't work for all OEMs.

No judgement, RHEL is similar.  Linux is nothing if not pragmatic.

> Personally, I lean more towards idealistic; I prefer in-tree modules,
> dislike compiler plugins (for much the same reasons), and idealize
> loose coupling of software components. This series looks to me like it
> tightly couples arm64 live patching to objtool and GCC.
> 
> On the earlier thread, Julien writes:
> 
> >> I think most people interested in livepatching are using GCC built
> >> kernels, but I could be mistaken (althought in the long run, both
> >> compilers should be supported, and yes, I realize the objtool solution
> >> currently only would support GCC).
> 
> Google's production kernels are using livepatching and are built with
> Clang.  Getting similar functionality working for arm64 would be of
> interest.

Well, that's cool.  I had no idea.

I'm curious how they're generating livepatch modules?  Because
kpatch-build doesn't support Clang (AFAIK), and if they're not using
kpatch-build then there are some traps to look out for.

> > GCC plugins generally enforce the exact same GCC version for OOT
> > modules.  So there's no ABI to worry about.  I assume Clang does the
> > same?
> >
> > Or did I miss your point?
> 
> I think so.  It would seem that the current solution for stack
> validation would require multiple different compiler plugins to work
> reliably.  If jump tables are an issue, I don't see why you wouldn't
> pursue a more portable option like -fno-jump-tables first, then work
> to claw back any performance loss from that, if relevant?  Then
> there's no complaint about toolchain specific implementations or tight
> coupling.
> 
> Objtool support on arm64 is interesting to me though, because it has
> found bugs in LLVM codegen. That alone is extremely valuable.  But not
> it's not helpful if it's predicated or tightly coupled to GCC, as this
> series appears to do.

I agree 100%, if there are actual Clang livepatch users (which it sounds
like there are) then we should target both compilers.

And yes, objtool has been pretty good at finding compiler bugs, so the
more coverage the better.

I like the -fno-jump-tables suggestion, assuming it doesn't affect
performance significantly.

> The idea of rebuilding control flow from binary analysis and using
> that to find codegen bugs is a really cool idea (novel, even? idk),
> and I wish we had some analog for userspace binaries that could
> perform similar checks.

Objtool is generic in many ways -- in fact I recently heard from a PhD
candidate who used it successfully on another kernel for an ORC
unwinder.

It could probably be used on user space without much effort.  That was
an early original stated goal but I definitely don't have the bandwidth
or incentive to work on it.

> > > FWIW, I think the approach taken by -mstack-protector-guard-reg= is a
> > > useful case study.  It was prototyped as a GCC extension, then added
> > > to GCC proper, then added to LLVM (currently only x86, but most of the
> > > machinery is in place in the compiler to get it running on arm64).  My
> > > recommendation is to skip the plugin part and work on a standard
> > > interface for compilers to implement, with input from compiler
> > > developers.
> >
> > I like the idea.  Is there a recommended forum for such discussions?
> > Just an email to GCC/Clang development lists?
> 
> linux-toolchains@vger.kernel.org is probably a good start.

Thanks, I'll write something up (maybe a more specific proposal) and
post it.
Julien Thierry Feb. 2, 2021, 8:57 a.m. UTC | #7
On 2/2/21 12:17 AM, Nick Desaulniers wrote:
> On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>
>> On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote:
>>> On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>>>
>>>> On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
>>>>>> From: Raphael Gault <raphael.gault@arm.com>
>>>>>>
>>>>>> This plugins comes into play before the final 2 RTL passes of GCC and
>>>>>> detects switch-tables that are to be outputed in the ELF and writes
>>>>>> information in an ".discard.switch_table_info" section which will be
>>>>>> used by objtool.
>>>>>>
>>>>>> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
>>>>>> [J.T.: Change section name to store switch table information,
>>>>>>         Make plugin Kconfig be selected rather than opt-in by user,
>>>>>>         Add a relocation in the switch_table_info that points to
>>>>>>         the jump operation itself]
>>>>>> Signed-off-by: Julien Thierry <jthierry@redhat.com>
>>>>>
>>>>> Rather than tightly couple this feature to a particular toolchain via
>>>>> plugin, it might be nice to consider what features could be spec'ed out
>>>>> for toolchains to implement (perhaps via a -f flag).
>>>>
>>>> The problem is being able to detect switch statement jump table vectors.
>>>>
>>>> For a given indirect branch (due to a switch statement), what are all
>>>> the corresponding jump targets?
>>>>
>>>> We would need the compiler to annotate that information somehow.
>>>
>>> Makes sense, the compiler should have this information.  How is this
>>> problem solved on x86?
>>
>> Thus far we've been able to successfully reverse engineer it on x86,
>> though it hasn't been easy.
>>
>> There were some particulars for arm64 which made doing so impossible.
>> (I don't remember the details.)

The main issue is that the tables for arm64 have more indirection than x86.

On x86, the dispatching jump instruction fetches the target address from 
a contiguous array of addresses based on a given offset. So the list of 
potential targets of the jump is neatly organized in a table (and sure, 
before link time these are just relocation, but still processable).

On arm64 (with GCC at least), what is stored in a table is an array of 
candidate offsets from the jump instruction. And because arm64 is 
limited to 32bit instructions, the encoding often requires multiple 
instructions to compute the target address:

ldr<*>  x_offset, [x_offsets_table, x_index, ...]  // load offset
adr     x_dest_base, <addr>          // load target branch for offset 0
add     x_dest, x_target_base, x_offset, ...  // compute final address
br      x_dest        // jump

Where this gets trickier is that (with GCC) the offsets stored in the 
table might or might not be signed constants (and this can be seen in 
GCC intermediate representations, but I do not believe this information 
is output in the final object file). And on top of that, GCC might 
decide to use offsets that are seen as unsigned during intermediate 
representation as signed offset by sign extending them in the add 
instruction.

So, to handle this we'd have to track the different operation done with 
the offset, from the load to the final jump, decoding the instructions
and deducing the potential target instructions from the table of offsets.

But that is error prone as we don't really know how many instructions 
can be between the ones doing the address computation, and I remember 
some messy case of a jump table inside a jump table where tracking the 
instruction touching one or the other offset would need a lot of corner 
case handling.

And this of course is just for GCC, I haven't looked at what it all 
looks like on Clang's end.


> 
> I think the details are pertinent to finding a portable solution.  The
> commit message of this commit in particular doesn't document such
> details, such as why such an approach is necessary or how the data is
> laid out for objtool to consume it.
> 

Sorry, I will need to make that clearer. The next patch explains it a 
bit [1]

Basically, for simplicity, the plugin creates a new section containing 
tables (one per jump table) of references to the jump targets, similar 
to what x86 has, except that in this case this table isn't actually used 
by runtime code and is discarded at link time. I only chose this to 
minimize what needed to be changed in objtool and because the format 
seemed simple enough.

But I'm open on some alternative, whether it's a -fjump-table-info 
option added to compilers with a different format to do the links. The 
important requirement is to be able to know all the candidate targets 
for a "br <reg>" instruction.

[1] https://lkml.org/lkml/2021/1/20/910

Thanks,
David Laight Feb. 2, 2021, 2:24 p.m. UTC | #8
Stirring more goop into the hole ....

Requiring gcc-plugins, matching compiler versions and the
same 'dwarf' format for OOT modules is probably very painful.

In many cases (and this may include drivers released by some
distributions) an OOT driver has two separate parts.

One part is C source that is compiled when the module is built
on the target system and against the installed kernel headers.
Getting this to match 'just' relies on having the correct
compiler (etc) installed and in $PATH.

The second part is much more problematic.
This is just an object file compiled by a third party.
It doesn't directly depend on anything defined in the
kernel headers - so can (currently) be linked into any
kernel version.

In the past some graphics drivers have had a third party
object file.
I think some of the laptop wifi drivers might as well.

Now I some people think everything should be free source.
But there are various commercial and practical reasons
for both OOT drivers and object file 'blobs' in OOT drivers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Nick Desaulniers Feb. 2, 2021, 10:33 p.m. UTC | #9
On Mon, Feb 1, 2021 at 4:02 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote:
> > On the earlier thread, Julien writes:
> >
> > >> I think most people interested in livepatching are using GCC built
> > >> kernels, but I could be mistaken (althought in the long run, both
> > >> compilers should be supported, and yes, I realize the objtool solution
> > >> currently only would support GCC).
> >
> > Google's production kernels are using livepatching and are built with
> > Clang.  Getting similar functionality working for arm64 would be of
> > interest.
>
> Well, that's cool.  I had no idea.
>
> I'm curious how they're generating livepatch modules?  Because
> kpatch-build doesn't support Clang (AFAIK), and if they're not using
> kpatch-build then there are some traps to look out for.

Ok, I just met with a bunch of folks that are actively working on
this.  Let me intro
Yonghyun Hwang <yonghyun@google.com>
Pete Swain <swine@google.com>
who will be the folks on point for this from Google.

My understanding after some clarifications today is that Google is
currently using a proprietary kernel patching mechanism that developed
around a decade ago, "pre-ksplice Oracle acquisition."  But we are
looking to transition to kpatch, and help towards supporting arm64.
Live patching is important for deploying kernel fixes faster than
predetermined scheduled draining of jobs in clusters.

The first steps for kpatch transition is supporting builds with Clang.
Yonghyun is working on that and my hope is he will have patches for
you for that soon.

Curiously, the proprietary mechanism doesn't rely on stack validation.
I think that such dependency was questioned on the cover letter
patch's thread as well.  Maybe there's "some traps to look out for"
you're referring to there?  I'm not privy to the details, though I
would guess it has to do with ensuring kernel threads aren't executing
(or planning to return through) code regions that are trying to be
patched/unpatched.  I am curious about frame pointers never being
omitted for arm64; is frame pointer chasing is unreliable in certain
contexts?

The internal functionality has been used heavily in production for
almost a decade, though without it being public or supporting arm64;
I'm not sure precisely how they solve such issues (or how others might
review such an approach).

Either way, the dependencies for live patching are less important, so
long as they are toolchain portable.  The ability to live patch kernel
images is ___important___ to Google.

> > Objtool support on arm64 is interesting to me though, because it has
> > found bugs in LLVM codegen. That alone is extremely valuable.  But not
> > it's not helpful if it's predicated or tightly coupled to GCC, as this
> > series appears to do.
>
> I agree 100%, if there are actual Clang livepatch users (which it sounds
> like there are) then we should target both compilers.

Or will be. (Sorry, I didn't know we hadn't completed the transition
to kpatch yet.  It is "the opposite side of the house" from where I
work; I literally have 8 bosses, not kidding).

Though if kpatch moves to requiring GCC plugins for architectures we
use extensively or would like to use more of, that's probably going to
throw a wrench in multiple transition plans.  (The fleet's transition
to Clang is done, I'm not worried about that).

> And yes, objtool has been pretty good at finding compiler bugs, so the
> more coverage the better.
> > The idea of rebuilding control flow from binary analysis and using
> > that to find codegen bugs is a really cool idea (novel, even? idk),
> > and I wish we had some analog for userspace binaries that could
> > perform similar checks.
>
> Objtool is generic in many ways -- in fact I recently heard from a PhD
> candidate who used it successfully on another kernel for an ORC
> unwinder.

That's pretty cool!  Reuse outside the initial context is always a
good sign that something was designed right.

> It could probably be used on user space without much effort.  That was
> an early original stated goal but I definitely don't have the bandwidth
> or incentive to work on it.

Heh.  I'm a big fan of game theory; carrot or stick, right?
Nick Desaulniers Feb. 2, 2021, 11:01 p.m. UTC | #10
On Tue, Feb 2, 2021 at 12:57 AM Julien Thierry <jthierry@redhat.com> wrote:
>
>
>
> On 2/2/21 12:17 AM, Nick Desaulniers wrote:
> > On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>
> >> On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote:
> >>> On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>>>
> >>>> On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
> >>>>>> From: Raphael Gault <raphael.gault@arm.com>
> >>>>>>
> >>>>>> This plugins comes into play before the final 2 RTL passes of GCC and
> >>>>>> detects switch-tables that are to be outputed in the ELF and writes
> >>>>>> information in an ".discard.switch_table_info" section which will be
> >>>>>> used by objtool.
> >>>>>>
> >>>>>> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> >>>>>> [J.T.: Change section name to store switch table information,
> >>>>>>         Make plugin Kconfig be selected rather than opt-in by user,
> >>>>>>         Add a relocation in the switch_table_info that points to
> >>>>>>         the jump operation itself]
> >>>>>> Signed-off-by: Julien Thierry <jthierry@redhat.com>
> >>>>>
> >>>>> Rather than tightly couple this feature to a particular toolchain via
> >>>>> plugin, it might be nice to consider what features could be spec'ed out
> >>>>> for toolchains to implement (perhaps via a -f flag).
> >>>>
> >>>> The problem is being able to detect switch statement jump table vectors.
> >>>>
> >>>> For a given indirect branch (due to a switch statement), what are all
> >>>> the corresponding jump targets?
> >>>>
> >>>> We would need the compiler to annotate that information somehow.
> >>>
> >>> Makes sense, the compiler should have this information.  How is this
> >>> problem solved on x86?
> >>
> >> Thus far we've been able to successfully reverse engineer it on x86,
> >> though it hasn't been easy.
> >>
> >> There were some particulars for arm64 which made doing so impossible.
> >> (I don't remember the details.)
>
> The main issue is that the tables for arm64 have more indirection than x86.

I wonder if PAC or BTI also make this slightly more complex?  PAC at
least has implications for unwinders, IIUC.

>
> On x86, the dispatching jump instruction fetches the target address from
> a contiguous array of addresses based on a given offset. So the list of
> potential targets of the jump is neatly organized in a table (and sure,
> before link time these are just relocation, but still processable).
>
> On arm64 (with GCC at least), what is stored in a table is an array of
> candidate offsets from the jump instruction. And because arm64 is
> limited to 32bit instructions, the encoding often requires multiple
> instructions to compute the target address:
>
> ldr<*>  x_offset, [x_offsets_table, x_index, ...]  // load offset
> adr     x_dest_base, <addr>          // load target branch for offset 0
> add     x_dest, x_target_base, x_offset, ...  // compute final address
> br      x_dest        // jump
>
> Where this gets trickier is that (with GCC) the offsets stored in the
> table might or might not be signed constants (and this can be seen in
> GCC intermediate representations, but I do not believe this information
> is output in the final object file). And on top of that, GCC might
> decide to use offsets that are seen as unsigned during intermediate
> representation as signed offset by sign extending them in the add
> instruction.
>
> So, to handle this we'd have to track the different operation done with
> the offset, from the load to the final jump, decoding the instructions
> and deducing the potential target instructions from the table of offsets.
>
> But that is error prone as we don't really know how many instructions
> can be between the ones doing the address computation, and I remember
> some messy case of a jump table inside a jump table where tracking the
> instruction touching one or the other offset would need a lot of corner
> case handling.
>
> And this of course is just for GCC, I haven't looked at what it all
> looks like on Clang's end.

Sure, but this is what production unwinders do, and they don't require
compiler plugins, right?  I don't doubt unwinders can be made simpler
with changes to toolchain output; please work with your compiler
vendor on making such changes rather than relying on compiler plugins
to do so.

> > I think the details are pertinent to finding a portable solution.  The
> > commit message of this commit in particular doesn't document such
> > details, such as why such an approach is necessary or how the data is
> > laid out for objtool to consume it.
> >
>
> Sorry, I will need to make that clearer. The next patch explains it a
> bit [1]
>
> Basically, for simplicity, the plugin creates a new section containing

Right, this takes a focus on simplicity, at the cost of alienating a toolchain.

Ard's point about 3193c0836f20 relating to -fgcse is that when
presented with tricky cases to unwind, the simplest approach is taken.
There it was disabling a compiler specific compiler optimization, here
it's either a compiler specific compiler plugin (or disabling another
compiler optimization).  The pattern seems to be "Objtool isn't smart
enough" ... "compiler optimization disabled" or "compiler plugin
dependency."

> tables (one per jump table) of references to the jump targets, similar
> to what x86 has, except that in this case this table isn't actually used
> by runtime code and is discarded at link time. I only chose this to
> minimize what needed to be changed in objtool and because the format
> seemed simple enough.
>
> But I'm open on some alternative, whether it's a -fjump-table-info

Yes, I think we could spec out something like that.  But I would
appreciate revisiting open questions around stack validation (frame
pointers), preventing the generation of jump tables to begin with
(-fno-jump-tables) in place of making objtool more robust, or
generally the need to depend on compiler plugins.

> option added to compilers with a different format to do the links. The
> important requirement is to be able to know all the candidate targets
> for a "br <reg>" instruction.
>
> [1] https://lkml.org/lkml/2021/1/20/910
>
> Thanks,
>
> --
> Julien Thierry
>
Josh Poimboeuf Feb. 2, 2021, 11:36 p.m. UTC | #11
On Tue, Feb 02, 2021 at 02:33:38PM -0800, Nick Desaulniers wrote:
> On Mon, Feb 1, 2021 at 4:02 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote:
> > > On the earlier thread, Julien writes:
> > >
> > > >> I think most people interested in livepatching are using GCC built
> > > >> kernels, but I could be mistaken (althought in the long run, both
> > > >> compilers should be supported, and yes, I realize the objtool solution
> > > >> currently only would support GCC).
> > >
> > > Google's production kernels are using livepatching and are built with
> > > Clang.  Getting similar functionality working for arm64 would be of
> > > interest.
> >
> > Well, that's cool.  I had no idea.
> >
> > I'm curious how they're generating livepatch modules?  Because
> > kpatch-build doesn't support Clang (AFAIK), and if they're not using
> > kpatch-build then there are some traps to look out for.
> 
> Ok, I just met with a bunch of folks that are actively working on
> this.  Let me intro
> Yonghyun Hwang <yonghyun@google.com>
> Pete Swain <swine@google.com>
> who will be the folks on point for this from Google.

Nice to meet you all.  Adding the live-patching ML sub-thread.

> My understanding after some clarifications today is that Google is
> currently using a proprietary kernel patching mechanism that developed
> around a decade ago, "pre-ksplice Oracle acquisition."  But we are
> looking to transition to kpatch, and help towards supporting arm64.
> Live patching is important for deploying kernel fixes faster than
> predetermined scheduled draining of jobs in clusters.
> 
> The first steps for kpatch transition is supporting builds with Clang.
> Yonghyun is working on that and my hope is he will have patches for
> you for that soon.

That would be great!

> Curiously, the proprietary mechanism doesn't rely on stack validation.

If this proprietary mechanism relies on stack traces, that could
problematic.  Livepatch originally made the same assumption, but it was
shot down quickly:

  https://lwn.net/Articles/634649/
  https://lwn.net/Articles/658333/

> I think that such dependency was questioned on the cover letter
> patch's thread as well.

Yes, though it's generally agreed that unvalidated compiler-generated
unwinder metadata isn't going to be robust enough for kernel live
patching.

> Maybe there's "some traps to look out for" you're referring to there?

The "traps" are more about how the patches are generated.  If they're
built with source code, like a normal kernel module, you have to be
extra careful because of function ABI nastiness.  kpatch-build avoids
this problem.  Unfortunately this still isn't documented.

> I'm not privy to the details, though I would guess it has to do with
> ensuring kernel threads aren't executing (or planning to return
> through) code regions that are trying to be patched/unpatched.

Right.  There are some good details in
Documentation/livepatch/livepatch.rst.

> I am curious about frame pointers never being omitted for arm64; is
> frame pointer chasing is unreliable in certain contexts?

Yes, problematic areas are interrupts, exceptions, inline asm,
hand-coded asm.  A nice document was recently added in
Documentation/livepatch/reliable-stacktrace.rst which covers a lot of
this stuff.

> The internal functionality has been used heavily in production for
> almost a decade, though without it being public or supporting arm64;
> I'm not sure precisely how they solve such issues (or how others might
> review such an approach).

Very impressive to run it in production that long.  Their experience and
expertise is definitely welcome.

> Either way, the dependencies for live patching are less important, so
> long as they are toolchain portable.  The ability to live patch kernel
> images is ___important___ to Google.
> 
> > > Objtool support on arm64 is interesting to me though, because it has
> > > found bugs in LLVM codegen. That alone is extremely valuable.  But not
> > > it's not helpful if it's predicated or tightly coupled to GCC, as this
> > > series appears to do.
> >
> > I agree 100%, if there are actual Clang livepatch users (which it sounds
> > like there are) then we should target both compilers.
> 
> Or will be. (Sorry, I didn't know we hadn't completed the transition
> to kpatch yet.  It is "the opposite side of the house" from where I
> work; I literally have 8 bosses, not kidding).
> 
> Though if kpatch moves to requiring GCC plugins for architectures we
> use extensively or would like to use more of, that's probably going to
> throw a wrench in multiple transition plans.  (The fleet's transition
> to Clang is done, I'm not worried about that).

Hopefully we can just forget the GCC plugin idea.

It would be really nice to see some performance numbers for
-fno-jump-tables so we can justify doing that instead, at least in the
short-term.  I'd suspect the difference isn't measurable in the real
world.

(In the case of GCC+retpolines, it would be a performance improvement.)

> > And yes, objtool has been pretty good at finding compiler bugs, so the
> > more coverage the better.
> > > The idea of rebuilding control flow from binary analysis and using
> > > that to find codegen bugs is a really cool idea (novel, even? idk),
> > > and I wish we had some analog for userspace binaries that could
> > > perform similar checks.
> >
> > Objtool is generic in many ways -- in fact I recently heard from a PhD
> > candidate who used it successfully on another kernel for an ORC
> > unwinder.
> 
> That's pretty cool!  Reuse outside the initial context is always a
> good sign that something was designed right.

So basically you're saying objtool is both useful and well-designed.  I
will quote you on that!
Nick Desaulniers Feb. 2, 2021, 11:52 p.m. UTC | #12
On Tue, Feb 2, 2021 at 3:36 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Tue, Feb 02, 2021 at 02:33:38PM -0800, Nick Desaulniers wrote:
> > On Mon, Feb 1, 2021 at 4:02 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote:
> > > And yes, objtool has been pretty good at finding compiler bugs, so the
> > > more coverage the better.
> > > > The idea of rebuilding control flow from binary analysis and using
> > > > that to find codegen bugs is a really cool idea (novel, even? idk),
> > > > and I wish we had some analog for userspace binaries that could
> > > > perform similar checks.
> > >
> > > Objtool is generic in many ways -- in fact I recently heard from a PhD
> > > candidate who used it successfully on another kernel for an ORC
> > > unwinder.
> >
> > That's pretty cool!  Reuse outside the initial context is always a
> > good sign that something was designed right.
>
> So basically you're saying objtool is both useful and well-designed.  I
> will quote you on that!

Haha, all I'm saying is that while I'm not proud that it did find bugs
in LLVM (and I do have existing bugs found by it to fix on my plate),
I don't see who else or how else those would have been spotted, and I
can appreciate that.  I think the tools given to us are broken (by
design, perhaps), so anything that can help us spot issues might help
our code live longer than we do.

I also think that there's room for improvement and experimentation in
debug info formats, though there is currently a proliferation to
support.  Live patching and eBPF seem to have some functional overlap
IIUC, strengths/weaknesses, and their own unique debug info formats to
go with it.  Supporting each one does require some level of toolchain
support or coordination (or complexity, even).
Josh Poimboeuf Feb. 3, 2021, 12:14 a.m. UTC | #13
On Tue, Feb 02, 2021 at 03:01:22PM -0800, Nick Desaulniers wrote:
> > >> Thus far we've been able to successfully reverse engineer it on x86,
> > >> though it hasn't been easy.
> > >>
> > >> There were some particulars for arm64 which made doing so impossible.
> > >> (I don't remember the details.)
> >
> > The main issue is that the tables for arm64 have more indirection than x86.
> 
> I wonder if PAC or BTI also make this slightly more complex?  PAC at
> least has implications for unwinders, IIUC.

What is PAC/BTI?

> > On x86, the dispatching jump instruction fetches the target address from
> > a contiguous array of addresses based on a given offset. So the list of
> > potential targets of the jump is neatly organized in a table (and sure,
> > before link time these are just relocation, but still processable).
> >
> > On arm64 (with GCC at least), what is stored in a table is an array of
> > candidate offsets from the jump instruction. And because arm64 is
> > limited to 32bit instructions, the encoding often requires multiple
> > instructions to compute the target address:
> >
> > ldr<*>  x_offset, [x_offsets_table, x_index, ...]  // load offset
> > adr     x_dest_base, <addr>          // load target branch for offset 0
> > add     x_dest, x_target_base, x_offset, ...  // compute final address
> > br      x_dest        // jump
> >
> > Where this gets trickier is that (with GCC) the offsets stored in the
> > table might or might not be signed constants (and this can be seen in
> > GCC intermediate representations, but I do not believe this information
> > is output in the final object file). And on top of that, GCC might
> > decide to use offsets that are seen as unsigned during intermediate
> > representation as signed offset by sign extending them in the add
> > instruction.
> >
> > So, to handle this we'd have to track the different operation done with
> > the offset, from the load to the final jump, decoding the instructions
> > and deducing the potential target instructions from the table of offsets.
> >
> > But that is error prone as we don't really know how many instructions
> > can be between the ones doing the address computation, and I remember
> > some messy case of a jump table inside a jump table where tracking the
> > instruction touching one or the other offset would need a lot of corner
> > case handling.
> >
> > And this of course is just for GCC, I haven't looked at what it all
> > looks like on Clang's end.
> 
> Sure, but this is what production unwinders do, and they don't require
> compiler plugins, right?

What do you mean by "production unwinders"?  Generally unwinders rely on
either frame pointers or DWARF, but (without validation) those aren't
robust enough for live patching in the kernel, so I'm not sure how this
is relevant.

> > > I think the details are pertinent to finding a portable solution.  The
> > > commit message of this commit in particular doesn't document such
> > > details, such as why such an approach is necessary or how the data is
> > > laid out for objtool to consume it.
> > >
> >
> > Sorry, I will need to make that clearer. The next patch explains it a
> > bit [1]
> >
> > Basically, for simplicity, the plugin creates a new section containing
> 
> Right, this takes a focus on simplicity, at the cost of alienating a toolchain.
> 
> Ard's point about 3193c0836f20 relating to -fgcse is that when
> presented with tricky cases to unwind, the simplest approach is taken.
> There it was disabling a compiler specific compiler optimization, here
> it's either a compiler specific compiler plugin (or disabling another
> compiler optimization).  The pattern seems to be "Objtool isn't smart
> enough" ... "compiler optimization disabled" or "compiler plugin
> dependency."

You're taking the two absolute worst case scenarios (one of which is
just a patch which doesn't look like it's going to get merged anyway)
and drawing a false narrative.

In this case the simplest approach would have been to just give up and
disable jump tables.

We try as hard as possible (beyond turning objtool into a full emulator)
to avoid doing that kind of thing because objtool isn't supposed to
dictate kernel optimizations.  Otherwise we would have disabled jump
tables (even for non-retpolines) a long time ago, because that's been a
serious PITA.

You might not like the plugin -- I don't like it either -- but the goal
was to avoid penalizing the kernel with "objtool-friendly"
optimizations.

That said, jump tables are such a pain for objtool (and currently
impossible to deal with for arm64) that I'm completely open to just
disabling them if they're shown to have negligible benefit for the
kernel.
Julien Thierry Feb. 3, 2021, 8:11 a.m. UTC | #14
On 2/3/21 12:01 AM, Nick Desaulniers wrote:
> On Tue, Feb 2, 2021 at 12:57 AM Julien Thierry <jthierry@redhat.com> wrote:
>>
>>
>>
>> On 2/2/21 12:17 AM, Nick Desaulniers wrote:
>>> On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>>>
>>>> On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote:
>>>>> On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>>>>>
>>>>>> On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote:
>>>>>>>> From: Raphael Gault <raphael.gault@arm.com>
>>>>>>>>
>>>>>>>> This plugins comes into play before the final 2 RTL passes of GCC and
>>>>>>>> detects switch-tables that are to be outputed in the ELF and writes
>>>>>>>> information in an ".discard.switch_table_info" section which will be
>>>>>>>> used by objtool.
>>>>>>>>
>>>>>>>> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
>>>>>>>> [J.T.: Change section name to store switch table information,
>>>>>>>>          Make plugin Kconfig be selected rather than opt-in by user,
>>>>>>>>          Add a relocation in the switch_table_info that points to
>>>>>>>>          the jump operation itself]
>>>>>>>> Signed-off-by: Julien Thierry <jthierry@redhat.com>
>>>>>>>
>>>>>>> Rather than tightly couple this feature to a particular toolchain via
>>>>>>> plugin, it might be nice to consider what features could be spec'ed out
>>>>>>> for toolchains to implement (perhaps via a -f flag).
>>>>>>
>>>>>> The problem is being able to detect switch statement jump table vectors.
>>>>>>
>>>>>> For a given indirect branch (due to a switch statement), what are all
>>>>>> the corresponding jump targets?
>>>>>>
>>>>>> We would need the compiler to annotate that information somehow.
>>>>>
>>>>> Makes sense, the compiler should have this information.  How is this
>>>>> problem solved on x86?
>>>>
>>>> Thus far we've been able to successfully reverse engineer it on x86,
>>>> though it hasn't been easy.
>>>>
>>>> There were some particulars for arm64 which made doing so impossible.
>>>> (I don't remember the details.)
>>
>> The main issue is that the tables for arm64 have more indirection than x86.
> 
> I wonder if PAC or BTI also make this slightly more complex?  PAC at
> least has implications for unwinders, IIUC.
> 
>>
>> On x86, the dispatching jump instruction fetches the target address from
>> a contiguous array of addresses based on a given offset. So the list of
>> potential targets of the jump is neatly organized in a table (and sure,
>> before link time these are just relocation, but still processable).
>>
>> On arm64 (with GCC at least), what is stored in a table is an array of
>> candidate offsets from the jump instruction. And because arm64 is
>> limited to 32bit instructions, the encoding often requires multiple
>> instructions to compute the target address:
>>
>> ldr<*>  x_offset, [x_offsets_table, x_index, ...]  // load offset
>> adr     x_dest_base, <addr>          // load target branch for offset 0
>> add     x_dest, x_target_base, x_offset, ...  // compute final address
>> br      x_dest        // jump
>>
>> Where this gets trickier is that (with GCC) the offsets stored in the
>> table might or might not be signed constants (and this can be seen in
>> GCC intermediate representations, but I do not believe this information
>> is output in the final object file). And on top of that, GCC might
>> decide to use offsets that are seen as unsigned during intermediate
>> representation as signed offset by sign extending them in the add
>> instruction.
>>
>> So, to handle this we'd have to track the different operation done with
>> the offset, from the load to the final jump, decoding the instructions
>> and deducing the potential target instructions from the table of offsets.
>>
>> But that is error prone as we don't really know how many instructions
>> can be between the ones doing the address computation, and I remember
>> some messy case of a jump table inside a jump table where tracking the
>> instruction touching one or the other offset would need a lot of corner
>> case handling.
>>
>> And this of course is just for GCC, I haven't looked at what it all
>> looks like on Clang's end.
> 
> Sure, but this is what production unwinders do, and they don't require
> compiler plugins, right?  I don't doubt unwinders can be made simpler
> with changes to toolchain output; please work with your compiler
> vendor on making such changes rather than relying on compiler plugins
> to do so.
> 

I think there is a small confusion. The plugin nor the data it generates 
is not to be used by a kernel unwinder. It's here to allow objtool to 
assess whether the code being checked can be unwound (?) reliably (not 
omitting functions). Part of this is checking that a branch/jump in a 
function does not end up in some code that is not related to the 
function without setting up a call frame.

This is about static validation rather than functionality.

>>> I think the details are pertinent to finding a portable solution.  The
>>> commit message of this commit in particular doesn't document such
>>> details, such as why such an approach is necessary or how the data is
>>> laid out for objtool to consume it.
>>>
>>
>> Sorry, I will need to make that clearer. The next patch explains it a
>> bit [1]
>>
>> Basically, for simplicity, the plugin creates a new section containing
> 
> Right, this takes a focus on simplicity, at the cost of alienating a toolchain.
> 
> Ard's point about 3193c0836f20 relating to -fgcse is that when
> presented with tricky cases to unwind, the simplest approach is taken.
> There it was disabling a compiler specific compiler optimization, here
> it's either a compiler specific compiler plugin (or disabling another
> compiler optimization).  The pattern seems to be "Objtool isn't smart
> enough" ... "compiler optimization disabled" or "compiler plugin
> dependency."
> 
>> tables (one per jump table) of references to the jump targets, similar
>> to what x86 has, except that in this case this table isn't actually used
>> by runtime code and is discarded at link time. I only chose this to
>> minimize what needed to be changed in objtool and because the format
>> seemed simple enough.
>>
>> But I'm open on some alternative, whether it's a -fjump-table-info
> 
> Yes, I think we could spec out something like that.  But I would
> appreciate revisiting open questions around stack validation (frame
> pointers), preventing the generation of jump tables to begin with
> (-fno-jump-tables) in place of making objtool more robust, or
> generally the need to depend on compiler plugins.
> 

I'll give it a try at least for the arm64 side.

Thanks,
Peter Zijlstra Feb. 3, 2021, 11:57 a.m. UTC | #15
On Tue, Feb 02, 2021 at 06:14:14PM -0600, Josh Poimboeuf wrote:
> > Sure, but this is what production unwinders do, and they don't require
> > compiler plugins, right?
> 
> What do you mean by "production unwinders"?  Generally unwinders rely on
> either frame pointers or DWARF, but (without validation) those aren't
> robust enough for live patching in the kernel, so I'm not sure how this
> is relevant.

Not to mention that DWARF and consequently it's unders are horribly
large, complex and above all fragile things.

There's a reason ORC got invented, DWARF is simlpy unacceptable and
inadequate.

Now, one avenue that has been mentioned in the past, but I've not seen
recently, is to have objtool use DWARF as input to help it understand
the code. At least in userspace we can rely on DWARF libs. But I'm
fairly sure people aren't jumping up and down for having to always build
their kernel with DWARFs on, compile speed etc..
Mark Brown Feb. 3, 2021, 1:04 p.m. UTC | #16
On Tue, Feb 02, 2021 at 06:14:14PM -0600, Josh Poimboeuf wrote:
> On Tue, Feb 02, 2021 at 03:01:22PM -0800, Nick Desaulniers wrote:

> > I wonder if PAC or BTI also make this slightly more complex?  PAC at
> > least has implications for unwinders, IIUC.

> What is PAC/BTI?

PAC and BTI are ARM architecture extensions.  PAC uses a tag in pointers
to sign and verify them, presenting a barrier to ROP, and when BTI is
active only specific instructions can be branched to.  Since PAC
modifies pointers when it is active the unwinder has to undo the tagging
to understand what's being pointed to, that's already there.
Mark Rutland Feb. 3, 2021, 1:58 p.m. UTC | #17
On Tue, Feb 02, 2021 at 06:14:14PM -0600, Josh Poimboeuf wrote:
> On Tue, Feb 02, 2021 at 03:01:22PM -0800, Nick Desaulniers wrote:
> > > >> Thus far we've been able to successfully reverse engineer it on x86,
> > > >> though it hasn't been easy.
> > > >>
> > > >> There were some particulars for arm64 which made doing so impossible.
> > > >> (I don't remember the details.)
> > >
> > > The main issue is that the tables for arm64 have more indirection than x86.
> > 
> > I wonder if PAC or BTI also make this slightly more complex?  PAC at
> > least has implications for unwinders, IIUC.
> 
> What is PAC/BTI?

PAC is "Pointer Authentication Codes". The gist is that we munge some
bits in pointers when they get stored in memory (called "signing"), and
undo that with a check (called "authentication") when reading from
memory, in order to detect unexpected modification. There's some new
instructions that may exist in function prologues and epilogues, etc.

There's a basic introduction at:

https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf
https://www.kernel.org/doc/html/latest/arm64/pointer-authentication.html

Return address signing/authentication uses the SP as an input, so
without knowing the SP something was signed against it's not possible to
alter it reliably (or to check it). The arm64 unwinder ignores the PAC
bits, and ftrace uses patchable-function-entry so that we don't have to
do anything special to manipulate the return address.

Today the ABI used by the kernel doesn't mess with the pointers used in
jump tables, but that may come in future as toolchain folk are working
to define an ABI that might.

BTI is "Branch Target Identification", which is a bit like CET's
indirect branch tracking -- indirect branches need to land on a specific
instruction, or they'll raise an exception.

Thanks,
Mark.
Daniel Kiss Feb. 9, 2021, 4:30 p.m. UTC | #18
> On 3 Feb 2021, at 00:01, Nick Desaulniers <ndesaulniers@google.com> wrote:
> 
> I wonder if PAC or BTI also make this slightly more complex?  PAC at
> least has implications for unwinders, IIUC.

BTI has effect on the jump target\case because a landing pad is required for each and every target.
As I see that would not hurt here.

The unwinder shall take care of the PAC signed return address if reads the LR from the stack.
DWARF contains an entry where the LR is got signed\authenticated.

I’m wondering would be simpler or possible to transform the DWARD to ORC.
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 05e17351e4f3..93a320cc8e03 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -100,6 +100,7 @@  config ARM64
 	select DMA_DIRECT_REMAP
 	select EDAC_SUPPORT
 	select FRAME_POINTER
+	select GCC_PLUGIN_SWITCH_TABLES if STACK_VALIDATION
 	select GENERIC_ALLOCATOR
 	select GENERIC_ARCH_TOPOLOGY
 	select GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index 952e46876329..8af322311f6b 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -46,6 +46,8 @@  ifdef CONFIG_GCC_PLUGIN_ARM_SSP_PER_TASK
 endif
 export DISABLE_ARM_SSP_PER_TASK_PLUGIN
 
+gcc-plugin-$(CONFIG_GCC_PLUGIN_SWITCH_TABLES)	+= arm64_switch_table_detection_plugin.so
+
 # All the plugin CFLAGS are collected here in case a build target needs to
 # filter them out of the KBUILD_CFLAGS.
 GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y))
diff --git a/scripts/gcc-plugins/Kconfig b/scripts/gcc-plugins/Kconfig
index ab9eb4cbe33a..76efbb97d223 100644
--- a/scripts/gcc-plugins/Kconfig
+++ b/scripts/gcc-plugins/Kconfig
@@ -104,4 +104,8 @@  config GCC_PLUGIN_ARM_SSP_PER_TASK
 	bool
 	depends on GCC_PLUGINS && ARM
 
+config GCC_PLUGIN_SWITCH_TABLES
+	bool
+	depends on GCC_PLUGINS && ARM64
+
 endif
diff --git a/scripts/gcc-plugins/arm64_switch_table_detection_plugin.c b/scripts/gcc-plugins/arm64_switch_table_detection_plugin.c
new file mode 100644
index 000000000000..60ef00ff2c5b
--- /dev/null
+++ b/scripts/gcc-plugins/arm64_switch_table_detection_plugin.c
@@ -0,0 +1,85 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include "gcc-common.h"
+
+__visible int plugin_is_GPL_compatible;
+
+#define GEN_QUAD(rtx)	assemble_integer_with_op(".quad ", rtx)
+
+/*
+ * Create an array of metadata for each jump table found in the rtl.
+ * The metadata contains:
+ * - A reference to first instruction part of the RTL expanded into an
+ *   acutal jump
+ * - The number of entries in the table of offsets
+ * - A reference to each possible jump target
+ *
+ * Separate each entry with a null quad word.
+ */
+static unsigned int arm64_switchtbl_rtl_execute(void)
+{
+	rtx_insn *insn;
+	rtx_insn *labelp = NULL;
+	rtx_jump_table_data *tablep = NULL;
+	section *swt_sec;
+	section *curr_sec = current_function_section();
+
+	swt_sec = get_section(".discard.switch_table_info",
+			      SECTION_DEBUG | SECTION_EXCLUDE, NULL);
+
+	for (insn = get_insns(); insn; insn = NEXT_INSN(insn)) {
+		/*
+		 * Find a tablejump_p INSN (using a dispatch table)
+		 */
+		if (!tablejump_p(insn, &labelp, &tablep))
+			continue;
+
+		if (labelp && tablep) {
+			rtx_code_label *label_to_jump;
+			rtvec jump_labels = tablep->get_labels();
+			int nr_labels = GET_NUM_ELEM(jump_labels);
+			int i;
+
+			label_to_jump = gen_label_rtx();
+			SET_LABEL_KIND(label_to_jump, LABEL_NORMAL);
+			emit_label_before(label_to_jump, insn);
+			LABEL_PRESERVE_P(label_to_jump) = 1;
+
+			switch_to_section(swt_sec);
+			GEN_QUAD(GEN_INT(0)); // mark separation between rela tables
+			GEN_QUAD(gen_rtx_LABEL_REF(Pmode, label_to_jump));
+			GEN_QUAD(GEN_INT(nr_labels));
+			for (i = 0; i < nr_labels; i++)
+				GEN_QUAD(gen_rtx_LABEL_REF(Pmode,
+							   label_ref_label(RTVEC_ELT(jump_labels, i))));
+			switch_to_section(curr_sec);
+			delete_insn(label_to_jump);
+		}
+	}
+	return 0;
+}
+
+#define PASS_NAME arm64_switchtbl_rtl
+
+#define NO_GATE
+#include "gcc-generate-rtl-pass.h"
+
+__visible int plugin_init(struct plugin_name_args *plugin_info,
+			  struct plugin_gcc_version *version)
+{
+	const char * const plugin_name = plugin_info->base_name;
+
+	if (!plugin_default_version_check(version, &gcc_version)) {
+		error(G_("incompatible gcc/plugin versions"));
+		return 1;
+	}
+
+	PASS_INFO(arm64_switchtbl_rtl, "final", 1,
+		  PASS_POS_INSERT_BEFORE);
+
+	register_callback(plugin_info->base_name, PLUGIN_PASS_MANAGER_SETUP,
+			  NULL, &arm64_switchtbl_rtl_pass_info);
+
+	return 0;
+}