mbox series

[RFC,00/11] Kernel FineIBT Support

Message ID 20220420004241.2093-1-joao@overdrivepizza.com (mailing list archive)
Headers show
Series Kernel FineIBT Support | expand

Message

Joao Moreira April 20, 2022, 12:42 a.m. UTC
From: Joao Moreira <joao@overdrivepizza.com>

Disclaimer: This is all in a very early/poc stage and is mostly research work
-- be advised to proceed with care and to bring a towel with you.

This patch series enables FineIBT in the kernel. FineIBT is a compiler-enhanced
forward-edge CFI scheme built on top of Intel's CET-IBT that works by setting a
hash on the caller side which is then checked at the callee side. Because IBT
requires indirect branches to land on ENDBR instructions, these hash checks
shouldn't be bypassable on the occasion of function pointer corruption. More
details on the general FineIBT design are here [1, 2].

When compared to IBT itself, FineIBT imposes a more restrictive policy that
should be more robust against control-flow hijacking attacks. When compared to
schemes like KCFI [3], it has the benefit of not depending on memory reads
(which not only might be more efficient in terms of performance and power but
also makes it compatible with XOM [4]) and brings in the benefits of IBT
regarding speculative execution hardening.

A debatable point is the fact that on FineIBT the checks are made on the callee
side. On a quick look, this seems to be cool because it allows strict
reachability refinement of more security-critical functions (like hardware
feature disabling ones) while still allowing other less critical functions to be
relaxed/coarse-grained; under caller-side checks, if one single function is
required to be relaxed, this leads into an indirect call instruction being
relaxed, then becoming a branch capable of reaching all the functions in the
executable address space, including those considered security-critical. Inputs
and opinions on this are very welcome, as there are other perspectives about
this I might be missing.

This series relies heavily on the recently upstreamed IBT support and also
respins some sorcery proposed by Peter Zijlstra in his IBT v2 series [5]. A huge
part of these is a repurpose of work originally cast by Peter.

The FineIBT enablement uses a modified clang version to emit code with the
FineIBT hash set/check operations. The modified clang is available here [6]. The
.config used for building and testing is available here [7] along with more or
less general instructions on how to build it. A tree with this series on top is
available here [8].

Key insights:
- On IBT v2, Peter proposed an optimization that uses objtool to add an offset
  to operands of direct branches targeting ENDBR instructions, skipping the need
to fetch/decode these. With FineIBT, skipping ENDBRs+hash checks is not only
desirable but needed, as a way to prevent direct calls from being considered a
violation whenever they reach a function without priorly setting the respective
hash. This series respins the approach and uses objtool to fix direct branches
targeting FineIBT hash check operations. Fixing this in objtool instead of using
the compiler is convenient because then it becomes easy to mix
FineIBT-instrumented code with binaries only augmented with regular ENDBRs (such
as currently-existing assembly).
- The same approach (identifying FineIBT hash check sequences and fixing direct
  branch targets) is also used dynamically to support module loading (fix the
relocations), text-patching (support static calls), and on BPF (support jitted
calls to core functions).
- When a direct branch into a FineIBT hash check cannot be fixed (because it is
  a short jump targeting an instruction which, once incremented with the needed
offset, becomes unreachable) the respective functionality patches the FineIBT
sequence with nops, making it a valid target that is still constrained by IBT.
- This series also fixes unmatching prototypes of certain indirect calls that
  will trigger violations on any prototype-based CFI scheme.
- Also adds test modules for both IBT and FineIBT.
- Also adds coarsecf_check attributes to specific functions, making the compiler
  emit them with regular ENDBRs instead of the FineIBT hash check. This is
useful because certain functions are called from assembly and we currently don't
have a sane scheme to set hashes in all of these (although we do it in one more
relevant spot).
- In the occasion of violations, the hash check invokes a __fineibt_handler,
  which is a function that makes it easier to debug unmatching prototypes and
such. It can be easily switched to an ud2 instruction or anything like that.
- In my tests, the above-mentioned .config runs this series smoothly, without
  any false-positive violations.

Some obvious possible improvements:
- The support should identify FineIBT sequences based on annotations, not on
  parsing the actual instructions. This would make things less hacky and more
reliable.
- Assembly coverage must be improved eventually.
- The FineIBT hash check operation can have its length reduced by replacing the
  inlined check with a call to a checker.

@PeterZ @JoshP

I'm a bit unaware of the details on why the objtool approach to bypass ENDBRs
was removed from the IBT series. Is this approach now sensible considering that
it is a requirement for a new/enhanced feature? If not, how extending the Linker
to emit already fixed offsets sounds like?

@Kees

I'm considering detaching the prototype fixes from this series and reworking
them to submit actual fixes (patches 10 and 11). Any specific suggestions for
these specific patches? Maybe you want to take a look and help in co-authorship
as we did with the void*-in-x86-crypto patches in the past? I guess these are
useful for whatever CFI scheme is in place.

@all

Any other major concerns, ideas, or suggestions? :)

Refs:

[1] - FineIBT:
https://www.openwall.com/lists/kernel-hardening/2021/02/11/1

[2] - FineIBT on Linux Security Summit:
https://static.sched.com/hosted_files/lssna2021/8f/LSS_FINEIBT_JOAOMOREIRA.pdf

[3] - KCFI Clang Patches:
https://reviews.llvm.org/D119296/new/

[4] - eXecute-Only Memory:
https://lpc.events/event/4/contributions/283/attachments/357/588/Touch_but_dont_look__Running_the_kernel_in_execute_only_memory-presented.pdf

[5] - IBT Patches v2:
https://lore.kernel.org/lkml/20220224145138.952963315@infradead.org/

[6] - FineIBT-capable Clang:
https://github.com/lvwr/llvm-project/tree/fineibt/kernel

[7] - Kernel .config and dummy stuff:
https://github.com/lvwr/kfineibt_testing

[8] - Linux + FineIBT:
https://github.com/lvwr/linux/tree/x86/fineibt

Joao Moreira (11):
  x86: kernel FineIBT
  kbuild: Support FineIBT build
  objtool: Support FineIBT offset fixes
  x86/module: Support FineIBT in modules
  x86/text-patching: Support FineIBT text-patching
  x86/bpf: Support FineIBT
  x86/lib: Prevent UACCESS call warning from objtool
  x86/ibt: Add CET_TEST module for IBT testing
  x86/FineIBT: Add FINEIBT_TEST module
  linux/interrupt: Fix prototype matching property
  driver/int3400_thermal: Fix prototype matching

 arch/x86/Kconfig                              |  10 +
 arch/x86/Kconfig.debug                        |  10 +
 arch/x86/Makefile                             |   3 +
 arch/x86/entry/entry_64.S                     |   1 +
 arch/x86/entry/vdso/Makefile                  |   5 +
 arch/x86/include/asm/ibt.h                    |  16 ++
 arch/x86/include/asm/setup.h                  |  12 +-
 arch/x86/include/asm/special_insns.h          |   4 +-
 arch/x86/include/asm/text-patching.h          |  92 ++++++++-
 arch/x86/kernel/Makefile                      |   3 +
 arch/x86/kernel/cet_test.c                    |  30 +++
 arch/x86/kernel/cpu/common.c                  |   2 +-
 arch/x86/kernel/fineibt.c                     | 123 ++++++++++++
 arch/x86/kernel/fineibt_test.c                |  39 ++++
 arch/x86/kernel/head64.c                      |  12 +-
 arch/x86/kernel/module.c                      |  45 ++++-
 arch/x86/kernel/smpboot.c                     |   2 +-
 arch/x86/lib/copy_mc.c                        |   2 +-
 arch/x86/net/bpf_jit_comp.c                   |  31 +++
 arch/x86/purgatory/Makefile                   |   4 +
 .../intel/int340x_thermal/int3400_thermal.c   |  10 +-
 include/linux/interrupt.h                     |   6 +-
 scripts/Makefile.build                        |   1 +
 scripts/link-vmlinux.sh                       |   8 +
 tools/objtool/arch/x86/decode.c               | 175 +++++++++++++----
 tools/objtool/arch/x86/include/arch/special.h |   2 +
 tools/objtool/builtin-check.c                 |   3 +-
 tools/objtool/check.c                         | 183 +++++++++++++++++-
 tools/objtool/include/objtool/arch.h          |   3 +
 tools/objtool/include/objtool/builtin.h       |   2 +-
 30 files changed, 767 insertions(+), 72 deletions(-)
 create mode 100644 arch/x86/kernel/cet_test.c
 create mode 100644 arch/x86/kernel/fineibt.c
 create mode 100644 arch/x86/kernel/fineibt_test.c

Comments

Kees Cook April 20, 2022, 2:42 a.m. UTC | #1
Hi!

On Tue, Apr 19, 2022 at 05:42:30PM -0700, joao@overdrivepizza.com wrote:
> I'm considering detaching the prototype fixes from this series and reworking
> them to submit actual fixes (patches 10 and 11). Any specific suggestions for
> these specific patches? Maybe you want to take a look and help in co-authorship
> as we did with the void*-in-x86-crypto patches in the past? I guess these are
> useful for whatever CFI scheme is in place.

Yeah, if 10 and 11 are general prototype-based fixes, let's get them in.
I would expect regular CFI and kCFI to trip over those too. I'll comment
on those patches directly.

> Any other major concerns, ideas, or suggestions? :)

I think it'd be good to get kCFI landed in Clang first (since it is
effectively architecture agnostic), and then get FineIBT landed. But
that doesn't mean we can't be working on the kernel side of things at
the same time.

And just thinking generally, for other architecture-specific stuff,
I do wonder what an arm64 PAC-based CFI might look like. I prefer things
be hard-coded as kCFI is doing, but it'd be nice to be able to directly
measure performance and size overheads comparing the various methods.
Peter Zijlstra April 20, 2022, 7:40 a.m. UTC | #2
On Tue, Apr 19, 2022 at 05:42:30PM -0700, joao@overdrivepizza.com wrote:
> @PeterZ @JoshP
> 
> I'm a bit unaware of the details on why the objtool approach to bypass ENDBRs
> was removed from the IBT series. Is this approach now sensible considering that
> it is a requirement for a new/enhanced feature? If not, how extending the Linker
> to emit already fixed offsets sounds like?

Josh hates objtool modifying actualy code. He much prefers objtool only
emits out of band data.

Now, I did sneak in that jump_label nop'ing, and necessity (broken
compilers) had us do the KCOV nop'ing in noinstr, but if you look at the
recent objtool series here:

  https://lkml.kernel.org/r/cover.1650300597.git.jpoimboe@redhat.com

you'll see his thoughs on that :-)

Now, I obviously don't mind, it's easy enough to figure out what objtool
actually does with something like:

  $ OBJTOOL_ARGS="--backup" make O=ibt-build/ -j$lots vmlinux
  $ objdiff.sh ibt-build/vmlinux.o

Where objdiff.sh is the below crummy script.

Now, one compromise that I did get out of Josh was that he objected less
to rewriting relocations than to rewriting the immediates. From my
testing the relocations got us the vast majority of direct call sites,
very few are immediates.

Josh, any way you might reconsider all that? :-)

---
#!/bin/bash

name=$1
pre=${name}.orig
post=${name}

function to_text {
	obj=$1
	( objdump -wdr $obj;
	  readelf -W --relocs --symbols $obj |
		  awk '/^Relocation section/ { $6=0 } { print $0 }'
	) > ${obj}.tmp
}

to_text $pre
to_text $post

diff -u ${pre}.tmp ${post}.tmp

rm ${pre}.tmp ${post}.tmp
Josh Poimboeuf April 20, 2022, 3:17 p.m. UTC | #3
On Wed, Apr 20, 2022 at 09:40:44AM +0200, Peter Zijlstra wrote:
> On Tue, Apr 19, 2022 at 05:42:30PM -0700, joao@overdrivepizza.com wrote:
> > @PeterZ @JoshP
> > 
> > I'm a bit unaware of the details on why the objtool approach to bypass ENDBRs
> > was removed from the IBT series. Is this approach now sensible considering that
> > it is a requirement for a new/enhanced feature? If not, how extending the Linker
> > to emit already fixed offsets sounds like?
> 
> Josh hates objtool modifying actualy code. He much prefers objtool only
> emits out of band data.
> 
> Now, I did sneak in that jump_label nop'ing, and necessity (broken
> compilers) had us do the KCOV nop'ing in noinstr, but if you look at the
> recent objtool series here:
> 
>   https://lkml.kernel.org/r/cover.1650300597.git.jpoimboe@redhat.com
> 
> you'll see his thoughs on that :-)
> 
> Now, I obviously don't mind, it's easy enough to figure out what objtool
> actually does with something like:
> 
>   $ OBJTOOL_ARGS="--backup" make O=ibt-build/ -j$lots vmlinux
>   $ objdiff.sh ibt-build/vmlinux.o
> 
> Where objdiff.sh is the below crummy script.
> 
> Now, one compromise that I did get out of Josh was that he objected less
> to rewriting relocations than to rewriting the immediates. From my
> testing the relocations got us the vast majority of direct call sites,
> very few are immediates.
> 
> Josh, any way you might reconsider all that? :-)

If I remember correctly, the goal of --ibt-fix-direct was to avoid
hitting unnecessary ENDBRs, which basically decode to NOPs, so the
ghastly hack wasn't worth it.

If FineIBT needs it, I could reconsider.  But I think there's a strong
case to be made that the linker should be doing that instead.
Nick Desaulniers April 20, 2022, 5:12 p.m. UTC | #4
On Wed, Apr 20, 2022 at 8:17 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Wed, Apr 20, 2022 at 09:40:44AM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 19, 2022 at 05:42:30PM -0700, joao@overdrivepizza.com wrote:
> > > @PeterZ @JoshP
> > >
> > > I'm a bit unaware of the details on why the objtool approach to bypass ENDBRs
> > > was removed from the IBT series. Is this approach now sensible considering that
> > > it is a requirement for a new/enhanced feature? If not, how extending the Linker
> > > to emit already fixed offsets sounds like?
> >
> > Josh hates objtool modifying actualy code. He much prefers objtool only
> > emits out of band data.
> >
> > Now, I did sneak in that jump_label nop'ing, and necessity (broken
> > compilers) had us do the KCOV nop'ing in noinstr, but if you look at the
> > recent objtool series here:
> >
> >   https://lkml.kernel.org/r/cover.1650300597.git.jpoimboe@redhat.com
> >
> > you'll see his thoughs on that :-)
> >
> > Now, I obviously don't mind, it's easy enough to figure out what objtool
> > actually does with something like:
> >
> >   $ OBJTOOL_ARGS="--backup" make O=ibt-build/ -j$lots vmlinux
> >   $ objdiff.sh ibt-build/vmlinux.o
> >
> > Where objdiff.sh is the below crummy script.
> >
> > Now, one compromise that I did get out of Josh was that he objected less
> > to rewriting relocations than to rewriting the immediates. From my
> > testing the relocations got us the vast majority of direct call sites,
> > very few are immediates.
> >
> > Josh, any way you might reconsider all that? :-)
>
> If I remember correctly, the goal of --ibt-fix-direct was to avoid
> hitting unnecessary ENDBRs, which basically decode to NOPs, so the
> ghastly hack wasn't worth it.
>
> If FineIBT needs it, I could reconsider.  But I think there's a strong
> case to be made that the linker should be doing that instead.

That sounds reasonable to me (and reminds me of linker relaxation).
Joao, can you please work with Fangrui (LLD) and HJ (GNU binutils) to
determine how feasible this would be? I assume code outside the kernel
might enjoy such an optimization, too.  When that's the case, then it
probably makes more sense to "upstream" such "optimizations" from the
kernel-specific objtool into the toolchains.
Joao Moreira April 20, 2022, 10:40 p.m. UTC | #5
>> 
>> If FineIBT needs it, I could reconsider.  But I think there's a strong
>> case to be made that the linker should be doing that instead.
> 
> That sounds reasonable to me (and reminds me of linker relaxation).
> Joao, can you please work with Fangrui (LLD) and HJ (GNU binutils) to
> determine how feasible this would be? I assume code outside the kernel
> might enjoy such an optimization, too.  When that's the case, then it
> probably makes more sense to "upstream" such "optimizations" from the
> kernel-specific objtool into the toolchains.

Alright, these are the greenlights I was hoping for.

I went quickly into this with HJ and he mentioned that it should be 
doable in the linker, and that he has a patch for it in gcc (for local 
function, from what I could see): 
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html

If @Fangrui is fine with it, I would like to try implementing this 
myself in lld (I'm still learning a lot about lld and having an actual 
problem to solve is the kind of fuel I need). Should take me a while, 
but I think this is not urgent, right? I can also go ahead and replicate 
HJ's gcc patch into clang, so we can also handle the local functions 
within the compiler (I think this makes a lot of sense).

Once we have these in, I'll revisit FineIBT and extend the features to 
handle the FineIBT instrumentation. Hopefully we'll be released from 
needing objtool (famous last words?!).

This sounds like a plan, but I'm ofc open to suggestions or different 
ideas/plans.

Tks,
Joao
Joao Moreira April 20, 2022, 10:50 p.m. UTC | #6
> I think it'd be good to get kCFI landed in Clang first (since it is
> effectively architecture agnostic), and then get FineIBT landed. But
> that doesn't mean we can't be working on the kernel side of things at
> the same time.

FWIIW, I'm effectively taking some time away from work for the next 3 
months. I'll be around to answer this and that, help reviewing KCFI and 
maybe send small fixes around, but I'm not planning to land FineIBT in 
clang anytime before that (specially now that I have a direction to look 
into the linker approach as per the other thread e-mails). This should 
give KCFI the time it needs to squeeze in.

> 
> And just thinking generally, for other architecture-specific stuff,
> I do wonder what an arm64 PAC-based CFI might look like. I prefer 
> things
> be hard-coded as kCFI is doing, but it'd be nice to be able to directly
> measure performance and size overheads comparing the various methods.

There are other important bullets to this list, I think, like power 
consumption, robustness and collateral gains (like IBT's side-channel 
hardening). But yeah, this is probably a good list to keep in mind for 
us to discuss during plumbers :)
Edgecombe, Rick P April 20, 2022, 11:34 p.m. UTC | #7
On Tue, 2022-04-19 at 17:42 -0700, joao@overdrivepizza.com wrote:
> A debatable point is the fact that on FineIBT the checks are made on
> the callee
> side. On a quick look, this seems to be cool because it allows strict
> reachability refinement of more security-critical functions (like
> hardware
> feature disabling ones) while still allowing other less critical
> functions to be
> relaxed/coarse-grained; under caller-side checks, if one single
> function is
> required to be relaxed, this leads into an indirect call instruction
> being
> relaxed, then becoming a branch capable of reaching all the functions
> in the
> executable address space, including those considered security-
> critical. Inputs
> and opinions on this are very welcome, as there are other
> perspectives about
> this I might be missing.

One minor point: In the course IBT implementation there are places in
the kernel where IBT is toggled because of missing endbranches (calling
into BIOS). So for caller checked solutions, these calls would probably
need to be annotated or something such that caller checks were not
generated.

I haven't been following kCFI, so apologies if this is already handled
somehow.
Peter Zijlstra April 21, 2022, 7:49 a.m. UTC | #8
On Wed, Apr 20, 2022 at 03:40:41PM -0700, Joao Moreira wrote:
> > > 
> > > If FineIBT needs it, I could reconsider.  But I think there's a strong
> > > case to be made that the linker should be doing that instead.
> > 
> > That sounds reasonable to me (and reminds me of linker relaxation).
> > Joao, can you please work with Fangrui (LLD) and HJ (GNU binutils) to
> > determine how feasible this would be? I assume code outside the kernel
> > might enjoy such an optimization, too.  When that's the case, then it
> > probably makes more sense to "upstream" such "optimizations" from the
> > kernel-specific objtool into the toolchains.
> 
> Alright, these are the greenlights I was hoping for.
> 
> I went quickly into this with HJ and he mentioned that it should be doable
> in the linker, and that he has a patch for it in gcc (for local function,
> from what I could see):
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html
> 
> If @Fangrui is fine with it, I would like to try implementing this myself in
> lld (I'm still learning a lot about lld and having an actual problem to
> solve is the kind of fuel I need). Should take me a while, but I think this
> is not urgent, right? I can also go ahead and replicate HJ's gcc patch into
> clang, so we can also handle the local functions within the compiler (I
> think this makes a lot of sense).
> 
> Once we have these in, I'll revisit FineIBT and extend the features to
> handle the FineIBT instrumentation. Hopefully we'll be released from needing
> objtool (famous last words?!).
> 
> This sounds like a plan, but I'm ofc open to suggestions or different
> ideas/plans.

So trivially the plan sounds like: compiler fixes STB_LOCAL because it
has the scope, and the linker fixes everything else. However, that seems
to assume that !STB_LOCAL will have ENDBR.

This latter isn't true; for one there's __attribute__((nocf_check)) that
can be used to suppress ENDBR generation on a function.

Alternatively the linker will need to 'read' the function to determine
if it has ENDBR, or we need to augment the ELF format such that we can
tell from that.

So what exactly is the plan?
Joao Moreira April 21, 2022, 3:23 p.m. UTC | #9
On 2022-04-21 00:49, Peter Zijlstra wrote:
> On Wed, Apr 20, 2022 at 03:40:41PM -0700, Joao Moreira wrote:
>> > >
>> > > If FineIBT needs it, I could reconsider.  But I think there's a strong
>> > > case to be made that the linker should be doing that instead.
>> >
>> > That sounds reasonable to me (and reminds me of linker relaxation).
>> > Joao, can you please work with Fangrui (LLD) and HJ (GNU binutils) to
>> > determine how feasible this would be? I assume code outside the kernel
>> > might enjoy such an optimization, too.  When that's the case, then it
>> > probably makes more sense to "upstream" such "optimizations" from the
>> > kernel-specific objtool into the toolchains.
>> 
>> Alright, these are the greenlights I was hoping for.
>> 
>> I went quickly into this with HJ and he mentioned that it should be 
>> doable
>> in the linker, and that he has a patch for it in gcc (for local 
>> function,
>> from what I could see):
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html
>> 
>> If @Fangrui is fine with it, I would like to try implementing this 
>> myself in
>> lld (I'm still learning a lot about lld and having an actual problem 
>> to
>> solve is the kind of fuel I need). Should take me a while, but I think 
>> this
>> is not urgent, right? I can also go ahead and replicate HJ's gcc patch 
>> into
>> clang, so we can also handle the local functions within the compiler 
>> (I
>> think this makes a lot of sense).
>> 
>> Once we have these in, I'll revisit FineIBT and extend the features to
>> handle the FineIBT instrumentation. Hopefully we'll be released from 
>> needing
>> objtool (famous last words?!).
>> 
>> This sounds like a plan, but I'm ofc open to suggestions or different
>> ideas/plans.
> 
> So trivially the plan sounds like: compiler fixes STB_LOCAL because it
> has the scope, and the linker fixes everything else. However, that 
> seems
> to assume that !STB_LOCAL will have ENDBR.
> 
> This latter isn't true; for one there's __attribute__((nocf_check)) 
> that
> can be used to suppress ENDBR generation on a function.
> 
> Alternatively the linker will need to 'read' the function to determine
> if it has ENDBR, or we need to augment the ELF format such that we can
> tell from that.
> 
> So what exactly is the plan?

I ran into too many broken dreams by trying to infer the presence of 
ENDBRs just by the symbol locality/linkage... not only because of the 
attribute, but also because of ancient assembly.

So, my first thought was to use something similar to the 
__patchable_function_entries section 
(https://man7.org/linux/man-pages/man1/gcc.1.html), where we would have 
a section to mark all the placed ENDBR. But then it occurred to me that 
if we follow that road we'll miss the ENDBR placed in assembly unless we 
mark it manually, so I started thinking that reading the target 
instructions from the ELF object could be a more simplified approach, 
although a little more treacherous.

I didn't decide yet what to try first -- any thoughts?

@Fangrui's and @HJ's thoughts about this could be gold.
H.J. Lu April 21, 2022, 3:35 p.m. UTC | #10
On Thu, Apr 21, 2022 at 8:23 AM Joao Moreira <joao@overdrivepizza.com> wrote:
>
> On 2022-04-21 00:49, Peter Zijlstra wrote:
> > On Wed, Apr 20, 2022 at 03:40:41PM -0700, Joao Moreira wrote:
> >> > >
> >> > > If FineIBT needs it, I could reconsider.  But I think there's a strong
> >> > > case to be made that the linker should be doing that instead.
> >> >
> >> > That sounds reasonable to me (and reminds me of linker relaxation).
> >> > Joao, can you please work with Fangrui (LLD) and HJ (GNU binutils) to
> >> > determine how feasible this would be? I assume code outside the kernel
> >> > might enjoy such an optimization, too.  When that's the case, then it
> >> > probably makes more sense to "upstream" such "optimizations" from the
> >> > kernel-specific objtool into the toolchains.
> >>
> >> Alright, these are the greenlights I was hoping for.
> >>
> >> I went quickly into this with HJ and he mentioned that it should be
> >> doable
> >> in the linker, and that he has a patch for it in gcc (for local
> >> function,
> >> from what I could see):
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html
> >>
> >> If @Fangrui is fine with it, I would like to try implementing this
> >> myself in
> >> lld (I'm still learning a lot about lld and having an actual problem
> >> to
> >> solve is the kind of fuel I need). Should take me a while, but I think
> >> this
> >> is not urgent, right? I can also go ahead and replicate HJ's gcc patch
> >> into
> >> clang, so we can also handle the local functions within the compiler
> >> (I
> >> think this makes a lot of sense).
> >>
> >> Once we have these in, I'll revisit FineIBT and extend the features to
> >> handle the FineIBT instrumentation. Hopefully we'll be released from
> >> needing
> >> objtool (famous last words?!).
> >>
> >> This sounds like a plan, but I'm ofc open to suggestions or different
> >> ideas/plans.
> >
> > So trivially the plan sounds like: compiler fixes STB_LOCAL because it
> > has the scope, and the linker fixes everything else. However, that
> > seems
> > to assume that !STB_LOCAL will have ENDBR.
> >
> > This latter isn't true; for one there's __attribute__((nocf_check))
> > that
> > can be used to suppress ENDBR generation on a function.
> >
> > Alternatively the linker will need to 'read' the function to determine
> > if it has ENDBR, or we need to augment the ELF format such that we can
> > tell from that.
> >
> > So what exactly is the plan?
>
> I ran into too many broken dreams by trying to infer the presence of
> ENDBRs just by the symbol locality/linkage... not only because of the
> attribute, but also because of ancient assembly.
>
> So, my first thought was to use something similar to the
> __patchable_function_entries section
> (https://man7.org/linux/man-pages/man1/gcc.1.html), where we would have
> a section to mark all the placed ENDBR. But then it occurred to me that
> if we follow that road we'll miss the ENDBR placed in assembly unless we
> mark it manually, so I started thinking that reading the target
> instructions from the ELF object could be a more simplified approach,
> although a little more treacherous.
>
> I didn't decide yet what to try first -- any thoughts?
>
> @Fangrui's and @HJ's thoughts about this could be gold.

You can't assume ENDBR existence just by symbol visibility.
Compiler knows if there is an ENDBR at function entry since
it is generated by compiler.   Otherwise, you need to check
the first 4 bytes at function entry,
Fangrui Song April 21, 2022, 10:11 p.m. UTC | #11
On 2022-04-21, H.J. Lu wrote:
>On Thu, Apr 21, 2022 at 8:23 AM Joao Moreira <joao@overdrivepizza.com> wrote:
>>
>> On 2022-04-21 00:49, Peter Zijlstra wrote:
>> > On Wed, Apr 20, 2022 at 03:40:41PM -0700, Joao Moreira wrote:
>> >> > >
>> >> > > If FineIBT needs it, I could reconsider.  But I think there's a strong
>> >> > > case to be made that the linker should be doing that instead.
>> >> >
>> >> > That sounds reasonable to me (and reminds me of linker relaxation).
>> >> > Joao, can you please work with Fangrui (LLD) and HJ (GNU binutils) to
>> >> > determine how feasible this would be? I assume code outside the kernel
>> >> > might enjoy such an optimization, too.  When that's the case, then it
>> >> > probably makes more sense to "upstream" such "optimizations" from the
>> >> > kernel-specific objtool into the toolchains.
>> >>
>> >> Alright, these are the greenlights I was hoping for.
>> >>
>> >> I went quickly into this with HJ and he mentioned that it should be
>> >> doable
>> >> in the linker, and that he has a patch for it in gcc (for local
>> >> function,
>> >> from what I could see):
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html
>> >>
>> >> If @Fangrui is fine with it, I would like to try implementing this
>> >> myself in
>> >> lld (I'm still learning a lot about lld and having an actual problem
>> >> to
>> >> solve is the kind of fuel I need). Should take me a while, but I think
>> >> this
>> >> is not urgent, right? I can also go ahead and replicate HJ's gcc patch
>> >> into
>> >> clang, so we can also handle the local functions within the compiler
>> >> (I
>> >> think this makes a lot of sense).
>> >>
>> >> Once we have these in, I'll revisit FineIBT and extend the features to
>> >> handle the FineIBT instrumentation. Hopefully we'll be released from
>> >> needing
>> >> objtool (famous last words?!).
>> >>
>> >> This sounds like a plan, but I'm ofc open to suggestions or different
>> >> ideas/plans.

Thanks for looping me in! (I mean: this is discussed beforehand, instead
of GCC/GNU ld taking some steps and pushing LLVM toolchain to follow
suite..)
Though I think I may not have the full context here...

I can see that you have a request to skip the endbr instruction
(and potentially more instructions for FineIBT).

The GCC patch https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html
achieves it. A linker implementation will handle more cases.
This is somewhat similar to PowerPC64's global entry vs local entry.
The linker can redirect a branch instruction to the local entry in some
conditions. My concern is that inspecting the section content goes too far
and breaks the spirit of ELF relocation resolving. The proper way is to use
a symbol attribute (e.g. st_other).

st_other bits are scarce, so any use needs to be prudent.

>> > So trivially the plan sounds like: compiler fixes STB_LOCAL because it
>> > has the scope, and the linker fixes everything else. However, that
>> > seems
>> > to assume that !STB_LOCAL will have ENDBR.
>> >
>> > This latter isn't true; for one there's __attribute__((nocf_check))
>> > that
>> > can be used to suppress ENDBR generation on a function.
>> >
>> > Alternatively the linker will need to 'read' the function to determine
>> > if it has ENDBR, or we need to augment the ELF format such that we can
>> > tell from that.
>> >
>> > So what exactly is the plan?
>>
>> I ran into too many broken dreams by trying to infer the presence of
>> ENDBRs just by the symbol locality/linkage... not only because of the
>> attribute, but also because of ancient assembly.
>>
>> So, my first thought was to use something similar to the
>> __patchable_function_entries section
>> (https://man7.org/linux/man-pages/man1/gcc.1.html), where we would have
>> a section to mark all the placed ENDBR. But then it occurred to me that
>> if we follow that road we'll miss the ENDBR placed in assembly unless we
>> mark it manually, so I started thinking that reading the target
>> instructions from the ELF object could be a more simplified approach,
>> although a little more treacherous.
>>
>> I didn't decide yet what to try first -- any thoughts?
>>
>> @Fangrui's and @HJ's thoughts about this could be gold.
>
>You can't assume ENDBR existence just by symbol visibility.
>Compiler knows if there is an ENDBR at function entry since
>it is generated by compiler.   Otherwise, you need to check
>the first 4 bytes at function entry,
>
>-- 
>H.J.
H.J. Lu April 21, 2022, 10:26 p.m. UTC | #12
On Thu, Apr 21, 2022 at 3:11 PM Fangrui Song <maskray@google.com> wrote:
>
> On 2022-04-21, H.J. Lu wrote:
> >On Thu, Apr 21, 2022 at 8:23 AM Joao Moreira <joao@overdrivepizza.com> wrote:
> >>
> >> On 2022-04-21 00:49, Peter Zijlstra wrote:
> >> > On Wed, Apr 20, 2022 at 03:40:41PM -0700, Joao Moreira wrote:
> >> >> > >
> >> >> > > If FineIBT needs it, I could reconsider.  But I think there's a strong
> >> >> > > case to be made that the linker should be doing that instead.
> >> >> >
> >> >> > That sounds reasonable to me (and reminds me of linker relaxation).
> >> >> > Joao, can you please work with Fangrui (LLD) and HJ (GNU binutils) to
> >> >> > determine how feasible this would be? I assume code outside the kernel
> >> >> > might enjoy such an optimization, too.  When that's the case, then it
> >> >> > probably makes more sense to "upstream" such "optimizations" from the
> >> >> > kernel-specific objtool into the toolchains.
> >> >>
> >> >> Alright, these are the greenlights I was hoping for.
> >> >>
> >> >> I went quickly into this with HJ and he mentioned that it should be
> >> >> doable
> >> >> in the linker, and that he has a patch for it in gcc (for local
> >> >> function,
> >> >> from what I could see):
> >> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html
> >> >>
> >> >> If @Fangrui is fine with it, I would like to try implementing this
> >> >> myself in
> >> >> lld (I'm still learning a lot about lld and having an actual problem
> >> >> to
> >> >> solve is the kind of fuel I need). Should take me a while, but I think
> >> >> this
> >> >> is not urgent, right? I can also go ahead and replicate HJ's gcc patch
> >> >> into
> >> >> clang, so we can also handle the local functions within the compiler
> >> >> (I
> >> >> think this makes a lot of sense).
> >> >>
> >> >> Once we have these in, I'll revisit FineIBT and extend the features to
> >> >> handle the FineIBT instrumentation. Hopefully we'll be released from
> >> >> needing
> >> >> objtool (famous last words?!).
> >> >>
> >> >> This sounds like a plan, but I'm ofc open to suggestions or different
> >> >> ideas/plans.
>
> Thanks for looping me in! (I mean: this is discussed beforehand, instead
> of GCC/GNU ld taking some steps and pushing LLVM toolchain to follow
> suite..)
> Though I think I may not have the full context here...
>
> I can see that you have a request to skip the endbr instruction
> (and potentially more instructions for FineIBT).
>
> The GCC patch https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590832.html
> achieves it. A linker implementation will handle more cases.
> This is somewhat similar to PowerPC64's global entry vs local entry.
> The linker can redirect a branch instruction to the local entry in some
> conditions. My concern is that inspecting the section content goes too far
> and breaks the spirit of ELF relocation resolving. The proper way is to use
> a symbol attribute (e.g. st_other).
>
> st_other bits are scarce, so any use needs to be prudent.

On the other hand, linker inspection doesn't require changes in object files.
It is much more user friendly.   X86-64 psABI already allows linker optimization
based on contents at relocation sites.   This goes one step further to contents
at relocation targets.

> >> > So trivially the plan sounds like: compiler fixes STB_LOCAL because it
> >> > has the scope, and the linker fixes everything else. However, that
> >> > seems
> >> > to assume that !STB_LOCAL will have ENDBR.
> >> >
> >> > This latter isn't true; for one there's __attribute__((nocf_check))
> >> > that
> >> > can be used to suppress ENDBR generation on a function.
> >> >
> >> > Alternatively the linker will need to 'read' the function to determine
> >> > if it has ENDBR, or we need to augment the ELF format such that we can
> >> > tell from that.
> >> >
> >> > So what exactly is the plan?
> >>
> >> I ran into too many broken dreams by trying to infer the presence of
> >> ENDBRs just by the symbol locality/linkage... not only because of the
> >> attribute, but also because of ancient assembly.
> >>
> >> So, my first thought was to use something similar to the
> >> __patchable_function_entries section
> >> (https://man7.org/linux/man-pages/man1/gcc.1.html), where we would have
> >> a section to mark all the placed ENDBR. But then it occurred to me that
> >> if we follow that road we'll miss the ENDBR placed in assembly unless we
> >> mark it manually, so I started thinking that reading the target
> >> instructions from the ELF object could be a more simplified approach,
> >> although a little more treacherous.
> >>
> >> I didn't decide yet what to try first -- any thoughts?
> >>
> >> @Fangrui's and @HJ's thoughts about this could be gold.
> >
> >You can't assume ENDBR existence just by symbol visibility.
> >Compiler knows if there is an ENDBR at function entry since
> >it is generated by compiler.   Otherwise, you need to check
> >the first 4 bytes at function entry,
> >
> >--
> >H.J.