mbox series

[RFC,v1,0/9] arm64: livepatch: Use DWARF Call Frame Information for frame pointer validation

Message ID 20220407202518.19780-1-madvenka@linux.microsoft.com (mailing list archive)
Headers show
Series arm64: livepatch: Use DWARF Call Frame Information for frame pointer validation | expand

Message

Madhavan T. Venkataraman April 7, 2022, 8:25 p.m. UTC
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Introduction
============

The livepatch feature requires an unwinder that can provide a reliable stack
trace. General requirements for a reliable unwinder are described in this
document from Mark Rutland:

	Documentation/livepatch/reliable-stacktrace.rst

The requirements have two parts:

1. The unwinder must be enhanced with certain features. E.g.,

	- Identifying successful termination of stack trace
	- Identifying unwindable and non-unwindable code
	- Identifying interrupts and exceptions occurring in the frame pointer
	  prolog and epilog
	- Identifying features such as kretprobe and ftrace graph tracing
	  that can modify the return address stored on the stack
	- Identifying corrupted/unreliable stack contents
	- Architecture-specific items that can render a stack trace unreliable
	  at certain points in code

	Some of these features are already in the arm64 unwinder. I am pursuing
	the rest in another patch series. This is work in progress. The latest
	submission as of this writing is here:

https://lore.kernel.org/linux-arm-kernel/20220117145608.6781-1-madvenka@linux.microsoft.com/T/#t

2. Validation of the frame pointer

	This assumes that the unwinder is based on the frame pointer (FP).
	The actual frame pointer that the unwinder uses cannot just be
	assumed to be correct. It needs to be validated somehow.

	This patch series is to address this requirement.

Validation of the FP (aka STACK_VALIDATION)
====================

The current approach in Linux is to use objtool, a build time tool, for this
purpose. When configured, objtool is invoked on every relocatable object file
during kernel build. It performs static analysis of the code in each file. It
walks the instructions in every function and notes the changes to the stack
pointer (SP) and the frame pointer (FP). It makes sure that the changes are in
accordance with the ABI rules. Once objtool completes successfully, the kernel
can then be used for livepatch purposes.

Objtool can have uses other than just FP validation. For instance, it can check
control flow integrity during its analysis.

Problem
=======

Objtool is complex and highly architecture-dependent. It presents a great
challenge to support livepatch on an architecture. We need an alternative
solution for livepatch, preferably one that is largely architecture
independent.

A different approach
====================

I would like to propose a different approach for FP validation - one that is
simpler as well as architecture-independent for the most part. This initial
work is for arm64. But it can easily be extended to other architectures.

In this approach, objtool is used to generate data for the unwinder. The
unwinder uses the data during a stack trace to validate the FP in each
frame. In other words, the FP is validated dynamically and not statically
at build time.

Background for the solution
===========================

DWARF is a debugging file format used by many compilers and debuggers to
support source level debugging. One of the components of DWARF is the
DWARF Call Frame Information (CFI). A special section called .debug_frame
is generated by the compiler to contain CFI. CFI supplies all the rules
required to compute the contents of every register at every instruction
address. A complete unwinder can be built from CFI.

However, DWARF is complex and building a complete unwinder from DWARF CFI
is a ship that has already sailed. That is not the purpose of this patch
series.

The solution
============

The goal here is to use the absolute minimum CFI needed to compute the FP at
every instruction address. The unwinder can compute the FP in each frame,
compare the actual FP with the computed one and validate the actual FP.

Objtool is enhanced to parse the CFI, extract just the rules required,
encode them in compact data structures and create special sections for
the rules. The unwinder uses the special sections to find the rules for
a given instruction address and compute the FP.

Objtool can be invoked as follows:

	objtool dwarf generate <object-file>

The version of the DWARF standard supported in this work is version 4. The
official documentation for this version is here:

	https://dwarfstd.org/doc/DWARF4.pdf

Section 6.4 contains the description of the CFI.

Register rules in CFI
=====================

CFI defines the Canonical Frame Address (CFA) as the value of the stack
pointer (SP) when a call instruction is executed. For the called function,
register values are expressed relative to the CFA.

DWARF CFI defines the following rules to obtain the value of a register
in the previous frame, given a current frame:

1. Same_Value:

	The current and previous values of the register are the same.

2. Val_Offset(N):

	The previous value is (CFA + N) where N is a signed offset.

3. Offset(N):

	The previous value is saved at (CFA + N).

4. register(R):

	The previous value is saved in register R.

5. Val_Expression(E):

	The previous value is the value produced by evaluating a given
	DWARF expression. DWARF expressions are evaluated on a stack. That
	is, operands are pushed and popped on a stack, DWARF operators are
	applied on them and the result is obtained.

6. Expression(E):

	The previous value is stored at the address computed from a DWARF
	expression.

7. Architectural:

	The previous value is obtained in an architecture-specific way via
	an architecture-specific "augmentor". Augmentors are vendor specific
	and are not part of the DWARF standard.

The minimum CFI needed for this work
====================================

Fortunately, gcc and clang only generate rules (1), (2) and (3) for the
SP, FP and return address (RA). So, this implementation only supports these
3 rules. These are very simple rules. At the time of this writing, these
rules are found to be sufficient for ARM, ARM64 and RISCV.

As an exercise, I also ran my CFI parser on X64. For a very small percentage
of the functions, DWARF expressions are indeed used. Of course, X64 already
has a complete objtool-based static stack validation scheme. So, X64 does
not need this.

I have not checked other architectures so far.

Compact encoding of CFI
=======================

The CFI is defined in a very generic format to allow all of the above rules
to be defined. Since this work uses only a minimal subset of the rules, the
supported CFI rules can be encoded in a more compact format. Also, this
subset of the rules can be statically evaluated at build time by objtool.
The kernel does not have to do any CFI parsing.

Unsupported rules
=================

There are three main reasons why I chose not to support rules (4) thru (7).

	- the compiler does not generate these for the SP, FP and RA for
	  arm64. So, arm64 does not need them.

	- They have complexity.

	- Objtool may not be able to do all the work for rules (4) thru (7).
	  The kernel may be required to evaluate expressions that involve
	  dereferencing an address, getting the value stored in a register,
	  etc.

That does not mean that they cannot be supported. But supporting them would
increase the complexity. I strongly suggest that this work be used only for
architectures where all of the parsing and record generation can be done in
objtool at build time. The kernel part of the implementation should be kept
simple.

How to deal with unsupported rules, if they are present?
========================================================

objtool does not generate any rule data for the code locations at which
unsupported rules exist. When the unwinder tries to find a rule for any of
these locations, it will not find any. Then, it will simply consider the
code locations unreliable from an unwind perspective. The requirement for
the unwinder is really that it must be able to identify reliable and
unreliable code. It can still do this.

So, livepatch can be supported even on architectures where unsupported rules
are generated by the compiler. It only means that the code ranges that contain
those rules will be considered unreliable by the unwinder. If they occur in
frequently used functions, then it is definitely a problem. If not, they may
result in some retries during the livepatch process. But livepatch can still
be done.

FP prolog, epilog and leaf functions
====================================

DWARF CFI rules allow objtool to recognize these cases. Objtool does not
generate any rule data for a function unless the frame is completely setup. If
an interrupt or an exception happens in code where the frame is not set up
or not set up completely, the unwinder will not find the rules for such code.
Automatically, the stack trace is considered unreliable as it should be.

Assembly functions
==================

DWARF CFI is generated by the compiler only for C functions. This means that
the unwinder will not find any rules for assembly code. So, assembly functions
are automatically considered unreliable from an unwind perspective.

For assembly functions, DWARF annotations are defined that can be placed in
assembly code. In that case, DWARF CFI can be generated for assembly functions
as well. However, DWARF annotations are a PITA to maintain. So, this is not
a good path to go down.

Now, there are certain points in assembly code that we would like to unwind
through reliably. Like interrupt and exception handlers. This is mainly for
getting reliable stack traces in these cases and reducing the number of
retries during the livepatch process. For these, unwind hints can be placed
at strategic points in assembly code. Only a small number of these hints
should be needed.

In this work, I have defined the following unwind hints so stack traces that
contain these can be done reliably:

	- Exception handlers
	- Interrupt handlers
	- FTrace tracer functions
	- FTrace graph return prepare code
	- FTrace callsites
	- Kretprobe Trampoline

Unwind hints are collected in a special section. Objtool converts unwind hints
to rule data just like the CFI based ones. The kernel does not need special
code to process unwind hints.

Generated code
==============

Generated code will not have any DWARF rules. Such code will be considered
unreliable by the kernel.

Size of the memory consumed within the kernel for this feature
==============================================================

This depends on the amount of code in the kernel which, in turn, depends on
the number of configs turned on. E.g., on the kernel on my arm64 system, the
.debug_frame section generated by the compiler in vmlinux is about 3.42 MB.
But the rule data generated by objtool for vmlinux is only about 1.06 MB.

Architecture-dependent part
===========================

The following architecture-dependent items must be supplied to support
an architecture:

	- Mapping from DWARF register numbers to actual registers. This is
	  required only for the SP and FP (and RA, if the architecture
	  defines an RA register).

	- Relocation information for the special section created by objtool.
	  Relocation types are processor-specific.

	- Architecture-specific rule checking. For instance, the return
	  address and the frame pointer are saved on adjacent locations
	  on the stack for arm64. This is checked by an arm64-specific
	  rule checker during CFI parsing.

The architecture dependent portion is very small.

Items like endianness and address size are already handled in generic code.

GitHub repository
=================

I have created a github repo to share my work. For each version I will create
a branch. For version 1, it is here:

https://github.com/madvenka786/linux/tree/dwarftool_v1

Please feel free to clone and check it out. And, please let me know if you
find any issues.

Testing
=======

I have run all of the livepatch selftests successfully. I have written a
couple of extra selftests myself which I will be posting separately.

There is an open source utility called dwarfdump. It parses the CFI and
produces ASCII output of the same. I have written a tool to extract that
information and compare it with what my parser generates. The comparison
is successful. So, the parser has been well tested.

I have extracted the same instruction addresses from vmlinux and fed
them to the lookup function in the kernel that the unwinder uses. I have
verified that the correct CFI rules are looked up for every single
input address. So, the lookup function has been well tested.

TBD
===

- Objtool generates a table of instruction addresses or PCs for the kernel.
  These need to be sorted for doing an efficient binary search. Currently,
  the sorting is done in the kernel during boot. I will add support to the
  sorttable script so that the sorting can be done at build time.

- I need to perform more rigorous testing with different scenarios. This
  is work in progress. Any ideas or suggestions are welcome.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

Madhavan T. Venkataraman (9):
  objtool: Parse DWARF Call Frame Information in object files
  objtool: Generate DWARF rules and place them in a special section
  dwarf: Build the kernel with DWARF information
  dwarf: Implement DWARF rule processing in the kernel
  dwarf: Implement DWARF support for modules
  arm64: unwinder: Add a reliability check in the unwinder based on
    DWARF CFI
  arm64: dwarf: Implement unwind hints
  dwarf: Enable livepatch for ARM64

Suraj Jitindar Singh (1):
  dwarf: Miscellaneous changes required for enabling livepatch

 arch/Kconfig                                  |   4 +-
 arch/arm64/Kconfig                            |   7 +
 arch/arm64/Kconfig.debug                      |   5 +
 arch/arm64/configs/defconfig                  |   1 +
 arch/arm64/include/asm/livepatch.h            |  42 ++
 arch/arm64/include/asm/sections.h             |   4 +
 arch/arm64/include/asm/stacktrace.h           |   9 +
 arch/arm64/include/asm/thread_info.h          |   4 +-
 arch/arm64/include/asm/unwind_hints.h         |  28 +
 arch/arm64/kernel/entry-ftrace.S              |  23 +
 arch/arm64/kernel/entry.S                     |   3 +
 arch/arm64/kernel/ftrace.c                    |  16 +
 arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +
 arch/arm64/kernel/signal.c                    |   4 +
 arch/arm64/kernel/stacktrace.c                | 131 ++++
 arch/arm64/kernel/vmlinux.lds.S               |  22 +
 include/linux/dwarf.h                         |  90 +++
 include/linux/ftrace.h                        |   4 +
 include/linux/module.h                        |   3 +
 kernel/Makefile                               |   1 +
 kernel/dwarf_fp.c                             | 305 ++++++++++
 kernel/module.c                               |  31 +
 scripts/Makefile.build                        |   4 +
 scripts/link-vmlinux.sh                       |   6 +
 tools/include/linux/dwarf.h                   |  90 +++
 tools/objtool/Build                           |   5 +
 tools/objtool/Makefile                        |  10 +-
 tools/objtool/arch/arm64/Build                |   2 +
 tools/objtool/arch/arm64/dwarf_arch.c         | 114 ++++
 tools/objtool/arch/arm64/dwarf_clang.c        |  53 ++
 .../arch/arm64/include/arch/dwarf_reg.h       |  17 +
 tools/objtool/builtin-dwarf.c                 |  75 +++
 tools/objtool/dwarf_op.c                      | 560 ++++++++++++++++++
 tools/objtool/dwarf_parse.c                   | 351 +++++++++++
 tools/objtool/dwarf_rules.c                   | 265 +++++++++
 tools/objtool/dwarf_util.c                    | 280 +++++++++
 tools/objtool/elf.c                           |   2 +-
 tools/objtool/include/objtool/builtin.h       |   1 +
 tools/objtool/include/objtool/dwarf_def.h     | 460 ++++++++++++++
 tools/objtool/include/objtool/elf.h           |   1 +
 tools/objtool/include/objtool/objtool.h       |   3 +
 tools/objtool/objtool.c                       |   1 +
 tools/objtool/sync-check.sh                   |   6 +
 tools/objtool/weak.c                          |  38 ++
 44 files changed, 3079 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/livepatch.h
 create mode 100644 arch/arm64/include/asm/unwind_hints.h
 create mode 100644 include/linux/dwarf.h
 create mode 100644 kernel/dwarf_fp.c
 create mode 100644 tools/include/linux/dwarf.h
 create mode 100644 tools/objtool/arch/arm64/Build
 create mode 100644 tools/objtool/arch/arm64/dwarf_arch.c
 create mode 100644 tools/objtool/arch/arm64/dwarf_clang.c
 create mode 100644 tools/objtool/arch/arm64/include/arch/dwarf_reg.h
 create mode 100644 tools/objtool/builtin-dwarf.c
 create mode 100644 tools/objtool/dwarf_op.c
 create mode 100644 tools/objtool/dwarf_parse.c
 create mode 100644 tools/objtool/dwarf_rules.c
 create mode 100644 tools/objtool/dwarf_util.c
 create mode 100644 tools/objtool/include/objtool/dwarf_def.h


base-commit: fc74e0a40e4f9fd0468e34045b0c45bba11dcbb2

Comments

Josh Poimboeuf April 8, 2022, 12:21 a.m. UTC | #1
On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
> The solution
> ============
> 
> The goal here is to use the absolute minimum CFI needed to compute the FP at
> every instruction address. The unwinder can compute the FP in each frame,
> compare the actual FP with the computed one and validate the actual FP.
> 
> Objtool is enhanced to parse the CFI, extract just the rules required,
> encode them in compact data structures and create special sections for
> the rules. The unwinder uses the special sections to find the rules for
> a given instruction address and compute the FP.
> 
> Objtool can be invoked as follows:
> 
> 	objtool dwarf generate <object-file>

Hi Madhaven,

This is quite interesting.  And it's exactly the kind of crazy idea I
can appreciate ;-)

Some initial thoughts:


1)

I have some concerns about DWARF's reliability, especially considering
a) inline asm, b) regular asm, and c) the kernel's tendency to push
compilers to their limits.

BUT, supplementing the frame pointer unwinding with DWARF, rather than
just relying on DWARF alone, does help a LOT.

I guess the hope is that cross-checking two "mostly reliable" things
against each other (frame pointers and DWARF) will give a reliable
result ;-)

In a general sense, I've never looked at DWARF's reliability, even for
just normal C code.  It would be good to have some way of knowing that
DWARF looks mostly sane for both GCC and Clang.  For example, maybe
somehow cross-checking it with objtool's knowledge.  And then of course
we'd have to hope that it stays bug-free in future compilers.

I'd also be somewhat concerned about assembly.  Since there's nothing
ensuring the unwind hints are valid, and will stay valid over time, I
wonder how likely it would be for that to break, and what the
implications would be.  Most likely I guess it would break silently, but
then get caught by the frame pointer cross-checking.  So a broken hint
might not get noticed for a long time, but at least it (hopefully)
wouldn't break reliable unwinding.

Also, inline asm can sometimes do stack hacks like
"push;do_something;pop" which isn't visible to the toolchain.  But
again, hopefully the frame pointer checking would fail and mark it
unreliable.

So I do have some worries about DWARF, but the fact that it's getting
"fact checked" by frame pointers might be sufficient.


2)

If I understand correctly, objtool is converting parts of DWARF to a new
format which can then be read by the kernel.  In that case, please don't
call it DWARF as that will cause a lot of confusion.

There are actually several similarities between your new format and ORC,
which is also an objtool-created DWARF alternative.  It would be
interesting to see if they could be combined somehow.


3)

Objtool has become an integral part of x86-64, due to security and
performance features and toolchain workarounds.

Not *all* of its features require the full "branch validation" which
follows all code paths -- and was the hardest part to get right -- but
several features *do* need that: stack validation, ORC, uaccess
validation, noinstr validation.

Objtool has been picking up a lot of steam (and features) lately, with
more features currently in active development.  And lately there have
been renewed patches for porting it to powerpc and arm64 (and rumors of
s390).

If arm64 ever wants one of those features -- particularly a "branch
validation" based feature -- I think it would make more sense to just do
the stack validation in objtool, rather than the DWARF supplementation
approach.

Just to give an idea of what objtool already supports and how useful it
has become for x86, here's an excerpt from some documentation I've been
working on, since I'm in the middle of rewriting the interface to make
it more modular.  This is a list of all its current features:


Features
--------

Objtool has the following features:


- Stack unwinding metadata validation -- useful for helping to ensure
  stack traces are reliable for live patching

- ORC unwinder metadata generation -- a faster and more precise
  alternative to frame pointer based unwinding

- Retpoline validation -- ensures that all indirect calls go through
  retpoline thunks, for Spectre v2 mitigations

- Retpoline call site annotation -- annotates all retpoline thunk call
  sites, enabling the kernel to patch them inline, to prevent "thunk
  funneling" for both security and performance reasons

- Non-instrumentation validation -- validates non-instrumentable
  ("noinstr") code rules, preventing unexpected instrumentation in
  low-level C entry code

- Static call annotation -- annotates static call sites, enabling the
  kernel to implement inline static calls, a faster alternative to some
  indirect branches

- Uaccess validation -- validates uaccess rules for a proper safe
  implementation of Supervisor Mode Access Protection (SMAP)

- Straight Line Speculation validation -- validates certain SLS
  mitigations

- Indirect Branch Tracking validation -- validates Intel CET IBT rules
  to ensure that all functions referenced by function pointers have
  corresponding ENDBR instructions

- Indirect Branch Tracking annotation -- annotates unused ENDBR
  instruction sites, enabling the kernel to "seal" them (replace them
  with NOPs) to further harden IBT

- Function entry annotation -- annotates function entries, enabling
  kernel function tracing

- Other toolchain hacks which will go unmentioned at this time...
Peter Zijlstra April 8, 2022, 10:55 a.m. UTC | #2
On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:

> [-- application/octet-stream is unsupported (use 'v' to view this part) --]

Your emails are unreadable :-(
Peter Zijlstra April 8, 2022, 11:41 a.m. UTC | #3
Right; so not having seen the patches due to Madhaven's email being
broken, I can perhaps less appreciated the crazy involved.

On Thu, Apr 07, 2022 at 05:21:51PM -0700, Josh Poimboeuf wrote:
> 2)
> 
> If I understand correctly, objtool is converting parts of DWARF to a new
> format which can then be read by the kernel.  In that case, please don't
> call it DWARF as that will cause a lot of confusion.
> 
> There are actually several similarities between your new format and ORC,
> which is also an objtool-created DWARF alternative.  It would be
> interesting to see if they could be combined somehow.

What Josh said; please use/extend ORC.

I really don't understand where all this crazy is coming from; why does
objtool need to do something radically weird for ARM64?

There are existing ARM64 patches for objtool; in fact they have recently
been re-posted:

 https://lkml.kernel.org/r/20220407120141.43801-1-chenzhongjin@huawei.com

The only tricky bit seems to be the whole jump-table issue. Using DWARF
as input to deal with jump-tables should be possible -- exceedingly
overkill, but possible I suppose. Mandating DWARF sucks though, compile
times are so much worse with DWARVES on :/

Once objtool can properly follow/validate ARM64 code, it should be
fairly straight forward to have it generate ORC data just like it does
on x86_64.
Peter Zijlstra April 8, 2022, 11:54 a.m. UTC | #4
On Fri, Apr 08, 2022 at 12:55:11PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
> 
> > [-- application/octet-stream is unsupported (use 'v' to view this part) --]
> 
> Your emails are unreadable :-(

List copy is OK, so perhaps it's due to how Josh bounced them..
Peter Zijlstra April 8, 2022, 12:06 p.m. UTC | #5
On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
> The solution
> ============
> 
> The goal here is to use the absolute minimum CFI needed to compute the FP at
> every instruction address. The unwinder can compute the FP in each frame,
> compare the actual FP with the computed one and validate the actual FP.
> 
> Objtool is enhanced to parse the CFI, extract just the rules required,
> encode them in compact data structures and create special sections for
> the rules. The unwinder uses the special sections to find the rules for
> a given instruction address and compute the FP.
> 
> Objtool can be invoked as follows:
> 
> 	objtool dwarf generate <object-file>
> 
> The version of the DWARF standard supported in this work is version 4. The
> official documentation for this version is here:
> 
> 	https://dwarfstd.org/doc/DWARF4.pdf
> 
> Section 6.4 contains the description of the CFI.

The problem is of course that DWARF is only available for compiler
generated code and doesn't cover assembly code, of which is there is
always lots.

I suppose you can go add DWARF annotations to all the assembly, but IIRC
those are pretty terrible. We were *REALLY* happy to delete all that
nasty from the x86 code.

On top of that, AFAIK compilers don't generally consider DWARF
generation to be a correctness issue. For them it's debug info and
having it be correct is nice but not required. So using it as input for
something that's required to be correct, seems unfortunate.
Josh Poimboeuf April 8, 2022, 2:34 p.m. UTC | #6
On Fri, Apr 08, 2022 at 01:54:28PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2022 at 12:55:11PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
> > 
> > > [-- application/octet-stream is unsupported (use 'v' to view this part) --]
> > 
> > Your emails are unreadable :-(
> 
> List copy is OK, so perhaps it's due to how Josh bounced them..

Corporate email Mimecast fail when I bounced them, sorry :-/
Madhavan T. Venkataraman April 10, 2022, 5:47 p.m. UTC | #7
On 4/8/22 05:55, Peter Zijlstra wrote:
> On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
> 
>> [-- application/octet-stream is unsupported (use 'v' to view this part) --]
> 
> Your emails are unreadable :-(

I am not sure why the emails are unreadable. Any suggestions? Should I resend? Please let me know.
Sorry about this.

Madhavan
Josh Poimboeuf April 11, 2022, 4:34 p.m. UTC | #8
On Sun, Apr 10, 2022 at 12:47:46PM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 4/8/22 05:55, Peter Zijlstra wrote:
> > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
> > 
> >> [-- application/octet-stream is unsupported (use 'v' to view this part) --]
> > 
> > Your emails are unreadable :-(
> 
> I am not sure why the emails are unreadable. Any suggestions? Should I resend? Please let me know.
> Sorry about this.

That was actually my (company's) fault when I bounced the patches to Peter.
Madhavan T. Venkataraman April 11, 2022, 5:18 p.m. UTC | #9
Hi Josh,

On 4/7/22 19:21, Josh Poimboeuf wrote:
> On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
>> The solution
>> ============
>>
>> The goal here is to use the absolute minimum CFI needed to compute the FP at
>> every instruction address. The unwinder can compute the FP in each frame,
>> compare the actual FP with the computed one and validate the actual FP.
>>
>> Objtool is enhanced to parse the CFI, extract just the rules required,
>> encode them in compact data structures and create special sections for
>> the rules. The unwinder uses the special sections to find the rules for
>> a given instruction address and compute the FP.
>>
>> Objtool can be invoked as follows:
>>
>> 	objtool dwarf generate <object-file>
> 
> Hi Madhaven,
> 
> This is quite interesting.  And it's exactly the kind of crazy idea I
> can appreciate ;-)
> 

A little crazy is a good thing sometimes.

> Some initial thoughts:
> 
> 
> 1)
> 
> I have some concerns about DWARF's reliability, especially considering
> a) inline asm, b) regular asm, and c) the kernel's tendency to push
> compilers to their limits.
> 

I am thinking of implementing a DWARF verifier to make sure that the DWARF information
in .debug_frame is correct. I am still in the process of designing this. I will keep
you posted on that. This should address (a) and (c).

As for (b), the compiler does not generate any DWARF rules for ASM code. DWARF
annotations are a PITA to maintain. So, in my current design regular ASM functions are 
considered unreliable from an unwind perspective except the places that have unwind
hints. Unwind hints are only needed for places that tend to occur frequently in stack
traces. So, it would be just a handful of unwind hints which can be maintained.

As you know, ASM functions come in two flavors - SYM_CODE() functions and SYM_FUNC()
functions. SYM_CODE functions are, by definition, unreliable from an unwind perspective
because they don't follow ABI rules and they don't set up any frame pointer. In my
reliable stack trace patch series, I have a patch based on the opinion of the reviewers
to mark these functions so the unwinder can recognize them and declare the stack trace
unreliable. So, the only ASM functions that matter in (b) are the SYM_FUNC functions.
For now, I have considered them to be unreliable. But I will analyze those functions to
see if any of them can occur frequently in stack traces. If some of these functions
can occur frequently in stack traces, I need to address them. I will see if unwind hints
are a good fit. I will get back to you on this.

> BUT, supplementing the frame pointer unwinding with DWARF, rather than
> just relying on DWARF alone, does help a LOT.
> 

Yes.

> I guess the hope is that cross-checking two "mostly reliable" things
> against each other (frame pointers and DWARF) will give a reliable
> result ;-)
> 

Yes!

> In a general sense, I've never looked at DWARF's reliability, even for
> just normal C code.  It would be good to have some way of knowing that
> DWARF looks mostly sane for both GCC and Clang.  For example, maybe
> somehow cross-checking it with objtool's knowledge.  And then of course
> we'd have to hope that it stays bug-free in future compilers.
> 

This is a valid point. So far, I find that gcc generates reliable DWARF information.
But there are two bugs in what Clang generates. I have added workarounds in my
parser to compensate.

So, I think a DWARF verifier is an option that architectures can use. At this point,
I don't want to mandate a verifier on every architecture. But that is a discussion
that we can have once I have a verifier ready.

> I'd also be somewhat concerned about assembly.  Since there's nothing
> ensuring the unwind hints are valid, and will stay valid over time, I
> wonder how likely it would be for that to break, and what the
> implications would be.  Most likely I guess it would break silently, but
> then get caught by the frame pointer cross-checking.  So a broken hint
> might not get noticed for a long time, but at least it (hopefully)
> wouldn't break reliable unwinding.
> 

Yes. That is my thinking as well. When the unwinder checks the actual FP with the
computed FP, any mismatch will be treated as unreliable code for unwind. So,
apart from some retries during the livepatch process, this is most probably not
a problem.

Now,  I set a flag for an unwind hint so that the unwinder knows that it is
processing an unwind hint. I could generate a warning if an unwind hint does not
result in a reliable unwind of the frame. This would bring the broken hint
to people's attention.


> Also, inline asm can sometimes do stack hacks like
> "push;do_something;pop" which isn't visible to the toolchain.  But
> again, hopefully the frame pointer checking would fail and mark it
> unreliable.
> 
> So I do have some worries about DWARF, but the fact that it's getting
> "fact checked" by frame pointers might be sufficient.
> 

Exactly.

> 
> 2)
> 
> If I understand correctly, objtool is converting parts of DWARF to a new
> format which can then be read by the kernel.  In that case, please don't
> call it DWARF as that will cause a lot of confusion.
> 

OK. I will rename it.

> There are actually several similarities between your new format and ORC,
> which is also an objtool-created DWARF alternative.  It would be
> interesting to see if they could be combined somehow.
> 

I will certainly look into it. So, if I decide to merge the two, I might want
to make a minor change to the ORC structure. Would that be OK with you?

> 
> 3)
> 
> Objtool has become an integral part of x86-64, due to security and
> performance features and toolchain workarounds.
> 
> Not *all* of its features require the full "branch validation" which
> follows all code paths -- and was the hardest part to get right -- but
> several features *do* need that: stack validation, ORC, uaccess
> validation, noinstr validation.
> 
> Objtool has been picking up a lot of steam (and features) lately, with
> more features currently in active development.  And lately there have
> been renewed patches for porting it to powerpc and arm64 (and rumors of
> s390).
> 
> If arm64 ever wants one of those features -- particularly a "branch
> validation" based feature -- I think it would make more sense to just do
> the stack validation in objtool, rather than the DWARF supplementation
> approach.
> 

First off, I think that objtool does a great job for X64. I only want to implement
frame pointer validation in a different way. All the other features of objtool
(listed below) are great. I have admired the amount of work you guys have put into
the X64 part.

These are the reasons why I tried the DWARF based method:

- My implementation is largely architecture independent. There are a couple of
  minor pieces that are architecture-specific, but they are minor in nature.
  So, if an architecture wanted to support the livepatch feature but did not
  want to do a heavy weight objtool implementation, then it has an option.
  There has been some debate about whether static analysis should be mandated
  for livepatch. My patch series is an attempt to provide an option.

- To get an objtool static analysis implementation working for an architecture
  as reliably as X64 and getting it reviewed and upstreamed can take years. It took
  years for X64, am I right? I mean, it has been quite a while since the original
  patch series for arm64 was posted. There have been only one or two minor comments
  so far. I am sure arm64 linux users would very much want to have livepatch available
  ASAP to be able to install security fixes without downtime. This is an immediate need.

- No software is bug free. So, even if static analysis is implemented for an architecture,
  it would be good to have another method of verifying the unwind rules generated from
  the static analysis. DWARF can provide that additional verification.

> Just to give an idea of what objtool already supports and how useful it
> has become for x86, here's an excerpt from some documentation I've been
> working on, since I'm in the middle of rewriting the interface to make
> it more modular.  This is a list of all its current features:
> 
> 
> Features
> --------
> 
> Objtool has the following features:
> 
> 
> - Stack unwinding metadata validation -- useful for helping to ensure
>   stack traces are reliable for live patching
> 
> - ORC unwinder metadata generation -- a faster and more precise
>   alternative to frame pointer based unwinding
> 
> - Retpoline validation -- ensures that all indirect calls go through
>   retpoline thunks, for Spectre v2 mitigations
> 
> - Retpoline call site annotation -- annotates all retpoline thunk call
>   sites, enabling the kernel to patch them inline, to prevent "thunk
>   funneling" for both security and performance reasons
> 
> - Non-instrumentation validation -- validates non-instrumentable
>   ("noinstr") code rules, preventing unexpected instrumentation in
>   low-level C entry code
> 
> - Static call annotation -- annotates static call sites, enabling the
>   kernel to implement inline static calls, a faster alternative to some
>   indirect branches
> 
> - Uaccess validation -- validates uaccess rules for a proper safe
>   implementation of Supervisor Mode Access Protection (SMAP)
> 
> - Straight Line Speculation validation -- validates certain SLS
>   mitigations
> 
> - Indirect Branch Tracking validation -- validates Intel CET IBT rules
>   to ensure that all functions referenced by function pointers have
>   corresponding ENDBR instructions
> 
> - Indirect Branch Tracking annotation -- annotates unused ENDBR
>   instruction sites, enabling the kernel to "seal" them (replace them
>   with NOPs) to further harden IBT
> 
> - Function entry annotation -- annotates function entries, enabling
>   kernel function tracing
> 
> - Other toolchain hacks which will go unmentioned at this time...
> 

I completely agree.

So, it is just frame pointer validation for livepatch I am trying to look at.

Thanks!

Madhavan
Madhavan T. Venkataraman April 11, 2022, 5:26 p.m. UTC | #10
On 4/8/22 06:41, Peter Zijlstra wrote:
> 
> Right; so not having seen the patches due to Madhaven's email being
> broken, I can perhaps less appreciated the crazy involved.
> 

Crazy like a fox.

> On Thu, Apr 07, 2022 at 05:21:51PM -0700, Josh Poimboeuf wrote:
>> 2)
>>
>> If I understand correctly, objtool is converting parts of DWARF to a new
>> format which can then be read by the kernel.  In that case, please don't
>> call it DWARF as that will cause a lot of confusion.
>>
>> There are actually several similarities between your new format and ORC,
>> which is also an objtool-created DWARF alternative.  It would be
>> interesting to see if they could be combined somehow.
> 
> What Josh said; please use/extend ORC.
> 

Yes. I am looking into it.

> I really don't understand where all this crazy is coming from; why does
> objtool need to do something radically weird for ARM64?
> 
> There are existing ARM64 patches for objtool; in fact they have recently
> been re-posted:
> 
>  https://lkml.kernel.org/r/20220407120141.43801-1-chenzhongjin@huawei.com
> 
> The only tricky bit seems to be the whole jump-table issue. Using DWARF
> as input to deal with jump-tables should be possible -- exceedingly
> overkill, but possible I suppose. Mandating DWARF sucks though, compile
> times are so much worse with DWARVES on :/
> 
> Once objtool can properly follow/validate ARM64 code, it should be
> fairly straight forward to have it generate ORC data just like it does
> on x86_64.
> 

My reasons for attempting the DWARF based implementation:

- My implementation is largely architecture independent. There are a couple of
  minor pieces that are architecture-specific, but they are minor in nature.
  So, if an architecture wanted to support the livepatch feature but did not
  want to do a heavy weight objtool implementation, then it has an option.
  There has been some debate about whether static analysis should be mandated
  for livepatch. My patch series is an attempt to provide an option.

- To get an objtool static analysis implementation working for an architecture
  as reliably as X64 and getting it reviewed and upstreamed can take years. It took
  years for X64, am I right? I mean, it has been quite a while since the original
  patch series for arm64 was posted. There have been only one or two minor comments
  so far. I am sure arm64 linux users would very much want to have livepatch available
  ASAP to be able to install security fixes without downtime. This is an immediate need.

- No software is bug free. So, even if static analysis is implemented for an architecture,
  it would be good to have another method of verifying the unwind rules generated from
  the static analysis. DWARF can provide that additional verification.

Madhavan
Madhavan T. Venkataraman April 11, 2022, 5:35 p.m. UTC | #11
On 4/8/22 07:06, Peter Zijlstra wrote:
> On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
>> The solution
>> ============
>>
>> The goal here is to use the absolute minimum CFI needed to compute the FP at
>> every instruction address. The unwinder can compute the FP in each frame,
>> compare the actual FP with the computed one and validate the actual FP.
>>
>> Objtool is enhanced to parse the CFI, extract just the rules required,
>> encode them in compact data structures and create special sections for
>> the rules. The unwinder uses the special sections to find the rules for
>> a given instruction address and compute the FP.
>>
>> Objtool can be invoked as follows:
>>
>> 	objtool dwarf generate <object-file>
>>
>> The version of the DWARF standard supported in this work is version 4. The
>> official documentation for this version is here:
>>
>> 	https://dwarfstd.org/doc/DWARF4.pdf
>>
>> Section 6.4 contains the description of the CFI.
> 
> The problem is of course that DWARF is only available for compiler
> generated code and doesn't cover assembly code, of which is there is
> always lots.
> 

Yes. But assembly functions are of two types:

SYM_CODE_*() functions
SYM_FUNC_*() functions

SYM_CODE functions are, by definition, special functions that don't follow any ABI rules.
They don't set up a frame. Based on the opinion of ARM64 experts, these need to be
recognized by the unwinder and, if they are present in a stack trace, the stack trace
must be considered unreliable. I have, in fact, submitted a patch to implement that.

So, only SYM_FUNC*() functions are relevant for this part. I will look into these for arm64
and check if any of them can occur frequently in stack traces. If any of them is likely
to occur frequently in stack traces, I must address them. If there are only a few such
functions, unwind hints may be sufficient. I will get back to you on this.

> I suppose you can go add DWARF annotations to all the assembly, but IIRC
> those are pretty terrible. We were *REALLY* happy to delete all that
> nasty from the x86 code.
> 

DWARF annotations are a PITA to maintain. I will never recommend that!

> On top of that, AFAIK compilers don't generally consider DWARF
> generation to be a correctness issue. For them it's debug info and
> having it be correct is nice but not required. So using it as input for
> something that's required to be correct, seems unfortunate.

It is only debug info. But if that info can be verified, then it is usable for livepatch
purposes. I am thinking of implementing a verifier since DWARF reliability is a valid
concern. I will keep you posted.

Thanks!

Madhavan
Chen Zhongjin April 12, 2022, 8:32 a.m. UTC | #12
Hi Madhaven,

Sorry I sent the last email as HTML. This is a plain text resend.

On 2022/4/12 1:18, Madhavan T. Venkataraman wrote:

>> In a general sense, I've never looked at DWARF's reliability, even for
>> just normal C code.  It would be good to have some way of knowing that
>> DWARF looks mostly sane for both GCC and Clang.  For example, maybe
>> somehow cross-checking it with objtool's knowledge.  And then of course
>> we'd have to hope that it stays bug-free in future compilers.
>>
> 
> This is a valid point. So far, I find that gcc generates reliable DWARF information.
> But there are two bugs in what Clang generates. I have added workarounds in my
> parser to compensate.
> 
> So, I think a DWARF verifier is an option that architectures can use. At this point,
> I don't want to mandate a verifier on every architecture. But that is a discussion
> that we can have once I have a verifier ready.
>
I'm concerning that depending on compilers to generate correct 
information can become a trouble because we linux kernel side can rarely 
fix what compilers make. That's also why the gcc plugin idea was 
objected in the objtool migration.

If your parser can solve this it sounds more doable.

>> I'd also be somewhat concerned about assembly.  Since there's nothing
>> ensuring the unwind hints are valid, and will stay valid over time, I
>> wonder how likely it would be for that to break, and what the
>> implications would be.  Most likely I guess it would break silently, but
>> then get caught by the frame pointer cross-checking.  So a broken hint
>> might not get noticed for a long time, but at least it (hopefully)
>> wouldn't break reliable unwinding.
>>
> 
> Yes. That is my thinking as well. When the unwinder checks the actual FP with the
> computed FP, any mismatch will be treated as unreliable code for unwind. So,
> apart from some retries during the livepatch process, this is most probably not
> a problem.
> 
> Now,  I set a flag for an unwind hint so that the unwinder knows that it is
> processing an unwind hint. I could generate a warning if an unwind hint does not
> result in a reliable unwind of the frame. This would bring the broken hint
> to people's attention.
> 
> 
>> Also, inline asm can sometimes do stack hacks like
>> "push;do_something;pop" which isn't visible to the toolchain.  But
>> again, hopefully the frame pointer checking would fail and mark it
>> unreliable.
>>
>> So I do have some worries about DWARF, but the fact that it's getting
>> "fact checked" by frame pointers might be sufficient.
>>
> 
> Exactly.
> 
I'm wondering how much functions will give a unreliable result because 
any unreliable function shows in stack trace will cause livepatch 
fail/retry. IIUC all unmarked assembly functions will considered 
unreliable and cause problem. It can be a burden to mark all of them.

> - No software is bug free. So, even if static analysis is implemented for an architecture,
>    it would be good to have another method of verifying the unwind rules generated from
>    the static analysis. DWARF can provide that additional verification.
> 
I'm wondering how much functions will give a unreliable result because 
any unreliable function shows in stack trace will cause livepatch 
fail/retry. IIUC all unmarked assembly functions will considered 
unreliable and cause problem. It can be a burden to mark all of them.

> 
> So, it is just frame pointer validation for livepatch I am trying to look at.
> 
My support reason for FP with validation is that it provides a guarantee 
for FP unwinder. FP and ORC use absolute and relative for stack unwind 
to unwind stack respectively, however FP has been considered unreliable. 
Is there any feature depends on FP? If so it can be more persuasive.


Also this patch is much more completed than migration for objtool. It 
would be nice if this could be put into use quickly. The objtool-arm64 
is less than half done, but I'm going to relies as much as possible on 
current objtool components, so no more feasibility validation is required.

By the way, I was thinking about a corner case, because arm64 CALL 
instruction won't push LR onto stack atomically as x86. Before push LR, 
FP to save frame there still can be some instructions such as bti, 
paciasp. If an irq happens here, the stack frame is not constructed so 
the FP unwinder will omit this function and provides a wrong stack trace 
to livepatch.

It's just a guess and I have not built the test case. But I think it's a 
defect on arm64 that FP unwinder can't work properly on prologue and 
epilogue. Do you have any idea about this?

Thanks for your time,
Chen
Madhavan T. Venkataraman April 12, 2022, 5:27 p.m. UTC | #13
On 4/12/22 02:30, Chen Zhongjin wrote:
> Hi Madhaven,
> 
> On 2022/4/12 1:18, Madhavan T. Venkataraman wrote:
>>
>> This is a valid point. So far, I find that gcc generates reliable DWARF information.
>> But there are two bugs in what Clang generates. I have added workarounds in my
>> parser to compensate.
>>
>> So, I think a DWARF verifier is an option that architectures can use. At this point,
>> I don't want to mandate a verifier on every architecture. But that is a discussion
>> that we can have once I have a verifier ready.
> I'm concerning that depending on compilers to generate correct information can become
> a trouble because we linux kernel side can rarely fix what compilers make. That's also why
> the gcc plugin idea was objected in the objtool migration.
> 
> If your parser can solve this it sounds more doable.
> 

So far, I find that gcc generates reliable DWARF information. Clang did have two bugs for
which the parser compensates.

So, what is needed is a DWARF verifier which can find buggy DWARF information. I am working on
it.

Having said that,  I think the DWARF information for the stack and frame pointer is a *lot*
simpler than the other debug stuff. So, there may be a couple of existing bugs that need to be
discovered and fixed. But I think the likelihood of bugs appearing in this area in the future is
low but it can happen. I agree that there needs to be a way to discover and flag the bugs
when they do appear.

But compiler bugs can also affect objtool and cause problems in it. If I am not mistaken,
they ran into this multiple times on the X86 side and had to get fixes done. Josh knows better.
Compiler bugs and optimizations have always been problematic from this perspective and they
will potentially continue to be.

>>
>> Yes. That is my thinking as well. When the unwinder checks the actual FP with the
>> computed FP, any mismatch will be treated as unreliable code for unwind. So,
>> apart from some retries during the livepatch process, this is most probably not
>> a problem.
>>
>> Now,  I set a flag for an unwind hint so that the unwinder knows that it is
>> processing an unwind hint. I could generate a warning if an unwind hint does not
>> result in a reliable unwind of the frame. This would bring the broken hint
>> to people's attention.
>>
>>
>> Also, inline asm can sometimes do stack hacks like
>> "push;do_something;pop" which isn't visible to the toolchain.  But
>> again, hopefully the frame pointer checking would fail and mark it
>> unreliable.
>>
> I'm wondering how much functions will give a unreliable result because any unreliable
> function shows in stack trace will cause livepatch fail/retry. IIUC all unmarked assembly
> functions will considered unreliable and cause problem. It can be a burden to mark all
> of them.

It is not a burden to mark all of them. For instance, I have submitted a patch where I mark
all the SYM_CODE*() functions by overriding the SYM_CODE_START()/END() macros in arm64. So,
the changes are very small and self-contained.

A good part of the assembly functions are defined with SYM_CODE_*(). These, by definition,
are low-level functions that do not follow ABI rules. IIRC, Objtool does not perform static
analysis on these today. These need to be recognized by the unwinder in the kernel and handled.
Josh, please correct me if I am wrong. So, this is a problem even if we had static analysis
in Objtool for arm64.

As for functions defined with SYM_FUNC_*(), they are supposed to have proper frame setup and
teardown. But most of them do not appear to have a proper frame pointer prolog and epilog today
in arm64. Some of these will probably never have an FP prolog or epilog because they are high
performance functions or specialized functions and the extra overhead may be unacceptable. Some of
the SYM_FUNC*() functions are leaf functions that don't need an FP prolog or epilog.

With static analysis, Objtool will flag all such functions. Either a proper FP prolog and epilog
have to be introduced in the code or the functions need to be flagged as unreliable from an unwind
perspective. If any of these functions occurs frequently in stack traces, then, either a proper
FP prolog and epilog have to be introduced in the function code. Or, unwind hints have to be placed
at strategic points. In either case, there is a maintenance burden although developers may prefer
one over the other on a case-by-case basis.

The DWARF situation is the same. For the frequently occurring assembly functions, unwind hints
need to be defined. Currently, I have undertaken to study the SYM_FUNC*() functions in arm64
to see if I can determine which ones belong to this category. Also, I am going to be doing targeted
livepatch testing to see if any of these functions will cause many retries during the livepatch
process. If they don't, then this is not a problem.


>> - No software is bug free. So, even if static analysis is implemented for an architecture,
>>   it would be good to have another method of verifying the unwind rules generated from
>>   the static analysis. DWARF can provide that additional verification.
> To me verifying ORC with DWARF a little odd, cuz they are running with different unwind
> mechanism. For normal scenario which calling convention is obeyed, ORC can give a
> promised reliable stack trace, while when it easily involve bug in assembly codes,
> DWARF also can't work.
> 

I am not sure I follow. With both DWARF and ORC, stack pointer and frame pointer offsets are recorded
for every instruction address. These offsets have to be the same regardless of which one you use.

The only difference is that for an assembly function that has a proper FP prolog and epilog, Objtool
static analysis is able to generate those offsets. With DWARF, the compiler does not generate any
offsets for assembly functions. I have to rely on unwind hints. BTW, I only need to do this for functions
that occur frequently in stack traces.


> My support for FP with validation is that it provides a guarantee for FP unwinder. FP and ORC
> use absolute and relative for stack unwind to unwind stack respectively, however FP has been
> considered unreliable. Is there any feature depends on FP? If so it can be more persuasive.
> 

Yes. Static analysis makes sure that functions are following ABI rules. So, it provides a static
guarantee. And that is great.

However, with DWARF, even if some functions don't follow ABI rules, a reliable unwinder can still
be provided as long as the DWARF information generated by the compiler is correct. For instance,
let us say that the compiler generates code for a function with a call to another function before
the FP has been setup properly. If the DWARF information is correctly generated, the unwinder can
see that a stack trace involving the called function is unreliable.

Also, hypothetically, if a buggy kernel function corrupts the frame pointer or the stack
pointer, dynamic validation can catch it.

> 
> Also this patch is much more completed than migration for objtool. It would be
> nice if this could be put into use quickly. The objtool-arm64 is less than half done, but I'm going
> to relies as much as possible on current objtool components, so no more feasibility validation
> is required.
> 

The approach in your patch series is certainly feasible. I don't deny that at all. And, believe me,
I would like the community to take interest in it and review it. If I get a chance, I will also
participate in that review.

As I mentioned in a previous email, my attempt is to come up with a largely architecture independent
solution to the FP validation problem with a quicker time to market. That is all.

> By the way, I was thinking about a corner case, because arm64 CALL instruction won't push LR
> onto stack atomically as x86. Before push LR, FP to save frame there still can be some instructions
> such as bti, paciasp. If an irq happens here, the stack frame is not constructed
> so the FP unwinder will omit this function and provides a wrong stack trace to livepatch.
> 
> 

With DWARF, the unwinder will see that there are no DWARF rules associated with those PCs that occur
before the FP is completely setup. It will mark the stack trace as unreliable. So, these cases are
already handled as I have explained in my cover letter.

> It's just a guess and I have not built the test case. But I think it's a defect on arm64 that FP
> unwinder can't work properly on prologue and epilogue. Do you have any idea about this?
> 

There is no defect. The frame pointer prolog can have multiple instructions before the frame is set up.
Any interrupt or exception happening on those instructions will have an unreliable stack trace by
definition. A reliable unwinder must be able to recognize that case and mark the stack trace as unreliable.
That is all.

Thanks for your comments.

Madhavan
Madhavan T. Venkataraman April 14, 2022, 2:11 p.m. UTC | #14
Hi Josh, Peter,

I have decided to accept your comment that using the compiler generated DWARF info
is probably not reliable. I can write a DWARF verifier. But I decided that if I can
write a DWARF verifier, I can use that same code to generate the same information as
DWARF CFI.

So, I am going to implement this in the traditional way as you wanted. Fortunately,
I only need to decode a small subset of the instruction formats (just the ones that
affect the SP and FP) and branch and return instructions to be able to compute the
SP and FP offsets at every instruction address. I only need a small part of the
static analysis code.

In other words, I am removing the crazy from my patch series. I am still retaining the
idea of dynamic FP validation rather than static validation. IMO, that will be able to
handle even non-ABI compliant functions and FP corruptions from buggy kernel functions.

Huawei has resubmitted the objtool patch set. When the full blown static analysis
is implemented by them and accepted by the community, my changes can be easily merged
with theirs. So, my solution is not a competing solution. In fact, it will provide an
additional level of robustness.

I will make these changes, test and send out version 2.

Thanks for your comments.

Madhavan

On 4/7/22 19:21, Josh Poimboeuf wrote:
> On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
>> The solution
>> ============
>>
>> The goal here is to use the absolute minimum CFI needed to compute the FP at
>> every instruction address. The unwinder can compute the FP in each frame,
>> compare the actual FP with the computed one and validate the actual FP.
>>
>> Objtool is enhanced to parse the CFI, extract just the rules required,
>> encode them in compact data structures and create special sections for
>> the rules. The unwinder uses the special sections to find the rules for
>> a given instruction address and compute the FP.
>>
>> Objtool can be invoked as follows:
>>
>> 	objtool dwarf generate <object-file>
> 
> Hi Madhaven,
> 
> This is quite interesting.  And it's exactly the kind of crazy idea I
> can appreciate ;-)
> 
> Some initial thoughts:
> 
> 
> 1)
> 
> I have some concerns about DWARF's reliability, especially considering
> a) inline asm, b) regular asm, and c) the kernel's tendency to push
> compilers to their limits.
> 
> BUT, supplementing the frame pointer unwinding with DWARF, rather than
> just relying on DWARF alone, does help a LOT.
> 
> I guess the hope is that cross-checking two "mostly reliable" things
> against each other (frame pointers and DWARF) will give a reliable
> result ;-)
> 
> In a general sense, I've never looked at DWARF's reliability, even for
> just normal C code.  It would be good to have some way of knowing that
> DWARF looks mostly sane for both GCC and Clang.  For example, maybe
> somehow cross-checking it with objtool's knowledge.  And then of course
> we'd have to hope that it stays bug-free in future compilers.
> 
> I'd also be somewhat concerned about assembly.  Since there's nothing
> ensuring the unwind hints are valid, and will stay valid over time, I
> wonder how likely it would be for that to break, and what the
> implications would be.  Most likely I guess it would break silently, but
> then get caught by the frame pointer cross-checking.  So a broken hint
> might not get noticed for a long time, but at least it (hopefully)
> wouldn't break reliable unwinding.
> 
> Also, inline asm can sometimes do stack hacks like
> "push;do_something;pop" which isn't visible to the toolchain.  But
> again, hopefully the frame pointer checking would fail and mark it
> unreliable.
> 
> So I do have some worries about DWARF, but the fact that it's getting
> "fact checked" by frame pointers might be sufficient.
> 
> 
> 2)
> 
> If I understand correctly, objtool is converting parts of DWARF to a new
> format which can then be read by the kernel.  In that case, please don't
> call it DWARF as that will cause a lot of confusion.
> 
> There are actually several similarities between your new format and ORC,
> which is also an objtool-created DWARF alternative.  It would be
> interesting to see if they could be combined somehow.
> 
> 
> 3)
> 
> Objtool has become an integral part of x86-64, due to security and
> performance features and toolchain workarounds.
> 
> Not *all* of its features require the full "branch validation" which
> follows all code paths -- and was the hardest part to get right -- but
> several features *do* need that: stack validation, ORC, uaccess
> validation, noinstr validation.
> 
> Objtool has been picking up a lot of steam (and features) lately, with
> more features currently in active development.  And lately there have
> been renewed patches for porting it to powerpc and arm64 (and rumors of
> s390).
> 
> If arm64 ever wants one of those features -- particularly a "branch
> validation" based feature -- I think it would make more sense to just do
> the stack validation in objtool, rather than the DWARF supplementation
> approach.
> 
> Just to give an idea of what objtool already supports and how useful it
> has become for x86, here's an excerpt from some documentation I've been
> working on, since I'm in the middle of rewriting the interface to make
> it more modular.  This is a list of all its current features:
> 
> 
> Features
> --------
> 
> Objtool has the following features:
> 
> 
> - Stack unwinding metadata validation -- useful for helping to ensure
>   stack traces are reliable for live patching
> 
> - ORC unwinder metadata generation -- a faster and more precise
>   alternative to frame pointer based unwinding
> 
> - Retpoline validation -- ensures that all indirect calls go through
>   retpoline thunks, for Spectre v2 mitigations
> 
> - Retpoline call site annotation -- annotates all retpoline thunk call
>   sites, enabling the kernel to patch them inline, to prevent "thunk
>   funneling" for both security and performance reasons
> 
> - Non-instrumentation validation -- validates non-instrumentable
>   ("noinstr") code rules, preventing unexpected instrumentation in
>   low-level C entry code
> 
> - Static call annotation -- annotates static call sites, enabling the
>   kernel to implement inline static calls, a faster alternative to some
>   indirect branches
> 
> - Uaccess validation -- validates uaccess rules for a proper safe
>   implementation of Supervisor Mode Access Protection (SMAP)
> 
> - Straight Line Speculation validation -- validates certain SLS
>   mitigations
> 
> - Indirect Branch Tracking validation -- validates Intel CET IBT rules
>   to ensure that all functions referenced by function pointers have
>   corresponding ENDBR instructions
> 
> - Indirect Branch Tracking annotation -- annotates unused ENDBR
>   instruction sites, enabling the kernel to "seal" them (replace them
>   with NOPs) to further harden IBT
> 
> - Function entry annotation -- annotates function entries, enabling
>   kernel function tracing
> 
> - Other toolchain hacks which will go unmentioned at this time...
>
Josh Poimboeuf April 16, 2022, 12:56 a.m. UTC | #15
On Tue, Apr 12, 2022 at 04:32:22PM +0800, Chen Zhongjin wrote:
> By the way, I was thinking about a corner case, because arm64 CALL
> instruction won't push LR onto stack atomically as x86. Before push LR, FP
> to save frame there still can be some instructions such as bti, paciasp. If
> an irq happens here, the stack frame is not constructed so the FP unwinder
> will omit this function and provides a wrong stack trace to livepatch.
> 
> It's just a guess and I have not built the test case. But I think it's a
> defect on arm64 that FP unwinder can't work properly on prologue and
> epilogue. Do you have any idea about this?

x86 has similar issues with frame pointers, if for example preemption or
page fault exception occurs in a leaf function, or in a function
prologue or epilogue, before or after the frame pointer setup.

This issue is solved by the "reliable" unwinder which detects
irqs/exceptions on the stack and reports the stack as unreliable.
Josh Poimboeuf April 16, 2022, 1:07 a.m. UTC | #16
On Mon, Apr 11, 2022 at 12:18:13PM -0500, Madhavan T. Venkataraman wrote:
> > There are actually several similarities between your new format and ORC,
> > which is also an objtool-created DWARF alternative.  It would be
> > interesting to see if they could be combined somehow.
> > 
> 
> I will certainly look into it. So, if I decide to merge the two, I might want
> to make a minor change to the ORC structure. Would that be OK with you?

Yes, in fact I would expect it, since ORC is quite x86-specific at the
moment.  So it would need some abstractions to make it more multi-arch
friendly.
Chen Zhongjin April 18, 2022, 12:28 p.m. UTC | #17
Hi Josh,

IIUC, ORC on x86 can make reliable stack unwind for this scenario
because objtool validates BP state.

I'm thinking that on arm64 there's no guarantee that LR will be pushed
onto stack. When we meet similar scenario on arm64, we should recover
(LR, FP) on pt_regs and continue to unwind the stack. And this is
reliable only after we validate (LR, FP).

So should we track LR on arm64 additionally as track BP on x86? Or can
we just treat (LR, FP) as a pair? because as I know they are always set
up together.

On 2022/4/16 8:56, Josh Poimboeuf wrote:
> On Tue, Apr 12, 2022 at 04:32:22PM +0800, Chen Zhongjin wrote:
>> By the way, I was thinking about a corner case, because arm64 CALL
>> instruction won't push LR onto stack atomically as x86. Before push LR, FP
>> to save frame there still can be some instructions such as bti, paciasp. If
>> an irq happens here, the stack frame is not constructed so the FP unwinder
>> will omit this function and provides a wrong stack trace to livepatch.
>>
>> It's just a guess and I have not built the test case. But I think it's a
>> defect on arm64 that FP unwinder can't work properly on prologue and
>> epilogue. Do you have any idea about this?
> 
> x86 has similar issues with frame pointers, if for example preemption or
> page fault exception occurs in a leaf function, or in a function
> prologue or epilogue, before or after the frame pointer setup.
> 
> This issue is solved by the "reliable" unwinder which detects
> irqs/exceptions on the stack and reports the stack as unreliable.
>
Josh Poimboeuf April 18, 2022, 4:11 p.m. UTC | #18
On Mon, Apr 18, 2022 at 08:28:33PM +0800, Chen Zhongjin wrote:
> Hi Josh,
> 
> IIUC, ORC on x86 can make reliable stack unwind for this scenario
> because objtool validates BP state.
> 
> I'm thinking that on arm64 there's no guarantee that LR will be pushed
> onto stack. When we meet similar scenario on arm64, we should recover
> (LR, FP) on pt_regs and continue to unwind the stack. And this is
> reliable only after we validate (LR, FP).
> 
> So should we track LR on arm64 additionally as track BP on x86? Or can
> we just treat (LR, FP) as a pair? because as I know they are always set
> up together.

Does the arm64 unwinder have a way to detect kernel pt_regs on the
stack?  If so, the simplest solution is to mark all stacks with kernel
regs as unreliable.  That's what the x86 FP unwinder does.
Madhavan T. Venkataraman April 18, 2022, 6:38 p.m. UTC | #19
On 4/18/22 11:11, Josh Poimboeuf wrote:
> On Mon, Apr 18, 2022 at 08:28:33PM +0800, Chen Zhongjin wrote:
>> Hi Josh,
>>
>> IIUC, ORC on x86 can make reliable stack unwind for this scenario
>> because objtool validates BP state.
>>
>> I'm thinking that on arm64 there's no guarantee that LR will be pushed
>> onto stack. When we meet similar scenario on arm64, we should recover
>> (LR, FP) on pt_regs and continue to unwind the stack. And this is
>> reliable only after we validate (LR, FP).
>>
>> So should we track LR on arm64 additionally as track BP on x86? Or can
>> we just treat (LR, FP) as a pair? because as I know they are always set
>> up together.
> 
> Does the arm64 unwinder have a way to detect kernel pt_regs on the
> stack?  If so, the simplest solution is to mark all stacks with kernel
> regs as unreliable.  That's what the x86 FP unwinder does.
> 

AFAICT, only the task pt_regs can be detected. For detecting the other pt_regs,
we would have to set a bit in the FP. IIRC, I had a proposal where I set the LSB in
the FP stored on the stack. The arm64 folks did not like that approach as it
would be indistinguishable from a corrupted FP, however unlikely the corruption
may be.

Unwind hints can be used for these cases to unwind reliably through them. That is
probably the current thinking. Mark Rutland can confirm.

Madhavan