diff mbox series

arm64: Apply dynamic shadow call stack patching in two passes

Message ID 20221213142849.1629026-1-ardb@kernel.org (mailing list archive)
State New, archived
Headers show
Series arm64: Apply dynamic shadow call stack patching in two passes | expand

Commit Message

Ard Biesheuvel Dec. 13, 2022, 2:28 p.m. UTC
Code patching for the dynamically enabled shadow call stack comes down
to finding PACIASP and AUTIASP instructions -which behave as NOPs on
cores that do not implement pointer authentication- and converting them
into shadow call stack pushes and pops, respectively.

Due to past bad experiences with the highly complex and overengineered
DWARF standard that describes the unwind metadata that we are using to
locate these instructions, let's make this patching logic a little bit
more robust so that any issues with the unwind metadata detected at boot
time can de dealt with gracefully.

The DWARF annotations that are used for this are emitted at function
granularity, and due to the fact that the instructions we are patching
will simply behave as NOPs if left unpatched, we can abort on errors as
long as we don't leave any functions in a half-patched state.

So do a dry run of each FDE frame (covering a single function) before
performing the actual patching, and give up if the DWARF metadata cannot
be understood.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/patch-scs.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Sami Tolvanen Dec. 15, 2022, 5:57 p.m. UTC | #1
On Tue, Dec 13, 2022 at 6:29 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Code patching for the dynamically enabled shadow call stack comes down
> to finding PACIASP and AUTIASP instructions -which behave as NOPs on
> cores that do not implement pointer authentication- and converting them
> into shadow call stack pushes and pops, respectively.
>
> Due to past bad experiences with the highly complex and overengineered
> DWARF standard that describes the unwind metadata that we are using to
> locate these instructions, let's make this patching logic a little bit
> more robust so that any issues with the unwind metadata detected at boot
> time can de dealt with gracefully.
>
> The DWARF annotations that are used for this are emitted at function
> granularity, and due to the fact that the instructions we are patching
> will simply behave as NOPs if left unpatched, we can abort on errors as
> long as we don't leave any functions in a half-patched state.
>
> So do a dry run of each FDE frame (covering a single function) before
> performing the actual patching, and give up if the DWARF metadata cannot
> be understood.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/kernel/patch-scs.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Sami Tolvanen <samitolvanen@google.com>

Sami
Will Deacon Jan. 26, 2023, 4:19 p.m. UTC | #2
[+Catalin]

On Tue, Dec 13, 2022 at 03:28:49PM +0100, Ard Biesheuvel wrote:
> Code patching for the dynamically enabled shadow call stack comes down
> to finding PACIASP and AUTIASP instructions -which behave as NOPs on
> cores that do not implement pointer authentication- and converting them
> into shadow call stack pushes and pops, respectively.
> 
> Due to past bad experiences with the highly complex and overengineered
> DWARF standard that describes the unwind metadata that we are using to
> locate these instructions, let's make this patching logic a little bit
> more robust so that any issues with the unwind metadata detected at boot
> time can de dealt with gracefully.
> 
> The DWARF annotations that are used for this are emitted at function
> granularity, and due to the fact that the instructions we are patching
> will simply behave as NOPs if left unpatched, we can abort on errors as
> long as we don't leave any functions in a half-patched state.
> 
> So do a dry run of each FDE frame (covering a single function) before
> performing the actual patching, and give up if the DWARF metadata cannot
> be understood.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/kernel/patch-scs.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will
Catalin Marinas Jan. 26, 2023, 6:49 p.m. UTC | #3
On Tue, 13 Dec 2022 15:28:49 +0100, Ard Biesheuvel wrote:
> Code patching for the dynamically enabled shadow call stack comes down
> to finding PACIASP and AUTIASP instructions -which behave as NOPs on
> cores that do not implement pointer authentication- and converting them
> into shadow call stack pushes and pops, respectively.
> 
> Due to past bad experiences with the highly complex and overengineered
> DWARF standard that describes the unwind metadata that we are using to
> locate these instructions, let's make this patching logic a little bit
> more robust so that any issues with the unwind metadata detected at boot
> time can de dealt with gracefully.
> 
> [...]

Applied to arm64 (for-next/misc), thanks!

[1/1] arm64: Apply dynamic shadow call stack patching in two passes
      https://git.kernel.org/arm64/c/54c968bec344
Linus Torvalds Jan. 26, 2023, 7:07 p.m. UTC | #4
On Tue, Dec 13, 2022 at 6:29 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Due to past bad experiences with the highly complex and overengineered
> DWARF standard that describes the unwind metadata that we are using to
> locate these instructions [..]

Just a note on why I distrust DWARF data so much - it's not so much
because it's complex and overengineered (although I agree it is),
because that's not anything new.

It's because it's almost entirely *untested*.

Compiler code generation bugs are a real issue, and happen
semi-regularly. Are they _common_? No. But they are an issue, and we
were chasing one just a couple of weeks ago.

But code generation bugs are things that get very fundamentally
tested. When the compiler generates bad code, every single user of
that compiler will effectively be testing it.

Yes, we still hit them, often because the kernel does something
unusual (ie the last one was apparently only triggered with a
combination of sanitizer and coverage flags), so "test coverage" isn't
any kind of guarantee, but it's there.

But DWARF debug info? It can be *completely* wrong, and in 99.9% of
all cases nobody will ever notice in any testing. Most of the time
it's not used at all, and even when it is used (whether exception
handling or for actually doing debuggers) it's used only for a tiny
tiny percentage of the whole thing.

So it's not just that we've had bad experiences with it in the past. I
feel that the problem goes deeper than that - the lack of testing
means that it's fundamentally not trustworthy.

Am I exaggerating a bit? Sure. Compilers have (extensive) test-suites
for debug info too. But I do think that coverage tends to be much less
than "everybody relies on it being right" like for normal code
generation.

End result: I would love for us to have some additional security nets
in this area.

Doing the checks as a dry-run phase is good so that any possible
issues hopefully get caught before the code actively rewrites things,
but I'd still be even happier if this was a build-time thing and part
of objtool or something.

That way the dwarf info would also be validated even when it's not
actively used - which is a large point about my "this has seldom been
tested" issue with it.

Because I *think* this dry-run thing is only run of the (few) arm64
cores that actually have PACIASP/AUTIASP. No?

Without test coverage, bugs happen.

                   Linus
Ard Biesheuvel Jan. 26, 2023, 7:35 p.m. UTC | #5
On Thu, 26 Jan 2023 at 20:08, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Dec 13, 2022 at 6:29 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Due to past bad experiences with the highly complex and overengineered
> > DWARF standard that describes the unwind metadata that we are using to
> > locate these instructions [..]
>
> Just a note on why I distrust DWARF data so much - it's not so much
> because it's complex and overengineered (although I agree it is),
> because that's not anything new.
>
> It's because it's almost entirely *untested*.
>
> Compiler code generation bugs are a real issue, and happen
> semi-regularly. Are they _common_? No. But they are an issue, and we
> were chasing one just a couple of weeks ago.
>
> But code generation bugs are things that get very fundamentally
> tested. When the compiler generates bad code, every single user of
> that compiler will effectively be testing it.
>
> Yes, we still hit them, often because the kernel does something
> unusual (ie the last one was apparently only triggered with a
> combination of sanitizer and coverage flags), so "test coverage" isn't
> any kind of guarantee, but it's there.
>
> But DWARF debug info? It can be *completely* wrong, and in 99.9% of
> all cases nobody will ever notice in any testing. Most of the time
> it's not used at all, and even when it is used (whether exception
> handling or for actually doing debuggers) it's used only for a tiny
> tiny percentage of the whole thing.
>
> So it's not just that we've had bad experiences with it in the past. I
> feel that the problem goes deeper than that - the lack of testing
> means that it's fundamentally not trustworthy.
>
> Am I exaggerating a bit? Sure. Compilers have (extensive) test-suites
> for debug info too. But I do think that coverage tends to be much less
> than "everybody relies on it being right" like for normal code
> generation.
>
> End result: I would love for us to have some additional security nets
> in this area.
>
> Doing the checks as a dry-run phase is good so that any possible
> issues hopefully get caught before the code actively rewrites things,
> but I'd still be even happier if this was a build-time thing and part
> of objtool or something.
>
> That way the dwarf info would also be validated even when it's not
> actively used - which is a large point about my "this has seldom been
> tested" issue with it.
>
> Because I *think* this dry-run thing is only run of the (few) arm64
> cores that actually have PACIASP/AUTIASP. No?
>

No, the other way around. On cores where PACIASP/AUTIASP execute as
NOPs, they are replaced with shadow call stack pushes and pops. This
is preferred over having both shadow call stack and PAC in the same
[generic] kernel image, as the performance hit of the shadow call
stack is not justified on cores that implement PAC.
Linus Torvalds Jan. 26, 2023, 8:08 p.m. UTC | #6
On Thu, Jan 26, 2023 at 11:35 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Because I *think* this dry-run thing is only run of the (few) arm64
> > cores that actually have PACIASP/AUTIASP. No?
>
> No, the other way around. On cores where PACIASP/AUTIASP execute as
> NOPs, they are replaced with shadow call stack pushes and pops.

Ahh, good, so at least this gets test coverage.

            Linus
diff mbox series

Patch

diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
index 1b3da02d5b741bc3..a1fe4b4ff5917670 100644
--- a/arch/arm64/kernel/patch-scs.c
+++ b/arch/arm64/kernel/patch-scs.c
@@ -130,7 +130,8 @@  struct eh_frame {
 
 static int noinstr scs_handle_fde_frame(const struct eh_frame *frame,
 					bool fde_has_augmentation_data,
-					int code_alignment_factor)
+					int code_alignment_factor,
+					bool dry_run)
 {
 	int size = frame->size - offsetof(struct eh_frame, opcodes) + 4;
 	u64 loc = (u64)offset_to_ptr(&frame->initial_loc);
@@ -184,7 +185,8 @@  static int noinstr scs_handle_fde_frame(const struct eh_frame *frame,
 			break;
 
 		case DW_CFA_negate_ra_state:
-			scs_patch_loc(loc - 4);
+			if (!dry_run)
+				scs_patch_loc(loc - 4);
 			break;
 
 		case 0x40 ... 0x7f:
@@ -235,9 +237,12 @@  int noinstr scs_patch(const u8 eh_frame[], int size)
 		} else {
 			ret = scs_handle_fde_frame(frame,
 						   fde_has_augmentation_data,
-						   code_alignment_factor);
+						   code_alignment_factor,
+						   true);
 			if (ret)
 				return ret;
+			scs_handle_fde_frame(frame, fde_has_augmentation_data,
+					     code_alignment_factor, false);
 		}
 
 		p += sizeof(frame->size) + frame->size;