Message ID | 20221213142849.1629026-1-ardb@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Apply dynamic shadow call stack patching in two passes | expand |
On Tue, Dec 13, 2022 at 6:29 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > Code patching for the dynamically enabled shadow call stack comes down > to finding PACIASP and AUTIASP instructions -which behave as NOPs on > cores that do not implement pointer authentication- and converting them > into shadow call stack pushes and pops, respectively. > > Due to past bad experiences with the highly complex and overengineered > DWARF standard that describes the unwind metadata that we are using to > locate these instructions, let's make this patching logic a little bit > more robust so that any issues with the unwind metadata detected at boot > time can de dealt with gracefully. > > The DWARF annotations that are used for this are emitted at function > granularity, and due to the fact that the instructions we are patching > will simply behave as NOPs if left unpatched, we can abort on errors as > long as we don't leave any functions in a half-patched state. > > So do a dry run of each FDE frame (covering a single function) before > performing the actual patching, and give up if the DWARF metadata cannot > be understood. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/arm64/kernel/patch-scs.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Sami
[+Catalin] On Tue, Dec 13, 2022 at 03:28:49PM +0100, Ard Biesheuvel wrote: > Code patching for the dynamically enabled shadow call stack comes down > to finding PACIASP and AUTIASP instructions -which behave as NOPs on > cores that do not implement pointer authentication- and converting them > into shadow call stack pushes and pops, respectively. > > Due to past bad experiences with the highly complex and overengineered > DWARF standard that describes the unwind metadata that we are using to > locate these instructions, let's make this patching logic a little bit > more robust so that any issues with the unwind metadata detected at boot > time can de dealt with gracefully. > > The DWARF annotations that are used for this are emitted at function > granularity, and due to the fact that the instructions we are patching > will simply behave as NOPs if left unpatched, we can abort on errors as > long as we don't leave any functions in a half-patched state. > > So do a dry run of each FDE frame (covering a single function) before > performing the actual patching, and give up if the DWARF metadata cannot > be understood. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/arm64/kernel/patch-scs.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) Acked-by: Will Deacon <will@kernel.org> Will
On Tue, 13 Dec 2022 15:28:49 +0100, Ard Biesheuvel wrote: > Code patching for the dynamically enabled shadow call stack comes down > to finding PACIASP and AUTIASP instructions -which behave as NOPs on > cores that do not implement pointer authentication- and converting them > into shadow call stack pushes and pops, respectively. > > Due to past bad experiences with the highly complex and overengineered > DWARF standard that describes the unwind metadata that we are using to > locate these instructions, let's make this patching logic a little bit > more robust so that any issues with the unwind metadata detected at boot > time can de dealt with gracefully. > > [...] Applied to arm64 (for-next/misc), thanks! [1/1] arm64: Apply dynamic shadow call stack patching in two passes https://git.kernel.org/arm64/c/54c968bec344
On Tue, Dec 13, 2022 at 6:29 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > Due to past bad experiences with the highly complex and overengineered > DWARF standard that describes the unwind metadata that we are using to > locate these instructions [..] Just a note on why I distrust DWARF data so much - it's not so much because it's complex and overengineered (although I agree it is), because that's not anything new. It's because it's almost entirely *untested*. Compiler code generation bugs are a real issue, and happen semi-regularly. Are they _common_? No. But they are an issue, and we were chasing one just a couple of weeks ago. But code generation bugs are things that get very fundamentally tested. When the compiler generates bad code, every single user of that compiler will effectively be testing it. Yes, we still hit them, often because the kernel does something unusual (ie the last one was apparently only triggered with a combination of sanitizer and coverage flags), so "test coverage" isn't any kind of guarantee, but it's there. But DWARF debug info? It can be *completely* wrong, and in 99.9% of all cases nobody will ever notice in any testing. Most of the time it's not used at all, and even when it is used (whether exception handling or for actually doing debuggers) it's used only for a tiny tiny percentage of the whole thing. So it's not just that we've had bad experiences with it in the past. I feel that the problem goes deeper than that - the lack of testing means that it's fundamentally not trustworthy. Am I exaggerating a bit? Sure. Compilers have (extensive) test-suites for debug info too. But I do think that coverage tends to be much less than "everybody relies on it being right" like for normal code generation. End result: I would love for us to have some additional security nets in this area. Doing the checks as a dry-run phase is good so that any possible issues hopefully get caught before the code actively rewrites things, but I'd still be even happier if this was a build-time thing and part of objtool or something. That way the dwarf info would also be validated even when it's not actively used - which is a large point about my "this has seldom been tested" issue with it. Because I *think* this dry-run thing is only run of the (few) arm64 cores that actually have PACIASP/AUTIASP. No? Without test coverage, bugs happen. Linus
On Thu, 26 Jan 2023 at 20:08, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, Dec 13, 2022 at 6:29 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > Due to past bad experiences with the highly complex and overengineered > > DWARF standard that describes the unwind metadata that we are using to > > locate these instructions [..] > > Just a note on why I distrust DWARF data so much - it's not so much > because it's complex and overengineered (although I agree it is), > because that's not anything new. > > It's because it's almost entirely *untested*. > > Compiler code generation bugs are a real issue, and happen > semi-regularly. Are they _common_? No. But they are an issue, and we > were chasing one just a couple of weeks ago. > > But code generation bugs are things that get very fundamentally > tested. When the compiler generates bad code, every single user of > that compiler will effectively be testing it. > > Yes, we still hit them, often because the kernel does something > unusual (ie the last one was apparently only triggered with a > combination of sanitizer and coverage flags), so "test coverage" isn't > any kind of guarantee, but it's there. > > But DWARF debug info? It can be *completely* wrong, and in 99.9% of > all cases nobody will ever notice in any testing. Most of the time > it's not used at all, and even when it is used (whether exception > handling or for actually doing debuggers) it's used only for a tiny > tiny percentage of the whole thing. > > So it's not just that we've had bad experiences with it in the past. I > feel that the problem goes deeper than that - the lack of testing > means that it's fundamentally not trustworthy. > > Am I exaggerating a bit? Sure. Compilers have (extensive) test-suites > for debug info too. But I do think that coverage tends to be much less > than "everybody relies on it being right" like for normal code > generation. > > End result: I would love for us to have some additional security nets > in this area. > > Doing the checks as a dry-run phase is good so that any possible > issues hopefully get caught before the code actively rewrites things, > but I'd still be even happier if this was a build-time thing and part > of objtool or something. > > That way the dwarf info would also be validated even when it's not > actively used - which is a large point about my "this has seldom been > tested" issue with it. > > Because I *think* this dry-run thing is only run of the (few) arm64 > cores that actually have PACIASP/AUTIASP. No? > No, the other way around. On cores where PACIASP/AUTIASP execute as NOPs, they are replaced with shadow call stack pushes and pops. This is preferred over having both shadow call stack and PAC in the same [generic] kernel image, as the performance hit of the shadow call stack is not justified on cores that implement PAC.
On Thu, Jan 26, 2023 at 11:35 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > Because I *think* this dry-run thing is only run of the (few) arm64 > > cores that actually have PACIASP/AUTIASP. No? > > No, the other way around. On cores where PACIASP/AUTIASP execute as > NOPs, they are replaced with shadow call stack pushes and pops. Ahh, good, so at least this gets test coverage. Linus
diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c index 1b3da02d5b741bc3..a1fe4b4ff5917670 100644 --- a/arch/arm64/kernel/patch-scs.c +++ b/arch/arm64/kernel/patch-scs.c @@ -130,7 +130,8 @@ struct eh_frame { static int noinstr scs_handle_fde_frame(const struct eh_frame *frame, bool fde_has_augmentation_data, - int code_alignment_factor) + int code_alignment_factor, + bool dry_run) { int size = frame->size - offsetof(struct eh_frame, opcodes) + 4; u64 loc = (u64)offset_to_ptr(&frame->initial_loc); @@ -184,7 +185,8 @@ static int noinstr scs_handle_fde_frame(const struct eh_frame *frame, break; case DW_CFA_negate_ra_state: - scs_patch_loc(loc - 4); + if (!dry_run) + scs_patch_loc(loc - 4); break; case 0x40 ... 0x7f: @@ -235,9 +237,12 @@ int noinstr scs_patch(const u8 eh_frame[], int size) } else { ret = scs_handle_fde_frame(frame, fde_has_augmentation_data, - code_alignment_factor); + code_alignment_factor, + true); if (ret) return ret; + scs_handle_fde_frame(frame, fde_has_augmentation_data, + code_alignment_factor, false); } p += sizeof(frame->size) + frame->size;
Code patching for the dynamically enabled shadow call stack comes down to finding PACIASP and AUTIASP instructions -which behave as NOPs on cores that do not implement pointer authentication- and converting them into shadow call stack pushes and pops, respectively. Due to past bad experiences with the highly complex and overengineered DWARF standard that describes the unwind metadata that we are using to locate these instructions, let's make this patching logic a little bit more robust so that any issues with the unwind metadata detected at boot time can de dealt with gracefully. The DWARF annotations that are used for this are emitted at function granularity, and due to the fact that the instructions we are patching will simply behave as NOPs if left unpatched, we can abort on errors as long as we don't leave any functions in a half-patched state. So do a dry run of each FDE frame (covering a single function) before performing the actual patching, and give up if the DWARF metadata cannot be understood. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/arm64/kernel/patch-scs.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)