mbox series

[v7,00/23] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth

Message ID 20210208095732.3267263-1-maz@kernel.org (mailing list archive)
Headers show
Series arm64: Early CPU feature override, and applications to VHE, BTI and PAuth | expand

Message

Marc Zyngier Feb. 8, 2021, 9:57 a.m. UTC
It recently came to light that there is a need to be able to override
some CPU features very early on, before the kernel is fully up and
running. The reasons for this range from specific feature support
(such as using Protected KVM on VHE HW, which is the main motivation
for this work) to errata workaround (a feature is broken on a CPU and
needs to be turned off, or rather not enabled).

This series tries to offer a limited framework for this kind of
problems, by allowing a set of options to be passed on the
command-line and altering the feature set that the cpufeature
subsystem exposes to the rest of the kernel. Note that this doesn't
change anything for code that directly uses the CPU ID registers.

The series completely changes the way a VHE-capable system boots, by
*always* booting non-VHE first, and then upgrading to VHE when deemed
capable. Although it sounds scary, this is actually simple to
implement (and I wish I had done that five years ago). The "upgrade to
VHE" path is then conditioned on the VHE feature not being disabled
from the command-line.

Said command-line parsing borrows a lot from the kaslr code, and
subsequently allows the "nokaslr" option to be moved to the new
infrastructure (though it all looks a bit... odd).

Further patches now add support for disabling BTI and PAuth, the
latter being based on an initial series by Srinivas Ramana[0]. There
is some ongoing discussions about being able to disable MTE, but no
clear resolution on that subject yet

WARNING: this series breaks Apple M1 badly, as it is stuck in VHE
mode. The last patch in this series papers over the problem, but it
*isn't* a candidate for merging yet.

This has been tested on multiple VHE and non-VHE systems.

Branch available at [7].

* From v6 [6]:
  - Greatly simplify SPE setup with VHE
  - Simplify option parsing by reusing some of the helpers user by
    parse_args(). The whole function cannot be used though, as it
    does things that can't be done at the point where we parse the
    overrides.
  - Add a patch allowing M1 CPUs to boot. This patch shouldn't be
    merged until we decide to support this non-architectural behaviour.

* From v5 [5]:
  - Turn most __initdata into __initconst
  - Ensure that all strings are part of the __initconst section.
    This is a bit ugly, but saves memory once up and running
  - Make overrides __ro_after_init
  - Change the command-line parsing so that the same feature can
    be overridden multiple times, with the expected left-to-right
    parsing order being respected
  - Handle all space-like characters as option delimiters
  - Collected Acks, RBs and TBs

* From v4 [4]:
  - Documentation fixes
  - Moved the val/mask pair into a arm64_ftr_override structure,
    leading to simpler code
  - All arm64_ftr_reg now have a default override, which simplifies
    the code a bit further
  - Dropped some of the "const" attributes
  - Renamed init_shadow_regs() to init_feature_override()
  - Renamed struct reg_desc to struct ftr_set_desc
  - Refactored command-line parsing
  - Simplified handling of VHE being disabled on the cmdline
  - Turn EL1 S1 MMU off on switch to VHE
  - HVC_VHE_RESTART now returns an error code on failure
  - Added missing asmlinkage and dummy prototypes
  - Collected Acks and RBs from David, Catalin and Suzuki

* From v3 [3]:
  - Fixed the VHE_RESTART stub (duh!)
  - Switched to using arm64_ftr_safe_value() instead of the user
    provided value
  - Per-feature override warning

* From v2 [2]:
  - Simplify the VHE_RESTART stub
  - Fixed a number of spelling mistakes, and hopefully introduced a
    few more
  - Override features in __read_sysreg_by_encoding()
  - Allow both BTI and PAuth to be overridden on the command line
  - Rebased on -rc3

* From v1 [1]:
  - Fix SPE init on VHE when EL2 doesn't own SPE
  - Fix re-init when KASLR is used
  - Handle the resume path
  - Rebased to 5.11-rc2

[0] https://lore.kernel.org/r/1610152163-16554-1-git-send-email-sramana@codeaurora.org
[1] https://lore.kernel.org/r/20201228104958.1848833-1-maz@kernel.org
[2] https://lore.kernel.org/r/20210104135011.2063104-1-maz@kernel.org
[3] https://lore.kernel.org/r/20210111132811.2455113-1-maz@kernel.org
[4] https://lore.kernel.org/r/20210118094533.2874082-1-maz@kernel.org
[5] https://lore.kernel.org/r/20210125105019.2946057-1-maz@kernel.org
[6] https://lore.kernel.org/r/20210201115637.3123740-1-maz@kernel.org
[7] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/arm64-early-cpufeature

Marc Zyngier (22):
  arm64: Fix labels in el2_setup macros
  arm64: Fix outdated TCR setup comment
  arm64: Turn the MMU-on sequence into a macro
  arm64: Provide an 'upgrade to VHE' stub hypercall
  arm64: Initialise as nVHE before switching to VHE
  arm64: Drop early setting of MDSCR_EL2.TPMS
  arm64: Move VHE-specific SPE setup to mutate_to_vhe()
  arm64: Simplify init_el2_state to be non-VHE only
  arm64: Move SCTLR_EL1 initialisation to EL-agnostic code
  arm64: cpufeature: Add global feature override facility
  arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()
  arm64: Extract early FDT mapping from kaslr_early_init()
  arm64: cpufeature: Add an early command-line cpufeature override
    facility
  arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command
    line
  arm64: Honor VHE being disabled from the command-line
  arm64: Add an aliasing facility for the idreg override
  arm64: Make kvm-arm.mode={nvhe, protected} an alias of
    id_aa64mmfr1.vh=0
  KVM: arm64: Document HVC_VHE_RESTART stub hypercall
  arm64: Move "nokaslr" over to the early cpufeature infrastructure
  arm64: cpufeatures: Allow disabling of BTI from the command-line
  arm64: cpufeatures: Allow disabling of Pointer Auth from the
    command-line
  [DO NOT MERGE] arm64: Cope with CPUs stuck in VHE mode

Srinivas Ramana (1):
  arm64: Defer enabling pointer authentication on boot core

 .../admin-guide/kernel-parameters.txt         |   9 +
 Documentation/virt/kvm/arm/hyp-abi.rst        |   9 +
 arch/arm64/include/asm/assembler.h            |  17 ++
 arch/arm64/include/asm/cpufeature.h           |  11 +
 arch/arm64/include/asm/el2_setup.h            |  60 ++---
 arch/arm64/include/asm/pointer_auth.h         |  10 +
 arch/arm64/include/asm/setup.h                |  11 +
 arch/arm64/include/asm/stackprotector.h       |   1 +
 arch/arm64/include/asm/virt.h                 |   7 +-
 arch/arm64/kernel/Makefile                    |   2 +-
 arch/arm64/kernel/asm-offsets.c               |   3 +
 arch/arm64/kernel/cpufeature.c                |  73 +++++-
 arch/arm64/kernel/head.S                      |  94 +++-----
 arch/arm64/kernel/hyp-stub.S                  | 141 +++++++++++-
 arch/arm64/kernel/idreg-override.c            | 216 ++++++++++++++++++
 arch/arm64/kernel/kaslr.c                     |  43 +---
 arch/arm64/kernel/setup.c                     |  15 ++
 arch/arm64/kernel/sleep.S                     |   1 +
 arch/arm64/kvm/arm.c                          |   3 +
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |   2 +-
 arch/arm64/mm/mmu.c                           |   2 +-
 arch/arm64/mm/proc.S                          |  16 +-
 22 files changed, 576 insertions(+), 170 deletions(-)
 create mode 100644 arch/arm64/include/asm/setup.h
 create mode 100644 arch/arm64/kernel/idreg-override.c

Comments

Will Deacon Feb. 8, 2021, 2:32 p.m. UTC | #1
Hi Marc,

On Mon, Feb 08, 2021 at 09:57:09AM +0000, Marc Zyngier wrote:
> It recently came to light that there is a need to be able to override
> some CPU features very early on, before the kernel is fully up and
> running. The reasons for this range from specific feature support
> (such as using Protected KVM on VHE HW, which is the main motivation
> for this work) to errata workaround (a feature is broken on a CPU and
> needs to be turned off, or rather not enabled).
> 
> This series tries to offer a limited framework for this kind of
> problems, by allowing a set of options to be passed on the
> command-line and altering the feature set that the cpufeature
> subsystem exposes to the rest of the kernel. Note that this doesn't
> change anything for code that directly uses the CPU ID registers.

I applied this locally, but I'm seeing consistent boot failure under QEMU when
KASAN is enabled. I tried sprinkling some __no_sanitize_address annotations
around (see below) but it didn't help. The culprit appears to be
early_fdt_map(), but looking a bit more closely, I'm really nervous about the
way we call into C functions from __primary_switched. Remember -- this code
runs _twice_ when KASLR is active: before and after the randomization. This
also means that any memory writes the first time around can be lost due to
the D-cache invalidation when (re-)creating the kernel page-tables.

Will

--->8

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index dffb16682330..751ed55261b5 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -195,7 +195,7 @@ static __init void parse_cmdline(void)
 /* Keep checkers quiet */
 void init_feature_override(void);
 
-asmlinkage void __init init_feature_override(void)
+asmlinkage void __init __no_sanitize_address init_feature_override(void)
 {
        int i;
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 61845c0821d9..33581de05d2e 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -170,12 +170,12 @@ static void __init smp_build_mpidr_hash(void)
 
 static void *early_fdt_ptr __initdata;
 
-void __init *get_early_fdt_ptr(void)
+void __init __no_sanitize_address *get_early_fdt_ptr(void)
 {
        return early_fdt_ptr;
 }
 
-asmlinkage void __init early_fdt_map(u64 dt_phys)
+asmlinkage void __init __no_sanitize_address early_fdt_map(u64 dt_phys)
 {
        int fdt_size;
Ard Biesheuvel Feb. 8, 2021, 2:40 p.m. UTC | #2
On Mon, 8 Feb 2021 at 15:32, Will Deacon <will@kernel.org> wrote:
>
> Hi Marc,
>
> On Mon, Feb 08, 2021 at 09:57:09AM +0000, Marc Zyngier wrote:
> > It recently came to light that there is a need to be able to override
> > some CPU features very early on, before the kernel is fully up and
> > running. The reasons for this range from specific feature support
> > (such as using Protected KVM on VHE HW, which is the main motivation
> > for this work) to errata workaround (a feature is broken on a CPU and
> > needs to be turned off, or rather not enabled).
> >
> > This series tries to offer a limited framework for this kind of
> > problems, by allowing a set of options to be passed on the
> > command-line and altering the feature set that the cpufeature
> > subsystem exposes to the rest of the kernel. Note that this doesn't
> > change anything for code that directly uses the CPU ID registers.
>
> I applied this locally, but I'm seeing consistent boot failure under QEMU when
> KASAN is enabled. I tried sprinkling some __no_sanitize_address annotations
> around (see below) but it didn't help. The culprit appears to be
> early_fdt_map(), but looking a bit more closely, I'm really nervous about the
> way we call into C functions from __primary_switched. Remember -- this code
> runs _twice_ when KASLR is active: before and after the randomization. This
> also means that any memory writes the first time around can be lost due to
> the D-cache invalidation when (re-)creating the kernel page-tables.
>

Not just cache invalidation - BSS gets wiped again as well.
Marc Zyngier Feb. 8, 2021, 3:02 p.m. UTC | #3
Hi Will,

On 2021-02-08 14:32, Will Deacon wrote:
> Hi Marc,
> 
> On Mon, Feb 08, 2021 at 09:57:09AM +0000, Marc Zyngier wrote:
>> It recently came to light that there is a need to be able to override
>> some CPU features very early on, before the kernel is fully up and
>> running. The reasons for this range from specific feature support
>> (such as using Protected KVM on VHE HW, which is the main motivation
>> for this work) to errata workaround (a feature is broken on a CPU and
>> needs to be turned off, or rather not enabled).
>> 
>> This series tries to offer a limited framework for this kind of
>> problems, by allowing a set of options to be passed on the
>> command-line and altering the feature set that the cpufeature
>> subsystem exposes to the rest of the kernel. Note that this doesn't
>> change anything for code that directly uses the CPU ID registers.
> 
> I applied this locally, but I'm seeing consistent boot failure under 
> QEMU when
> KASAN is enabled. I tried sprinkling some __no_sanitize_address 
> annotations
> around (see below) but it didn't help. The culprit appears to be
> early_fdt_map(), but looking a bit more closely, I'm really nervous 
> about the
> way we call into C functions from __primary_switched. Remember -- this 
> code
> runs _twice_ when KASLR is active: before and after the randomization. 
> This
> also means that any memory writes the first time around can be lost due 
> to
> the D-cache invalidation when (re-)creating the kernel page-tables.

Well, we already call into C functions with KASLR, and nothing explodes
with that, so I must be doing something else wrong.

I do have cache maintenance for the writes to the shadow registers, so 
that
part should be fine. But I think I'm missing some cache maintenance 
around
the FDT base itself, and I wonder what happens when we go around the 
loop.

I'll chase this down now.

Thanks for the heads up.

         M.
Marc Zyngier Feb. 8, 2021, 4:30 p.m. UTC | #4
On 2021-02-08 14:32, Will Deacon wrote:
> Hi Marc,
> 
> On Mon, Feb 08, 2021 at 09:57:09AM +0000, Marc Zyngier wrote:
>> It recently came to light that there is a need to be able to override
>> some CPU features very early on, before the kernel is fully up and
>> running. The reasons for this range from specific feature support
>> (such as using Protected KVM on VHE HW, which is the main motivation
>> for this work) to errata workaround (a feature is broken on a CPU and
>> needs to be turned off, or rather not enabled).
>> 
>> This series tries to offer a limited framework for this kind of
>> problems, by allowing a set of options to be passed on the
>> command-line and altering the feature set that the cpufeature
>> subsystem exposes to the rest of the kernel. Note that this doesn't
>> change anything for code that directly uses the CPU ID registers.
> 
> I applied this locally, but I'm seeing consistent boot failure under 
> QEMU when
> KASAN is enabled. I tried sprinkling some __no_sanitize_address 
> annotations
> around (see below) but it didn't help. The culprit appears to be
> early_fdt_map(), but looking a bit more closely, I'm really nervous 
> about the
> way we call into C functions from __primary_switched. Remember -- this 
> code
> runs _twice_ when KASLR is active: before and after the randomization. 
> This
> also means that any memory writes the first time around can be lost due 
> to
> the D-cache invalidation when (re-)creating the kernel page-tables.

Nailed it. Of course, before anything starts writing from C code, we 
need
to have initialised KASAN. kasan_init.c itself is compiled without any
address sanitising, but we can't repaint all the stuff that is called
from early_fdt_map() (quite a lot).

So the natural thing to do is to keep kasan_early_init() as the first
thing we do in C code, and everything falls from that.

Any chance you could try that on top and see if that cures your problem?
If that works for you, I'll push an updates series.

Thanks,

         M.

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index bce66d6bda74..09a5b603c950 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -429,13 +429,13 @@ SYM_FUNC_START_LOCAL(__primary_switched)
  	bl	__pi_memset
  	dsb	ishst				// Make zero page visible to PTW

+#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
+	bl	kasan_early_init
+#endif
  	mov	x0, x21				// pass FDT address in x0
  	bl	early_fdt_map			// Try mapping the FDT early
  	bl	init_feature_override
  	bl	switch_to_vhe
-#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
-	bl	kasan_early_init
-#endif
  #ifdef CONFIG_RANDOMIZE_BASE
  	tst	x23, ~(MIN_KIMG_ALIGN - 1)	// already running randomized?
  	b.ne	0f