diff mbox series

arm64 spectre-bhb backports break boot on stable kernels <= v5.4

Message ID 20220408120041.GB27685@willie-the-truck (mailing list archive)
State New, archived
Headers show
Series arm64 spectre-bhb backports break boot on stable kernels <= v5.4 | expand

Commit Message

Will Deacon April 8, 2022, noon UTC
Hi James,

Booting stable kernels <= v5.4 on arm64 with CONFIG_HARDEN_BRANCH_PREDICTOR=n
results in a NULL pointer dereference during boot due to kvm_get_hyp_vector()
dereferencing a NULL pointer from arm64_get_bp_hardening_data():

[    2.384444] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[    2.384461] pstate: 20400085 (nzCv daIf +PAN -UAO)
[    2.384472] pc : cpu_hyp_reinit+0x114/0x30c
[    2.384476] lr : cpu_hyp_reinit+0x80/0x30c
[    2.384477] sp : ffff800010013e20
[    2.384479] x29: ffff800010013e20 x28: ffff0000df033f00
[    2.384484] x27: ffff800011caa000 x26: ffff0000de58e000
[    2.384487] x25: ffff800011e57000 x24: ffff800011caad90
[    2.384490] x23: ffff800011c778e0 x22: ffff800011a56000
[    2.384494] x21: ffff800011a56000 x20: 0000b471121a6000
[    2.384497] x19: 00000000de588000 x18: 0000000000000000
[    2.384500] x17: 0000000000000000 x16: 0000000000000400
[    2.384503] x15: 0000000000001000 x14: 00000000000007f2
[    2.384507] x13: 0000000000000004 x12: 0000000000000002
[    2.384510] x11: 0000000000000000 x10: 00000000121a6000
[    2.384513] x9 : ffff800011c77fd8 x8 : 0000000000000010
[    2.384516] x7 : 4f422effff306b64 x6 : 0000008080000000
[    2.384519] x5 : 0000008080000000 x4 : ffff800011fabc94
[    2.384522] x3 : ffff800011f5bde0 x2 : 0000000000000001
[    2.384526] x1 : 00000000121a0000 x0 : ffff8000118f962d
[    2.384529] Call trace:
[    2.384533]  cpu_hyp_reinit+0x114/0x30c
[    2.384537]  _kvm_arch_hardware_enable+0x30/0x54
[    2.384541]  flush_smp_call_function_queue+0xe4/0x154
[    2.384544]  generic_smp_call_function_single_interrupt+0x10/0x18
[    2.384549]  ipi_handler+0x170/0x2b0
[    2.384555]  handle_percpu_devid_fasteoi_ipi+0x120/0x1cc
[    2.384560]  __handle_domain_irq+0x9c/0xf4
[    2.384563]  gic_handle_irq+0x6c/0xe4
[    2.384566]  el1_irq+0xf0/0x1c0
[    2.384570]  arch_cpu_idle+0x28/0x44
[    2.384574]  do_idle+0x100/0x2a8
[    2.384577]  cpu_startup_entry+0x20/0x24
[    2.384581]  secondary_start_kernel+0x1b0/0x1cc
[    2.384589] Code: b9469d08 7100011f 540003ad 52800208 (f9400108)
[    2.384600] ---[ end trace 266d08dbf96ff143 ]---
[    2.385171] Kernel panic - not syncing: Fatal exception in interrupt

The faulting load is the access to 'data->template_start', for which the
compiler emits the hilarious:

	mov	w8, #0x10
	ldr	x8, [x8]

I can bodge this as below (untested), but it's pretty grotty. Please can you
take a look?

Cheers,

Will

--->8

Comments

James Morse April 8, 2022, 4:08 p.m. UTC | #1
Hi Will,

On 08/04/2022 13:00, Will Deacon wrote:
> Booting stable kernels <= v5.4 on arm64 with CONFIG_HARDEN_BRANCH_PREDICTOR=n
> results in a NULL pointer dereference during boot due to kvm_get_hyp_vector()
> dereferencing a NULL pointer from arm64_get_bp_hardening_data():
> 
> [    2.384444] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [    2.384461] pstate: 20400085 (nzCv daIf +PAN -UAO)
> [    2.384472] pc : cpu_hyp_reinit+0x114/0x30c
> [    2.384476] lr : cpu_hyp_reinit+0x80/0x30c
> [    2.384477] sp : ffff800010013e20
> [    2.384479] x29: ffff800010013e20 x28: ffff0000df033f00
> [    2.384484] x27: ffff800011caa000 x26: ffff0000de58e000
> [    2.384487] x25: ffff800011e57000 x24: ffff800011caad90
> [    2.384490] x23: ffff800011c778e0 x22: ffff800011a56000
> [    2.384494] x21: ffff800011a56000 x20: 0000b471121a6000
> [    2.384497] x19: 00000000de588000 x18: 0000000000000000
> [    2.384500] x17: 0000000000000000 x16: 0000000000000400
> [    2.384503] x15: 0000000000001000 x14: 00000000000007f2
> [    2.384507] x13: 0000000000000004 x12: 0000000000000002
> [    2.384510] x11: 0000000000000000 x10: 00000000121a6000
> [    2.384513] x9 : ffff800011c77fd8 x8 : 0000000000000010
> [    2.384516] x7 : 4f422effff306b64 x6 : 0000008080000000
> [    2.384519] x5 : 0000008080000000 x4 : ffff800011fabc94
> [    2.384522] x3 : ffff800011f5bde0 x2 : 0000000000000001
> [    2.384526] x1 : 00000000121a0000 x0 : ffff8000118f962d
> [    2.384529] Call trace:
> [    2.384533]  cpu_hyp_reinit+0x114/0x30c
> [    2.384537]  _kvm_arch_hardware_enable+0x30/0x54
> [    2.384541]  flush_smp_call_function_queue+0xe4/0x154
> [    2.384544]  generic_smp_call_function_single_interrupt+0x10/0x18
> [    2.384549]  ipi_handler+0x170/0x2b0
> [    2.384555]  handle_percpu_devid_fasteoi_ipi+0x120/0x1cc
> [    2.384560]  __handle_domain_irq+0x9c/0xf4
> [    2.384563]  gic_handle_irq+0x6c/0xe4
> [    2.384566]  el1_irq+0xf0/0x1c0
> [    2.384570]  arch_cpu_idle+0x28/0x44
> [    2.384574]  do_idle+0x100/0x2a8
> [    2.384577]  cpu_startup_entry+0x20/0x24
> [    2.384581]  secondary_start_kernel+0x1b0/0x1cc
> [    2.384589] Code: b9469d08 7100011f 540003ad 52800208 (f9400108)
> [    2.384600] ---[ end trace 266d08dbf96ff143 ]---
> [    2.385171] Kernel panic - not syncing: Fatal exception in interrupt

Yikes!

Interesting to know that stuff behind CONFIG_EXPERT has someone who cares about it.
(I was going to propose dropping the Kconfig option after a while).


> The faulting load is the access to 'data->template_start', for which the
> compiler emits the hilarious:
> 
> 	mov	w8, #0x10
> 	ldr	x8, [x8]

This is nasal-daemon territory, so I guess the compiler can do whatever it likes.


> I can bodge this as below (untested), but it's pretty grotty.

I wanted to keep the detection code even if the feature is disabled so the sysfs reporting
is always correct.


> Please can you take a look?

Ugh, arm64_get_bp_hardening_data() returns NULL with that Kconfig setup.


For v5.4, this fixes it for me:
--------------------%<--------------------
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 78d110667c0c..ffe0aad96b17 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -479,7 +479,8 @@ static inline void *kvm_get_hyp_vector(void)
        int slot = -1;

        if ((cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR) ||
-            cpus_have_const_cap(ARM64_SPECTRE_BHB)) && data->template_start) {
+            cpus_have_const_cap(ARM64_SPECTRE_BHB)) &&
+           data && data->template_start) {
                vect = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs_start));
                slot = data->hyp_vectors_slot;
        }

--------------------%<--------------------

I'll check the other versions and post patches to the stable list. Earlier stable
backports grew a dependency between these features as it wasn't possible to unpick the
dependencies.


Thanks,

James
Will Deacon April 8, 2022, 4:21 p.m. UTC | #2
Hi James,

On Fri, Apr 08, 2022 at 05:08:00PM +0100, James Morse wrote:
> On 08/04/2022 13:00, Will Deacon wrote:
> > Booting stable kernels <= v5.4 on arm64 with CONFIG_HARDEN_BRANCH_PREDICTOR=n
> > results in a NULL pointer dereference during boot due to kvm_get_hyp_vector()
> > dereferencing a NULL pointer from arm64_get_bp_hardening_data():
> > 
> > [    2.384444] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > [    2.384461] pstate: 20400085 (nzCv daIf +PAN -UAO)
> > [    2.384472] pc : cpu_hyp_reinit+0x114/0x30c
> > [    2.384476] lr : cpu_hyp_reinit+0x80/0x30c
> > [    2.384477] sp : ffff800010013e20
> > [    2.384479] x29: ffff800010013e20 x28: ffff0000df033f00
> > [    2.384484] x27: ffff800011caa000 x26: ffff0000de58e000
> > [    2.384487] x25: ffff800011e57000 x24: ffff800011caad90
> > [    2.384490] x23: ffff800011c778e0 x22: ffff800011a56000
> > [    2.384494] x21: ffff800011a56000 x20: 0000b471121a6000
> > [    2.384497] x19: 00000000de588000 x18: 0000000000000000
> > [    2.384500] x17: 0000000000000000 x16: 0000000000000400
> > [    2.384503] x15: 0000000000001000 x14: 00000000000007f2
> > [    2.384507] x13: 0000000000000004 x12: 0000000000000002
> > [    2.384510] x11: 0000000000000000 x10: 00000000121a6000
> > [    2.384513] x9 : ffff800011c77fd8 x8 : 0000000000000010
> > [    2.384516] x7 : 4f422effff306b64 x6 : 0000008080000000
> > [    2.384519] x5 : 0000008080000000 x4 : ffff800011fabc94
> > [    2.384522] x3 : ffff800011f5bde0 x2 : 0000000000000001
> > [    2.384526] x1 : 00000000121a0000 x0 : ffff8000118f962d
> > [    2.384529] Call trace:
> > [    2.384533]  cpu_hyp_reinit+0x114/0x30c
> > [    2.384537]  _kvm_arch_hardware_enable+0x30/0x54
> > [    2.384541]  flush_smp_call_function_queue+0xe4/0x154
> > [    2.384544]  generic_smp_call_function_single_interrupt+0x10/0x18
> > [    2.384549]  ipi_handler+0x170/0x2b0
> > [    2.384555]  handle_percpu_devid_fasteoi_ipi+0x120/0x1cc
> > [    2.384560]  __handle_domain_irq+0x9c/0xf4
> > [    2.384563]  gic_handle_irq+0x6c/0xe4
> > [    2.384566]  el1_irq+0xf0/0x1c0
> > [    2.384570]  arch_cpu_idle+0x28/0x44
> > [    2.384574]  do_idle+0x100/0x2a8
> > [    2.384577]  cpu_startup_entry+0x20/0x24
> > [    2.384581]  secondary_start_kernel+0x1b0/0x1cc
> > [    2.384589] Code: b9469d08 7100011f 540003ad 52800208 (f9400108)
> > [    2.384600] ---[ end trace 266d08dbf96ff143 ]---
> > [    2.385171] Kernel panic - not syncing: Fatal exception in interrupt
> 
> Yikes!
> 
> Interesting to know that stuff behind CONFIG_EXPERT has someone who cares about it.
> (I was going to propose dropping the Kconfig option after a while).

Yup! FWIW, the hardening options are enabled in Android (GKI), but this was
reported to us externally by somebody using a custom config.

> > I can bodge this as below (untested), but it's pretty grotty.
> 
> I wanted to keep the detection code even if the feature is disabled so the sysfs reporting
> is always correct.

Makes sense. Another option is to check for ARM64_HARDEN_BRANCH_PREDICTOR in
spectre_bhb_enable_mitigation(), but then I think the KVM code would need
to query the mitigation state rather than just the cap.

> > Please can you take a look?
> 
> Ugh, arm64_get_bp_hardening_data() returns NULL with that Kconfig setup.
> 
> 
> For v5.4, this fixes it for me:
> --------------------%<--------------------
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 78d110667c0c..ffe0aad96b17 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -479,7 +479,8 @@ static inline void *kvm_get_hyp_vector(void)
>         int slot = -1;
> 
>         if ((cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR) ||
> -            cpus_have_const_cap(ARM64_SPECTRE_BHB)) && data->template_start) {
> +            cpus_have_const_cap(ARM64_SPECTRE_BHB)) &&
> +           data && data->template_start) {
>                 vect = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs_start));
>                 slot = data->hyp_vectors_slot;
>         }

That'll work, but will sysfs report that BHB is mitigated even if
!ARM64_HARDEN_BRANCH_PREDICTOR?

> --------------------%<--------------------
> 
> I'll check the other versions and post patches to the stable list. Earlier stable
> backports grew a dependency between these features as it wasn't possible to unpick the
> dependencies.

Cheers. I know 4.19 is busted too, but I didn't check 4.14.

Will
James Morse April 8, 2022, 4:38 p.m. UTC | #3
Hi Will,

On 08/04/2022 17:21, Will Deacon wrote:
> On Fri, Apr 08, 2022 at 05:08:00PM +0100, James Morse wrote:
>> On 08/04/2022 13:00, Will Deacon wrote:
>>> Booting stable kernels <= v5.4 on arm64 with CONFIG_HARDEN_BRANCH_PREDICTOR=n
>>> results in a NULL pointer dereference during boot due to kvm_get_hyp_vector()
>>> dereferencing a NULL pointer from arm64_get_bp_hardening_data():
>>>
>>> [    2.384444] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [    2.384461] pstate: 20400085 (nzCv daIf +PAN -UAO)
>>> [    2.384472] pc : cpu_hyp_reinit+0x114/0x30c
>>> [    2.384476] lr : cpu_hyp_reinit+0x80/0x30c

>>> [    2.385171] Kernel panic - not syncing: Fatal exception in interrupt
>>
>> Yikes!
>>
>> Interesting to know that stuff behind CONFIG_EXPERT has someone who cares about it.
>> (I was going to propose dropping the Kconfig option after a while).

> Yup! FWIW, the hardening options are enabled in Android (GKI), but this was
> reported to us externally by somebody using a custom config.

>>> I can bodge this as below (untested), but it's pretty grotty.
>>
>> I wanted to keep the detection code even if the feature is disabled so the sysfs reporting
>> is always correct.
> 
> Makes sense. Another option is to check for ARM64_HARDEN_BRANCH_PREDICTOR in
> spectre_bhb_enable_mitigation(), but then I think the KVM code would need
> to query the mitigation state rather than just the cap.

It already does, but as you say KVM is only using the cap here.


>>> Please can you take a look?
>>
>> Ugh, arm64_get_bp_hardening_data() returns NULL with that Kconfig setup.
>>
>>
>> For v5.4, this fixes it for me:
>> --------------------%<--------------------
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 78d110667c0c..ffe0aad96b17 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -479,7 +479,8 @@ static inline void *kvm_get_hyp_vector(void)
>>         int slot = -1;
>>
>>         if ((cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR) ||
>> -            cpus_have_const_cap(ARM64_SPECTRE_BHB)) && data->template_start) {
>> +            cpus_have_const_cap(ARM64_SPECTRE_BHB)) &&
>> +           data && data->template_start) {
>>                 vect = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs_start));
>>                 slot = data->hyp_vectors_slot;
>>         }
> 
> That'll work, but will sysfs report that BHB is mitigated even if
> !ARM64_HARDEN_BRANCH_PREDICTOR?

The (!IS_ENABLED(CONFIG_HARDEN_BRANCH_PREDICTOR) in check_branch_predictor() will set
__hardenbp_enab to false, which get_spectre_v2_workaround_state() picks up and causes
spectre_bhb_enable_mitigation() to skip all the mitigations, leaving state =
SPECTRE_VULNERABLE.

(The interactions with the other Spectre mitigations across the stable kernels
were/continue-to-be a pain in the neck)


>> --------------------%<--------------------
>>
>> I'll check the other versions and post patches to the stable list. Earlier stable
>> backports grew a dependency between these features as it wasn't possible to unpick the
>> dependencies.
> 
> Cheers. I know 4.19 is busted too, but I didn't check 4.14.

Yup, I've just reproduced that one. I suspect v4.14 is where I added the Kconfig dependency.


Thanks,

James
diff mbox series

Patch

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 33b33416fea4..3bc1380f5524 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -939,6 +939,7 @@  const struct arm64_cpu_capabilities arm64_errata[] = {
                .cpu_enable = cpu_enable_ssbd_mitigation,
                .midr_range_list = arm64_ssb_cpus,
        },
+#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
        {
                .desc = "Spectre-BHB",
                .capability = ARM64_SPECTRE_BHB,
@@ -946,6 +947,7 @@  const struct arm64_cpu_capabilities arm64_errata[] = {
                .matches = is_spectre_bhb_affected,
                .cpu_enable = spectre_bhb_enable_mitigation,
        },
+#endif
 #ifdef CONFIG_ARM64_ERRATUM_1418040
        {
                .desc = "ARM erratum 1418040",