diff mbox series

KVM: arm64: pkvm: Fixup boot mode to reflect that the kernel resumes from EL1

Message ID 20221011165400.1241729-1-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: pkvm: Fixup boot mode to reflect that the kernel resumes from EL1 | expand

Commit Message

Marc Zyngier Oct. 11, 2022, 4:54 p.m. UTC
The kernel has an awfully complicated boot sequence in order to cope
with the various EL2 configurations, including those that "enhanced"
the architecture. We go from EL2 to EL1, then back to EL2, staying
at EL2 if VHE capable and otherwise go back to EL1.

Here's a paracetamol tablet for you.

The cpu_resume path follows the same logic, because coming up with
two versions of a square wheel is hard.

However, things aren't this straightforward with pKVM, as the host
resume path is always proxied by the hypervisor, which means that
the kernel is always entered at EL1. Which contradicts what the
__boot_cpu_mode[] array contains (it obviously says EL2).

This thus triggers a HVC call from EL1 to EL2 in a vain attempt
to upgrade from EL1 to EL2 VHE, which we are, funnily enough,
reluctant to grant to the host kernel. This is also completely
unexpected, and puzzles your average EL2 hacker.

Address it by fixing up the boot mode at the point the host gets
deprivileged. is_hyp_mode_available() and co already have a static
branch to deal with this, making it pretty safe.

Reported-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Oliver Upton Oct. 11, 2022, 6:48 p.m. UTC | #1
On Tue, Oct 11, 2022 at 05:54:00PM +0100, Marc Zyngier wrote:
> The kernel has an awfully complicated boot sequence in order to cope
> with the various EL2 configurations, including those that "enhanced"
> the architecture. We go from EL2 to EL1, then back to EL2, staying
> at EL2 if VHE capable and otherwise go back to EL1.
> 
> Here's a paracetamol tablet for you.

Heh, still have a bit of a headache from this :)

I'm having a hard time following where we skip the EL2 promotion based
on __boot_cpu_mode.

On the cpu_resume() path it looks like we take the return of
init_kernel_el() and pass that along to finalise_el2(). As we are in EL1
at this point, it seems like we'd go init_kernel_el() -> init_el1().

What am I missing?

--
Thanks,
Oliver

> The cpu_resume path follows the same logic, because coming up with
> two versions of a square wheel is hard.
> 
> However, things aren't this straightforward with pKVM, as the host
> resume path is always proxied by the hypervisor, which means that
> the kernel is always entered at EL1. Which contradicts what the
> __boot_cpu_mode[] array contains (it obviously says EL2).
> 
> This thus triggers a HVC call from EL1 to EL2 in a vain attempt
> to upgrade from EL1 to EL2 VHE, which we are, funnily enough,
> reluctant to grant to the host kernel. This is also completely
> unexpected, and puzzles your average EL2 hacker.
> 
> Address it by fixing up the boot mode at the point the host gets
> deprivileged. is_hyp_mode_available() and co already have a static
> branch to deal with this, making it pretty safe.
> 
> Reported-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/arm.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b6c9bfa8492f..cf075c9b9ab1 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2107,6 +2107,17 @@ static int pkvm_drop_host_privileges(void)
>  	 * once the host stage 2 is installed.
>  	 */
>  	static_branch_enable(&kvm_protected_mode_initialized);
> +
> +	/*
> +	 * Fixup the boot mode so that we don't take spurious round
> +	 * trips via EL2 on cpu_resume. Flush to the PoC for a good
> +	 * measure, so that it can be observed by a CPU coming out of
> +	 * suspend with the MMU off.
> +	 */
> +	__boot_cpu_mode[0] = __boot_cpu_mode[1] = BOOT_CPU_MODE_EL1;
> +	dcache_clean_poc((unsigned long)__boot_cpu_mode,
> +			 (unsigned long)(__boot_cpu_mode + 2));
> +
>  	on_each_cpu(_kvm_host_prot_finalize, &ret, 1);
>  	return ret;
>  }
> -- 
> 2.34.1
>
Marc Zyngier Oct. 11, 2022, 8:58 p.m. UTC | #2
On Tue, 11 Oct 2022 19:48:39 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Tue, Oct 11, 2022 at 05:54:00PM +0100, Marc Zyngier wrote:
> > The kernel has an awfully complicated boot sequence in order to cope
> > with the various EL2 configurations, including those that "enhanced"
> > the architecture. We go from EL2 to EL1, then back to EL2, staying
> > at EL2 if VHE capable and otherwise go back to EL1.
> > 
> > Here's a paracetamol tablet for you.
> 
> Heh, still have a bit of a headache from this :)
> 
> I'm having a hard time following where we skip the EL2 promotion based
> on __boot_cpu_mode.
> 
> On the cpu_resume() path it looks like we take the return of
> init_kernel_el() and pass that along to finalise_el2(). As we are in EL1
> at this point, it seems like we'd go init_kernel_el() -> init_el1().
> 
> What am I missing?

That I'm an idiot.

This is only necessary on pre-6.0, before 005e12676af0 ("arm64: head:
record CPU boot mode after enabling the MMU"), as this code-path
*used* to reload the boot mode from memory. Now, this is directly
passed as a parameter, making this patch useless.

The joys of looking at too many code bases at the same time... I'll
see how we can add it to 5.19.

Thanks,

	M.
Vincent Donnefort Oct. 13, 2022, 1:33 p.m. UTC | #3
On Tue, Oct 11, 2022 at 09:58:22PM +0100, Marc Zyngier wrote:
> On Tue, 11 Oct 2022 19:48:39 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > On Tue, Oct 11, 2022 at 05:54:00PM +0100, Marc Zyngier wrote:
> > > The kernel has an awfully complicated boot sequence in order to cope
> > > with the various EL2 configurations, including those that "enhanced"
> > > the architecture. We go from EL2 to EL1, then back to EL2, staying
> > > at EL2 if VHE capable and otherwise go back to EL1.
> > > 
> > > Here's a paracetamol tablet for you.
> > 
> > Heh, still have a bit of a headache from this :)
> > 
> > I'm having a hard time following where we skip the EL2 promotion based
> > on __boot_cpu_mode.
> > 
> > On the cpu_resume() path it looks like we take the return of
> > init_kernel_el() and pass that along to finalise_el2(). As we are in EL1
> > at this point, it seems like we'd go init_kernel_el() -> init_el1().
> > 
> > What am I missing?
> 
> That I'm an idiot.
> 
> This is only necessary on pre-6.0, before 005e12676af0 ("arm64: head:
> record CPU boot mode after enabling the MMU"), as this code-path
> *used* to reload the boot mode from memory. Now, this is directly
> passed as a parameter, making this patch useless.

On a 5.10 though, the suprious HVCs are gone and I have not observed any
regression.

Thanks!

For a stable fix:

Tested-by: Vincent Donnefort <vdonnefort@google.com>

> 
> The joys of looking at too many code bases at the same time... I'll
> see how we can add it to 5.19.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b6c9bfa8492f..cf075c9b9ab1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2107,6 +2107,17 @@  static int pkvm_drop_host_privileges(void)
 	 * once the host stage 2 is installed.
 	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
+
+	/*
+	 * Fixup the boot mode so that we don't take spurious round
+	 * trips via EL2 on cpu_resume. Flush to the PoC for a good
+	 * measure, so that it can be observed by a CPU coming out of
+	 * suspend with the MMU off.
+	 */
+	__boot_cpu_mode[0] = __boot_cpu_mode[1] = BOOT_CPU_MODE_EL1;
+	dcache_clean_poc((unsigned long)__boot_cpu_mode,
+			 (unsigned long)(__boot_cpu_mode + 2));
+
 	on_each_cpu(_kvm_host_prot_finalize, &ret, 1);
 	return ret;
 }