diff mbox series

KVM: arm64: VHE: Initialize PMSCR_EL1

Message ID 20241106122654.38234-1-alexandru.elisei@arm.com (mailing list archive)
State New
Headers show
Series KVM: arm64: VHE: Initialize PMSCR_EL1 | expand

Commit Message

Alexandru Elisei Nov. 6, 2024, 12:26 p.m. UTC
According to the pseudocode for StatisticalProfilingEnabled() from Arm
DDI0487K.a, PMSCR_EL1 controls profiling at EL1 and EL0:

- PMSCR_EL1.E1SPE controls profiling at EL1.
- PMSCR_EL1.E0SPE controls profiling at EL0 if HCR_EL2.TGE=0. KVM always
  clears HCR_EL2.TGE when running a VM.

When profiling is enabled in the host, and the host is running in nVHE mode
(HCR_EL2.E2H=0), KVM clears PMSCR_EL1.{E1SPE,E0SPE} before jumping into the
guest.

When profiling is enabled in the host, and the host is running at EL2
(HCR_EL2.E2H=1), KVM will not touch PMSCR_EL1.{E1SPE,E0SPE} before jumping
into the guest. PMSCR_EL1.{E1SPE,E0SPE} reset to an architecturally UNKNOWN
value, which means it might be possible that KVM unintentionally profiles
the guest when is running in VHE mode.

Clear PMSCR_EL1.{E1SPE,E0SPE} when setting up VHE mode to keep the
behaviour consistent and predictable.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---

Tested on the model, by setting the PMSCR_EL1.E1SPE and E0SPE bits in
__init_el2_debug to simulate a system where they reset to 1. Without the
patch, when the host is running at EL2, and the user is profiling the
kvmtool process, I can see records taken at EL1:

# perf record -e arm_spe// -- ./lkvm-static run -c2 -m512 -k Image -d disk -p earlycon

With this patch, those records disappear; and the size of perf.data has
been more than halved.

 arch/arm64/kernel/hyp-stub.S | 2 ++
 1 file changed, 2 insertions(+)

Comments

Marc Zyngier Nov. 6, 2024, 1:51 p.m. UTC | #1
On Wed, 06 Nov 2024 12:26:54 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> According to the pseudocode for StatisticalProfilingEnabled() from Arm
> DDI0487K.a, PMSCR_EL1 controls profiling at EL1 and EL0:
> 
> - PMSCR_EL1.E1SPE controls profiling at EL1.
> - PMSCR_EL1.E0SPE controls profiling at EL0 if HCR_EL2.TGE=0. KVM always
>   clears HCR_EL2.TGE when running a VM.
> 
> When profiling is enabled in the host, and the host is running in nVHE mode
> (HCR_EL2.E2H=0), KVM clears PMSCR_EL1.{E1SPE,E0SPE} before jumping into the
> guest.
> 
> When profiling is enabled in the host, and the host is running at EL2
> (HCR_EL2.E2H=1), KVM will not touch PMSCR_EL1.{E1SPE,E0SPE} before jumping
> into the guest. PMSCR_EL1.{E1SPE,E0SPE} reset to an architecturally UNKNOWN
> value, which means it might be possible that KVM unintentionally profiles
> the guest when is running in VHE mode.
> 
> Clear PMSCR_EL1.{E1SPE,E0SPE} when setting up VHE mode to keep the
> behaviour consistent and predictable.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
> 
> Tested on the model, by setting the PMSCR_EL1.E1SPE and E0SPE bits in
> __init_el2_debug to simulate a system where they reset to 1. Without the
> patch, when the host is running at EL2, and the user is profiling the
> kvmtool process, I can see records taken at EL1:
> 
> # perf record -e arm_spe// -- ./lkvm-static run -c2 -m512 -k Image -d disk -p earlycon
> 
> With this patch, those records disappear; and the size of perf.data has
> been more than halved.
> 
>  arch/arm64/kernel/hyp-stub.S | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> index 65f76064c86b..df63f329d400 100644
> --- a/arch/arm64/kernel/hyp-stub.S
> +++ b/arch/arm64/kernel/hyp-stub.S
> @@ -117,6 +117,8 @@ SYM_CODE_START_LOCAL(__finalise_el2)
>  	bic	x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
>  	bic	x0, x0, #(MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT)
>  	msr	mdcr_el2, x0
> +	// Disable profiling when running a virtual machine
> +	msr_s	SYS_PMSCR_EL12, xzr

... resulting in an early crash on anything that doesn't have SPE.
That's indeed "consistent and predictable" :-).

>  
>  	// Transfer the MM state from EL1 to EL2
>  	mrs_s	x0, SYS_TCR_EL12

I find it pretty odd to hide something that is squarely guest state in
the hyp stubs, and I'd rather see something like this (untested):

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 48cafb65d6acf..806f25a8753ed 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2139,8 +2139,12 @@ static void cpu_hyp_init_features(void)
 	cpu_set_hyp_vector();
 	kvm_arm_init_debug();
 
-	if (is_kernel_in_hyp_mode())
+	if (is_kernel_in_hyp_mode()) {
+		if (SYS_FIELD_GET(ID_AA64DFR0_EL1, PMSVer,
+				  read_sysreg(id_aa64dfr0_el1)))
+			write_sysreg_el1(0, SYS_PMSCR);
 		kvm_timer_init_vhe();
+	}
 
 	if (vgic_present)
 		kvm_vgic_init_cpu_hardware();

Thanks,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 65f76064c86b..df63f329d400 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -117,6 +117,8 @@  SYM_CODE_START_LOCAL(__finalise_el2)
 	bic	x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
 	bic	x0, x0, #(MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT)
 	msr	mdcr_el2, x0
+	// Disable profiling when running a virtual machine
+	msr_s	SYS_PMSCR_EL12, xzr
 
 	// Transfer the MM state from EL1 to EL2
 	mrs_s	x0, SYS_TCR_EL12