Message ID | 20241106122654.38234-1-alexandru.elisei@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: arm64: VHE: Initialize PMSCR_EL1 | expand |
On Wed, 06 Nov 2024 12:26:54 +0000, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > According to the pseudocode for StatisticalProfilingEnabled() from Arm > DDI0487K.a, PMSCR_EL1 controls profiling at EL1 and EL0: > > - PMSCR_EL1.E1SPE controls profiling at EL1. > - PMSCR_EL1.E0SPE controls profiling at EL0 if HCR_EL2.TGE=0. KVM always > clears HCR_EL2.TGE when running a VM. > > When profiling is enabled in the host, and the host is running in nVHE mode > (HCR_EL2.E2H=0), KVM clears PMSCR_EL1.{E1SPE,E0SPE} before jumping into the > guest. > > When profiling is enabled in the host, and the host is running at EL2 > (HCR_EL2.E2H=1), KVM will not touch PMSCR_EL1.{E1SPE,E0SPE} before jumping > into the guest. PMSCR_EL1.{E1SPE,E0SPE} reset to an architecturally UNKNOWN > value, which means it might be possible that KVM unintentionally profiles > the guest when is running in VHE mode. > > Clear PMSCR_EL1.{E1SPE,E0SPE} when setting up VHE mode to keep the > behaviour consistent and predictable. > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> > --- > > Tested on the model, by setting the PMSCR_EL1.E1SPE and E0SPE bits in > __init_el2_debug to simulate a system where they reset to 1. Without the > patch, when the host is running at EL2, and the user is profiling the > kvmtool process, I can see records taken at EL1: > > # perf record -e arm_spe// -- ./lkvm-static run -c2 -m512 -k Image -d disk -p earlycon > > With this patch, those records disappear; and the size of perf.data has > been more than halved. > > arch/arm64/kernel/hyp-stub.S | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S > index 65f76064c86b..df63f329d400 100644 > --- a/arch/arm64/kernel/hyp-stub.S > +++ b/arch/arm64/kernel/hyp-stub.S > @@ -117,6 +117,8 @@ SYM_CODE_START_LOCAL(__finalise_el2) > bic x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) > bic x0, x0, #(MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT) > msr mdcr_el2, x0 > + // Disable profiling when running a virtual machine > + msr_s SYS_PMSCR_EL12, xzr ... resulting in an early crash on anything that doesn't have SPE. That's indeed "consistent and predictable" :-). > > // Transfer the MM state from EL1 to EL2 > mrs_s x0, SYS_TCR_EL12 I find it pretty odd to hide something that is squarely guest state in the hyp stubs, and I'd rather see something like this (untested): diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 48cafb65d6acf..806f25a8753ed 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2139,8 +2139,12 @@ static void cpu_hyp_init_features(void) cpu_set_hyp_vector(); kvm_arm_init_debug(); - if (is_kernel_in_hyp_mode()) + if (is_kernel_in_hyp_mode()) { + if (SYS_FIELD_GET(ID_AA64DFR0_EL1, PMSVer, + read_sysreg(id_aa64dfr0_el1))) + write_sysreg_el1(0, SYS_PMSCR); kvm_timer_init_vhe(); + } if (vgic_present) kvm_vgic_init_cpu_hardware(); Thanks, M.
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 65f76064c86b..df63f329d400 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -117,6 +117,8 @@ SYM_CODE_START_LOCAL(__finalise_el2) bic x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) bic x0, x0, #(MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT) msr mdcr_el2, x0 + // Disable profiling when running a virtual machine + msr_s SYS_PMSCR_EL12, xzr // Transfer the MM state from EL1 to EL2 mrs_s x0, SYS_TCR_EL12
According to the pseudocode for StatisticalProfilingEnabled() from Arm DDI0487K.a, PMSCR_EL1 controls profiling at EL1 and EL0: - PMSCR_EL1.E1SPE controls profiling at EL1. - PMSCR_EL1.E0SPE controls profiling at EL0 if HCR_EL2.TGE=0. KVM always clears HCR_EL2.TGE when running a VM. When profiling is enabled in the host, and the host is running in nVHE mode (HCR_EL2.E2H=0), KVM clears PMSCR_EL1.{E1SPE,E0SPE} before jumping into the guest. When profiling is enabled in the host, and the host is running at EL2 (HCR_EL2.E2H=1), KVM will not touch PMSCR_EL1.{E1SPE,E0SPE} before jumping into the guest. PMSCR_EL1.{E1SPE,E0SPE} reset to an architecturally UNKNOWN value, which means it might be possible that KVM unintentionally profiles the guest when is running in VHE mode. Clear PMSCR_EL1.{E1SPE,E0SPE} when setting up VHE mode to keep the behaviour consistent and predictable. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> --- Tested on the model, by setting the PMSCR_EL1.E1SPE and E0SPE bits in __init_el2_debug to simulate a system where they reset to 1. Without the patch, when the host is running at EL2, and the user is profiling the kvmtool process, I can see records taken at EL1: # perf record -e arm_spe// -- ./lkvm-static run -c2 -m512 -k Image -d disk -p earlycon With this patch, those records disappear; and the size of perf.data has been more than halved. arch/arm64/kernel/hyp-stub.S | 2 ++ 1 file changed, 2 insertions(+)