Message ID | 20250120151514.1769597-1-lokeshvutla@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: arm64: Flush hyp bss section after initialization of variables in bss | expand |
On Mon, 20 Jan 2025 15:15:14 +0000, Lokesh Vutla <lokeshvutla@google.com> wrote: > > To determine CPU features during initialization, the NVHE hypervisor s/NVHE/nVHE/ > utilizes sanitized values of the host's CPU features registers. These > values, stored in u64 idaa64*_el1_sys_val variables are updated by the > kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility, the visibility *with the MMU off* > data cache needs to be flushed after these updates. However, > individually flushing each variable using kvm_flush_dcache_to_poc() is > inefficient. > > These cpu feature variables would be part of the bss section of > the hypervisor. Hence, flush the entire bss section of hypervisor > once the initialization is complete. > > Motivation for this change: > * Since the existing variables are not flushed from EL1, the > id_aa64pfr0_el1_sys_val is seen as 0 from EL2. > * based on this value check_override macro in hypervisor skips > updating the sve (cpacr_el1) at finalise_el2_state. > * The default value for cpacr_el1 enables the sve traps to EL2. > * With sve enabled, during the context switch from EL0 -> EL1 (which is > much later in the boot process), the sve registers are saved/restored. > * Since sve traps are enabled, accessing sve registers at EL1 caused a > trap to EL2. > * However, hypervisor is not ready to handle sve traps at this stage > causing the below kernel crash during the boot: Drop this section, it doesn't bring much to the discussion. > > [ 0.320850][ T1] Run /init as init process > [ 0.321392][ T1] kvm [1]: nVHE hyp BUG at: [<ffffffc08112ee8c>] __kvm_nvhe_$x.24+0x254/0x254! > [ 0.321522][ T1] kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE > [ 0.321635][ T1] kvm [1]: Hyp Offset: 0xffffff6e60000000 > [ 0.321710][ T1] Kernel panic - not syncing: HYP panic: > [ 0.321710][ T1] PS:634023c9 PC:000000522112ee8c ESR:00000000f2000800 > [ 0.321710][ T1] FAR:0000000000000000 CPACR:0000000000310000 PAR:0000000000000800 > [ 0.321710][ T1] VCPU:0000000000000000 > [...] > [ 0.322251][ T1] Call trace: > [ 0.322292][ T1] dump_backtrace+0x100/0x180 > [ 0.322355][ T1] show_stack+0x20/0x30 > [ 0.322410][ T1] dump_stack_lvl+0x40/0x88 > [ 0.322471][ T1] dump_stack+0x18/0x24 > [ 0.322523][ T1] panic+0x13c/0x364 > [ 0.322578][ T1] nvhe_hyp_panic_handler+0x148/0x1cc > [ 0.322646][ T1] do_sve_acc+0xec/0x260 > [ 0.322706][ T1] el0_sve_acc+0x34/0x68 This is essentially content-free, given that there is no backtrace. Please drop this. > > Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers") > Suggested-by: Fuad Tabba <tabba@google.com> > Signed-off-by: Lokesh Vutla <lokeshvutla@google.com> > --- > arch/arm64/kvm/arm.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index a102c3aebdbc..5d3b2069a2d5 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -2661,6 +2661,12 @@ static int __init init_hyp_mode(void) > } > } > > + /* > + * Flush entire BSS since part of its data is read while the MMU is off. > + */ > + kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start), > + kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start)); > + > return 0; > > out_err: I don't understand how this fixes anything. At this stage, the hypervisor has already been initialised, and I expect it will have evaluated the wrong values. Even worse, I strongly suspect that by the time you perform this, S2 is enabled on the host, and that the BSS is off-limit. Which means it could fault and send you to lalaland. Have you actually tested this with upstream? I would have expected the clean operations to be called from kvm_hyp_init_symbols(), which runs before EL2 gets initialised in protected mode. M.
Hi Marc, On Mon, Jan 20, 2025 at 9:44 PM Marc Zyngier <maz@kernel.org> wrote: > > On Mon, 20 Jan 2025 15:15:14 +0000, > Lokesh Vutla <lokeshvutla@google.com> wrote: > > > > To determine CPU features during initialization, the NVHE hypervisor > > s/NVHE/nVHE/ ok > > > utilizes sanitized values of the host's CPU features registers. These > > values, stored in u64 idaa64*_el1_sys_val variables are updated by the > > kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility, the > > visibility *with the MMU off* ok > > > data cache needs to be flushed after these updates. However, > > individually flushing each variable using kvm_flush_dcache_to_poc() is > > inefficient. > > > > These cpu feature variables would be part of the bss section of > > the hypervisor. Hence, flush the entire bss section of hypervisor > > once the initialization is complete. > > > > Motivation for this change: > > * Since the existing variables are not flushed from EL1, the > > id_aa64pfr0_el1_sys_val is seen as 0 from EL2. > > * based on this value check_override macro in hypervisor skips > > updating the sve (cpacr_el1) at finalise_el2_state. > > * The default value for cpacr_el1 enables the sve traps to EL2. > > * With sve enabled, during the context switch from EL0 -> EL1 (which is > > much later in the boot process), the sve registers are saved/restored. > > * Since sve traps are enabled, accessing sve registers at EL1 caused a > > trap to EL2. > > * However, hypervisor is not ready to handle sve traps at this stage > > causing the below kernel crash during the boot: > > Drop this section, it doesn't bring much to the discussion. > > > > > [ 0.320850][ T1] Run /init as init process > > [ 0.321392][ T1] kvm [1]: nVHE hyp BUG at: [<ffffffc08112ee8c>] __kvm_nvhe_$x.24+0x254/0x254! > > [ 0.321522][ T1] kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE > > [ 0.321635][ T1] kvm [1]: Hyp Offset: 0xffffff6e60000000 > > [ 0.321710][ T1] Kernel panic - not syncing: HYP panic: > > [ 0.321710][ T1] PS:634023c9 PC:000000522112ee8c ESR:00000000f2000800 > > [ 0.321710][ T1] FAR:0000000000000000 CPACR:0000000000310000 PAR:0000000000000800 > > [ 0.321710][ T1] VCPU:0000000000000000 > > [...] > > [ 0.322251][ T1] Call trace: > > [ 0.322292][ T1] dump_backtrace+0x100/0x180 > > [ 0.322355][ T1] show_stack+0x20/0x30 > > [ 0.322410][ T1] dump_stack_lvl+0x40/0x88 > > [ 0.322471][ T1] dump_stack+0x18/0x24 > > [ 0.322523][ T1] panic+0x13c/0x364 > > [ 0.322578][ T1] nvhe_hyp_panic_handler+0x148/0x1cc > > [ 0.322646][ T1] do_sve_acc+0xec/0x260 > > [ 0.322706][ T1] el0_sve_acc+0x34/0x68 > > This is essentially content-free, given that there is no > backtrace. Please drop this. Sure.. will drop the above 2 sections in v2. > > > > > Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers") > > Suggested-by: Fuad Tabba <tabba@google.com> > > Signed-off-by: Lokesh Vutla <lokeshvutla@google.com> > > --- > > arch/arm64/kvm/arm.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > index a102c3aebdbc..5d3b2069a2d5 100644 > > --- a/arch/arm64/kvm/arm.c > > +++ b/arch/arm64/kvm/arm.c > > @@ -2661,6 +2661,12 @@ static int __init init_hyp_mode(void) > > } > > } > > > > + /* > > + * Flush entire BSS since part of its data is read while the MMU is off. > > + */ > > + kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start), > > + kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start)); > > + > > return 0; > > > > out_err: > > I don't understand how this fixes anything. At this stage, the > hypervisor has already been initialised, and I expect it will have > evaluated the wrong values. > > Even worse, I strongly suspect that by the time you perform this, S2 > is enabled on the host, and that the BSS is off-limit. Which means it > could fault and send you to lalaland. > > Have you actually tested this with upstream? > > I would have expected the clean operations to be called from > kvm_hyp_init_symbols(), which runs before EL2 gets initialised in > protected mode. Sorry my bad.. Actually we started flushing each variable and tested the solution. With this we observed that the hypervisor is able to read the data properly with MMU off. Then kept optimizing the solution by flushing the entire bss and moving around the place of flushing.. I guess it got lucky in the last run with this wrong placement. Will add the flush in kvm_hyp_init_symbols() and post v2. Thanks and Regards, Lokesh > > M. > > -- > Without deviation from the norm, progress is not possible.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index a102c3aebdbc..5d3b2069a2d5 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2661,6 +2661,12 @@ static int __init init_hyp_mode(void) } } + /* + * Flush entire BSS since part of its data is read while the MMU is off. + */ + kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start), + kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start)); + return 0; out_err:
To determine CPU features during initialization, the NVHE hypervisor utilizes sanitized values of the host's CPU features registers. These values, stored in u64 idaa64*_el1_sys_val variables are updated by the kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility, the data cache needs to be flushed after these updates. However, individually flushing each variable using kvm_flush_dcache_to_poc() is inefficient. These cpu feature variables would be part of the bss section of the hypervisor. Hence, flush the entire bss section of hypervisor once the initialization is complete. Motivation for this change: * Since the existing variables are not flushed from EL1, the id_aa64pfr0_el1_sys_val is seen as 0 from EL2. * based on this value check_override macro in hypervisor skips updating the sve (cpacr_el1) at finalise_el2_state. * The default value for cpacr_el1 enables the sve traps to EL2. * With sve enabled, during the context switch from EL0 -> EL1 (which is much later in the boot process), the sve registers are saved/restored. * Since sve traps are enabled, accessing sve registers at EL1 caused a trap to EL2. * However, hypervisor is not ready to handle sve traps at this stage causing the below kernel crash during the boot: [ 0.320850][ T1] Run /init as init process [ 0.321392][ T1] kvm [1]: nVHE hyp BUG at: [<ffffffc08112ee8c>] __kvm_nvhe_$x.24+0x254/0x254! [ 0.321522][ T1] kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE [ 0.321635][ T1] kvm [1]: Hyp Offset: 0xffffff6e60000000 [ 0.321710][ T1] Kernel panic - not syncing: HYP panic: [ 0.321710][ T1] PS:634023c9 PC:000000522112ee8c ESR:00000000f2000800 [ 0.321710][ T1] FAR:0000000000000000 CPACR:0000000000310000 PAR:0000000000000800 [ 0.321710][ T1] VCPU:0000000000000000 [...] [ 0.322251][ T1] Call trace: [ 0.322292][ T1] dump_backtrace+0x100/0x180 [ 0.322355][ T1] show_stack+0x20/0x30 [ 0.322410][ T1] dump_stack_lvl+0x40/0x88 [ 0.322471][ T1] dump_stack+0x18/0x24 [ 0.322523][ T1] panic+0x13c/0x364 [ 0.322578][ T1] nvhe_hyp_panic_handler+0x148/0x1cc [ 0.322646][ T1] do_sve_acc+0xec/0x260 [ 0.322706][ T1] el0_sve_acc+0x34/0x68 Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers") Suggested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Lokesh Vutla <lokeshvutla@google.com> --- arch/arm64/kvm/arm.c | 6 ++++++ 1 file changed, 6 insertions(+)