Message ID | 20230515173103.1017669-1-maz@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: ARMv8.3/8.4 Nested Virtualization support | expand |
Hi Marc, On 5/15/23 19:30, Marc Zyngier wrote: > This is the 4th drop of NV support on arm64 for this year. > > For the previous episodes, see [1]. > > What's changed: > > - New framework to track system register traps that are reinjected in > guest EL2. It is expected to replace the discrete handling we have > enjoyed so far, which didn't scale at all. This has already fixed a > number of bugs that were hidden (a bunch of traps were never > forwarded...). Still a work in progress, but this is going in the > right direction. > > - Allow the L1 hypervisor to have a S2 that has an input larger than > the L0 IPA space. This fixes a number of subtle issues, depending on > how the initial guest was created. > > - Consequently, the patch series has gone longer again. Boo. But > hopefully some of it is easier to review... > > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org I have started testing this and when booting my fedora guest I get [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: 23f425fd0 [80000209] [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, as soon as the host has kvm-arm.mode=nested This seems to be triggered very early by EDK2 (ArmPkg/Drivers/TimerDxe/TimerDxe.c). If I am not wrong this CNTV_CTL_EL0. Do you have any idea? By the way I got this already with your v9 (git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/nv-6.4-WIP) and with your v10 (I cherry-picked your maz/kvm-arm64/nv-6.5-WIP branch). Thanks Eric > > Andre Przywara (1): > KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ > > Christoffer Dall (5): > KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 > KVM: arm64: nv: Implement nested Stage-2 page table walk logic > KVM: arm64: nv: Unmap/flush shadow stage 2 page tables > KVM: arm64: nv: vgic: Emulate the HW bit in software > KVM: arm64: nv: Sync nested timer state with FEAT_NV2 > > Jintack Lim (7): > KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2 > KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting > KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings > KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings > KVM: arm64: nv: Configure HCR_EL2 for nested virtualization > KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2 > KVM: arm64: nv: Nested GICv3 Support > > Marc Zyngier (46): > KVM: arm64: Move VTCR_EL2 into struct s2_mmu > arm64: Add missing Set/Way CMO encodings > arm64: Add missing VA CMO encodings > arm64: Add missing ERXMISCx_EL1 encodings > arm64: Add missing DC ZVA/GVA/GZVA encodings > arm64: Add TLBI operation encodings > arm64: Add AT operation encodings > KVM: arm64: Add missing HCR_EL2 trap bits > KVM: arm64: nv: Add trap forwarding infrastructure > KVM: arm64: nv: Add trap forwarding for HCR_EL2 > KVM: arm64: nv: Expose FEAT_EVT to nested guests > KVM: arm64: nv: Add trap forwarding for MDCR_EL2 > KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 > KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers > KVM: arm64: nv: Handle virtual EL2 registers in > vcpu_read/write_sys_reg() > KVM: arm64: nv: Handle SPSR_EL2 specially > KVM: arm64: nv: Handle HCR_EL2.E2H specially > KVM: arm64: nv: Save/Restore vEL2 sysregs > KVM: arm64: nv: Support multiple nested Stage-2 mmu structures > KVM: arm64: nv: Handle shadow stage 2 page faults > KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's > KVM: arm64: nv: Set a handler for the system instruction traps > KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 > KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's > KVM: arm64: nv: Hide RAS from nested guests > KVM: arm64: nv: Add handling of EL2-specific timer registers > KVM: arm64: nv: Load timer before the GIC > KVM: arm64: nv: Don't load the GICv4 context on entering a nested > guest > KVM: arm64: nv: Implement maintenance interrupt forwarding > KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt > delivery > KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT > KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation > KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like > information > KVM: arm64: nv: Tag shadow S2 entries with nested level > KVM: arm64: nv: Add include containing the VNCR_EL2 offsets > KVM: arm64: nv: Map VNCR-capable registers to a separate page > KVM: arm64: nv: Move nested vgic state into the sysreg file > KVM: arm64: Add FEAT_NV2 cpu feature > KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup > KVM: arm64: nv: Publish emulated timer interrupt state in the > in-memory state > KVM: arm64: nv: Allocate VNCR page when required > KVM: arm64: nv: Enable ARMv8.4-NV support > KVM: arm64: nv: Fast-track 'InHost' exception returns > KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests > KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers > KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on > > .../virt/kvm/devices/arm-vgic-v3.rst | 12 +- > arch/arm64/include/asm/esr.h | 1 + > arch/arm64/include/asm/kvm_arm.h | 14 + > arch/arm64/include/asm/kvm_asm.h | 4 + > arch/arm64/include/asm/kvm_emulate.h | 93 +- > arch/arm64/include/asm/kvm_host.h | 181 +++- > arch/arm64/include/asm/kvm_hyp.h | 2 + > arch/arm64/include/asm/kvm_mmu.h | 20 +- > arch/arm64/include/asm/kvm_nested.h | 133 +++ > arch/arm64/include/asm/stage2_pgtable.h | 4 +- > arch/arm64/include/asm/sysreg.h | 196 ++++ > arch/arm64/include/asm/vncr_mapping.h | 74 ++ > arch/arm64/include/uapi/asm/kvm.h | 1 + > arch/arm64/kernel/cpufeature.c | 11 + > arch/arm64/kvm/Makefile | 4 +- > arch/arm64/kvm/arch_timer.c | 98 +- > arch/arm64/kvm/arm.c | 33 +- > arch/arm64/kvm/at.c | 219 ++++ > arch/arm64/kvm/emulate-nested.c | 934 ++++++++++++++++- > arch/arm64/kvm/handle_exit.c | 29 +- > arch/arm64/kvm/hyp/include/hyp/switch.h | 8 +- > arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 5 +- > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 8 +- > arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +- > arch/arm64/kvm/hyp/nvhe/switch.c | 2 +- > arch/arm64/kvm/hyp/nvhe/sysreg-sr.c | 2 +- > arch/arm64/kvm/hyp/pgtable.c | 2 +- > arch/arm64/kvm/hyp/vgic-v3-sr.c | 6 +- > arch/arm64/kvm/hyp/vhe/switch.c | 206 +++- > arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 124 ++- > arch/arm64/kvm/hyp/vhe/tlb.c | 83 ++ > arch/arm64/kvm/mmu.c | 255 ++++- > arch/arm64/kvm/nested.c | 799 ++++++++++++++- > arch/arm64/kvm/pkvm.c | 2 +- > arch/arm64/kvm/reset.c | 7 + > arch/arm64/kvm/sys_regs.c | 958 +++++++++++++++++- > arch/arm64/kvm/trace_arm.h | 19 + > arch/arm64/kvm/vgic/vgic-init.c | 33 + > arch/arm64/kvm/vgic/vgic-kvm-device.c | 32 +- > arch/arm64/kvm/vgic/vgic-v3-nested.c | 248 +++++ > arch/arm64/kvm/vgic/vgic-v3.c | 43 +- > arch/arm64/kvm/vgic/vgic.c | 29 + > arch/arm64/kvm/vgic/vgic.h | 10 + > arch/arm64/tools/cpucaps | 1 + > include/clocksource/arm_arch_timer.h | 4 + > include/kvm/arm_arch_timer.h | 1 + > include/kvm/arm_vgic.h | 17 + > include/uapi/linux/kvm.h | 1 + > tools/arch/arm/include/uapi/asm/kvm.h | 1 + > 49 files changed, 4790 insertions(+), 183 deletions(-) > create mode 100644 arch/arm64/include/asm/vncr_mapping.h > create mode 100644 arch/arm64/kvm/at.c > create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c >
Hey Eric, On Tue, 16 May 2023 17:53:14 +0100, Eric Auger <eauger@redhat.com> wrote: > > Hi Marc, > > On 5/15/23 19:30, Marc Zyngier wrote: > > This is the 4th drop of NV support on arm64 for this year. > > > > For the previous episodes, see [1]. > > > > What's changed: > > > > - New framework to track system register traps that are reinjected in > > guest EL2. It is expected to replace the discrete handling we have > > enjoyed so far, which didn't scale at all. This has already fixed a > > number of bugs that were hidden (a bunch of traps were never > > forwarded...). Still a work in progress, but this is going in the > > right direction. > > > > - Allow the L1 hypervisor to have a S2 that has an input larger than > > the L0 IPA space. This fixes a number of subtle issues, depending on > > how the initial guest was created. > > > > - Consequently, the patch series has gone longer again. Boo. But > > hopefully some of it is easier to review... > > > > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org > > I have started testing this and when booting my fedora guest I get > > [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: > 23f425fd0 [80000209] > [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, > > as soon as the host has kvm-arm.mode=nested Very odd. A write to CNTV_CTL_EL0 that fails, meaning that we have ECV traps for the virtual timer set, and so this is probably done from (virtual) EL2. Which of course we're not properly handling. Duh. > This seems to be triggered very early by EDK2 > (ArmPkg/Drivers/TimerDxe/TimerDxe.c). > > If I am not wrong this CNTV_CTL_EL0. Do you have any idea? I have a good idea, but I could use some more information: - Is this a nested guest? Or a non nested guest? - Can you stash your EDK2 build somewhere so that I can try to reproduce it? - What do you use for VMM on the host? I'm running EDK2 on both L1 and L2 guests (some Debian builds), and I don't see any issue. But when running EDK2 on L1, I run it non-nested. > By the way I got this already with your v9 > (git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git > kvm-arm64/nv-6.4-WIP) and with your v10 (I cherry-picked your > maz/kvm-arm64/nv-6.5-WIP branch). That's an interesting data point. At least, it hasn't regressed further... I'll try and cook a test tomorrow and run it on my own rig. Which may or may not behave like yours, you never know... Thanks a lot for the report! M.
On Tue, 16 May 2023 17:53:14 +0100, Eric Auger <eauger@redhat.com> wrote: > > Hi Marc, > > On 5/15/23 19:30, Marc Zyngier wrote: > > This is the 4th drop of NV support on arm64 for this year. > > > > For the previous episodes, see [1]. > > > > What's changed: > > > > - New framework to track system register traps that are reinjected in > > guest EL2. It is expected to replace the discrete handling we have > > enjoyed so far, which didn't scale at all. This has already fixed a > > number of bugs that were hidden (a bunch of traps were never > > forwarded...). Still a work in progress, but this is going in the > > right direction. > > > > - Allow the L1 hypervisor to have a S2 that has an input larger than > > the L0 IPA space. This fixes a number of subtle issues, depending on > > how the initial guest was created. > > > > - Consequently, the patch series has gone longer again. Boo. But > > hopefully some of it is easier to review... > > > > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org > > I have started testing this and when booting my fedora guest I get > > [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: > 23f425fd0 [80000209] > [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, > > as soon as the host has kvm-arm.mode=nested > > This seems to be triggered very early by EDK2 > (ArmPkg/Drivers/TimerDxe/TimerDxe.c). > > If I am not wrong this CNTV_CTL_EL0. Do you have any idea? So here's my current analysis: I assume you are running EDK2 as the L1 guest in a nested configuration. I also assume that you are not running on an Apple CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is in nVHE mode. Finally, I'm going to assume that your implementation has FEAT_ECV and FEAT_NV2, because I can't see how it could fail otherwise. In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so that we can trap the EL0 virtual timer and faithfully emulate it (it is otherwise written to memory, which isn't very helpful). As it turns out, we don't handle these traps. I didn't spot it because my test machines are all Apple boxes that don't have a nVHE mode, so nothing on the nVHE path is getting *ANY* coverage. Hint: having access to such a machine would help (shipping address on request!). Otherwise, I'll eventually kill the nVHE support altogether. I have written the following patch, which compiles, but that I cannot test with my current setup. Could you please give it a go? Thanks again, M. From feb03b57de0bcb83254a2d6a3ce320f5e39434b6 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz@kernel.org> Date: Tue, 16 May 2023 21:06:20 +0100 Subject: [PATCH] KVM: arm64: Handle virtual timer traps when CNTHCTL_EL2.EL1TVT is set Signed-off-by: Marc Zyngier <maz@kernel.org> --- arch/arm64/include/asm/sysreg.h | 1 + arch/arm64/kvm/sys_regs.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 72ff6df5d75b..77a61179ea37 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -436,6 +436,7 @@ #define SYS_CNTP_CTL_EL0 sys_reg(3, 3, 14, 2, 1) #define SYS_CNTP_CVAL_EL0 sys_reg(3, 3, 14, 2, 2) +#define SYS_CNTV_TVAL_EL0 sys_reg(3, 3, 14, 3, 0) #define SYS_CNTV_CTL_EL0 sys_reg(3, 3, 14, 3, 1) #define SYS_CNTV_CVAL_EL0 sys_reg(3, 3, 14, 3, 2) diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 27a29dcbfcd2..9aa9c4e4b4d6 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1328,6 +1328,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu, treg = TIMER_REG_TVAL; break; + case SYS_CNTV_TVAL_EL0: + if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu)) + tmr = TIMER_HVTIMER; + else + tmr = TIMER_VTIMER; + treg = TIMER_REG_TVAL; + break; + case SYS_AARCH32_CNTP_TVAL: case SYS_CNTP_TVAL_EL02: tmr = TIMER_PTIMER; @@ -1357,6 +1365,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu, treg = TIMER_REG_CTL; break; + case SYS_CNTV_CTL_EL0: + if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu)) + tmr = TIMER_HVTIMER; + else + tmr = TIMER_VTIMER; + treg = TIMER_REG_CTL; + break; + case SYS_AARCH32_CNTP_CTL: case SYS_CNTP_CTL_EL02: tmr = TIMER_PTIMER; @@ -1386,6 +1402,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu, treg = TIMER_REG_CVAL; break; + case SYS_CNTV_CVAL_EL0: + if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu)) + tmr = TIMER_HVTIMER; + else + tmr = TIMER_VTIMER; + treg = TIMER_REG_CVAL; + break; + case SYS_AARCH32_CNTP_CVAL: case SYS_CNTP_CVAL_EL02: tmr = TIMER_PTIMER; @@ -2510,6 +2534,10 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_CNTP_CTL_EL0), access_arch_timer }, { SYS_DESC(SYS_CNTP_CVAL_EL0), access_arch_timer }, + { SYS_DESC(SYS_CNTV_TVAL_EL0), access_arch_timer }, + { SYS_DESC(SYS_CNTV_CTL_EL0), access_arch_timer }, + { SYS_DESC(SYS_CNTV_CVAL_EL0), access_arch_timer }, + /* PMEVCNTRn_EL0 */ PMU_PMEVCNTR_EL0(0), PMU_PMEVCNTR_EL0(1),
Hi Marc, On 5/16/23 22:28, Marc Zyngier wrote: > On Tue, 16 May 2023 17:53:14 +0100, > Eric Auger <eauger@redhat.com> wrote: >> >> Hi Marc, >> >> On 5/15/23 19:30, Marc Zyngier wrote: >>> This is the 4th drop of NV support on arm64 for this year. >>> >>> For the previous episodes, see [1]. >>> >>> What's changed: >>> >>> - New framework to track system register traps that are reinjected in >>> guest EL2. It is expected to replace the discrete handling we have >>> enjoyed so far, which didn't scale at all. This has already fixed a >>> number of bugs that were hidden (a bunch of traps were never >>> forwarded...). Still a work in progress, but this is going in the >>> right direction. >>> >>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>> the L0 IPA space. This fixes a number of subtle issues, depending on >>> how the initial guest was created. >>> >>> - Consequently, the patch series has gone longer again. Boo. But >>> hopefully some of it is easier to review... >>> >>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >> >> I have started testing this and when booting my fedora guest I get >> >> [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: >> 23f425fd0 [80000209] >> [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, >> >> as soon as the host has kvm-arm.mode=nested >> >> This seems to be triggered very early by EDK2 >> (ArmPkg/Drivers/TimerDxe/TimerDxe.c). >> >> If I am not wrong this CNTV_CTL_EL0. Do you have any idea? > > So here's my current analysis: > > I assume you are running EDK2 as the L1 guest in a nested > configuration. I also assume that you are not running on an Apple > CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is > in nVHE mode. > > Finally, I'm going to assume that your implementation has FEAT_ECV and > FEAT_NV2, because I can't see how it could fail otherwise. all the above is correct. > > In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so > that we can trap the EL0 virtual timer and faithfully emulate it (it > is otherwise written to memory, which isn't very helpful). indeed > > As it turns out, we don't handle these traps. I didn't spot it because > my test machines are all Apple boxes that don't have a nVHE mode, so > nothing on the nVHE path is getting *ANY* coverage. Hint: having > access to such a machine would help (shipping address on request!). > Otherwise, I'll eventually kill the nVHE support altogether. > > I have written the following patch, which compiles, but that I cannot > test with my current setup. Could you please give it a go? with the patch below, my guest boots nicely. You did it great on the 1st shot!!! So this fixes my issue. I will continue testing the v10. Thank you again! Eric > > Thanks again, > > M. > > From feb03b57de0bcb83254a2d6a3ce320f5e39434b6 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <maz@kernel.org> > Date: Tue, 16 May 2023 21:06:20 +0100 > Subject: [PATCH] KVM: arm64: Handle virtual timer traps when > CNTHCTL_EL2.EL1TVT is set > > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/include/asm/sysreg.h | 1 + > arch/arm64/kvm/sys_regs.c | 28 ++++++++++++++++++++++++++++ > 2 files changed, 29 insertions(+) > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index 72ff6df5d75b..77a61179ea37 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -436,6 +436,7 @@ > #define SYS_CNTP_CTL_EL0 sys_reg(3, 3, 14, 2, 1) > #define SYS_CNTP_CVAL_EL0 sys_reg(3, 3, 14, 2, 2) > > +#define SYS_CNTV_TVAL_EL0 sys_reg(3, 3, 14, 3, 0) > #define SYS_CNTV_CTL_EL0 sys_reg(3, 3, 14, 3, 1) > #define SYS_CNTV_CVAL_EL0 sys_reg(3, 3, 14, 3, 2) > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 27a29dcbfcd2..9aa9c4e4b4d6 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -1328,6 +1328,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu, > treg = TIMER_REG_TVAL; > break; > > + case SYS_CNTV_TVAL_EL0: > + if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu)) > + tmr = TIMER_HVTIMER; > + else > + tmr = TIMER_VTIMER; > + treg = TIMER_REG_TVAL; > + break; > + > case SYS_AARCH32_CNTP_TVAL: > case SYS_CNTP_TVAL_EL02: > tmr = TIMER_PTIMER; > @@ -1357,6 +1365,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu, > treg = TIMER_REG_CTL; > break; > > + case SYS_CNTV_CTL_EL0: > + if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu)) > + tmr = TIMER_HVTIMER; > + else > + tmr = TIMER_VTIMER; > + treg = TIMER_REG_CTL; > + break; > + > case SYS_AARCH32_CNTP_CTL: > case SYS_CNTP_CTL_EL02: > tmr = TIMER_PTIMER; > @@ -1386,6 +1402,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu, > treg = TIMER_REG_CVAL; > break; > > + case SYS_CNTV_CVAL_EL0: > + if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu)) > + tmr = TIMER_HVTIMER; > + else > + tmr = TIMER_VTIMER; > + treg = TIMER_REG_CVAL; > + break; > + > case SYS_AARCH32_CNTP_CVAL: > case SYS_CNTP_CVAL_EL02: > tmr = TIMER_PTIMER; > @@ -2510,6 +2534,10 @@ static const struct sys_reg_desc sys_reg_descs[] = { > { SYS_DESC(SYS_CNTP_CTL_EL0), access_arch_timer }, > { SYS_DESC(SYS_CNTP_CVAL_EL0), access_arch_timer }, > > + { SYS_DESC(SYS_CNTV_TVAL_EL0), access_arch_timer }, > + { SYS_DESC(SYS_CNTV_CTL_EL0), access_arch_timer }, > + { SYS_DESC(SYS_CNTV_CVAL_EL0), access_arch_timer }, > + > /* PMEVCNTRn_EL0 */ > PMU_PMEVCNTR_EL0(0), > PMU_PMEVCNTR_EL0(1),
On Wed, 17 May 2023 09:59:45 +0100, Eric Auger <eauger@redhat.com> wrote: > > Hi Marc, > > On 5/16/23 22:28, Marc Zyngier wrote: > > On Tue, 16 May 2023 17:53:14 +0100, > > Eric Auger <eauger@redhat.com> wrote: > >> > >> Hi Marc, > >> > >> On 5/15/23 19:30, Marc Zyngier wrote: > >>> This is the 4th drop of NV support on arm64 for this year. > >>> > >>> For the previous episodes, see [1]. > >>> > >>> What's changed: > >>> > >>> - New framework to track system register traps that are reinjected in > >>> guest EL2. It is expected to replace the discrete handling we have > >>> enjoyed so far, which didn't scale at all. This has already fixed a > >>> number of bugs that were hidden (a bunch of traps were never > >>> forwarded...). Still a work in progress, but this is going in the > >>> right direction. > >>> > >>> - Allow the L1 hypervisor to have a S2 that has an input larger than > >>> the L0 IPA space. This fixes a number of subtle issues, depending on > >>> how the initial guest was created. > >>> > >>> - Consequently, the patch series has gone longer again. Boo. But > >>> hopefully some of it is easier to review... > >>> > >>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org > >> > >> I have started testing this and when booting my fedora guest I get > >> > >> [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: > >> 23f425fd0 [80000209] > >> [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, > >> > >> as soon as the host has kvm-arm.mode=nested > >> > >> This seems to be triggered very early by EDK2 > >> (ArmPkg/Drivers/TimerDxe/TimerDxe.c). > >> > >> If I am not wrong this CNTV_CTL_EL0. Do you have any idea? > > > > So here's my current analysis: > > > > I assume you are running EDK2 as the L1 guest in a nested > > configuration. I also assume that you are not running on an Apple > > CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is > > in nVHE mode. > > > > Finally, I'm going to assume that your implementation has FEAT_ECV and > > FEAT_NV2, because I can't see how it could fail otherwise. > all the above is correct. > > > > In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so > > that we can trap the EL0 virtual timer and faithfully emulate it (it > > is otherwise written to memory, which isn't very helpful). > > indeed > > > > As it turns out, we don't handle these traps. I didn't spot it because > > my test machines are all Apple boxes that don't have a nVHE mode, so > > nothing on the nVHE path is getting *ANY* coverage. Hint: having > > access to such a machine would help (shipping address on request!). > > Otherwise, I'll eventually kill the nVHE support altogether. > > > > I have written the following patch, which compiles, but that I cannot > > test with my current setup. Could you please give it a go? > > with the patch below, my guest boots nicely. You did it great on the 1st > shot!!! So this fixes my issue. I will continue testing the v10. Thanks a lot for reporting the issue and testing my hacks. I'll eventually fold it into the rest of the series. By the way, what are you using as your VMM? I'd really like to reproduce your setup. Cheers, M.
Hi Marc, On 5/15/23 19:30, Marc Zyngier wrote: > This is the 4th drop of NV support on arm64 for this year. > > For the previous episodes, see [1]. > > What's changed: > > - New framework to track system register traps that are reinjected in > guest EL2. It is expected to replace the discrete handling we have > enjoyed so far, which didn't scale at all. This has already fixed a > number of bugs that were hidden (a bunch of traps were never > forwarded...). Still a work in progress, but this is going in the > right direction. > > - Allow the L1 hypervisor to have a S2 that has an input larger than > the L0 IPA space. This fixes a number of subtle issues, depending on > how the initial guest was created. > > - Consequently, the patch series has gone longer again. Boo. But > hopefully some of it is easier to review... > > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org > > Andre Przywara (1): > KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ I guess you have executed kselftests on L1 guests. Have all the tests passed there? On my end it stalls in the KVM_RUN. for instance tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c fails in test_guest_raz(vcpu) on the KVM_RUN. Even with a basic static void guest_main(void) { GUEST_DONE(); } I get aarch32_id_regs-768 [002] ..... 410.544665: kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 aarch32_id_regs-768 [002] d.... 410.544666: kvm_entry: PC: 0x0000000000401ec4 aarch32_id_regs-768 [002] ..... 410.544675: kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 aarch32_id_regs-768 [002] d.... 410.544676: kvm_entry: PC: 0x0000000000401ec4 aarch32_id_regs-768 [002] ..... 410.544685: kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 looping forever instead of aarch32_id_regs-1085576 [079] d..1. 1401295.068739: kvm_entry: PC: 0x0000000000401ec4 aarch32_id_regs-1085576 [079] ...1. 1401295.068745: kvm_exit: TRAP: HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 aarch32_id_regs-1085576 [079] d..1. 1401295.068790: kvm_entry: PC: 0x0000000000401ec4 aarch32_id_regs-1085576 [079] ...1. 1401295.068792: kvm_exit: TRAP: HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 aarch32_id_regs-1085576 [079] d..1. 1401295.068794: kvm_entry: PC: 0x0000000000401ec4 aarch32_id_regs-1085576 [079] ...1. 1401295.068795: kvm_exit: TRAP: HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 aarch32_id_regs-1085576 [079] d..1. 1401295.068797: kvm_entry: PC: 0x0000000000401ec4 ../.. Any idea or any known restriction wrt kselftests? Thanks Eric > > Christoffer Dall (5): > KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 > KVM: arm64: nv: Implement nested Stage-2 page table walk logic > KVM: arm64: nv: Unmap/flush shadow stage 2 page tables > KVM: arm64: nv: vgic: Emulate the HW bit in software > KVM: arm64: nv: Sync nested timer state with FEAT_NV2 > > Jintack Lim (7): > KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2 > KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting > KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings > KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings > KVM: arm64: nv: Configure HCR_EL2 for nested virtualization > KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2 > KVM: arm64: nv: Nested GICv3 Support > > Marc Zyngier (46): > KVM: arm64: Move VTCR_EL2 into struct s2_mmu > arm64: Add missing Set/Way CMO encodings > arm64: Add missing VA CMO encodings > arm64: Add missing ERXMISCx_EL1 encodings > arm64: Add missing DC ZVA/GVA/GZVA encodings > arm64: Add TLBI operation encodings > arm64: Add AT operation encodings > KVM: arm64: Add missing HCR_EL2 trap bits > KVM: arm64: nv: Add trap forwarding infrastructure > KVM: arm64: nv: Add trap forwarding for HCR_EL2 > KVM: arm64: nv: Expose FEAT_EVT to nested guests > KVM: arm64: nv: Add trap forwarding for MDCR_EL2 > KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 > KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers > KVM: arm64: nv: Handle virtual EL2 registers in > vcpu_read/write_sys_reg() > KVM: arm64: nv: Handle SPSR_EL2 specially > KVM: arm64: nv: Handle HCR_EL2.E2H specially > KVM: arm64: nv: Save/Restore vEL2 sysregs > KVM: arm64: nv: Support multiple nested Stage-2 mmu structures > KVM: arm64: nv: Handle shadow stage 2 page faults > KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's > KVM: arm64: nv: Set a handler for the system instruction traps > KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 > KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's > KVM: arm64: nv: Hide RAS from nested guests > KVM: arm64: nv: Add handling of EL2-specific timer registers > KVM: arm64: nv: Load timer before the GIC > KVM: arm64: nv: Don't load the GICv4 context on entering a nested > guest > KVM: arm64: nv: Implement maintenance interrupt forwarding > KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt > delivery > KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT > KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation > KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like > information > KVM: arm64: nv: Tag shadow S2 entries with nested level > KVM: arm64: nv: Add include containing the VNCR_EL2 offsets > KVM: arm64: nv: Map VNCR-capable registers to a separate page > KVM: arm64: nv: Move nested vgic state into the sysreg file > KVM: arm64: Add FEAT_NV2 cpu feature > KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup > KVM: arm64: nv: Publish emulated timer interrupt state in the > in-memory state > KVM: arm64: nv: Allocate VNCR page when required > KVM: arm64: nv: Enable ARMv8.4-NV support > KVM: arm64: nv: Fast-track 'InHost' exception returns > KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests > KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers > KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on > > .../virt/kvm/devices/arm-vgic-v3.rst | 12 +- > arch/arm64/include/asm/esr.h | 1 + > arch/arm64/include/asm/kvm_arm.h | 14 + > arch/arm64/include/asm/kvm_asm.h | 4 + > arch/arm64/include/asm/kvm_emulate.h | 93 +- > arch/arm64/include/asm/kvm_host.h | 181 +++- > arch/arm64/include/asm/kvm_hyp.h | 2 + > arch/arm64/include/asm/kvm_mmu.h | 20 +- > arch/arm64/include/asm/kvm_nested.h | 133 +++ > arch/arm64/include/asm/stage2_pgtable.h | 4 +- > arch/arm64/include/asm/sysreg.h | 196 ++++ > arch/arm64/include/asm/vncr_mapping.h | 74 ++ > arch/arm64/include/uapi/asm/kvm.h | 1 + > arch/arm64/kernel/cpufeature.c | 11 + > arch/arm64/kvm/Makefile | 4 +- > arch/arm64/kvm/arch_timer.c | 98 +- > arch/arm64/kvm/arm.c | 33 +- > arch/arm64/kvm/at.c | 219 ++++ > arch/arm64/kvm/emulate-nested.c | 934 ++++++++++++++++- > arch/arm64/kvm/handle_exit.c | 29 +- > arch/arm64/kvm/hyp/include/hyp/switch.h | 8 +- > arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 5 +- > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 8 +- > arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +- > arch/arm64/kvm/hyp/nvhe/switch.c | 2 +- > arch/arm64/kvm/hyp/nvhe/sysreg-sr.c | 2 +- > arch/arm64/kvm/hyp/pgtable.c | 2 +- > arch/arm64/kvm/hyp/vgic-v3-sr.c | 6 +- > arch/arm64/kvm/hyp/vhe/switch.c | 206 +++- > arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 124 ++- > arch/arm64/kvm/hyp/vhe/tlb.c | 83 ++ > arch/arm64/kvm/mmu.c | 255 ++++- > arch/arm64/kvm/nested.c | 799 ++++++++++++++- > arch/arm64/kvm/pkvm.c | 2 +- > arch/arm64/kvm/reset.c | 7 + > arch/arm64/kvm/sys_regs.c | 958 +++++++++++++++++- > arch/arm64/kvm/trace_arm.h | 19 + > arch/arm64/kvm/vgic/vgic-init.c | 33 + > arch/arm64/kvm/vgic/vgic-kvm-device.c | 32 +- > arch/arm64/kvm/vgic/vgic-v3-nested.c | 248 +++++ > arch/arm64/kvm/vgic/vgic-v3.c | 43 +- > arch/arm64/kvm/vgic/vgic.c | 29 + > arch/arm64/kvm/vgic/vgic.h | 10 + > arch/arm64/tools/cpucaps | 1 + > include/clocksource/arm_arch_timer.h | 4 + > include/kvm/arm_arch_timer.h | 1 + > include/kvm/arm_vgic.h | 17 + > include/uapi/linux/kvm.h | 1 + > tools/arch/arm/include/uapi/asm/kvm.h | 1 + > 49 files changed, 4790 insertions(+), 183 deletions(-) > create mode 100644 arch/arm64/include/asm/vncr_mapping.h > create mode 100644 arch/arm64/kvm/at.c > create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c >
Hey Eric, On Mon, 05 Jun 2023 12:28:12 +0100, Eric Auger <eauger@redhat.com> wrote: > > Hi Marc, > > On 5/15/23 19:30, Marc Zyngier wrote: > > This is the 4th drop of NV support on arm64 for this year. > > > > For the previous episodes, see [1]. > > > > What's changed: > > > > - New framework to track system register traps that are reinjected in > > guest EL2. It is expected to replace the discrete handling we have > > enjoyed so far, which didn't scale at all. This has already fixed a > > number of bugs that were hidden (a bunch of traps were never > > forwarded...). Still a work in progress, but this is going in the > > right direction. > > > > - Allow the L1 hypervisor to have a S2 that has an input larger than > > the L0 IPA space. This fixes a number of subtle issues, depending on > > how the initial guest was created. > > > > - Consequently, the patch series has gone longer again. Boo. But > > hopefully some of it is easier to review... > > > > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org > > > > Andre Przywara (1): > > KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ > > I guess you have executed kselftests on L1 guests. Have all the tests > passed there? On my end it stalls in the KVM_RUN. No, I hardly run any kselftest, because they are just not designed to run at EL2 at all. There's some work to be done there, but I just don't have the bandwidth for that (hint, wink...) > > for instance > tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c fails in > test_guest_raz(vcpu) on the KVM_RUN. Even with a basic > > static void guest_main(void) > { > GUEST_DONE(); > } My guess is that the test harness expects things to run at EL1. Depending on the value you get for HCR_EL2, you could get all sort of odd behaviours. Also, the harness configures EL1 only, which is unlikely to work at EL2. My conclusion is that "processor.c" needs to be taught about EL2, at the very least. > > I get > aarch32_id_regs-768 [002] ..... 410.544665: kvm_exit: IRQ: > HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 > aarch32_id_regs-768 [002] d.... 410.544666: kvm_entry: PC: > 0x0000000000401ec4 > aarch32_id_regs-768 [002] ..... 410.544675: kvm_exit: IRQ: > HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 > aarch32_id_regs-768 [002] d.... 410.544676: kvm_entry: PC: > 0x0000000000401ec4 > aarch32_id_regs-768 [002] ..... 410.544685: kvm_exit: IRQ: > HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 > > looping forever instead of > > aarch32_id_regs-1085576 [079] d..1. 1401295.068739: kvm_entry: PC: > 0x0000000000401ec4 > aarch32_id_regs-1085576 [079] ...1. 1401295.068745: kvm_exit: TRAP: > HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 > aarch32_id_regs-1085576 [079] d..1. 1401295.068790: kvm_entry: PC: > 0x0000000000401ec4 > aarch32_id_regs-1085576 [079] ...1. 1401295.068792: kvm_exit: TRAP: > HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 > aarch32_id_regs-1085576 [079] d..1. 1401295.068794: kvm_entry: PC: > 0x0000000000401ec4 > aarch32_id_regs-1085576 [079] ...1. 1401295.068795: kvm_exit: TRAP: > HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 > aarch32_id_regs-1085576 [079] d..1. 1401295.068797: kvm_entry: PC: > 0x0000000000401ec4 > ../.. > > Any idea or any known restriction wrt kselftests? See above. I'd love someone to actually start looking into it and devise a testing harness that would run both at EL{0,1,2} *at the same time* so that we can start exercising some of the trap behaviours that the architecture mandates. Also, Alexandru had a some KUT tests a few years ago, but I don't know what happened of them. Thanks, M.
Hi Marc, On 6/6/23 09:30, Marc Zyngier wrote: > Hey Eric, > > On Mon, 05 Jun 2023 12:28:12 +0100, > Eric Auger <eauger@redhat.com> wrote: >> >> Hi Marc, >> >> On 5/15/23 19:30, Marc Zyngier wrote: >>> This is the 4th drop of NV support on arm64 for this year. >>> >>> For the previous episodes, see [1]. >>> >>> What's changed: >>> >>> - New framework to track system register traps that are reinjected in >>> guest EL2. It is expected to replace the discrete handling we have >>> enjoyed so far, which didn't scale at all. This has already fixed a >>> number of bugs that were hidden (a bunch of traps were never >>> forwarded...). Still a work in progress, but this is going in the >>> right direction. >>> >>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>> the L0 IPA space. This fixes a number of subtle issues, depending on >>> how the initial guest was created. >>> >>> - Consequently, the patch series has gone longer again. Boo. But >>> hopefully some of it is easier to review... >>> >>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >>> >>> Andre Przywara (1): >>> KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ >> >> I guess you have executed kselftests on L1 guests. Have all the tests >> passed there? On my end it stalls in the KVM_RUN. > > No, I hardly run any kselftest, because they are just not designed to > run at EL2 at all. There's some work to be done there, but I just > don't have the bandwidth for that (hint, wink...) oh OK, I missed that point. If nobody is working on this I can start looking at it. Would be interesting to run them on nested guest too. > >> >> for instance >> tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c fails in >> test_guest_raz(vcpu) on the KVM_RUN. Even with a basic >> >> static void guest_main(void) >> { >> GUEST_DONE(); >> } > > My guess is that the test harness expects things to run at EL1. > Depending on the value you get for HCR_EL2, you could get all sort of > odd behaviours. Also, the harness configures EL1 only, which is > unlikely to work at EL2. My conclusion is that "processor.c" needs to > be taught about EL2, at the very least. > >> >> I get >> aarch32_id_regs-768 [002] ..... 410.544665: kvm_exit: IRQ: >> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 >> aarch32_id_regs-768 [002] d.... 410.544666: kvm_entry: PC: >> 0x0000000000401ec4 >> aarch32_id_regs-768 [002] ..... 410.544675: kvm_exit: IRQ: >> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 >> aarch32_id_regs-768 [002] d.... 410.544676: kvm_entry: PC: >> 0x0000000000401ec4 >> aarch32_id_regs-768 [002] ..... 410.544685: kvm_exit: IRQ: >> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4 >> >> looping forever instead of >> >> aarch32_id_regs-1085576 [079] d..1. 1401295.068739: kvm_entry: PC: >> 0x0000000000401ec4 >> aarch32_id_regs-1085576 [079] ...1. 1401295.068745: kvm_exit: TRAP: >> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 >> aarch32_id_regs-1085576 [079] d..1. 1401295.068790: kvm_entry: PC: >> 0x0000000000401ec4 >> aarch32_id_regs-1085576 [079] ...1. 1401295.068792: kvm_exit: TRAP: >> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 >> aarch32_id_regs-1085576 [079] d..1. 1401295.068794: kvm_entry: PC: >> 0x0000000000401ec4 >> aarch32_id_regs-1085576 [079] ...1. 1401295.068795: kvm_exit: TRAP: >> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4 >> aarch32_id_regs-1085576 [079] d..1. 1401295.068797: kvm_entry: PC: >> 0x0000000000401ec4 >> ../.. >> >> Any idea or any known restriction wrt kselftests? > > See above. I'd love someone to actually start looking into it and > devise a testing harness that would run both at EL{0,1,2} *at the same > time* so that we can start exercising some of the trap behaviours that > the architecture mandates. > > Also, Alexandru had a some KUT tests a few years ago, but I don't know > what happened of them. Yeah I remember that one too. Alexandru, do you have plans to revive it? That would be also interesting to run kuts on nested, giving a chance to have incremental and more unitary testing. Thanks! Eric > > Thanks, > > M. >
Hi Marc, On 5/17/23 16:12, Marc Zyngier wrote: > On Wed, 17 May 2023 09:59:45 +0100, > Eric Auger <eauger@redhat.com> wrote: >> >> Hi Marc, >>Hi Marc, >> On 5/16/23 22:28, Marc Zyngier wrote: >>> On Tue, 16 May 2023 17:53:14 +0100, >>> Eric Auger <eauger@redhat.com> wrote: >>>> >>>> Hi Marc, >>>> >>>> On 5/15/23 19:30, Marc Zyngier wrote: >>>>> This is the 4th drop of NV support on arm64 for this year. >>>>> >>>>> For the previous episodes, see [1]. >>>>> >>>>> What's changed: >>>>> >>>>> - New framework to track system register traps that are reinjected in >>>>> guest EL2. It is expected to replace the discrete handling we have >>>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>>> number of bugs that were hidden (a bunch of traps were never >>>>> forwarded...). Still a work in progress, but this is going in the >>>>> right direction. >>>>> >>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>>> the L0 IPA space. This fixes a number of subtle issues, depending on >>>>> how the initial guest was created. >>>>> >>>>> - Consequently, the patch series has gone longer again. Boo. But >>>>> hopefully some of it is easier to review... >>>>> >>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >>>> >>>> I have started testing this and when booting my fedora guest I get >>>> >>>> [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: >>>> 23f425fd0 [80000209] >>>> [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, >>>> >>>> as soon as the host has kvm-arm.mode=nested >>>> >>>> This seems to be triggered very early by EDK2 >>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c). >>>> >>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea? >>> >>> So here's my current analysis: >>> >>> I assume you are running EDK2 as the L1 guest in a nested >>> configuration. I also assume that you are not running on an Apple >>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is >>> in nVHE mode. >>> >>> Finally, I'm going to assume that your implementation has FEAT_ECV and >>> FEAT_NV2, because I can't see how it could fail otherwise. >> all the above is correct. >>> >>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so >>> that we can trap the EL0 virtual timer and faithfully emulate it (it >>> is otherwise written to memory, which isn't very helpful). >> >> indeed >>> >>> As it turns out, we don't handle these traps. I didn't spot it because >>> my test machines are all Apple boxes that don't have a nVHE mode, so >>> nothing on the nVHE path is getting *ANY* coverage. Hint: having >>> access to such a machine would help (shipping address on request!). >>> Otherwise, I'll eventually kill the nVHE support altogether. >>> >>> I have written the following patch, which compiles, but that I cannot >>> test with my current setup. Could you please give it a go? >> >> with the patch below, my guest boots nicely. You did it great on the 1st >> shot!!! So this fixes my issue. I will continue testing the v10. > > Thanks a lot for reporting the issue and testing my hacks. I'll > eventually fold it into the rest of the series. > > By the way, what are you using as your VMM? I'd really like to > reproduce your setup. Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC) and fedora L1 guest. Thanks to your fix, this boots fine. But at the moment it does not reboot and hangs in edk2 I think. Unfortunately this time I have no trace on host :-( While looking at your series I will add some traces. Eric > > Cheers, > > M. >
On Tue, 06 Jun 2023 10:29:36 +0100, Eric Auger <eauger@redhat.com> wrote: > > Hi Marc, > On 6/6/23 09:30, Marc Zyngier wrote: > > Hey Eric, > > > > On Mon, 05 Jun 2023 12:28:12 +0100, > > Eric Auger <eauger@redhat.com> wrote: > >> > >> Hi Marc, > >> > >> On 5/15/23 19:30, Marc Zyngier wrote: > >>> This is the 4th drop of NV support on arm64 for this year. > >>> > >>> For the previous episodes, see [1]. > >>> > >>> What's changed: > >>> > >>> - New framework to track system register traps that are reinjected in > >>> guest EL2. It is expected to replace the discrete handling we have > >>> enjoyed so far, which didn't scale at all. This has already fixed a > >>> number of bugs that were hidden (a bunch of traps were never > >>> forwarded...). Still a work in progress, but this is going in the > >>> right direction. > >>> > >>> - Allow the L1 hypervisor to have a S2 that has an input larger than > >>> the L0 IPA space. This fixes a number of subtle issues, depending on > >>> how the initial guest was created. > >>> > >>> - Consequently, the patch series has gone longer again. Boo. But > >>> hopefully some of it is easier to review... > >>> > >>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org > >>> > >>> Andre Przywara (1): > >>> KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ > >> > >> I guess you have executed kselftests on L1 guests. Have all the tests > >> passed there? On my end it stalls in the KVM_RUN. > > > > No, I hardly run any kselftest, because they are just not designed to > > run at EL2 at all. There's some work to be done there, but I just > > don't have the bandwidth for that (hint, wink...) > > oh OK, I missed that point. If nobody is working on this I can start > looking at it. Would be interesting to run them on nested guest too. If you want to pick this up, it would be extremely helpful. And no, nobody is really looking into it at the moment, so it's all yours! Thanks, M.
On Tue, 06 Jun 2023 10:33:27 +0100, Eric Auger <eauger@redhat.com> wrote: > > > By the way, what are you using as your VMM? I'd really like to > > reproduce your setup. > Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC) > and fedora L1 guest. OK, so that's *very* different for what I'm using, which is good! Do you have a QEMU branch somewhere? > Thanks to your fix, this boots fine. But at the moment it does not > reboot and hangs in edk2 I think. Unfortunately this time I have no > trace on host :-( While looking at your series I will add some traces. I've been able to run EDK2 compiled for kvmtool with only a couple of change (such as using SMCs instead of HVCs, and disabling the broken DT-to-ACPI convertion). However, reboot isn't something I've played with, as kvmtool doesn't even try to reboot the guest (reboot is handled as power-off). I'm pretty sure this is related to tearing down the shadow S2 MMU contexts when QEMU reinit the vcpus, and we may fail to clean things up. I'll try and have a look once I'm back from holiday (and we can have a look at KVM Forum). Thanks, M.
Hello Eric, Marc, > On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote: > > Hi Marc, > > On 5/17/23 16:12, Marc Zyngier wrote: >> On Wed, 17 May 2023 09:59:45 +0100, >> Eric Auger <eauger@redhat.com> wrote: >>> >>> Hi Marc, >>> Hi Marc, >>> On 5/16/23 22:28, Marc Zyngier wrote: >>>> On Tue, 16 May 2023 17:53:14 +0100, >>>> Eric Auger <eauger@redhat.com> wrote: >>>>> >>>>> Hi Marc, >>>>> >>>>> On 5/15/23 19:30, Marc Zyngier wrote: >>>>>> This is the 4th drop of NV support on arm64 for this year. >>>>>> >>>>>> For the previous episodes, see [1]. >>>>>> >>>>>> What's changed: >>>>>> >>>>>> - New framework to track system register traps that are reinjected in >>>>>> guest EL2. It is expected to replace the discrete handling we have >>>>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>>>> number of bugs that were hidden (a bunch of traps were never >>>>>> forwarded...). Still a work in progress, but this is going in the >>>>>> right direction. >>>>>> >>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>>>> the L0 IPA space. This fixes a number of subtle issues, depending on >>>>>> how the initial guest was created. >>>>>> >>>>>> - Consequently, the patch series has gone longer again. Boo. But >>>>>> hopefully some of it is easier to review... >>>>>> >>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >>>>> >>>>> I have started testing this and when booting my fedora guest I get >>>>> >>>>> [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: >>>>> 23f425fd0 [80000209] >>>>> [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, >>>>> >>>>> as soon as the host has kvm-arm.mode=nested >>>>> >>>>> This seems to be triggered very early by EDK2 >>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c). >>>>> >>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea? >>>> >>>> So here's my current analysis: >>>> >>>> I assume you are running EDK2 as the L1 guest in a nested >>>> configuration. I also assume that you are not running on an Apple >>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is >>>> in nVHE mode. >>>> >>>> Finally, I'm going to assume that your implementation has FEAT_ECV and >>>> FEAT_NV2, because I can't see how it could fail otherwise. >>> all the above is correct. >>>> >>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so >>>> that we can trap the EL0 virtual timer and faithfully emulate it (it >>>> is otherwise written to memory, which isn't very helpful). >>> >>> indeed >>>> >>>> As it turns out, we don't handle these traps. I didn't spot it because >>>> my test machines are all Apple boxes that don't have a nVHE mode, so >>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having >>>> access to such a machine would help (shipping address on request!). >>>> Otherwise, I'll eventually kill the nVHE support altogether. >>>> >>>> I have written the following patch, which compiles, but that I cannot >>>> test with my current setup. Could you please give it a go? >>> >>> with the patch below, my guest boots nicely. You did it great on the 1st >>> shot!!! So this fixes my issue. I will continue testing the v10. >> >> Thanks a lot for reporting the issue and testing my hacks. I'll >> eventually fold it into the rest of the series. >> >> By the way, what are you using as your VMM? I'd really like to >> reproduce your setup. > Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC) > and fedora L1 guest. > Following this subject, I’ve forward ported Alexandru’s KUT patches ( and I encourage others to do it also =) ) which expose an EL2 test that does three checks: - whether VHE is supported and enabled - disable VHE - re-enable VHE I’m running qemu with virtualization=on as well to run this test and it is passing although problems seem to happen when running with virtualization=off, which I’m still looking into it. Thanks Miguel > Thanks to your fix, this boots fine. But at the moment it does not > reboot and hangs in edk2 I think. Unfortunately this time I have no > trace on host :-( While looking at your series I will add some traces. > > Eric >> >> Cheers, >> >> M.
Hi Marc, On 6/6/23 18:22, Marc Zyngier wrote: > On Tue, 06 Jun 2023 10:29:36 +0100, > Eric Auger <eauger@redhat.com> wrote: >> >> Hi Marc, >> On 6/6/23 09:30, Marc Zyngier wrote: >>> Hey Eric, >>> >>> On Mon, 05 Jun 2023 12:28:12 +0100, >>> Eric Auger <eauger@redhat.com> wrote: >>>> >>>> Hi Marc, >>>> >>>> On 5/15/23 19:30, Marc Zyngier wrote: >>>>> This is the 4th drop of NV support on arm64 for this year. >>>>> >>>>> For the previous episodes, see [1]. >>>>> >>>>> What's changed: >>>>> >>>>> - New framework to track system register traps that are reinjected in >>>>> guest EL2. It is expected to replace the discrete handling we have >>>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>>> number of bugs that were hidden (a bunch of traps were never >>>>> forwarded...). Still a work in progress, but this is going in the >>>>> right direction. >>>>> >>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>>> the L0 IPA space. This fixes a number of subtle issues, depending on >>>>> how the initial guest was created. >>>>> >>>>> - Consequently, the patch series has gone longer again. Boo. But >>>>> hopefully some of it is easier to review... >>>>> >>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >>>>> >>>>> Andre Przywara (1): >>>>> KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ >>>> >>>> I guess you have executed kselftests on L1 guests. Have all the tests >>>> passed there? On my end it stalls in the KVM_RUN. >>> >>> No, I hardly run any kselftest, because they are just not designed to >>> run at EL2 at all. There's some work to be done there, but I just >>> don't have the bandwidth for that (hint, wink...) >> >> oh OK, I missed that point. If nobody is working on this I can start >> looking at it. Would be interesting to run them on nested guest too. > > If you want to pick this up, it would be extremely helpful. And no, > nobody is really looking into it at the moment, so it's all yours! OK I will study that then :-) Eric > > Thanks, > > M. >
Hi Marc, On 6/6/23 18:30, Marc Zyngier wrote: > On Tue, 06 Jun 2023 10:33:27 +0100, > Eric Auger <eauger@redhat.com> wrote: >> >>> By the way, what are you using as your VMM? I'd really like to >>> reproduce your setup. >> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC) >> and fedora L1 guest. > > OK, so that's *very* different for what I'm using, which is good! Do > you have a QEMU branch somewhere? here it is: https://github.com/eauger/qemu/tree/nv_rfc_rebase > >> Thanks to your fix, this boots fine. But at the moment it does not >> reboot and hangs in edk2 I think. Unfortunately this time I have no >> trace on host :-( While looking at your series I will add some traces. > > I've been able to run EDK2 compiled for kvmtool with only a couple of > change (such as using SMCs instead of HVCs, and disabling the broken > DT-to-ACPI convertion). However, reboot isn't something I've played > with, as kvmtool doesn't even try to reboot the guest (reboot is > handled as power-off). > > I'm pretty sure this is related to tearing down the shadow S2 MMU > contexts when QEMU reinit the vcpus, and we may fail to clean things > up. > > I'll try and have a look once I'm back from holiday (and we can have a > look at KVM Forum). I won't be there this year. But enjoy the event and enjoy your vacation! Thanks Eric > > Thanks, > > M. >
Hi Miguel, On 6/6/23 19:52, Miguel Luis wrote: > Hello Eric, Marc, > >> On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote: >> >> Hi Marc, >> >> On 5/17/23 16:12, Marc Zyngier wrote: >>> On Wed, 17 May 2023 09:59:45 +0100, >>> Eric Auger <eauger@redhat.com> wrote: >>>> >>>> Hi Marc, >>>> Hi Marc, >>>> On 5/16/23 22:28, Marc Zyngier wrote: >>>>> On Tue, 16 May 2023 17:53:14 +0100, >>>>> Eric Auger <eauger@redhat.com> wrote: >>>>>> >>>>>> Hi Marc, >>>>>> >>>>>> On 5/15/23 19:30, Marc Zyngier wrote: >>>>>>> This is the 4th drop of NV support on arm64 for this year. >>>>>>> >>>>>>> For the previous episodes, see [1]. >>>>>>> >>>>>>> What's changed: >>>>>>> >>>>>>> - New framework to track system register traps that are reinjected in >>>>>>> guest EL2. It is expected to replace the discrete handling we have >>>>>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>>>>> number of bugs that were hidden (a bunch of traps were never >>>>>>> forwarded...). Still a work in progress, but this is going in the >>>>>>> right direction. >>>>>>> >>>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>>>>> the L0 IPA space. This fixes a number of subtle issues, depending on >>>>>>> how the initial guest was created. >>>>>>> >>>>>>> - Consequently, the patch series has gone longer again. Boo. But >>>>>>> hopefully some of it is easier to review... >>>>>>> >>>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >>>>>> >>>>>> I have started testing this and when booting my fedora guest I get >>>>>> >>>>>> [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: >>>>>> 23f425fd0 [80000209] >>>>>> [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, >>>>>> >>>>>> as soon as the host has kvm-arm.mode=nested >>>>>> >>>>>> This seems to be triggered very early by EDK2 >>>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c). >>>>>> >>>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea? >>>>> >>>>> So here's my current analysis: >>>>> >>>>> I assume you are running EDK2 as the L1 guest in a nested >>>>> configuration. I also assume that you are not running on an Apple >>>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is >>>>> in nVHE mode. >>>>> >>>>> Finally, I'm going to assume that your implementation has FEAT_ECV and >>>>> FEAT_NV2, because I can't see how it could fail otherwise. >>>> all the above is correct. >>>>> >>>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so >>>>> that we can trap the EL0 virtual timer and faithfully emulate it (it >>>>> is otherwise written to memory, which isn't very helpful). >>>> >>>> indeed >>>>> >>>>> As it turns out, we don't handle these traps. I didn't spot it because >>>>> my test machines are all Apple boxes that don't have a nVHE mode, so >>>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having >>>>> access to such a machine would help (shipping address on request!). >>>>> Otherwise, I'll eventually kill the nVHE support altogether. >>>>> >>>>> I have written the following patch, which compiles, but that I cannot >>>>> test with my current setup. Could you please give it a go? >>>> >>>> with the patch below, my guest boots nicely. You did it great on the 1st >>>> shot!!! So this fixes my issue. I will continue testing the v10. >>> >>> Thanks a lot for reporting the issue and testing my hacks. I'll >>> eventually fold it into the rest of the series. >>> >>> By the way, what are you using as your VMM? I'd really like to >>> reproduce your setup. >> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC) >> and fedora L1 guest. >> > > Following this subject, I’ve forward ported Alexandru’s KUT patches > ( and I encourage others to do it also =) ) which expose an EL2 test that Do you have a branch available with Alexandru's rebased kut series? Thanks Eric > does three checks: > > - whether VHE is supported and enabled > - disable VHE > - re-enable VHE > > I’m running qemu with virtualization=on as well to run this test and it is passing although > problems seem to happen when running with virtualization=off, which I’m still looking into it. > > Thanks > Miguel > >> Thanks to your fix, this boots fine. But at the moment it does not >> reboot and hangs in edk2 I think. Unfortunately this time I have no >> trace on host :-( While looking at your series I will add some traces. >> >> Eric >>> >>> Cheers, >>> >>> M. > >
Hi Eric, > On 7 Jun 2023, at 16:40, Eric Auger <eauger@redhat.com> wrote: > > Hi Miguel, > > On 6/6/23 19:52, Miguel Luis wrote: >> Hello Eric, Marc, >> >>> On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote: >>> >>> Hi Marc, >>> >>> On 5/17/23 16:12, Marc Zyngier wrote: >>>> On Wed, 17 May 2023 09:59:45 +0100, >>>> Eric Auger <eauger@redhat.com> wrote: >>>>> >>>>> Hi Marc, >>>>> Hi Marc, >>>>> On 5/16/23 22:28, Marc Zyngier wrote: >>>>>> On Tue, 16 May 2023 17:53:14 +0100, >>>>>> Eric Auger <eauger@redhat.com> wrote: >>>>>>> >>>>>>> Hi Marc, >>>>>>> >>>>>>> On 5/15/23 19:30, Marc Zyngier wrote: >>>>>>>> This is the 4th drop of NV support on arm64 for this year. >>>>>>>> >>>>>>>> For the previous episodes, see [1]. >>>>>>>> >>>>>>>> What's changed: >>>>>>>> >>>>>>>> - New framework to track system register traps that are reinjected in >>>>>>>> guest EL2. It is expected to replace the discrete handling we have >>>>>>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>>>>>> number of bugs that were hidden (a bunch of traps were never >>>>>>>> forwarded...). Still a work in progress, but this is going in the >>>>>>>> right direction. >>>>>>>> >>>>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>>>>>> the L0 IPA space. This fixes a number of subtle issues, depending on >>>>>>>> how the initial guest was created. >>>>>>>> >>>>>>>> - Consequently, the patch series has gone longer again. Boo. But >>>>>>>> hopefully some of it is easier to review... >>>>>>>> >>>>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >>>>>>> >>>>>>> I have started testing this and when booting my fedora guest I get >>>>>>> >>>>>>> [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: >>>>>>> 23f425fd0 [80000209] >>>>>>> [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, >>>>>>> >>>>>>> as soon as the host has kvm-arm.mode=nested >>>>>>> >>>>>>> This seems to be triggered very early by EDK2 >>>>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c). >>>>>>> >>>>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea? >>>>>> >>>>>> So here's my current analysis: >>>>>> >>>>>> I assume you are running EDK2 as the L1 guest in a nested >>>>>> configuration. I also assume that you are not running on an Apple >>>>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is >>>>>> in nVHE mode. >>>>>> >>>>>> Finally, I'm going to assume that your implementation has FEAT_ECV and >>>>>> FEAT_NV2, because I can't see how it could fail otherwise. >>>>> all the above is correct. >>>>>> >>>>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so >>>>>> that we can trap the EL0 virtual timer and faithfully emulate it (it >>>>>> is otherwise written to memory, which isn't very helpful). >>>>> >>>>> indeed >>>>>> >>>>>> As it turns out, we don't handle these traps. I didn't spot it because >>>>>> my test machines are all Apple boxes that don't have a nVHE mode, so >>>>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having >>>>>> access to such a machine would help (shipping address on request!). >>>>>> Otherwise, I'll eventually kill the nVHE support altogether. >>>>>> >>>>>> I have written the following patch, which compiles, but that I cannot >>>>>> test with my current setup. Could you please give it a go? >>>>> >>>>> with the patch below, my guest boots nicely. You did it great on the 1st >>>>> shot!!! So this fixes my issue. I will continue testing the v10. >>>> >>>> Thanks a lot for reporting the issue and testing my hacks. I'll >>>> eventually fold it into the rest of the series. >>>> >>>> By the way, what are you using as your VMM? I'd really like to >>>> reproduce your setup. >>> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC) >>> and fedora L1 guest. >>> >> >> Following this subject, I’ve forward ported Alexandru’s KUT patches >> ( and I encourage others to do it also =) ) which expose an EL2 test that > > Do you have a branch available with Alexandru's rebased kut series? Now I do :) https://github.com/mluis/kvm-unit-tests/tree/nv-WIP Thanks, Miguel > > Thanks > > Eric >> does three checks: >> >> - whether VHE is supported and enabled >> - disable VHE >> - re-enable VHE >> >> I’m running qemu with virtualization=on as well to run this test and it is passing although >> problems seem to happen when running with virtualization=off, which I’m still looking into it. >> >> Thanks >> Miguel >> >>> Thanks to your fix, this boots fine. But at the moment it does not >>> reboot and hangs in edk2 I think. Unfortunately this time I have no >>> trace on host :-( While looking at your series I will add some traces. >>> >>> Eric >>>> >>>> Cheers, >>>> >>>> M.
Hi Miguel, On 6/10/23 10:25, Miguel Luis wrote: > Hi Eric, > >> On 7 Jun 2023, at 16:40, Eric Auger <eauger@redhat.com> wrote: >> >> Hi Miguel, >> >> On 6/6/23 19:52, Miguel Luis wrote: >>> Hello Eric, Marc, >>> >>>> On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote: >>>> >>>> Hi Marc, >>>> >>>> On 5/17/23 16:12, Marc Zyngier wrote: >>>>> On Wed, 17 May 2023 09:59:45 +0100, >>>>> Eric Auger <eauger@redhat.com> wrote: >>>>>> >>>>>> Hi Marc, >>>>>> Hi Marc, >>>>>> On 5/16/23 22:28, Marc Zyngier wrote: >>>>>>> On Tue, 16 May 2023 17:53:14 +0100, >>>>>>> Eric Auger <eauger@redhat.com> wrote: >>>>>>>> >>>>>>>> Hi Marc, >>>>>>>> >>>>>>>> On 5/15/23 19:30, Marc Zyngier wrote: >>>>>>>>> This is the 4th drop of NV support on arm64 for this year. >>>>>>>>> >>>>>>>>> For the previous episodes, see [1]. >>>>>>>>> >>>>>>>>> What's changed: >>>>>>>>> >>>>>>>>> - New framework to track system register traps that are reinjected in >>>>>>>>> guest EL2. It is expected to replace the discrete handling we have >>>>>>>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>>>>>>> number of bugs that were hidden (a bunch of traps were never >>>>>>>>> forwarded...). Still a work in progress, but this is going in the >>>>>>>>> right direction. >>>>>>>>> >>>>>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>>>>>>> the L0 IPA space. This fixes a number of subtle issues, depending on >>>>>>>>> how the initial guest was created. >>>>>>>>> >>>>>>>>> - Consequently, the patch series has gone longer again. Boo. But >>>>>>>>> hopefully some of it is easier to review... >>>>>>>>> >>>>>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org >>>>>>>> >>>>>>>> I have started testing this and when booting my fedora guest I get >>>>>>>> >>>>>>>> [ 151.796544] kvm [7617]: Unsupported guest sys_reg access at: >>>>>>>> 23f425fd0 [80000209] >>>>>>>> [ 151.796544] { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write }, >>>>>>>> >>>>>>>> as soon as the host has kvm-arm.mode=nested >>>>>>>> >>>>>>>> This seems to be triggered very early by EDK2 >>>>>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c). >>>>>>>> >>>>>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea? >>>>>>> >>>>>>> So here's my current analysis: >>>>>>> >>>>>>> I assume you are running EDK2 as the L1 guest in a nested >>>>>>> configuration. I also assume that you are not running on an Apple >>>>>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is >>>>>>> in nVHE mode. >>>>>>> >>>>>>> Finally, I'm going to assume that your implementation has FEAT_ECV and >>>>>>> FEAT_NV2, because I can't see how it could fail otherwise. >>>>>> all the above is correct. >>>>>>> >>>>>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so >>>>>>> that we can trap the EL0 virtual timer and faithfully emulate it (it >>>>>>> is otherwise written to memory, which isn't very helpful). >>>>>> >>>>>> indeed >>>>>>> >>>>>>> As it turns out, we don't handle these traps. I didn't spot it because >>>>>>> my test machines are all Apple boxes that don't have a nVHE mode, so >>>>>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having >>>>>>> access to such a machine would help (shipping address on request!). >>>>>>> Otherwise, I'll eventually kill the nVHE support altogether. >>>>>>> >>>>>>> I have written the following patch, which compiles, but that I cannot >>>>>>> test with my current setup. Could you please give it a go? >>>>>> >>>>>> with the patch below, my guest boots nicely. You did it great on the 1st >>>>>> shot!!! So this fixes my issue. I will continue testing the v10. >>>>> >>>>> Thanks a lot for reporting the issue and testing my hacks. I'll >>>>> eventually fold it into the rest of the series. >>>>> >>>>> By the way, what are you using as your VMM? I'd really like to >>>>> reproduce your setup. >>>> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC) >>>> and fedora L1 guest. >>>> >>> >>> Following this subject, I’ve forward ported Alexandru’s KUT patches >>> ( and I encourage others to do it also =) ) which expose an EL2 test that >> >> Do you have a branch available with Alexandru's rebased kut series? > > Now I do :) > > https://github.com/mluis/kvm-unit-tests/tree/nv-WIP Cool, thanks! Eric > > Thanks, > Miguel > >> >> Thanks >> >> Eric >>> does three checks: >>> >>> - whether VHE is supported and enabled >>> - disable VHE >>> - re-enable VHE >>> >>> I’m running qemu with virtualization=on as well to run this test and it is passing although >>> problems seem to happen when running with virtualization=off, which I’m still looking into it. >>> >>> Thanks >>> Miguel >>> >>>> Thanks to your fix, this boots fine. But at the moment it does not >>>> reboot and hangs in edk2 I think. Unfortunately this time I have no >>>> trace on host :-( While looking at your series I will add some traces. >>>> >>>> Eric >>>>> >>>>> Cheers, >>>>> >>>>> M. > >
Hi Marc, On 15-05-2023 11:00 pm, Marc Zyngier wrote: > This is the 4th drop of NV support on arm64 for this year. > > For the previous episodes, see [1]. > > What's changed: > > - New framework to track system register traps that are reinjected in > guest EL2. It is expected to replace the discrete handling we have > enjoyed so far, which didn't scale at all. This has already fixed a > number of bugs that were hidden (a bunch of traps were never > forwarded...). Still a work in progress, but this is going in the > right direction. > > - Allow the L1 hypervisor to have a S2 that has an input larger than > the L0 IPA space. This fixes a number of subtle issues, depending on > how the initial guest was created. > > - Consequently, the patch series has gone longer again. Boo. But > hopefully some of it is easier to review... > I am facing issue in booting NestedVM with V9 as well with 10 patchset. I have tried V9/V10 on Ampere platform using kvmtool and I could boot Guest-Hypervisor and then NestedVM without any issue. However when I try to boot using QEMU(not using EDK2/EFI), Guest-Hypervisor is booted with Fedora 37 using virtio disk. From Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, boot hangs very early stage of the boot. I did some debug using ftrace and it seems the Guest-Hypervisor is getting very high rate of arch-timer interrupts, due to that all CPU time is going on in serving the Guest-Hypervisor and it is never going back to NestedVM. I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] [1] https://lore.kernel.org/all/20230227163718.62003-1-miguel.luis@oracle.com/ > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org > > Andre Przywara (1): > KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ > > Christoffer Dall (5): > KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 > KVM: arm64: nv: Implement nested Stage-2 page table walk logic > KVM: arm64: nv: Unmap/flush shadow stage 2 page tables > KVM: arm64: nv: vgic: Emulate the HW bit in software > KVM: arm64: nv: Sync nested timer state with FEAT_NV2 > > Jintack Lim (7): > KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2 > KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting > KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings > KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings > KVM: arm64: nv: Configure HCR_EL2 for nested virtualization > KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2 > KVM: arm64: nv: Nested GICv3 Support > > Marc Zyngier (46): > KVM: arm64: Move VTCR_EL2 into struct s2_mmu > arm64: Add missing Set/Way CMO encodings > arm64: Add missing VA CMO encodings > arm64: Add missing ERXMISCx_EL1 encodings > arm64: Add missing DC ZVA/GVA/GZVA encodings > arm64: Add TLBI operation encodings > arm64: Add AT operation encodings > KVM: arm64: Add missing HCR_EL2 trap bits > KVM: arm64: nv: Add trap forwarding infrastructure > KVM: arm64: nv: Add trap forwarding for HCR_EL2 > KVM: arm64: nv: Expose FEAT_EVT to nested guests > KVM: arm64: nv: Add trap forwarding for MDCR_EL2 > KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 > KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers > KVM: arm64: nv: Handle virtual EL2 registers in > vcpu_read/write_sys_reg() > KVM: arm64: nv: Handle SPSR_EL2 specially > KVM: arm64: nv: Handle HCR_EL2.E2H specially > KVM: arm64: nv: Save/Restore vEL2 sysregs > KVM: arm64: nv: Support multiple nested Stage-2 mmu structures > KVM: arm64: nv: Handle shadow stage 2 page faults > KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's > KVM: arm64: nv: Set a handler for the system instruction traps > KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 > KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's > KVM: arm64: nv: Hide RAS from nested guests > KVM: arm64: nv: Add handling of EL2-specific timer registers > KVM: arm64: nv: Load timer before the GIC > KVM: arm64: nv: Don't load the GICv4 context on entering a nested > guest > KVM: arm64: nv: Implement maintenance interrupt forwarding > KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt > delivery > KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT > KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation > KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like > information > KVM: arm64: nv: Tag shadow S2 entries with nested level > KVM: arm64: nv: Add include containing the VNCR_EL2 offsets > KVM: arm64: nv: Map VNCR-capable registers to a separate page > KVM: arm64: nv: Move nested vgic state into the sysreg file > KVM: arm64: Add FEAT_NV2 cpu feature > KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup > KVM: arm64: nv: Publish emulated timer interrupt state in the > in-memory state > KVM: arm64: nv: Allocate VNCR page when required > KVM: arm64: nv: Enable ARMv8.4-NV support > KVM: arm64: nv: Fast-track 'InHost' exception returns > KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests > KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers > KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on > > .../virt/kvm/devices/arm-vgic-v3.rst | 12 +- > arch/arm64/include/asm/esr.h | 1 + > arch/arm64/include/asm/kvm_arm.h | 14 + > arch/arm64/include/asm/kvm_asm.h | 4 + > arch/arm64/include/asm/kvm_emulate.h | 93 +- > arch/arm64/include/asm/kvm_host.h | 181 +++- > arch/arm64/include/asm/kvm_hyp.h | 2 + > arch/arm64/include/asm/kvm_mmu.h | 20 +- > arch/arm64/include/asm/kvm_nested.h | 133 +++ > arch/arm64/include/asm/stage2_pgtable.h | 4 +- > arch/arm64/include/asm/sysreg.h | 196 ++++ > arch/arm64/include/asm/vncr_mapping.h | 74 ++ > arch/arm64/include/uapi/asm/kvm.h | 1 + > arch/arm64/kernel/cpufeature.c | 11 + > arch/arm64/kvm/Makefile | 4 +- > arch/arm64/kvm/arch_timer.c | 98 +- > arch/arm64/kvm/arm.c | 33 +- > arch/arm64/kvm/at.c | 219 ++++ > arch/arm64/kvm/emulate-nested.c | 934 ++++++++++++++++- > arch/arm64/kvm/handle_exit.c | 29 +- > arch/arm64/kvm/hyp/include/hyp/switch.h | 8 +- > arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 5 +- > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 8 +- > arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +- > arch/arm64/kvm/hyp/nvhe/switch.c | 2 +- > arch/arm64/kvm/hyp/nvhe/sysreg-sr.c | 2 +- > arch/arm64/kvm/hyp/pgtable.c | 2 +- > arch/arm64/kvm/hyp/vgic-v3-sr.c | 6 +- > arch/arm64/kvm/hyp/vhe/switch.c | 206 +++- > arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 124 ++- > arch/arm64/kvm/hyp/vhe/tlb.c | 83 ++ > arch/arm64/kvm/mmu.c | 255 ++++- > arch/arm64/kvm/nested.c | 799 ++++++++++++++- > arch/arm64/kvm/pkvm.c | 2 +- > arch/arm64/kvm/reset.c | 7 + > arch/arm64/kvm/sys_regs.c | 958 +++++++++++++++++- > arch/arm64/kvm/trace_arm.h | 19 + > arch/arm64/kvm/vgic/vgic-init.c | 33 + > arch/arm64/kvm/vgic/vgic-kvm-device.c | 32 +- > arch/arm64/kvm/vgic/vgic-v3-nested.c | 248 +++++ > arch/arm64/kvm/vgic/vgic-v3.c | 43 +- > arch/arm64/kvm/vgic/vgic.c | 29 + > arch/arm64/kvm/vgic/vgic.h | 10 + > arch/arm64/tools/cpucaps | 1 + > include/clocksource/arm_arch_timer.h | 4 + > include/kvm/arm_arch_timer.h | 1 + > include/kvm/arm_vgic.h | 17 + > include/uapi/linux/kvm.h | 1 + > tools/arch/arm/include/uapi/asm/kvm.h | 1 + > 49 files changed, 4790 insertions(+), 183 deletions(-) > create mode 100644 arch/arm64/include/asm/vncr_mapping.h > create mode 100644 arch/arm64/kvm/at.c > create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c > Thanks, Ganapat
Hi Ganapatrao, On Wed, 28 Jun 2023 07:45:55 +0100, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > Hi Marc, > > > On 15-05-2023 11:00 pm, Marc Zyngier wrote: > > This is the 4th drop of NV support on arm64 for this year. > > > > For the previous episodes, see [1]. > > > > What's changed: > > > > - New framework to track system register traps that are reinjected in > > guest EL2. It is expected to replace the discrete handling we have > > enjoyed so far, which didn't scale at all. This has already fixed a > > number of bugs that were hidden (a bunch of traps were never > > forwarded...). Still a work in progress, but this is going in the > > right direction. > > > > - Allow the L1 hypervisor to have a S2 that has an input larger than > > the L0 IPA space. This fixes a number of subtle issues, depending on > > how the initial guest was created. > > > > - Consequently, the patch series has gone longer again. Boo. But > > hopefully some of it is easier to review... > > > > I am facing issue in booting NestedVM with V9 as well with 10 patchset. > > I have tried V9/V10 on Ampere platform using kvmtool and I could boot > Guest-Hypervisor and then NestedVM without any issue. > However when I try to boot using QEMU(not using EDK2/EFI), > Guest-Hypervisor is booted with Fedora 37 using virtio disk. From > Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, > boot hangs very early stage of the boot. > > I did some debug using ftrace and it seems the Guest-Hypervisor is > getting very high rate of arch-timer interrupts, > due to that all CPU time is going on in serving the Guest-Hypervisor > and it is never going back to NestedVM. > > I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] So I went ahead and gave QEMU a go. On my systems, *nothing* works (I cannot even boot a L1 with 'virtualization=on" (the guest is stuck at the point where virtio gets probed and waits for its first interrupt). Worse, booting a hVHE guest results in QEMU generating an assert as it tries to inject an interrupt using the QEMU GICv3 model, something that should *NEVER* be in use with KVM. With help from Eric, I got to a point where the hVHE guest could boot as long as I kept injecting console interrupts, which is again a symptom of the vGIC not being used. So something is *majorly* wrong with the QEMU patches. I don't know what makes it possible for you to even boot the L1 - if the GIC is external, injecting an interrupt in the L2 is simply impossible. Miguel, can you please investigate this? In the meantime, I'll add some code to the kernel side to refuse the external interrupt controller configuration with NV. Hopefully that will lead to some clues about what is going on. Thanks, M.
Hi Marc, On 29-06-2023 12:33 pm, Marc Zyngier wrote: > Hi Ganapatrao, > > On Wed, 28 Jun 2023 07:45:55 +0100, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> Hi Marc, >> >> >> On 15-05-2023 11:00 pm, Marc Zyngier wrote: >>> This is the 4th drop of NV support on arm64 for this year. >>> >>> For the previous episodes, see [1]. >>> >>> What's changed: >>> >>> - New framework to track system register traps that are reinjected in >>> guest EL2. It is expected to replace the discrete handling we have >>> enjoyed so far, which didn't scale at all. This has already fixed a >>> number of bugs that were hidden (a bunch of traps were never >>> forwarded...). Still a work in progress, but this is going in the >>> right direction. >>> >>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>> the L0 IPA space. This fixes a number of subtle issues, depending on >>> how the initial guest was created. >>> >>> - Consequently, the patch series has gone longer again. Boo. But >>> hopefully some of it is easier to review... >>> >> >> I am facing issue in booting NestedVM with V9 as well with 10 patchset. >> >> I have tried V9/V10 on Ampere platform using kvmtool and I could boot >> Guest-Hypervisor and then NestedVM without any issue. >> However when I try to boot using QEMU(not using EDK2/EFI), >> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From >> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, >> boot hangs very early stage of the boot. >> >> I did some debug using ftrace and it seems the Guest-Hypervisor is >> getting very high rate of arch-timer interrupts, >> due to that all CPU time is going on in serving the Guest-Hypervisor >> and it is never going back to NestedVM. >> >> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] > > So I went ahead and gave QEMU a go. On my systems, *nothing* works (I > cannot even boot a L1 with 'virtualization=on" (the guest is stuck at > the point where virtio gets probed and waits for its first interrupt). > > Worse, booting a hVHE guest results in QEMU generating an assert as it > tries to inject an interrupt using the QEMU GICv3 model, something > that should *NEVER* be in use with KVM. > > With help from Eric, I got to a point where the hVHE guest could boot > as long as I kept injecting console interrupts, which is again a > symptom of the vGIC not being used. > > So something is *majorly* wrong with the QEMU patches. I don't know > what makes it possible for you to even boot the L1 - if the GIC is > external, injecting an interrupt in the L2 is simply impossible. > > Miguel, can you please investigate this? > > In the meantime, I'll add some code to the kernel side to refuse the > external interrupt controller configuration with NV. Hopefully that > will lead to some clues about what is going on. Continued debugging of the issue and it seems the endless ptimer interrupts on Ampere platform is due to some mess up of CVAL of ptimer, resulting in interrupt triggered always when it is enabled. I see function "timer_set_offset" called from kvm_arm_timer_set_reg in QEMU case but there is no such calls in kvmtool boot. If I comment the timer_set_offset calls in kvm_arm_timer_set_reg function, then I could boot the Guest-Hypervisor then NestedVM from GH/L1. I also observed in QEMU case, kvm_arm_timer_set_reg is called to set CNT, CVAL and CTL of both vtimer and ptimer. Not sure why QEMU is setting these registers explicitly? need to dig. > > Thanks, > > M. > Thanks, Ganapat
On 04-07-2023 06:01 pm, Ganapatrao Kulkarni wrote: > > Hi Marc, > > On 29-06-2023 12:33 pm, Marc Zyngier wrote: >> Hi Ganapatrao, >> >> On Wed, 28 Jun 2023 07:45:55 +0100, >> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>> >>> >>> Hi Marc, >>> >>> >>> On 15-05-2023 11:00 pm, Marc Zyngier wrote: >>>> This is the 4th drop of NV support on arm64 for this year. >>>> >>>> For the previous episodes, see [1]. >>>> >>>> What's changed: >>>> >>>> - New framework to track system register traps that are reinjected in >>>> guest EL2. It is expected to replace the discrete handling we have >>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>> number of bugs that were hidden (a bunch of traps were never >>>> forwarded...). Still a work in progress, but this is going in the >>>> right direction. >>>> >>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>> the L0 IPA space. This fixes a number of subtle issues, >>>> depending on >>>> how the initial guest was created. >>>> >>>> - Consequently, the patch series has gone longer again. Boo. But >>>> hopefully some of it is easier to review... >>>> >>> >>> I am facing issue in booting NestedVM with V9 as well with 10 patchset. >>> >>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot >>> Guest-Hypervisor and then NestedVM without any issue. >>> However when I try to boot using QEMU(not using EDK2/EFI), >>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From >>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, >>> boot hangs very early stage of the boot. >>> >>> I did some debug using ftrace and it seems the Guest-Hypervisor is >>> getting very high rate of arch-timer interrupts, >>> due to that all CPU time is going on in serving the Guest-Hypervisor >>> and it is never going back to NestedVM. >>> >>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] >> >> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I >> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at >> the point where virtio gets probed and waits for its first interrupt). >> >> Worse, booting a hVHE guest results in QEMU generating an assert as it >> tries to inject an interrupt using the QEMU GICv3 model, something >> that should *NEVER* be in use with KVM. >> >> With help from Eric, I got to a point where the hVHE guest could boot >> as long as I kept injecting console interrupts, which is again a >> symptom of the vGIC not being used. >> >> So something is *majorly* wrong with the QEMU patches. I don't know >> what makes it possible for you to even boot the L1 - if the GIC is >> external, injecting an interrupt in the L2 is simply impossible. >> >> Miguel, can you please investigate this? >> >> In the meantime, I'll add some code to the kernel side to refuse the >> external interrupt controller configuration with NV. Hopefully that >> will lead to some clues about what is going on. > > Continued debugging of the issue and it seems the endless ptimer > interrupts on Ampere platform is due to some mess up of CVAL of ptimer, > resulting in interrupt triggered always when it is enabled. > > I see function "timer_set_offset" called from kvm_arm_timer_set_reg in > QEMU case but there is no such calls in kvmtool boot. > > If I comment the timer_set_offset calls in kvm_arm_timer_set_reg > function, then I could boot the Guest-Hypervisor then NestedVM from GH/L1. > > I also observed in QEMU case, kvm_arm_timer_set_reg is called to set > CNT, CVAL and CTL of both vtimer and ptimer. > Not sure why QEMU is setting these registers explicitly? need to dig. > I don't see any direct ioctl calls to change any timer registers. Looks like it is happening from the emulation code(target/arm/helper.c)? >> >> Thanks, >> >> M. >> > > Thanks, > Ganapat > Thanks, Ganapat
Hi Marc, > On 29 Jun 2023, at 07:03, Marc Zyngier <maz@kernel.org> wrote: > > Hi Ganapatrao, > > On Wed, 28 Jun 2023 07:45:55 +0100, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> Hi Marc, >> >> >> On 15-05-2023 11:00 pm, Marc Zyngier wrote: >>> This is the 4th drop of NV support on arm64 for this year. >>> >>> For the previous episodes, see [1]. >>> >>> What's changed: >>> >>> - New framework to track system register traps that are reinjected in >>> guest EL2. It is expected to replace the discrete handling we have >>> enjoyed so far, which didn't scale at all. This has already fixed a >>> number of bugs that were hidden (a bunch of traps were never >>> forwarded...). Still a work in progress, but this is going in the >>> right direction. >>> >>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>> the L0 IPA space. This fixes a number of subtle issues, depending on >>> how the initial guest was created. >>> >>> - Consequently, the patch series has gone longer again. Boo. But >>> hopefully some of it is easier to review... >>> >> >> I am facing issue in booting NestedVM with V9 as well with 10 patchset. >> >> I have tried V9/V10 on Ampere platform using kvmtool and I could boot >> Guest-Hypervisor and then NestedVM without any issue. >> However when I try to boot using QEMU(not using EDK2/EFI), >> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From >> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, >> boot hangs very early stage of the boot. >> >> I did some debug using ftrace and it seems the Guest-Hypervisor is >> getting very high rate of arch-timer interrupts, >> due to that all CPU time is going on in serving the Guest-Hypervisor >> and it is never going back to NestedVM. >> >> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] > > So I went ahead and gave QEMU a go. On my systems, *nothing* works (I > cannot even boot a L1 with 'virtualization=on" (the guest is stuck at > the point where virtio gets probed and waits for its first interrupt). > > Worse, booting a hVHE guest results in QEMU generating an assert as it > tries to inject an interrupt using the QEMU GICv3 model, something > that should *NEVER* be in use with KVM. > > With help from Eric, I got to a point where the hVHE guest could boot > as long as I kept injecting console interrupts, which is again a > symptom of the vGIC not being used. > > So something is *majorly* wrong with the QEMU patches. I don't know > what makes it possible for you to even boot the L1 - if the GIC is > external, injecting an interrupt in the L2 is simply impossible. > > Miguel, can you please investigate this? Yes, I will investigate it. Sorry for the delay in replying as I took a break short after KVM forum and I’ve just started to sync up. Thanks, Miguel > > In the meantime, I'll add some code to the kernel side to refuse the > external interrupt controller configuration with NV. Hopefully that > will lead to some clues about what is going on. > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible.
On 07-07-2023 03:16 pm, Ganapatrao Kulkarni wrote: > > > On 04-07-2023 06:01 pm, Ganapatrao Kulkarni wrote: >> >> Hi Marc, >> >> On 29-06-2023 12:33 pm, Marc Zyngier wrote: >>> Hi Ganapatrao, >>> >>> On Wed, 28 Jun 2023 07:45:55 +0100, >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>> >>>> >>>> Hi Marc, >>>> >>>> >>>> On 15-05-2023 11:00 pm, Marc Zyngier wrote: >>>>> This is the 4th drop of NV support on arm64 for this year. >>>>> >>>>> For the previous episodes, see [1]. >>>>> >>>>> What's changed: >>>>> >>>>> - New framework to track system register traps that are reinjected in >>>>> guest EL2. It is expected to replace the discrete handling we have >>>>> enjoyed so far, which didn't scale at all. This has already >>>>> fixed a >>>>> number of bugs that were hidden (a bunch of traps were never >>>>> forwarded...). Still a work in progress, but this is going in the >>>>> right direction. >>>>> >>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>>> the L0 IPA space. This fixes a number of subtle issues, >>>>> depending on >>>>> how the initial guest was created. >>>>> >>>>> - Consequently, the patch series has gone longer again. Boo. But >>>>> hopefully some of it is easier to review... >>>>> >>>> >>>> I am facing issue in booting NestedVM with V9 as well with 10 patchset. >>>> >>>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot >>>> Guest-Hypervisor and then NestedVM without any issue. >>>> However when I try to boot using QEMU(not using EDK2/EFI), >>>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From >>>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, >>>> boot hangs very early stage of the boot. >>>> >>>> I did some debug using ftrace and it seems the Guest-Hypervisor is >>>> getting very high rate of arch-timer interrupts, >>>> due to that all CPU time is going on in serving the Guest-Hypervisor >>>> and it is never going back to NestedVM. >>>> >>>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] >>> >>> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I >>> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at >>> the point where virtio gets probed and waits for its first interrupt). >>> >>> Worse, booting a hVHE guest results in QEMU generating an assert as it >>> tries to inject an interrupt using the QEMU GICv3 model, something >>> that should *NEVER* be in use with KVM. >>> >>> With help from Eric, I got to a point where the hVHE guest could boot >>> as long as I kept injecting console interrupts, which is again a >>> symptom of the vGIC not being used. >>> >>> So something is *majorly* wrong with the QEMU patches. I don't know >>> what makes it possible for you to even boot the L1 - if the GIC is >>> external, injecting an interrupt in the L2 is simply impossible. >>> >>> Miguel, can you please investigate this? >>> >>> In the meantime, I'll add some code to the kernel side to refuse the >>> external interrupt controller configuration with NV. Hopefully that >>> will lead to some clues about what is going on. >> >> Continued debugging of the issue and it seems the endless ptimer >> interrupts on Ampere platform is due to some mess up of CVAL of >> ptimer, resulting in interrupt triggered always when it is enabled. >> >> I see function "timer_set_offset" called from kvm_arm_timer_set_reg in >> QEMU case but there is no such calls in kvmtool boot. >> >> If I comment the timer_set_offset calls in kvm_arm_timer_set_reg >> function, then I could boot the Guest-Hypervisor then NestedVM from >> GH/L1. >> >> I also observed in QEMU case, kvm_arm_timer_set_reg is called to set >> CNT, CVAL and CTL of both vtimer and ptimer. >> Not sure why QEMU is setting these registers explicitly? need to dig. >> > > I don't see any direct ioctl calls to change any timer registers. Looks > like it is happening from the emulation code(target/arm/helper.c)? function "write_list_to_kvmstate(target/arm/kvm.c)" is issuing ioctl to write timer registers. PTIMER_CNT and TIMER_CNT writes from this function is resulting in offsets change. > >>> >>> Thanks, >>> >>> M. >>> >> >> Thanks, >> Ganapat >> > > Thanks, > Ganapat Thanks, Ganapat
On Tue, 11 Jul 2023 12:56:48 +0100, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > > On 07-07-2023 03:16 pm, Ganapatrao Kulkarni wrote: > > > > > > On 04-07-2023 06:01 pm, Ganapatrao Kulkarni wrote: > >> > >> Hi Marc, > >> > >> On 29-06-2023 12:33 pm, Marc Zyngier wrote: > >>> Hi Ganapatrao, > >>> > >>> On Wed, 28 Jun 2023 07:45:55 +0100, > >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > >>>> > >>>> > >>>> Hi Marc, > >>>> > >>>> > >>>> On 15-05-2023 11:00 pm, Marc Zyngier wrote: > >>>>> This is the 4th drop of NV support on arm64 for this year. > >>>>> > >>>>> For the previous episodes, see [1]. > >>>>> > >>>>> What's changed: > >>>>> > >>>>> - New framework to track system register traps that are reinjected in > >>>>> guest EL2. It is expected to replace the discrete handling we have > >>>>> enjoyed so far, which didn't scale at all. This has already > >>>>> fixed a > >>>>> number of bugs that were hidden (a bunch of traps were never > >>>>> forwarded...). Still a work in progress, but this is going in the > >>>>> right direction. > >>>>> > >>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than > >>>>> the L0 IPA space. This fixes a number of subtle issues, > >>>>> depending on > >>>>> how the initial guest was created. > >>>>> > >>>>> - Consequently, the patch series has gone longer again. Boo. But > >>>>> hopefully some of it is easier to review... > >>>>> > >>>> > >>>> I am facing issue in booting NestedVM with V9 as well with 10 patchset. > >>>> > >>>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot > >>>> Guest-Hypervisor and then NestedVM without any issue. > >>>> However when I try to boot using QEMU(not using EDK2/EFI), > >>>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From > >>>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, > >>>> boot hangs very early stage of the boot. > >>>> > >>>> I did some debug using ftrace and it seems the Guest-Hypervisor is > >>>> getting very high rate of arch-timer interrupts, > >>>> due to that all CPU time is going on in serving the Guest-Hypervisor > >>>> and it is never going back to NestedVM. > >>>> > >>>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] > >>> > >>> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I > >>> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at > >>> the point where virtio gets probed and waits for its first interrupt). > >>> > >>> Worse, booting a hVHE guest results in QEMU generating an assert as it > >>> tries to inject an interrupt using the QEMU GICv3 model, something > >>> that should *NEVER* be in use with KVM. > >>> > >>> With help from Eric, I got to a point where the hVHE guest could boot > >>> as long as I kept injecting console interrupts, which is again a > >>> symptom of the vGIC not being used. > >>> > >>> So something is *majorly* wrong with the QEMU patches. I don't know > >>> what makes it possible for you to even boot the L1 - if the GIC is > >>> external, injecting an interrupt in the L2 is simply impossible. > >>> > >>> Miguel, can you please investigate this? > >>> > >>> In the meantime, I'll add some code to the kernel side to refuse the > >>> external interrupt controller configuration with NV. Hopefully that > >>> will lead to some clues about what is going on. > >> > >> Continued debugging of the issue and it seems the endless ptimer > >> interrupts on Ampere platform is due to some mess up of CVAL of > >> ptimer, resulting in interrupt triggered always when it is enabled. > >> > >> I see function "timer_set_offset" called from kvm_arm_timer_set_reg > >> in QEMU case but there is no such calls in kvmtool boot. > >> > >> If I comment the timer_set_offset calls in kvm_arm_timer_set_reg > >> function, then I could boot the Guest-Hypervisor then NestedVM from > >> GH/L1. > >> > >> I also observed in QEMU case, kvm_arm_timer_set_reg is called to > >> set CNT, CVAL and CTL of both vtimer and ptimer. > >> Not sure why QEMU is setting these registers explicitly? need to dig. > >> > > > > I don't see any direct ioctl calls to change any timer > > registers. Looks like it is happening from the emulation > > code(target/arm/helper.c)? > > function "write_list_to_kvmstate(target/arm/kvm.c)" is issuing ioctl > to write timer registers. PTIMER_CNT and TIMER_CNT writes from this > function is resulting in offsets change. Madness. Why is QEMU doing this? It has no business writing to the timer at any stage, at least with KVM. This confirms my suspicions that QEMU is confused about what mode it runs in, and does all sort of funky things it was never expected to do. M.
Hi Marc, > On 10 Jul 2023, at 12:56, Miguel Luis <miguel.luis@oracle.com> wrote: > > Hi Marc, > >> On 29 Jun 2023, at 07:03, Marc Zyngier <maz@kernel.org> wrote: >> >> Hi Ganapatrao, >> >> On Wed, 28 Jun 2023 07:45:55 +0100, >> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>> >>> >>> Hi Marc, >>> >>> >>> On 15-05-2023 11:00 pm, Marc Zyngier wrote: >>>> This is the 4th drop of NV support on arm64 for this year. >>>> >>>> For the previous episodes, see [1]. >>>> >>>> What's changed: >>>> >>>> - New framework to track system register traps that are reinjected in >>>> guest EL2. It is expected to replace the discrete handling we have >>>> enjoyed so far, which didn't scale at all. This has already fixed a >>>> number of bugs that were hidden (a bunch of traps were never >>>> forwarded...). Still a work in progress, but this is going in the >>>> right direction. >>>> >>>> - Allow the L1 hypervisor to have a S2 that has an input larger than >>>> the L0 IPA space. This fixes a number of subtle issues, depending on >>>> how the initial guest was created. >>>> >>>> - Consequently, the patch series has gone longer again. Boo. But >>>> hopefully some of it is easier to review... >>>> >>> >>> I am facing issue in booting NestedVM with V9 as well with 10 patchset. >>> >>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot >>> Guest-Hypervisor and then NestedVM without any issue. >>> However when I try to boot using QEMU(not using EDK2/EFI), >>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From >>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, >>> boot hangs very early stage of the boot. >>> >>> I did some debug using ftrace and it seems the Guest-Hypervisor is >>> getting very high rate of arch-timer interrupts, >>> due to that all CPU time is going on in serving the Guest-Hypervisor >>> and it is never going back to NestedVM. >>> >>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] >> >> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I >> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at >> the point where virtio gets probed and waits for its first interrupt). In order to use the previous patches you need to update the linux headers of QEMU according to the target kernel you’re testing. So you would want to run ./scripts/update-linux-headers.sh <kernel src dir> <qemu src dir> in the place of patch 1 then you should be able to boot the L1 guest with virtualization=on. Regarding the L2 guest, it does not boot and I’m in the process of understanding why. The previous patches had some improvements to make but I couldn’t relate them to not booting the L2 guest, yet. Eric stated an issue with NV and SVE enablement which I’m still looking at. ( which is similar to the commit 5b578f088ada3c4319f7274c0221b5d92143fe6a in your kvmtool branch arm64/nv-5.16 ) As a test I’ve disabled SVE on the kernel side with 'arm64.nosve’ and the output matches the one from your kvmtool: [ 0.000000] CPU features: SYS_ID_AA64PFR0_EL1[35:32]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[59:56]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[55:52]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[47:44]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[43:40]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[35:32]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[23:20]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[19:16]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[7:4]: already set to 0 [ 0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[3:0]: already set to 0 That didn’t enabled the L2 guest to boot on QEMU so the issue feels still in a grey area. As a baseline I tested your kvmtool branch for 5.16 after updating the includes, and as I expected both L1 and L2 guests boot. Miguel >> >> Worse, booting a hVHE guest results in QEMU generating an assert as it >> tries to inject an interrupt using the QEMU GICv3 model, something >> that should *NEVER* be in use with KVM. >> >> With help from Eric, I got to a point where the hVHE guest could boot >> as long as I kept injecting console interrupts, which is again a >> symptom of the vGIC not being used. >> >> So something is *majorly* wrong with the QEMU patches. I don't know >> what makes it possible for you to even boot the L1 - if the GIC is >> external, injecting an interrupt in the L2 is simply impossible. >> >> Miguel, can you please investigate this? > > Yes, I will investigate it. Sorry for the delay in replying as I took a break > short after KVM forum and I’ve just started to sync up. > > Thanks, > Miguel > >> >> In the meantime, I'll add some code to the kernel side to refuse the >> external interrupt controller configuration with NV. Hopefully that >> will lead to some clues about what is going on. >> >> Thanks, >> >> M. >> >> -- >> Without deviation from the norm, progress is not possible.