Message ID | 20231120131027.854038-1-maz@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: Nested Virtualization support (FEAT_NV2 only) | expand |
Hi Marc, On 20-11-2023 06:39 pm, Marc Zyngier wrote: > This is the 5th drop of NV support on arm64 for this year, and most > probably the last one for this side of Christmas. > > For the previous episodes, see [1]. > > What's changed: > > - Drop support for the original FEAT_NV. No existing hardware supports > it without FEAT_NV2, and the architecture is deprecating the former > entirely. This results in fewer patches, and a slightly simpler > model overall. > > - Reorganise the series to make it a bit more logical now that FEAT_NV > is gone. > > - Apply the NV idreg restrictions on VM first run rather than on each > access. > > - Make the nested vgic shadow CPU interface a per-CPU structure rather > than per-vcpu. > > - Fix the EL0 timer fastpath > > - Work around the architecture deficiencies when trapping WFI from a > L2 guest. > > - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > > - Drop the patches that have already been merged (NV trap forwarding, > per-MMU VTCR) > > - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > > The branch containing these patches (and more) is at [3]. As for the > previous rounds, my intention is to take a prefix of this series into > 6.8, provided that it gets enough reviewing. > > [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > V11 series is not booting on Ampere platform (I am yet to debug). With lkvm, it is stuck at the very early stage itself and no early boot prints/logs. Are there any changes needed in kvmtool for V11? > Andre Przywara (1): > KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ > > Christoffer Dall (2): > KVM: arm64: nv: Implement nested Stage-2 page table walk logic > KVM: arm64: nv: Unmap/flush shadow stage 2 page tables > > Jintack Lim (3): > KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting > KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings > KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2 > > Marc Zyngier (37): > arm64: cpufeatures: Restrict NV support to FEAT_NV2 > KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt() > KVM: arm64: nv: Compute NV view of idregs as a one-off > KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR > KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers > KVM: arm64: nv: Add include containing the VNCR_EL2 offsets > KVM: arm64: Introduce a bad_trap() primitive for unexpected trap > handling > KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers > KVM: arm64: nv: Map VNCR-capable registers to a separate page > KVM: arm64: nv: Handle virtual EL2 registers in > vcpu_read/write_sys_reg() > KVM: arm64: nv: Handle HCR_EL2.E2H specially > KVM: arm64: nv: Handle CNTHCTL_EL2 specially > KVM: arm64: nv: Save/Restore vEL2 sysregs > KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2 > KVM: arm64: nv: Support multiple nested Stage-2 mmu structures > KVM: arm64: nv: Handle shadow stage 2 page faults > KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's > KVM: arm64: nv: Set a handler for the system instruction traps > KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 > KVM: arm64: nv: Hide RAS from nested guests > KVM: arm64: nv: Add handling of EL2-specific timer registers > KVM: arm64: nv: Sync nested timer state with FEAT_NV2 > KVM: arm64: nv: Publish emulated timer interrupt state in the > in-memory state > KVM: arm64: nv: Load timer before the GIC > KVM: arm64: nv: Nested GICv3 Support > KVM: arm64: nv: Don't block in WFI from nested state > KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup > KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt > delivery > KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation > KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like > information > KVM: arm64: nv: Tag shadow S2 entries with nested level > KVM: arm64: nv: Allocate VNCR page when required > KVM: arm64: nv: Fast-track 'InHost' exception returns > KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests > KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers > KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on > KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT > > .../virt/kvm/devices/arm-vgic-v3.rst | 12 +- > arch/arm64/include/asm/esr.h | 1 + > arch/arm64/include/asm/kvm_arm.h | 3 + > arch/arm64/include/asm/kvm_asm.h | 4 + > arch/arm64/include/asm/kvm_emulate.h | 53 +- > arch/arm64/include/asm/kvm_host.h | 223 +++- > arch/arm64/include/asm/kvm_hyp.h | 2 + > arch/arm64/include/asm/kvm_mmu.h | 12 + > arch/arm64/include/asm/kvm_nested.h | 130 ++- > arch/arm64/include/asm/sysreg.h | 7 + > arch/arm64/include/asm/vncr_mapping.h | 102 ++ > arch/arm64/include/uapi/asm/kvm.h | 1 + > arch/arm64/kernel/cpufeature.c | 22 +- > arch/arm64/kvm/Makefile | 4 +- > arch/arm64/kvm/arch_timer.c | 115 +- > arch/arm64/kvm/arm.c | 46 +- > arch/arm64/kvm/at.c | 219 ++++ > arch/arm64/kvm/emulate-nested.c | 48 +- > arch/arm64/kvm/handle_exit.c | 29 +- > arch/arm64/kvm/hyp/include/hyp/switch.h | 8 +- > arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 5 +- > arch/arm64/kvm/hyp/nvhe/switch.c | 2 +- > arch/arm64/kvm/hyp/nvhe/sysreg-sr.c | 2 +- > arch/arm64/kvm/hyp/vgic-v3-sr.c | 6 +- > arch/arm64/kvm/hyp/vhe/switch.c | 211 +++- > arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 133 ++- > arch/arm64/kvm/hyp/vhe/tlb.c | 83 ++ > arch/arm64/kvm/mmu.c | 248 ++++- > arch/arm64/kvm/nested.c | 813 ++++++++++++++- > arch/arm64/kvm/reset.c | 7 + > arch/arm64/kvm/sys_regs.c | 978 ++++++++++++++++-- > arch/arm64/kvm/vgic/vgic-init.c | 35 + > arch/arm64/kvm/vgic/vgic-kvm-device.c | 29 +- > arch/arm64/kvm/vgic/vgic-v3-nested.c | 270 +++++ > arch/arm64/kvm/vgic/vgic-v3.c | 35 +- > arch/arm64/kvm/vgic/vgic.c | 29 + > arch/arm64/kvm/vgic/vgic.h | 10 + > arch/arm64/tools/cpucaps | 2 + > include/clocksource/arm_arch_timer.h | 4 + > include/kvm/arm_arch_timer.h | 19 + > include/kvm/arm_vgic.h | 16 + > include/uapi/linux/kvm.h | 1 + > tools/arch/arm/include/uapi/asm/kvm.h | 1 + > 43 files changed, 3725 insertions(+), 255 deletions(-) > create mode 100644 arch/arm64/include/asm/vncr_mapping.h > create mode 100644 arch/arm64/kvm/at.c > create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c > Thanks, Ganapat
On Tue, 21 Nov 2023 08:51:35 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > Hi Marc, > > On 20-11-2023 06:39 pm, Marc Zyngier wrote: > > This is the 5th drop of NV support on arm64 for this year, and most > > probably the last one for this side of Christmas. > > > > For the previous episodes, see [1]. > > > > What's changed: > > > > - Drop support for the original FEAT_NV. No existing hardware supports > > it without FEAT_NV2, and the architecture is deprecating the former > > entirely. This results in fewer patches, and a slightly simpler > > model overall. > > > > - Reorganise the series to make it a bit more logical now that FEAT_NV > > is gone. > > > > - Apply the NV idreg restrictions on VM first run rather than on each > > access. > > > > - Make the nested vgic shadow CPU interface a per-CPU structure rather > > than per-vcpu. > > > > - Fix the EL0 timer fastpath > > > > - Work around the architecture deficiencies when trapping WFI from a > > L2 guest. > > > > - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > > > > - Drop the patches that have already been merged (NV trap forwarding, > > per-MMU VTCR) > > > > - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > > > > The branch containing these patches (and more) is at [3]. As for the > > previous rounds, my intention is to take a prefix of this series into > > 6.8, provided that it gets enough reviewing. > > > > [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > > [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > > > > V11 series is not booting on Ampere platform (I am yet to debug). > With lkvm, it is stuck at the very early stage itself and no early > boot prints/logs. > > Are there any changes needed in kvmtool for V11? Not really, I'm still using the version I had built for 6.5. Is the problem with L1 or L2? However, this looks like a problem I've been chasing, and which I though was only a M2 issue. In some situations, I'm getting interrupt storms when L1 gets a level interrupt while in L2. Can you cherry-pick [1] from my tree, and let me know if this helps? This isn't a proper fix, but if L2 starts booting with this, I would know this is a common issue. Now, if your problem is with L1, I really have no idea. Thanks, M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/nv-6.8-nv2-only&id=759d2e18f8954f4c76eb1772f38301df6ed8fa5d
On 21-11-2023 02:38 pm, Marc Zyngier wrote: > On Tue, 21 Nov 2023 08:51:35 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> Hi Marc, >> >> On 20-11-2023 06:39 pm, Marc Zyngier wrote: >>> This is the 5th drop of NV support on arm64 for this year, and most >>> probably the last one for this side of Christmas. >>> >>> For the previous episodes, see [1]. >>> >>> What's changed: >>> >>> - Drop support for the original FEAT_NV. No existing hardware supports >>> it without FEAT_NV2, and the architecture is deprecating the former >>> entirely. This results in fewer patches, and a slightly simpler >>> model overall. >>> >>> - Reorganise the series to make it a bit more logical now that FEAT_NV >>> is gone. >>> >>> - Apply the NV idreg restrictions on VM first run rather than on each >>> access. >>> >>> - Make the nested vgic shadow CPU interface a per-CPU structure rather >>> than per-vcpu. >>> >>> - Fix the EL0 timer fastpath >>> >>> - Work around the architecture deficiencies when trapping WFI from a >>> L2 guest. >>> >>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) >>> >>> - Drop the patches that have already been merged (NV trap forwarding, >>> per-MMU VTCR) >>> >>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. >>> >>> The branch containing these patches (and more) is at [3]. As for the >>> previous rounds, my intention is to take a prefix of this series into >>> 6.8, provided that it gets enough reviewing. >>> >>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org >>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org >>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only >>> >> >> V11 series is not booting on Ampere platform (I am yet to debug). >> With lkvm, it is stuck at the very early stage itself and no early >> boot prints/logs. >> >> Are there any changes needed in kvmtool for V11? > > Not really, I'm still using the version I had built for 6.5. Is the > problem with L1 or L2? Stuck in the L1 itself. I am using kvmtool from https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16 > > However, this looks like a problem I've been chasing, and which I > though was only a M2 issue. In some situations, I'm getting interrupt > storms when L1 gets a level interrupt while in L2. > > Can you cherry-pick [1] from my tree, and let me know if this helps? > This isn't a proper fix, but if L2 starts booting with this, I would > know this is a common issue. > > Now, if your problem is with L1, I really have no idea. > > Thanks, > > M. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/nv-6.8-nv2-only&id=759d2e18f8954f4c76eb1772f38301df6ed8fa5d > Thanks, Ganapat
On Tue, 21 Nov 2023 09:26:22 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > > On 21-11-2023 02:38 pm, Marc Zyngier wrote: > > On Tue, 21 Nov 2023 08:51:35 +0000, > > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > >> > >> > >> Hi Marc, > >> > >> On 20-11-2023 06:39 pm, Marc Zyngier wrote: > >>> This is the 5th drop of NV support on arm64 for this year, and most > >>> probably the last one for this side of Christmas. > >>> > >>> For the previous episodes, see [1]. > >>> > >>> What's changed: > >>> > >>> - Drop support for the original FEAT_NV. No existing hardware supports > >>> it without FEAT_NV2, and the architecture is deprecating the former > >>> entirely. This results in fewer patches, and a slightly simpler > >>> model overall. > >>> > >>> - Reorganise the series to make it a bit more logical now that FEAT_NV > >>> is gone. > >>> > >>> - Apply the NV idreg restrictions on VM first run rather than on each > >>> access. > >>> > >>> - Make the nested vgic shadow CPU interface a per-CPU structure rather > >>> than per-vcpu. > >>> > >>> - Fix the EL0 timer fastpath > >>> > >>> - Work around the architecture deficiencies when trapping WFI from a > >>> L2 guest. > >>> > >>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > >>> > >>> - Drop the patches that have already been merged (NV trap forwarding, > >>> per-MMU VTCR) > >>> > >>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > >>> > >>> The branch containing these patches (and more) is at [3]. As for the > >>> previous rounds, my intention is to take a prefix of this series into > >>> 6.8, provided that it gets enough reviewing. > >>> > >>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > >>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > >>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > >>> > >> > >> V11 series is not booting on Ampere platform (I am yet to debug). > >> With lkvm, it is stuck at the very early stage itself and no early > >> boot prints/logs. > >> > >> Are there any changes needed in kvmtool for V11? > > > > Not really, I'm still using the version I had built for 6.5. Is the > > problem with L1 or L2? > > Stuck in the L1 itself. > > I am using kvmtool from > https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16 Huh. That's positively ancient. Yet, you shouldn't get into a situation where the L1 guest locks up. I have pushed out my kvmtool branch[1]. Please give it a go. Thanks, M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-6.5
Hi Marc, > On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: > > This is the 5th drop of NV support on arm64 for this year, and most > probably the last one for this side of Christmas. > > For the previous episodes, see [1]. > > What's changed: > > - Drop support for the original FEAT_NV. No existing hardware supports > it without FEAT_NV2, and the architecture is deprecating the former > entirely. This results in fewer patches, and a slightly simpler > model overall. > > - Reorganise the series to make it a bit more logical now that FEAT_NV > is gone. > > - Apply the NV idreg restrictions on VM first run rather than on each > access. > > - Make the nested vgic shadow CPU interface a per-CPU structure rather > than per-vcpu. > > - Fix the EL0 timer fastpath > > - Work around the architecture deficiencies when trapping WFI from a > L2 guest. > > - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > > - Drop the patches that have already been merged (NV trap forwarding, > per-MMU VTCR) > > - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > > The branch containing these patches (and more) is at [3]. As for the > previous rounds, my intention is to take a prefix of this series into > 6.8, provided that it gets enough reviewing. > > [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > While I was testing this with kvmtool for 5.16 I noted the following on dmesg: [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, This is CPACR_EL12. Still need yet to debug. As for QEMU, it is having issues enabling _EL2 feature although EL2 is supported by checking KVM_CAP_ARM_EL2; need yet to debug this. Thanks Miguel > Andre Przywara (1): > KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ > > Christoffer Dall (2): > KVM: arm64: nv: Implement nested Stage-2 page table walk logic > KVM: arm64: nv: Unmap/flush shadow stage 2 page tables > > Jintack Lim (3): > KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting > KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings > KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2 > > Marc Zyngier (37): > arm64: cpufeatures: Restrict NV support to FEAT_NV2 > KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt() > KVM: arm64: nv: Compute NV view of idregs as a one-off > KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR > KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers > KVM: arm64: nv: Add include containing the VNCR_EL2 offsets > KVM: arm64: Introduce a bad_trap() primitive for unexpected trap > handling > KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers > KVM: arm64: nv: Map VNCR-capable registers to a separate page > KVM: arm64: nv: Handle virtual EL2 registers in > vcpu_read/write_sys_reg() > KVM: arm64: nv: Handle HCR_EL2.E2H specially > KVM: arm64: nv: Handle CNTHCTL_EL2 specially > KVM: arm64: nv: Save/Restore vEL2 sysregs > KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2 > KVM: arm64: nv: Support multiple nested Stage-2 mmu structures > KVM: arm64: nv: Handle shadow stage 2 page faults > KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's > KVM: arm64: nv: Set a handler for the system instruction traps > KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 > KVM: arm64: nv: Hide RAS from nested guests > KVM: arm64: nv: Add handling of EL2-specific timer registers > KVM: arm64: nv: Sync nested timer state with FEAT_NV2 > KVM: arm64: nv: Publish emulated timer interrupt state in the > in-memory state > KVM: arm64: nv: Load timer before the GIC > KVM: arm64: nv: Nested GICv3 Support > KVM: arm64: nv: Don't block in WFI from nested state > KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup > KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt > delivery > KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation > KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like > information > KVM: arm64: nv: Tag shadow S2 entries with nested level > KVM: arm64: nv: Allocate VNCR page when required > KVM: arm64: nv: Fast-track 'InHost' exception returns > KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests > KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers > KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on > KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT > > .../virt/kvm/devices/arm-vgic-v3.rst | 12 +- > arch/arm64/include/asm/esr.h | 1 + > arch/arm64/include/asm/kvm_arm.h | 3 + > arch/arm64/include/asm/kvm_asm.h | 4 + > arch/arm64/include/asm/kvm_emulate.h | 53 +- > arch/arm64/include/asm/kvm_host.h | 223 +++- > arch/arm64/include/asm/kvm_hyp.h | 2 + > arch/arm64/include/asm/kvm_mmu.h | 12 + > arch/arm64/include/asm/kvm_nested.h | 130 ++- > arch/arm64/include/asm/sysreg.h | 7 + > arch/arm64/include/asm/vncr_mapping.h | 102 ++ > arch/arm64/include/uapi/asm/kvm.h | 1 + > arch/arm64/kernel/cpufeature.c | 22 +- > arch/arm64/kvm/Makefile | 4 +- > arch/arm64/kvm/arch_timer.c | 115 +- > arch/arm64/kvm/arm.c | 46 +- > arch/arm64/kvm/at.c | 219 ++++ > arch/arm64/kvm/emulate-nested.c | 48 +- > arch/arm64/kvm/handle_exit.c | 29 +- > arch/arm64/kvm/hyp/include/hyp/switch.h | 8 +- > arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 5 +- > arch/arm64/kvm/hyp/nvhe/switch.c | 2 +- > arch/arm64/kvm/hyp/nvhe/sysreg-sr.c | 2 +- > arch/arm64/kvm/hyp/vgic-v3-sr.c | 6 +- > arch/arm64/kvm/hyp/vhe/switch.c | 211 +++- > arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 133 ++- > arch/arm64/kvm/hyp/vhe/tlb.c | 83 ++ > arch/arm64/kvm/mmu.c | 248 ++++- > arch/arm64/kvm/nested.c | 813 ++++++++++++++- > arch/arm64/kvm/reset.c | 7 + > arch/arm64/kvm/sys_regs.c | 978 ++++++++++++++++-- > arch/arm64/kvm/vgic/vgic-init.c | 35 + > arch/arm64/kvm/vgic/vgic-kvm-device.c | 29 +- > arch/arm64/kvm/vgic/vgic-v3-nested.c | 270 +++++ > arch/arm64/kvm/vgic/vgic-v3.c | 35 +- > arch/arm64/kvm/vgic/vgic.c | 29 + > arch/arm64/kvm/vgic/vgic.h | 10 + > arch/arm64/tools/cpucaps | 2 + > include/clocksource/arm_arch_timer.h | 4 + > include/kvm/arm_arch_timer.h | 19 + > include/kvm/arm_vgic.h | 16 + > include/uapi/linux/kvm.h | 1 + > tools/arch/arm/include/uapi/asm/kvm.h | 1 + > 43 files changed, 3725 insertions(+), 255 deletions(-) > create mode 100644 arch/arm64/include/asm/vncr_mapping.h > create mode 100644 arch/arm64/kvm/at.c > create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c > > -- > 2.39.2 >
On Tue, 21 Nov 2023 16:49:52 +0000, Miguel Luis <miguel.luis@oracle.com> wrote: > > Hi Marc, > > > On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: > > > > This is the 5th drop of NV support on arm64 for this year, and most > > probably the last one for this side of Christmas. > > > > For the previous episodes, see [1]. > > > > What's changed: > > > > - Drop support for the original FEAT_NV. No existing hardware supports > > it without FEAT_NV2, and the architecture is deprecating the former > > entirely. This results in fewer patches, and a slightly simpler > > model overall. > > > > - Reorganise the series to make it a bit more logical now that FEAT_NV > > is gone. > > > > - Apply the NV idreg restrictions on VM first run rather than on each > > access. > > > > - Make the nested vgic shadow CPU interface a per-CPU structure rather > > than per-vcpu. > > > > - Fix the EL0 timer fastpath > > > > - Work around the architecture deficiencies when trapping WFI from a > > L2 guest. > > > > - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > > > > - Drop the patches that have already been merged (NV trap forwarding, > > per-MMU VTCR) > > > > - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > > > > The branch containing these patches (and more) is at [3]. As for the > > previous rounds, my intention is to take a prefix of this series into > > 6.8, provided that it gets enough reviewing. > > > > [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > > [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > > > > While I was testing this with kvmtool for 5.16 I noted the following on dmesg: > > [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] > { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, > > This is CPACR_EL12. CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... > Still need yet to debug. Can you disassemble the guest around the offending PC? > As for QEMU, it is having issues enabling _EL2 feature although EL2 > is supported by checking KVM_CAP_ARM_EL2; need yet to debug this. The capability number changes at each release. Make sure you resync your includes. M.
On 21-11-2023 03:11 pm, Marc Zyngier wrote: > On Tue, 21 Nov 2023 09:26:22 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> >> On 21-11-2023 02:38 pm, Marc Zyngier wrote: >>> On Tue, 21 Nov 2023 08:51:35 +0000, >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>> >>>> >>>> Hi Marc, >>>> >>>> On 20-11-2023 06:39 pm, Marc Zyngier wrote: >>>>> This is the 5th drop of NV support on arm64 for this year, and most >>>>> probably the last one for this side of Christmas. >>>>> >>>>> For the previous episodes, see [1]. >>>>> >>>>> What's changed: >>>>> >>>>> - Drop support for the original FEAT_NV. No existing hardware supports >>>>> it without FEAT_NV2, and the architecture is deprecating the former >>>>> entirely. This results in fewer patches, and a slightly simpler >>>>> model overall. >>>>> >>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV >>>>> is gone. >>>>> >>>>> - Apply the NV idreg restrictions on VM first run rather than on each >>>>> access. >>>>> >>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather >>>>> than per-vcpu. >>>>> >>>>> - Fix the EL0 timer fastpath >>>>> >>>>> - Work around the architecture deficiencies when trapping WFI from a >>>>> L2 guest. >>>>> >>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) >>>>> >>>>> - Drop the patches that have already been merged (NV trap forwarding, >>>>> per-MMU VTCR) >>>>> >>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. >>>>> >>>>> The branch containing these patches (and more) is at [3]. As for the >>>>> previous rounds, my intention is to take a prefix of this series into >>>>> 6.8, provided that it gets enough reviewing. >>>>> >>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org >>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org >>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only >>>>> >>>> >>>> V11 series is not booting on Ampere platform (I am yet to debug). >>>> With lkvm, it is stuck at the very early stage itself and no early >>>> boot prints/logs. >>>> >>>> Are there any changes needed in kvmtool for V11? >>> >>> Not really, I'm still using the version I had built for 6.5. Is the >>> problem with L1 or L2? >> >> Stuck in the L1 itself. >> >> I am using kvmtool from >> https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16 > > Huh. That's positively ancient. Yet, you shouldn't get into a > situation where the L1 guest locks up. > > I have pushed out my kvmtool branch[1]. Please give it a go. > No change, still L1 hangs. Captured ftrace and the L1 is keep looping/faulting around same address across kvm_entry and kvm_exits. It is a weird behavior, L1 is faulting and looping around MDCR and AA64MMFR3_EL1 access in function __finalise_el2. asm: ffffffc080528a58: d53dc000 mrs x0, vbar_el12 ffffffc080528a5c: d518c000 msr vbar_el1, x0 ffffffc080528a60: d53c1120 mrs x0, mdcr_el2 ffffffc080528a64: 9272f400 and x0, x0, #0xffffffffffffcfff ffffffc080528a68: 9266f400 and x0, x0, #0xfffffffffcffffff ffffffc080528a6c: d51c1120 msr mdcr_el2, x0 ffffffc080528a70: d53d2040 mrs x0, tcr_el12 ffffffc080528a74: d5182040 msr tcr_el1, x0 ffffffc080528a78: d53d2000 mrs x0, ttbr0_el12 ffffffc080528a7c: d5182000 msr ttbr0_el1, x0 ffffffc080528a80: d53d2020 mrs x0, ttbr1_el12 ffffffc080528a84: d5182020 msr ttbr1_el1, x0 ffffffc080528a88: d53da200 mrs x0, mair_el12 ffffffc080528a8c: d518a200 msr mair_el1, x0 ffffffc080528a90: d5380761 mrs x1, s3_0_c0_c7_3 ffffffc080528a94: d3400c21 ubfx x1, x1, #0, #4 ffffffc080528a98: b4000141 cbz x1, ffffffc080528ac0 <__finalise_el2+0x270> ffffffc080528a9c: d53d2060 mrs x0, s3_5_c2_c0_3 ftrace: kvm-vcpu-0-88776 [001] ...1. 6076.581774: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] d..1. 6076.581774: kvm_entry: PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] ...1. 6076.581775: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90 kvm-vcpu-0-88776 [001] d..1. 6076.581776: kvm_entry: PC: 0x0000000080528a90 kvm-vcpu-0-88776 [001] ...1. 6076.581778: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a60 kvm-vcpu-0-88776 [001] d..1. 6076.581778: kvm_entry: PC: 0x0000000080528a60 kvm-vcpu-0-88776 [001] ...1. 6076.581779: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] d..1. 6076.581779: kvm_entry: PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] ...1. 6076.581780: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90 kvm-vcpu-0-88776 [001] d..1. 6076.581781: kvm_entry: PC: 0x0000000080528a90 kvm-vcpu-0-88776 [001] ...1. 6076.581783: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a60 kvm-vcpu-0-88776 [001] d..1. 6076.581783: kvm_entry: PC: 0x0000000080528a60 kvm-vcpu-0-88776 [001] ...1. 6076.581784: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] d..1. 6076.581784: kvm_entry: PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] ...1. 6076.581785: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90 kvm-vcpu-0-88776 [001] d..1. 6076.581786: kvm_entry: PC: 0x0000000080528a90 kvm-vcpu-0-88776 [001] ...1. 6076.581788: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a60 kvm-vcpu-0-88776 [001] d..1. 6076.581788: kvm_entry: PC: 0x0000000080528a60 kvm-vcpu-0-88776 [001] ...1. 6076.581789: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] d..1. 6076.581789: kvm_entry: PC: 0x0000000080528a6c kvm-vcpu-0-88776 [001] ...1. 6076.581790: kvm_exit: TRAP: HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90 Thanks, Ganapat
On Wed, 22 Nov 2023 11:10:10 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > No change, still L1 hangs. Captured ftrace and the L1 is keep > looping/faulting around same address across kvm_entry and kvm_exits. > > It is a weird behavior, L1 is faulting and looping around MDCR and > AA64MMFR3_EL1 access in function __finalise_el2. I really can't see how this happens. There are no backward branches, and we don't seem to reach the ERET either. So something must affect the state after the trap of ID_AA64MMFR3_EL1. M.
Hi Marc, On 21/11/2023 18:02, Marc Zyngier wrote: > On Tue, 21 Nov 2023 16:49:52 +0000, > Miguel Luis <miguel.luis@oracle.com> wrote: >> Hi Marc, >> >>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: >>> >>> This is the 5th drop of NV support on arm64 for this year, and most >>> probably the last one for this side of Christmas. >>> >>> For the previous episodes, see [1]. >>> >>> What's changed: >>> >>> - Drop support for the original FEAT_NV. No existing hardware supports >>> it without FEAT_NV2, and the architecture is deprecating the former >>> entirely. This results in fewer patches, and a slightly simpler >>> model overall. >>> >>> - Reorganise the series to make it a bit more logical now that FEAT_NV >>> is gone. >>> >>> - Apply the NV idreg restrictions on VM first run rather than on each >>> access. >>> >>> - Make the nested vgic shadow CPU interface a per-CPU structure rather >>> than per-vcpu. >>> >>> - Fix the EL0 timer fastpath >>> >>> - Work around the architecture deficiencies when trapping WFI from a >>> L2 guest. >>> >>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) >>> >>> - Drop the patches that have already been merged (NV trap forwarding, >>> per-MMU VTCR) >>> >>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. >>> >>> The branch containing these patches (and more) is at [3]. As for the >>> previous rounds, my intention is to take a prefix of this series into >>> 6.8, provided that it gets enough reviewing. >>> >>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org >>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org >>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only >>> >> While I was testing this with kvmtool for 5.16 I noted the following on dmesg: >> >> [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] >> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, >> >> This is CPACR_EL12. > CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... > >> Still need yet to debug. > Can you disassemble the guest around the offending PC? [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9] { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, 12baa00: 14000008 b 0x12baa20 12baa04: d000d501 adrp x1, 0x2d5c000 12baa08: 91154021 add x1, x1, #0x550 12baa0c: f9400022 ldr x2, [x1] 12baa10: f9400421 ldr x1, [x1, #8] 12baa14: 8a010042 and x2, x2, x1 12baa18: d3441c42 ubfx x2, x2, #4, #4 12baa1c: b4000082 cbz x2, 0x12baa2c 12baa20: d2a175a0 mov x0, #0xbad0000 // #195887104 12baa24: f2994220 movk x0, #0xca11 12baa28: d69f03e0 eret 12baa2c: d2c00080 mov x0, #0x400000000 // #17179869184 12baa30: f2b10000 movk x0, #0x8800, lsl #16 12baa34: f2800000 movk x0, #0x0 12baa38: d51c1100 msr hcr_el2, x0 12baa3c: d5033fdf isb 12baa40: d53c4100 mrs x0, sp_el1 12baa44: 9100001f mov sp, x0 12baa48: d538d080 mrs x0, tpidr_el1 12baa4c: d51cd040 msr tpidr_el2, x0 12baa50: d53d1040 mrs x0, cpacr_el12 12baa54: d5181040 msr cpacr_el1, x0 12baa58: d53dc000 mrs x0, vbar_el12 12baa5c: d518c000 msr vbar_el1, x0 12baa60: d53c1120 mrs x0, mdcr_el2 12baa64: 9272f400 and x0, x0, #0xffffffffffffcfff 12baa68: 9266f400 and x0, x0, #0xfffffffffcffffff 12baa6c: d51c1120 msr mdcr_el2, x0 12baa70: d53d2040 mrs x0, tcr_el12 12baa74: d5182040 msr tcr_el1, x0 12baa78: d53d2000 mrs x0, ttbr0_el12 12baa7c: d5182000 msr ttbr0_el1, x0 12baa80: d53d2020 mrs x0, ttbr1_el12 12baa84: d5182020 msr ttbr1_el1, x0 12baa88: d53da200 mrs x0, mair_el12 12baa8c: d518a200 msr mair_el1, x0 12baa90: d5380761 mrs x1, s3_0_c0_c7_3 12baa94: d3400c21 ubfx x1, x1, #0, #4 12baa98: b4000141 cbz x1, 0x12baac0 12baa9c: d53d2060 mrs x0, s3_5_c2_c0_3 >> As for QEMU, it is having issues enabling _EL2 feature although EL2 >> is supported by checking KVM_CAP_ARM_EL2; need yet to debug this. > The capability number changes at each release. Make sure you resync > your includes. Been there but it seems a different problem this time. Thank you Miguel > M. >
On Thu, 23 Nov 2023 16:21:48 +0000, Miguel Luis <miguel.luis@oracle.com> wrote: > > Hi Marc, > > On 21/11/2023 18:02, Marc Zyngier wrote: > > On Tue, 21 Nov 2023 16:49:52 +0000, > > Miguel Luis <miguel.luis@oracle.com> wrote: > >> Hi Marc, > >> > >>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: > >>> > >>> This is the 5th drop of NV support on arm64 for this year, and most > >>> probably the last one for this side of Christmas. > >>> > >>> For the previous episodes, see [1]. > >>> > >>> What's changed: > >>> > >>> - Drop support for the original FEAT_NV. No existing hardware supports > >>> it without FEAT_NV2, and the architecture is deprecating the former > >>> entirely. This results in fewer patches, and a slightly simpler > >>> model overall. > >>> > >>> - Reorganise the series to make it a bit more logical now that FEAT_NV > >>> is gone. > >>> > >>> - Apply the NV idreg restrictions on VM first run rather than on each > >>> access. > >>> > >>> - Make the nested vgic shadow CPU interface a per-CPU structure rather > >>> than per-vcpu. > >>> > >>> - Fix the EL0 timer fastpath > >>> > >>> - Work around the architecture deficiencies when trapping WFI from a > >>> L2 guest. > >>> > >>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > >>> > >>> - Drop the patches that have already been merged (NV trap forwarding, > >>> per-MMU VTCR) > >>> > >>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > >>> > >>> The branch containing these patches (and more) is at [3]. As for the > >>> previous rounds, my intention is to take a prefix of this series into > >>> 6.8, provided that it gets enough reviewing. > >>> > >>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > >>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > >>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > >>> > >> While I was testing this with kvmtool for 5.16 I noted the following on dmesg: > >> > >> [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] > >> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, > >> > >> This is CPACR_EL12. > > CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... > > > >> Still need yet to debug. > > Can you disassemble the guest around the offending PC? > > [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9] > { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, > > 12baa00: 14000008 b 0x12baa20 > 12baa04: d000d501 adrp x1, 0x2d5c000 > 12baa08: 91154021 add x1, x1, #0x550 > 12baa0c: f9400022 ldr x2, [x1] > 12baa10: f9400421 ldr x1, [x1, #8] > 12baa14: 8a010042 and x2, x2, x1 > 12baa18: d3441c42 ubfx x2, x2, #4, #4 > 12baa1c: b4000082 cbz x2, 0x12baa2c > 12baa20: d2a175a0 mov x0, #0xbad0000 // #195887104 > 12baa24: f2994220 movk x0, #0xca11 > 12baa28: d69f03e0 eret > 12baa2c: d2c00080 mov x0, #0x400000000 // #17179869184 > 12baa30: f2b10000 movk x0, #0x8800, lsl #16 > 12baa34: f2800000 movk x0, #0x0 > 12baa38: d51c1100 msr hcr_el2, x0 > 12baa3c: d5033fdf isb > 12baa40: d53c4100 mrs x0, sp_el1 > 12baa44: 9100001f mov sp, x0 > 12baa48: d538d080 mrs x0, tpidr_el1 > 12baa4c: d51cd040 msr tpidr_el2, x0 > 12baa50: d53d1040 mrs x0, cpacr_el12 > 12baa54: d5181040 msr cpacr_el1, x0 > 12baa58: d53dc000 mrs x0, vbar_el12 > 12baa5c: d518c000 msr vbar_el1, x0 > 12baa60: d53c1120 mrs x0, mdcr_el2 > 12baa64: 9272f400 and x0, x0, #0xffffffffffffcfff > 12baa68: 9266f400 and x0, x0, #0xfffffffffcffffff > 12baa6c: d51c1120 msr mdcr_el2, x0 > 12baa70: d53d2040 mrs x0, tcr_el12 > 12baa74: d5182040 msr tcr_el1, x0 > 12baa78: d53d2000 mrs x0, ttbr0_el12 > 12baa7c: d5182000 msr ttbr0_el1, x0 > 12baa80: d53d2020 mrs x0, ttbr1_el12 > 12baa84: d5182020 msr ttbr1_el1, x0 > 12baa88: d53da200 mrs x0, mair_el12 > 12baa8c: d518a200 msr mair_el1, x0 > 12baa90: d5380761 mrs x1, s3_0_c0_c7_3 > 12baa94: d3400c21 ubfx x1, x1, #0, #4 > 12baa98: b4000141 cbz x1, 0x12baac0 > 12baa9c: d53d2060 mrs x0, s3_5_c2_c0_3 OK, this is suspiciously close to the location Ganapatrao was having issues with. Are you running on the same hardware? In any case, we should never take a trap for this access. Can you dump HCR_EL2 at the point where the guest traps (in switch.c)? > >> As for QEMU, it is having issues enabling _EL2 feature although EL2 > >> is supported by checking KVM_CAP_ARM_EL2; need yet to debug this. > > The capability number changes at each release. Make sure you resync > > your includes. > > Been there but it seems a different problem this time. Creating the VM with SVE? NV doesn't support it yet (and it has been the case for a long while). M.
On 23-11-2023 10:14 pm, Marc Zyngier wrote: > On Thu, 23 Nov 2023 16:21:48 +0000, > Miguel Luis <miguel.luis@oracle.com> wrote: >> >> Hi Marc, >> >> On 21/11/2023 18:02, Marc Zyngier wrote: >>> On Tue, 21 Nov 2023 16:49:52 +0000, >>> Miguel Luis <miguel.luis@oracle.com> wrote: >>>> Hi Marc, >>>> >>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: >>>>> >>>>> This is the 5th drop of NV support on arm64 for this year, and most >>>>> probably the last one for this side of Christmas. >>>>> >>>>> For the previous episodes, see [1]. >>>>> >>>>> What's changed: >>>>> >>>>> - Drop support for the original FEAT_NV. No existing hardware supports >>>>> it without FEAT_NV2, and the architecture is deprecating the former >>>>> entirely. This results in fewer patches, and a slightly simpler >>>>> model overall. >>>>> >>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV >>>>> is gone. >>>>> >>>>> - Apply the NV idreg restrictions on VM first run rather than on each >>>>> access. >>>>> >>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather >>>>> than per-vcpu. >>>>> >>>>> - Fix the EL0 timer fastpath >>>>> >>>>> - Work around the architecture deficiencies when trapping WFI from a >>>>> L2 guest. >>>>> >>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) >>>>> >>>>> - Drop the patches that have already been merged (NV trap forwarding, >>>>> per-MMU VTCR) >>>>> >>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. >>>>> >>>>> The branch containing these patches (and more) is at [3]. As for the >>>>> previous rounds, my intention is to take a prefix of this series into >>>>> 6.8, provided that it gets enough reviewing. >>>>> >>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org >>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org >>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only >>>>> >>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg: >>>> >>>> [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] >>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, >>>> >>>> This is CPACR_EL12. >>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... >>> >>>> Still need yet to debug. >>> Can you disassemble the guest around the offending PC? >> >> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9] >> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, >> >> 12baa00: 14000008 b 0x12baa20 >> 12baa04: d000d501 adrp x1, 0x2d5c000 >> 12baa08: 91154021 add x1, x1, #0x550 >> 12baa0c: f9400022 ldr x2, [x1] >> 12baa10: f9400421 ldr x1, [x1, #8] >> 12baa14: 8a010042 and x2, x2, x1 >> 12baa18: d3441c42 ubfx x2, x2, #4, #4 >> 12baa1c: b4000082 cbz x2, 0x12baa2c >> 12baa20: d2a175a0 mov x0, #0xbad0000 // #195887104 >> 12baa24: f2994220 movk x0, #0xca11 >> 12baa28: d69f03e0 eret >> 12baa2c: d2c00080 mov x0, #0x400000000 // #17179869184 >> 12baa30: f2b10000 movk x0, #0x8800, lsl #16 >> 12baa34: f2800000 movk x0, #0x0 >> 12baa38: d51c1100 msr hcr_el2, x0 >> 12baa3c: d5033fdf isb >> 12baa40: d53c4100 mrs x0, sp_el1 >> 12baa44: 9100001f mov sp, x0 >> 12baa48: d538d080 mrs x0, tpidr_el1 >> 12baa4c: d51cd040 msr tpidr_el2, x0 >> 12baa50: d53d1040 mrs x0, cpacr_el12 >> 12baa54: d5181040 msr cpacr_el1, x0 >> 12baa58: d53dc000 mrs x0, vbar_el12 >> 12baa5c: d518c000 msr vbar_el1, x0 >> 12baa60: d53c1120 mrs x0, mdcr_el2 >> 12baa64: 9272f400 and x0, x0, #0xffffffffffffcfff >> 12baa68: 9266f400 and x0, x0, #0xfffffffffcffffff >> 12baa6c: d51c1120 msr mdcr_el2, x0 >> 12baa70: d53d2040 mrs x0, tcr_el12 >> 12baa74: d5182040 msr tcr_el1, x0 >> 12baa78: d53d2000 mrs x0, ttbr0_el12 >> 12baa7c: d5182000 msr ttbr0_el1, x0 >> 12baa80: d53d2020 mrs x0, ttbr1_el12 >> 12baa84: d5182020 msr ttbr1_el1, x0 >> 12baa88: d53da200 mrs x0, mair_el12 >> 12baa8c: d518a200 msr mair_el1, x0 >> 12baa90: d5380761 mrs x1, s3_0_c0_c7_3 >> 12baa94: d3400c21 ubfx x1, x1, #0, #4 >> 12baa98: b4000141 cbz x1, 0x12baac0 >> 12baa9c: d53d2060 mrs x0, s3_5_c2_c0_3 > > OK, this is suspiciously close to the location Ganapatrao was having > issues with. Are you running on the same hardware? > > In any case, we should never take a trap for this access. Can you dump > HCR_EL2 at the point where the guest traps (in switch.c)? > I have dumped HCR_EL2 before entry to L1 in both V11 and V10. on V10 HCR_EL2=0x2743c827c263f on V11 HCR_EL2=0x27c3c827c263f on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false resulting in NV1 bit set along with NV and NV2. AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1. I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true. There could be a bug in V11 or E2H0 patchset resulting in vcpu_el2_e2h_is_set() returning false? >>>> As for QEMU, it is having issues enabling _EL2 feature although EL2 >>>> is supported by checking KVM_CAP_ARM_EL2; need yet to debug this. >>> The capability number changes at each release. Make sure you resync >>> your includes. >> >> Been there but it seems a different problem this time. > > Creating the VM with SVE? NV doesn't support it yet (and it has been > the case for a long while). > > M. > Thanks, Ganapat
On Fri, 24 Nov 2023 09:50:33 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > > On 23-11-2023 10:14 pm, Marc Zyngier wrote: > > On Thu, 23 Nov 2023 16:21:48 +0000, > > Miguel Luis <miguel.luis@oracle.com> wrote: > >> > >> Hi Marc, > >> > >> On 21/11/2023 18:02, Marc Zyngier wrote: > >>> On Tue, 21 Nov 2023 16:49:52 +0000, > >>> Miguel Luis <miguel.luis@oracle.com> wrote: > >>>> Hi Marc, > >>>> > >>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: > >>>>> > >>>>> This is the 5th drop of NV support on arm64 for this year, and most > >>>>> probably the last one for this side of Christmas. > >>>>> > >>>>> For the previous episodes, see [1]. > >>>>> > >>>>> What's changed: > >>>>> > >>>>> - Drop support for the original FEAT_NV. No existing hardware supports > >>>>> it without FEAT_NV2, and the architecture is deprecating the former > >>>>> entirely. This results in fewer patches, and a slightly simpler > >>>>> model overall. > >>>>> > >>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV > >>>>> is gone. > >>>>> > >>>>> - Apply the NV idreg restrictions on VM first run rather than on each > >>>>> access. > >>>>> > >>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather > >>>>> than per-vcpu. > >>>>> > >>>>> - Fix the EL0 timer fastpath > >>>>> > >>>>> - Work around the architecture deficiencies when trapping WFI from a > >>>>> L2 guest. > >>>>> > >>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > >>>>> > >>>>> - Drop the patches that have already been merged (NV trap forwarding, > >>>>> per-MMU VTCR) > >>>>> > >>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > >>>>> > >>>>> The branch containing these patches (and more) is at [3]. As for the > >>>>> previous rounds, my intention is to take a prefix of this series into > >>>>> 6.8, provided that it gets enough reviewing. > >>>>> > >>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > >>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > >>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > >>>>> > >>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg: > >>>> > >>>> [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] > >>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, > >>>> > >>>> This is CPACR_EL12. > >>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... > >>> > >>>> Still need yet to debug. > >>> Can you disassemble the guest around the offending PC? > >> > >> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9] > >> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, > >> > >> 12baa00: 14000008 b 0x12baa20 > >> 12baa04: d000d501 adrp x1, 0x2d5c000 > >> 12baa08: 91154021 add x1, x1, #0x550 > >> 12baa0c: f9400022 ldr x2, [x1] > >> 12baa10: f9400421 ldr x1, [x1, #8] > >> 12baa14: 8a010042 and x2, x2, x1 > >> 12baa18: d3441c42 ubfx x2, x2, #4, #4 > >> 12baa1c: b4000082 cbz x2, 0x12baa2c > >> 12baa20: d2a175a0 mov x0, #0xbad0000 // #195887104 > >> 12baa24: f2994220 movk x0, #0xca11 > >> 12baa28: d69f03e0 eret > >> 12baa2c: d2c00080 mov x0, #0x400000000 // #17179869184 > >> 12baa30: f2b10000 movk x0, #0x8800, lsl #16 > >> 12baa34: f2800000 movk x0, #0x0 > >> 12baa38: d51c1100 msr hcr_el2, x0 > >> 12baa3c: d5033fdf isb > >> 12baa40: d53c4100 mrs x0, sp_el1 > >> 12baa44: 9100001f mov sp, x0 > >> 12baa48: d538d080 mrs x0, tpidr_el1 > >> 12baa4c: d51cd040 msr tpidr_el2, x0 > >> 12baa50: d53d1040 mrs x0, cpacr_el12 > >> 12baa54: d5181040 msr cpacr_el1, x0 > >> 12baa58: d53dc000 mrs x0, vbar_el12 > >> 12baa5c: d518c000 msr vbar_el1, x0 > >> 12baa60: d53c1120 mrs x0, mdcr_el2 > >> 12baa64: 9272f400 and x0, x0, #0xffffffffffffcfff > >> 12baa68: 9266f400 and x0, x0, #0xfffffffffcffffff > >> 12baa6c: d51c1120 msr mdcr_el2, x0 > >> 12baa70: d53d2040 mrs x0, tcr_el12 > >> 12baa74: d5182040 msr tcr_el1, x0 > >> 12baa78: d53d2000 mrs x0, ttbr0_el12 > >> 12baa7c: d5182000 msr ttbr0_el1, x0 > >> 12baa80: d53d2020 mrs x0, ttbr1_el12 > >> 12baa84: d5182020 msr ttbr1_el1, x0 > >> 12baa88: d53da200 mrs x0, mair_el12 > >> 12baa8c: d518a200 msr mair_el1, x0 > >> 12baa90: d5380761 mrs x1, s3_0_c0_c7_3 > >> 12baa94: d3400c21 ubfx x1, x1, #0, #4 > >> 12baa98: b4000141 cbz x1, 0x12baac0 > >> 12baa9c: d53d2060 mrs x0, s3_5_c2_c0_3 > > > > OK, this is suspiciously close to the location Ganapatrao was having > > issues with. Are you running on the same hardware? > > > > In any case, we should never take a trap for this access. Can you dump > > HCR_EL2 at the point where the guest traps (in switch.c)? > > > > I have dumped HCR_EL2 before entry to L1 in both V11 and V10. > on V10 HCR_EL2=0x2743c827c263f > on V11 HCR_EL2=0x27c3c827c263f > > on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false > resulting in NV1 bit set along with NV and NV2. > AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1. > > I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true. > There could be a bug in V11 or E2H0 patchset resulting in > vcpu_el2_e2h_is_set() returning false? The E2H0 series should only force vcpu_el2_e2h_is_set() to return true, but not set it to false. Can you dump the *guest's* version of HCR_EL2 at this point? M.
On 24-11-2023 03:49 pm, Marc Zyngier wrote: > On Fri, 24 Nov 2023 09:50:33 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> >> On 23-11-2023 10:14 pm, Marc Zyngier wrote: >>> On Thu, 23 Nov 2023 16:21:48 +0000, >>> Miguel Luis <miguel.luis@oracle.com> wrote: >>>> >>>> Hi Marc, >>>> >>>> On 21/11/2023 18:02, Marc Zyngier wrote: >>>>> On Tue, 21 Nov 2023 16:49:52 +0000, >>>>> Miguel Luis <miguel.luis@oracle.com> wrote: >>>>>> Hi Marc, >>>>>> >>>>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: >>>>>>> >>>>>>> This is the 5th drop of NV support on arm64 for this year, and most >>>>>>> probably the last one for this side of Christmas. >>>>>>> >>>>>>> For the previous episodes, see [1]. >>>>>>> >>>>>>> What's changed: >>>>>>> >>>>>>> - Drop support for the original FEAT_NV. No existing hardware supports >>>>>>> it without FEAT_NV2, and the architecture is deprecating the former >>>>>>> entirely. This results in fewer patches, and a slightly simpler >>>>>>> model overall. >>>>>>> >>>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV >>>>>>> is gone. >>>>>>> >>>>>>> - Apply the NV idreg restrictions on VM first run rather than on each >>>>>>> access. >>>>>>> >>>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather >>>>>>> than per-vcpu. >>>>>>> >>>>>>> - Fix the EL0 timer fastpath >>>>>>> >>>>>>> - Work around the architecture deficiencies when trapping WFI from a >>>>>>> L2 guest. >>>>>>> >>>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) >>>>>>> >>>>>>> - Drop the patches that have already been merged (NV trap forwarding, >>>>>>> per-MMU VTCR) >>>>>>> >>>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. >>>>>>> >>>>>>> The branch containing these patches (and more) is at [3]. As for the >>>>>>> previous rounds, my intention is to take a prefix of this series into >>>>>>> 6.8, provided that it gets enough reviewing. >>>>>>> >>>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org >>>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org >>>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only >>>>>>> >>>>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg: >>>>>> >>>>>> [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] >>>>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, >>>>>> >>>>>> This is CPACR_EL12. >>>>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... >>>>> >>>>>> Still need yet to debug. >>>>> Can you disassemble the guest around the offending PC? >>>> >>>> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9] >>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, >>>> >>>> 12baa00: 14000008 b 0x12baa20 >>>> 12baa04: d000d501 adrp x1, 0x2d5c000 >>>> 12baa08: 91154021 add x1, x1, #0x550 >>>> 12baa0c: f9400022 ldr x2, [x1] >>>> 12baa10: f9400421 ldr x1, [x1, #8] >>>> 12baa14: 8a010042 and x2, x2, x1 >>>> 12baa18: d3441c42 ubfx x2, x2, #4, #4 >>>> 12baa1c: b4000082 cbz x2, 0x12baa2c >>>> 12baa20: d2a175a0 mov x0, #0xbad0000 // #195887104 >>>> 12baa24: f2994220 movk x0, #0xca11 >>>> 12baa28: d69f03e0 eret >>>> 12baa2c: d2c00080 mov x0, #0x400000000 // #17179869184 >>>> 12baa30: f2b10000 movk x0, #0x8800, lsl #16 >>>> 12baa34: f2800000 movk x0, #0x0 >>>> 12baa38: d51c1100 msr hcr_el2, x0 >>>> 12baa3c: d5033fdf isb >>>> 12baa40: d53c4100 mrs x0, sp_el1 >>>> 12baa44: 9100001f mov sp, x0 >>>> 12baa48: d538d080 mrs x0, tpidr_el1 >>>> 12baa4c: d51cd040 msr tpidr_el2, x0 >>>> 12baa50: d53d1040 mrs x0, cpacr_el12 >>>> 12baa54: d5181040 msr cpacr_el1, x0 >>>> 12baa58: d53dc000 mrs x0, vbar_el12 >>>> 12baa5c: d518c000 msr vbar_el1, x0 >>>> 12baa60: d53c1120 mrs x0, mdcr_el2 >>>> 12baa64: 9272f400 and x0, x0, #0xffffffffffffcfff >>>> 12baa68: 9266f400 and x0, x0, #0xfffffffffcffffff >>>> 12baa6c: d51c1120 msr mdcr_el2, x0 >>>> 12baa70: d53d2040 mrs x0, tcr_el12 >>>> 12baa74: d5182040 msr tcr_el1, x0 >>>> 12baa78: d53d2000 mrs x0, ttbr0_el12 >>>> 12baa7c: d5182000 msr ttbr0_el1, x0 >>>> 12baa80: d53d2020 mrs x0, ttbr1_el12 >>>> 12baa84: d5182020 msr ttbr1_el1, x0 >>>> 12baa88: d53da200 mrs x0, mair_el12 >>>> 12baa8c: d518a200 msr mair_el1, x0 >>>> 12baa90: d5380761 mrs x1, s3_0_c0_c7_3 >>>> 12baa94: d3400c21 ubfx x1, x1, #0, #4 >>>> 12baa98: b4000141 cbz x1, 0x12baac0 >>>> 12baa9c: d53d2060 mrs x0, s3_5_c2_c0_3 >>> >>> OK, this is suspiciously close to the location Ganapatrao was having >>> issues with. Are you running on the same hardware? >>> >>> In any case, we should never take a trap for this access. Can you dump >>> HCR_EL2 at the point where the guest traps (in switch.c)? >>> >> >> I have dumped HCR_EL2 before entry to L1 in both V11 and V10. >> on V10 HCR_EL2=0x2743c827c263f >> on V11 HCR_EL2=0x27c3c827c263f >> >> on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false >> resulting in NV1 bit set along with NV and NV2. >> AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1. >> >> I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true. >> There could be a bug in V11 or E2H0 patchset resulting in >> vcpu_el2_e2h_is_set() returning false? > > The E2H0 series should only force vcpu_el2_e2h_is_set() to return > true, but not set it to false. Can you dump the *guest's* version of > HCR_EL2 at this point? > with V11: vhcr_el2=0x100030080000000 mask=0x100af00ffffffff with V10: vhcr_el2=0x488000000 with hack+V11: vhcr_el2=0x488000000 mask=0x100af00ffffffff Thanks, Ganapat
On Fri, 24 Nov 2023 12:34:41 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > > On 24-11-2023 03:49 pm, Marc Zyngier wrote: > > On Fri, 24 Nov 2023 09:50:33 +0000, > > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > >> > >> > >> > >> On 23-11-2023 10:14 pm, Marc Zyngier wrote: > >>> On Thu, 23 Nov 2023 16:21:48 +0000, > >>> Miguel Luis <miguel.luis@oracle.com> wrote: > >>>> > >>>> Hi Marc, > >>>> > >>>> On 21/11/2023 18:02, Marc Zyngier wrote: > >>>>> On Tue, 21 Nov 2023 16:49:52 +0000, > >>>>> Miguel Luis <miguel.luis@oracle.com> wrote: > >>>>>> Hi Marc, > >>>>>> > >>>>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: > >>>>>>> > >>>>>>> This is the 5th drop of NV support on arm64 for this year, and most > >>>>>>> probably the last one for this side of Christmas. > >>>>>>> > >>>>>>> For the previous episodes, see [1]. > >>>>>>> > >>>>>>> What's changed: > >>>>>>> > >>>>>>> - Drop support for the original FEAT_NV. No existing hardware supports > >>>>>>> it without FEAT_NV2, and the architecture is deprecating the former > >>>>>>> entirely. This results in fewer patches, and a slightly simpler > >>>>>>> model overall. > >>>>>>> > >>>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV > >>>>>>> is gone. > >>>>>>> > >>>>>>> - Apply the NV idreg restrictions on VM first run rather than on each > >>>>>>> access. > >>>>>>> > >>>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather > >>>>>>> than per-vcpu. > >>>>>>> > >>>>>>> - Fix the EL0 timer fastpath > >>>>>>> > >>>>>>> - Work around the architecture deficiencies when trapping WFI from a > >>>>>>> L2 guest. > >>>>>>> > >>>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) > >>>>>>> > >>>>>>> - Drop the patches that have already been merged (NV trap forwarding, > >>>>>>> per-MMU VTCR) > >>>>>>> > >>>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. > >>>>>>> > >>>>>>> The branch containing these patches (and more) is at [3]. As for the > >>>>>>> previous rounds, my intention is to take a prefix of this series into > >>>>>>> 6.8, provided that it gets enough reviewing. > >>>>>>> > >>>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org > >>>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org > >>>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only > >>>>>>> > >>>>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg: > >>>>>> > >>>>>> [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] > >>>>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, > >>>>>> > >>>>>> This is CPACR_EL12. > >>>>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... > >>>>> > >>>>>> Still need yet to debug. > >>>>> Can you disassemble the guest around the offending PC? > >>>> > >>>> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9] > >>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, > >>>> > >>>> 12baa00: 14000008 b 0x12baa20 > >>>> 12baa04: d000d501 adrp x1, 0x2d5c000 > >>>> 12baa08: 91154021 add x1, x1, #0x550 > >>>> 12baa0c: f9400022 ldr x2, [x1] > >>>> 12baa10: f9400421 ldr x1, [x1, #8] > >>>> 12baa14: 8a010042 and x2, x2, x1 > >>>> 12baa18: d3441c42 ubfx x2, x2, #4, #4 > >>>> 12baa1c: b4000082 cbz x2, 0x12baa2c > >>>> 12baa20: d2a175a0 mov x0, #0xbad0000 // #195887104 > >>>> 12baa24: f2994220 movk x0, #0xca11 > >>>> 12baa28: d69f03e0 eret > >>>> 12baa2c: d2c00080 mov x0, #0x400000000 // #17179869184 > >>>> 12baa30: f2b10000 movk x0, #0x8800, lsl #16 > >>>> 12baa34: f2800000 movk x0, #0x0 > >>>> 12baa38: d51c1100 msr hcr_el2, x0 > >>>> 12baa3c: d5033fdf isb ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This. > >>>> 12baa40: d53c4100 mrs x0, sp_el1 > >>>> 12baa44: 9100001f mov sp, x0 > >>>> 12baa48: d538d080 mrs x0, tpidr_el1 > >>>> 12baa4c: d51cd040 msr tpidr_el2, x0 > >>>> 12baa50: d53d1040 mrs x0, cpacr_el12 > >>>> 12baa54: d5181040 msr cpacr_el1, x0 > >>>> 12baa58: d53dc000 mrs x0, vbar_el12 > >>>> 12baa5c: d518c000 msr vbar_el1, x0 > >>>> 12baa60: d53c1120 mrs x0, mdcr_el2 > >>>> 12baa64: 9272f400 and x0, x0, #0xffffffffffffcfff > >>>> 12baa68: 9266f400 and x0, x0, #0xfffffffffcffffff > >>>> 12baa6c: d51c1120 msr mdcr_el2, x0 > >>>> 12baa70: d53d2040 mrs x0, tcr_el12 > >>>> 12baa74: d5182040 msr tcr_el1, x0 > >>>> 12baa78: d53d2000 mrs x0, ttbr0_el12 > >>>> 12baa7c: d5182000 msr ttbr0_el1, x0 > >>>> 12baa80: d53d2020 mrs x0, ttbr1_el12 > >>>> 12baa84: d5182020 msr ttbr1_el1, x0 > >>>> 12baa88: d53da200 mrs x0, mair_el12 > >>>> 12baa8c: d518a200 msr mair_el1, x0 > >>>> 12baa90: d5380761 mrs x1, s3_0_c0_c7_3 > >>>> 12baa94: d3400c21 ubfx x1, x1, #0, #4 > >>>> 12baa98: b4000141 cbz x1, 0x12baac0 > >>>> 12baa9c: d53d2060 mrs x0, s3_5_c2_c0_3 > >>> > >>> OK, this is suspiciously close to the location Ganapatrao was having > >>> issues with. Are you running on the same hardware? > >>> > >>> In any case, we should never take a trap for this access. Can you dump > >>> HCR_EL2 at the point where the guest traps (in switch.c)? > >>> > >> > >> I have dumped HCR_EL2 before entry to L1 in both V11 and V10. > >> on V10 HCR_EL2=0x2743c827c263f > >> on V11 HCR_EL2=0x27c3c827c263f > >> > >> on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false > >> resulting in NV1 bit set along with NV and NV2. > >> AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1. > >> > >> I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true. > >> There could be a bug in V11 or E2H0 patchset resulting in > >> vcpu_el2_e2h_is_set() returning false? > > > > The E2H0 series should only force vcpu_el2_e2h_is_set() to return > > true, but not set it to false. Can you dump the *guest's* version of > > HCR_EL2 at this point? > > > > with V11: vhcr_el2=0x100030080000000 mask=0x100af00ffffffff How is this value possible if the write to HCR_EL2 has taken place? When do you sample this? > with V10: vhcr_el2=0x488000000 > with hack+V11: vhcr_el2=0x488000000 mask=0x100af00ffffffff Well, of course, if you constrain the value of HCR_EL2... M.
On 24-11-2023 06:21 pm, Marc Zyngier wrote: > On Fri, 24 Nov 2023 12:34:41 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> >> On 24-11-2023 03:49 pm, Marc Zyngier wrote: >>> On Fri, 24 Nov 2023 09:50:33 +0000, >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>> >>>> >>>> >>>> On 23-11-2023 10:14 pm, Marc Zyngier wrote: >>>>> On Thu, 23 Nov 2023 16:21:48 +0000, >>>>> Miguel Luis <miguel.luis@oracle.com> wrote: >>>>>> >>>>>> Hi Marc, >>>>>> >>>>>> On 21/11/2023 18:02, Marc Zyngier wrote: >>>>>>> On Tue, 21 Nov 2023 16:49:52 +0000, >>>>>>> Miguel Luis <miguel.luis@oracle.com> wrote: >>>>>>>> Hi Marc, >>>>>>>> >>>>>>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote: >>>>>>>>> >>>>>>>>> This is the 5th drop of NV support on arm64 for this year, and most >>>>>>>>> probably the last one for this side of Christmas. >>>>>>>>> >>>>>>>>> For the previous episodes, see [1]. >>>>>>>>> >>>>>>>>> What's changed: >>>>>>>>> >>>>>>>>> - Drop support for the original FEAT_NV. No existing hardware supports >>>>>>>>> it without FEAT_NV2, and the architecture is deprecating the former >>>>>>>>> entirely. This results in fewer patches, and a slightly simpler >>>>>>>>> model overall. >>>>>>>>> >>>>>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV >>>>>>>>> is gone. >>>>>>>>> >>>>>>>>> - Apply the NV idreg restrictions on VM first run rather than on each >>>>>>>>> access. >>>>>>>>> >>>>>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather >>>>>>>>> than per-vcpu. >>>>>>>>> >>>>>>>>> - Fix the EL0 timer fastpath >>>>>>>>> >>>>>>>>> - Work around the architecture deficiencies when trapping WFI from a >>>>>>>>> L2 guest. >>>>>>>>> >>>>>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR) >>>>>>>>> >>>>>>>>> - Drop the patches that have already been merged (NV trap forwarding, >>>>>>>>> per-MMU VTCR) >>>>>>>>> >>>>>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2]. >>>>>>>>> >>>>>>>>> The branch containing these patches (and more) is at [3]. As for the >>>>>>>>> previous rounds, my intention is to take a prefix of this series into >>>>>>>>> 6.8, provided that it gets enough reviewing. >>>>>>>>> >>>>>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org >>>>>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org >>>>>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only >>>>>>>>> >>>>>>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg: >>>>>>>> >>>>>>>> [ 803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9] >>>>>>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, >>>>>>>> >>>>>>>> This is CPACR_EL12. >>>>>>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap... >>>>>>> >>>>>>>> Still need yet to debug. >>>>>>> Can you disassemble the guest around the offending PC? >>>>>> >>>>>> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9] >>>>>> { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read }, >>>>>> >>>>>> 12baa00: 14000008 b 0x12baa20 >>>>>> 12baa04: d000d501 adrp x1, 0x2d5c000 >>>>>> 12baa08: 91154021 add x1, x1, #0x550 >>>>>> 12baa0c: f9400022 ldr x2, [x1] >>>>>> 12baa10: f9400421 ldr x1, [x1, #8] >>>>>> 12baa14: 8a010042 and x2, x2, x1 >>>>>> 12baa18: d3441c42 ubfx x2, x2, #4, #4 >>>>>> 12baa1c: b4000082 cbz x2, 0x12baa2c >>>>>> 12baa20: d2a175a0 mov x0, #0xbad0000 // #195887104 >>>>>> 12baa24: f2994220 movk x0, #0xca11 >>>>>> 12baa28: d69f03e0 eret >>>>>> 12baa2c: d2c00080 mov x0, #0x400000000 // #17179869184 >>>>>> 12baa30: f2b10000 movk x0, #0x8800, lsl #16 >>>>>> 12baa34: f2800000 movk x0, #0x0 >>>>>> 12baa38: d51c1100 msr hcr_el2, x0 >>>>>> 12baa3c: d5033fdf isb > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > This. > >>>>>> 12baa40: d53c4100 mrs x0, sp_el1 >>>>>> 12baa44: 9100001f mov sp, x0 >>>>>> 12baa48: d538d080 mrs x0, tpidr_el1 >>>>>> 12baa4c: d51cd040 msr tpidr_el2, x0 >>>>>> 12baa50: d53d1040 mrs x0, cpacr_el12 >>>>>> 12baa54: d5181040 msr cpacr_el1, x0 >>>>>> 12baa58: d53dc000 mrs x0, vbar_el12 >>>>>> 12baa5c: d518c000 msr vbar_el1, x0 >>>>>> 12baa60: d53c1120 mrs x0, mdcr_el2 >>>>>> 12baa64: 9272f400 and x0, x0, #0xffffffffffffcfff >>>>>> 12baa68: 9266f400 and x0, x0, #0xfffffffffcffffff >>>>>> 12baa6c: d51c1120 msr mdcr_el2, x0 >>>>>> 12baa70: d53d2040 mrs x0, tcr_el12 >>>>>> 12baa74: d5182040 msr tcr_el1, x0 >>>>>> 12baa78: d53d2000 mrs x0, ttbr0_el12 >>>>>> 12baa7c: d5182000 msr ttbr0_el1, x0 >>>>>> 12baa80: d53d2020 mrs x0, ttbr1_el12 >>>>>> 12baa84: d5182020 msr ttbr1_el1, x0 >>>>>> 12baa88: d53da200 mrs x0, mair_el12 >>>>>> 12baa8c: d518a200 msr mair_el1, x0 >>>>>> 12baa90: d5380761 mrs x1, s3_0_c0_c7_3 >>>>>> 12baa94: d3400c21 ubfx x1, x1, #0, #4 >>>>>> 12baa98: b4000141 cbz x1, 0x12baac0 >>>>>> 12baa9c: d53d2060 mrs x0, s3_5_c2_c0_3 >>>>> >>>>> OK, this is suspiciously close to the location Ganapatrao was having >>>>> issues with. Are you running on the same hardware? >>>>> >>>>> In any case, we should never take a trap for this access. Can you dump >>>>> HCR_EL2 at the point where the guest traps (in switch.c)? >>>>> >>>> >>>> I have dumped HCR_EL2 before entry to L1 in both V11 and V10. >>>> on V10 HCR_EL2=0x2743c827c263f >>>> on V11 HCR_EL2=0x27c3c827c263f >>>> >>>> on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false >>>> resulting in NV1 bit set along with NV and NV2. >>>> AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1. >>>> >>>> I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true. >>>> There could be a bug in V11 or E2H0 patchset resulting in >>>> vcpu_el2_e2h_is_set() returning false? >>> >>> The E2H0 series should only force vcpu_el2_e2h_is_set() to return >>> true, but not set it to false. Can you dump the *guest's* version of >>> HCR_EL2 at this point? >>> >> >> with V11: vhcr_el2=0x100030080000000 mask=0x100af00ffffffff > > How is this value possible if the write to HCR_EL2 has taken place? > When do you sample this? I am not sure how and where it got set. I think, whatever it is set, it is due to false return of vcpu_el2_e2h_is_set(). Need to understand/debug. The vhcr_el2 value I have shared is traced along with hcr in function __activate_traps/__compute_hcr. > >> with V10: vhcr_el2=0x488000000 >> with hack+V11: vhcr_el2=0x488000000 mask=0x100af00ffffffff > > Well, of course, if you constrain the value of HCR_EL2... > > M. > Thanks, Ganapat
On Fri, 24 Nov 2023 13:22:22 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > How is this value possible if the write to HCR_EL2 has taken place? > > When do you sample this? > > I am not sure how and where it got set. I think, whatever it is set, > it is due to false return of vcpu_el2_e2h_is_set(). Need to > understand/debug. > The vhcr_el2 value I have shared is traced along with hcr in function > __activate_traps/__compute_hcr. Here's my hunch: The guest boots with E2H=0, because we don't advertise anything else on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2 means that HCR_EL2 is writable (to memory) without a trap. But we're still running with NV1=1. Subsequently, we access a sysreg that should never trap for a VHE guest, but we're with the wrong config. Bad things happen. Unfortunately, NV2 is pretty much incompatible with E2H being updated, because it cannot perform the changes that this would result into at the point where they should happen. We can try and do a best effort handling, but you can always trick it. Anyway, can you see if the hack below helps? I'm not keen on it at all, but this would be a good data point. M. From c4b856221661393b884cbf673d100faaa8dc018a Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz@kernel.org> Date: Fri, 26 May 2023 12:16:05 +0100 Subject: [PATCH] KVM: arm64: Opportunistically track HCR_EL2.E2H being flipped Signed-off-by: Marc Zyngier <maz@kernel.org> --- arch/arm64/include/asm/kvm_host.h | 9 +++++++-- arch/arm64/kvm/hyp/include/hyp/switch.h | 13 +++++++++++++ arch/arm64/kvm/hyp/vhe/switch.c | 10 ++++++++-- 3 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index c91f607e989d..d45ef41de5fb 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -655,6 +655,9 @@ struct kvm_vcpu_arch { /* State flags for kernel bookkeeping, unused by the hypervisor code */ u8 sflags; + /* Bookkeeping flags for NV */ + u8 nvflags; + /* * Don't run the guest (internal implementation need). * @@ -858,8 +861,6 @@ struct kvm_vcpu_arch { #define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5)) /* Save TRBE context if active */ #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6)) -/* vcpu running in HYP context */ -#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7)) /* SVE enabled for host EL0 */ #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0)) @@ -878,6 +879,10 @@ struct kvm_vcpu_arch { /* WFI instruction trapped */ #define IN_WFI __vcpu_single_flag(sflags, BIT(7)) +/* vcpu running in HYP context */ +#define VCPU_HYP_CONTEXT __vcpu_single_flag(nvflags, BIT(0)) +/* vcpu entered with HCR_EL2.E2H set */ +#define VCPU_HCR_E2H __vcpu_single_flag(nvflags, BIT(1)) /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) + \ diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index aed2ea35082c..9c1346116d61 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -669,6 +669,19 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) */ synchronize_vcpu_pstate(vcpu, exit_code); + if (vcpu_has_nv(vcpu) && + (!!vcpu_get_flag(vcpu, VCPU_HCR_E2H) ^ vcpu_el2_e2h_is_set(vcpu))) { + if (vcpu_el2_e2h_is_set(vcpu)) { + sysreg_clear_set(hcr_el2, HCR_NV1, 0); + vcpu_set_flag(vcpu, VCPU_HCR_E2H); + } else { + sysreg_clear_set(hcr_el2, 0, HCR_NV1); + vcpu_clear_flag(vcpu, VCPU_HCR_E2H); + } + + return true; + } + /* * Check whether we want to repaint the state one way or * another. diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index 8d1e9d1adabe..395aaa06f358 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -447,10 +447,16 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) sysreg_restore_guest_state_vhe(guest_ctxt); __debug_switch_to_guest(vcpu); - if (is_hyp_ctxt(vcpu)) + if (is_hyp_ctxt(vcpu)) { + if (vcpu_el2_e2h_is_set(vcpu)) + vcpu_set_flag(vcpu, VCPU_HCR_E2H); + else + vcpu_clear_flag(vcpu, VCPU_HCR_E2H); + vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT); - else + } else { vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT); + } do { /* Jump in the fire! */
On 24-11-2023 08:02 pm, Marc Zyngier wrote: > On Fri, 24 Nov 2023 13:22:22 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >>> How is this value possible if the write to HCR_EL2 has taken place? >>> When do you sample this? >> >> I am not sure how and where it got set. I think, whatever it is set, >> it is due to false return of vcpu_el2_e2h_is_set(). Need to >> understand/debug. >> The vhcr_el2 value I have shared is traced along with hcr in function >> __activate_traps/__compute_hcr. > > Here's my hunch: > > The guest boots with E2H=0, because we don't advertise anything else > on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2 > means that HCR_EL2 is writable (to memory) without a trap. But we're > still running with NV1=1. > > Subsequently, we access a sysreg that should never trap for a VHE > guest, but we're with the wrong config. Bad things happen. > > Unfortunately, NV2 is pretty much incompatible with E2H being updated, > because it cannot perform the changes that this would result into at > the point where they should happen. We can try and do a best effort > handling, but you can always trick it. > > Anyway, can you see if the hack below helps? I'm not keen on it at > all, but this would be a good data point. Thanks Marc, this diff fixes the issue. Just wondering what is changed w.r.t to L1 handling from V10 to V11 that it requires this trick? Also why this was not seen on your platform, is it E2H0 enabled? > > M. > > From c4b856221661393b884cbf673d100faaa8dc018a Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <maz@kernel.org> > Date: Fri, 26 May 2023 12:16:05 +0100 > Subject: [PATCH] KVM: arm64: Opportunistically track HCR_EL2.E2H being flipped > > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/include/asm/kvm_host.h | 9 +++++++-- > arch/arm64/kvm/hyp/include/hyp/switch.h | 13 +++++++++++++ > arch/arm64/kvm/hyp/vhe/switch.c | 10 ++++++++-- > 3 files changed, 28 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index c91f607e989d..d45ef41de5fb 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -655,6 +655,9 @@ struct kvm_vcpu_arch { > /* State flags for kernel bookkeeping, unused by the hypervisor code */ > u8 sflags; > > + /* Bookkeeping flags for NV */ > + u8 nvflags; > + > /* > * Don't run the guest (internal implementation need). > * > @@ -858,8 +861,6 @@ struct kvm_vcpu_arch { > #define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5)) > /* Save TRBE context if active */ > #define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6)) > -/* vcpu running in HYP context */ > -#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7)) > > /* SVE enabled for host EL0 */ > #define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0)) > @@ -878,6 +879,10 @@ struct kvm_vcpu_arch { > /* WFI instruction trapped */ > #define IN_WFI __vcpu_single_flag(sflags, BIT(7)) > > +/* vcpu running in HYP context */ > +#define VCPU_HYP_CONTEXT __vcpu_single_flag(nvflags, BIT(0)) > +/* vcpu entered with HCR_EL2.E2H set */ > +#define VCPU_HCR_E2H __vcpu_single_flag(nvflags, BIT(1)) > > /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ > #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) + \ > diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h > index aed2ea35082c..9c1346116d61 100644 > --- a/arch/arm64/kvm/hyp/include/hyp/switch.h > +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h > @@ -669,6 +669,19 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) > */ > synchronize_vcpu_pstate(vcpu, exit_code); > > + if (vcpu_has_nv(vcpu) && > + (!!vcpu_get_flag(vcpu, VCPU_HCR_E2H) ^ vcpu_el2_e2h_is_set(vcpu))) { > + if (vcpu_el2_e2h_is_set(vcpu)) { > + sysreg_clear_set(hcr_el2, HCR_NV1, 0); > + vcpu_set_flag(vcpu, VCPU_HCR_E2H); > + } else { > + sysreg_clear_set(hcr_el2, 0, HCR_NV1); > + vcpu_clear_flag(vcpu, VCPU_HCR_E2H); > + } > + > + return true; > + } > + > /* > * Check whether we want to repaint the state one way or > * another. > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c > index 8d1e9d1adabe..395aaa06f358 100644 > --- a/arch/arm64/kvm/hyp/vhe/switch.c > +++ b/arch/arm64/kvm/hyp/vhe/switch.c > @@ -447,10 +447,16 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) > sysreg_restore_guest_state_vhe(guest_ctxt); > __debug_switch_to_guest(vcpu); > > - if (is_hyp_ctxt(vcpu)) > + if (is_hyp_ctxt(vcpu)) { > + if (vcpu_el2_e2h_is_set(vcpu)) > + vcpu_set_flag(vcpu, VCPU_HCR_E2H); > + else > + vcpu_clear_flag(vcpu, VCPU_HCR_E2H); > + > vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT); > - else > + } else { > vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT); > + } > > do { > /* Jump in the fire! */ Thanks, Ganapat
On Mon, 27 Nov 2023 07:26:58 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > > On 24-11-2023 08:02 pm, Marc Zyngier wrote: > > On Fri, 24 Nov 2023 13:22:22 +0000, > > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > >> > >>> How is this value possible if the write to HCR_EL2 has taken place? > >>> When do you sample this? > >> > >> I am not sure how and where it got set. I think, whatever it is set, > >> it is due to false return of vcpu_el2_e2h_is_set(). Need to > >> understand/debug. > >> The vhcr_el2 value I have shared is traced along with hcr in function > >> __activate_traps/__compute_hcr. > > > > Here's my hunch: > > > > The guest boots with E2H=0, because we don't advertise anything else > > on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2 > > means that HCR_EL2 is writable (to memory) without a trap. But we're > > still running with NV1=1. > > > > Subsequently, we access a sysreg that should never trap for a VHE > > guest, but we're with the wrong config. Bad things happen. > > > > Unfortunately, NV2 is pretty much incompatible with E2H being updated, > > because it cannot perform the changes that this would result into at > > the point where they should happen. We can try and do a best effort > > handling, but you can always trick it. > > > > Anyway, can you see if the hack below helps? I'm not keen on it at > > all, but this would be a good data point. > > Thanks Marc, this diff fixes the issue. > Just wondering what is changed w.r.t to L1 handling from V10 to V11 > that it requires this trick? Not completely sure. Before v11, anything that would trap would be silently handled by the FEAT_NV code. Now, a trap for something that is supposed to be redirected to VNCR results in an UNDEF exception. I suspect that the exception is handled again as a call to __finalise_el2(), probably because the write to VBAR_EL1 didn't do what it was supposed to do. > Also why this was not seen on your platform, is it E2H0 enabled? It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no problems, as the guest cannot trick the host into losing track of the state (which I'm pretty sure can happen even with this ugly hack). I will probably completely disable NV1 support in the next drop, and make NV support only VHE guests. Which is the only mode that makes any sense anyway. Thanks, M.
On 27-11-2023 02:52 pm, Marc Zyngier wrote: > On Mon, 27 Nov 2023 07:26:58 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> >> On 24-11-2023 08:02 pm, Marc Zyngier wrote: >>> On Fri, 24 Nov 2023 13:22:22 +0000, >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>> >>>>> How is this value possible if the write to HCR_EL2 has taken place? >>>>> When do you sample this? >>>> >>>> I am not sure how and where it got set. I think, whatever it is set, >>>> it is due to false return of vcpu_el2_e2h_is_set(). Need to >>>> understand/debug. >>>> The vhcr_el2 value I have shared is traced along with hcr in function >>>> __activate_traps/__compute_hcr. >>> >>> Here's my hunch: >>> >>> The guest boots with E2H=0, because we don't advertise anything else >>> on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2 >>> means that HCR_EL2 is writable (to memory) without a trap. But we're >>> still running with NV1=1. >>> >>> Subsequently, we access a sysreg that should never trap for a VHE >>> guest, but we're with the wrong config. Bad things happen. >>> >>> Unfortunately, NV2 is pretty much incompatible with E2H being updated, >>> because it cannot perform the changes that this would result into at >>> the point where they should happen. We can try and do a best effort >>> handling, but you can always trick it. >>> >>> Anyway, can you see if the hack below helps? I'm not keen on it at >>> all, but this would be a good data point. >> >> Thanks Marc, this diff fixes the issue. >> Just wondering what is changed w.r.t to L1 handling from V10 to V11 >> that it requires this trick? > > Not completely sure. Before v11, anything that would trap would be > silently handled by the FEAT_NV code. Now, a trap for something that > is supposed to be redirected to VNCR results in an UNDEF exception. > > I suspect that the exception is handled again as a call to > __finalise_el2(), probably because the write to VBAR_EL1 didn't do > what it was supposed to do. > >> Also why this was not seen on your platform, is it E2H0 enabled? > > It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no > problems, as the guest cannot trick the host into losing track of the > state (which I'm pretty sure can happen even with this ugly hack). > > I will probably completely disable NV1 support in the next drop, and > make NV support only VHE guests. Which is the only mode that makes any > sense anyway. > Thanks, absolutely makes sense to have *VHE-only* L1, looking forward to a next drop. Thanks, Ganapat
On Mon, 27 Nov 2023 10:59:36 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > > On 27-11-2023 02:52 pm, Marc Zyngier wrote: > > On Mon, 27 Nov 2023 07:26:58 +0000, > > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > >> > >> > >> > >> On 24-11-2023 08:02 pm, Marc Zyngier wrote: > >>> On Fri, 24 Nov 2023 13:22:22 +0000, > >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > >>>> > >>>>> How is this value possible if the write to HCR_EL2 has taken place? > >>>>> When do you sample this? > >>>> > >>>> I am not sure how and where it got set. I think, whatever it is set, > >>>> it is due to false return of vcpu_el2_e2h_is_set(). Need to > >>>> understand/debug. > >>>> The vhcr_el2 value I have shared is traced along with hcr in function > >>>> __activate_traps/__compute_hcr. > >>> > >>> Here's my hunch: > >>> > >>> The guest boots with E2H=0, because we don't advertise anything else > >>> on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2 > >>> means that HCR_EL2 is writable (to memory) without a trap. But we're > >>> still running with NV1=1. > >>> > >>> Subsequently, we access a sysreg that should never trap for a VHE > >>> guest, but we're with the wrong config. Bad things happen. > >>> > >>> Unfortunately, NV2 is pretty much incompatible with E2H being updated, > >>> because it cannot perform the changes that this would result into at > >>> the point where they should happen. We can try and do a best effort > >>> handling, but you can always trick it. > >>> > >>> Anyway, can you see if the hack below helps? I'm not keen on it at > >>> all, but this would be a good data point. > >> > >> Thanks Marc, this diff fixes the issue. > >> Just wondering what is changed w.r.t to L1 handling from V10 to V11 > >> that it requires this trick? > > > > Not completely sure. Before v11, anything that would trap would be > > silently handled by the FEAT_NV code. Now, a trap for something that > > is supposed to be redirected to VNCR results in an UNDEF exception. > > > > I suspect that the exception is handled again as a call to > > __finalise_el2(), probably because the write to VBAR_EL1 didn't do > > what it was supposed to do. > > > >> Also why this was not seen on your platform, is it E2H0 enabled? > > > > It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no > > problems, as the guest cannot trick the host into losing track of the > > state (which I'm pretty sure can happen even with this ugly hack). > > > > I will probably completely disable NV1 support in the next drop, and > > make NV support only VHE guests. Which is the only mode that makes any > > sense anyway. > > > > Thanks, absolutely makes sense to have *VHE-only* L1, looking forward > to a next drop. Note that this won't be restricted to L1, but will affect *everything. No non-VHE guest will be supported at any level whatsoever, and NV will always expose ID_AA64MMFR4_EL1.E2H0=0b1110, indicating that HCR_EL2.NV1 is RES0, on top of ID_AA64MMFR4_EL1.NV_frac=1 (NV2 only). M.
On 27-11-2023 05:15 pm, Marc Zyngier wrote: > On Mon, 27 Nov 2023 10:59:36 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> >> >> On 27-11-2023 02:52 pm, Marc Zyngier wrote: >>> On Mon, 27 Nov 2023 07:26:58 +0000, >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>> >>>> >>>> >>>> On 24-11-2023 08:02 pm, Marc Zyngier wrote: >>>>> On Fri, 24 Nov 2023 13:22:22 +0000, >>>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>>>> >>>>>>> How is this value possible if the write to HCR_EL2 has taken place? >>>>>>> When do you sample this? >>>>>> >>>>>> I am not sure how and where it got set. I think, whatever it is set, >>>>>> it is due to false return of vcpu_el2_e2h_is_set(). Need to >>>>>> understand/debug. >>>>>> The vhcr_el2 value I have shared is traced along with hcr in function >>>>>> __activate_traps/__compute_hcr. >>>>> >>>>> Here's my hunch: >>>>> >>>>> The guest boots with E2H=0, because we don't advertise anything else >>>>> on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2 >>>>> means that HCR_EL2 is writable (to memory) without a trap. But we're >>>>> still running with NV1=1. >>>>> >>>>> Subsequently, we access a sysreg that should never trap for a VHE >>>>> guest, but we're with the wrong config. Bad things happen. >>>>> >>>>> Unfortunately, NV2 is pretty much incompatible with E2H being updated, >>>>> because it cannot perform the changes that this would result into at >>>>> the point where they should happen. We can try and do a best effort >>>>> handling, but you can always trick it. >>>>> >>>>> Anyway, can you see if the hack below helps? I'm not keen on it at >>>>> all, but this would be a good data point. >>>> >>>> Thanks Marc, this diff fixes the issue. >>>> Just wondering what is changed w.r.t to L1 handling from V10 to V11 >>>> that it requires this trick? >>> >>> Not completely sure. Before v11, anything that would trap would be >>> silently handled by the FEAT_NV code. Now, a trap for something that >>> is supposed to be redirected to VNCR results in an UNDEF exception. >>> >>> I suspect that the exception is handled again as a call to >>> __finalise_el2(), probably because the write to VBAR_EL1 didn't do >>> what it was supposed to do. >>> >>>> Also why this was not seen on your platform, is it E2H0 enabled? >>> >>> It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no >>> problems, as the guest cannot trick the host into losing track of the >>> state (which I'm pretty sure can happen even with this ugly hack). >>> >>> I will probably completely disable NV1 support in the next drop, and >>> make NV support only VHE guests. Which is the only mode that makes any >>> sense anyway. >>> >> >> Thanks, absolutely makes sense to have *VHE-only* L1, looking forward >> to a next drop. > > Note that this won't be restricted to L1, but will affect *everything. > Ok. > No non-VHE guest will be supported at any level whatsoever, and NV > will always expose ID_AA64MMFR4_EL1.E2H0=0b1110, indicating that > HCR_EL2.NV1 is RES0, on top of ID_AA64MMFR4_EL1.NV_frac=1 (NV2 only). OK, Even I was thinking the same instead of the work-around/trick, then I felt it is still needed since the L1 may be any distro kernel and it may not have code to interpret/decode ID_AA64MMFR4_EL1.E2H0. Thanks, Ganapat > > M. >
On Mon, 27 Nov 2023 12:18:45 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > >>> I will probably completely disable NV1 support in the next drop, and > >>> make NV support only VHE guests. Which is the only mode that makes any > >>> sense anyway. > >>> > >> > >> Thanks, absolutely makes sense to have *VHE-only* L1, looking forward > >> to a next drop. > > > > Note that this won't be restricted to L1, but will affect *everything. > > > Ok. > > > No non-VHE guest will be supported at any level whatsoever, and NV > > will always expose ID_AA64MMFR4_EL1.E2H0=0b1110, indicating that > > HCR_EL2.NV1 is RES0, on top of ID_AA64MMFR4_EL1.NV_frac=1 (NV2 only). > > OK, Even I was thinking the same instead of the work-around/trick, > then I felt it is still needed since the L1 may be any distro kernel > and it may not have code to interpret/decode ID_AA64MMFR4_EL1.E2H0. The problem is that we can't make that reliable. Current Linux kernels will work with E2H RES1, as you experienced it by hacking KVM. Old crap won't work, but I can't say I care. My point is: KVM NV support will be compliant with the architecture. SW that needs to run with NV will have to understand the version of the architecture that KVM exposes. If you need distros to be upgraded, then you can work with them to update their stuff. M.
On Mon, 20 Nov 2023 13:09:44 +0000, Marc Zyngier <maz@kernel.org> wrote: > > This is the 5th drop of NV support on arm64 for this year, and most > probably the last one for this side of Christmas. Unless someone objects, I'm planning to take the first 10 patches of this series into 6.8 (with the dependency on ID_AA64MMFR4_EL1.NV_frac in patch #1 removed). M.
On Mon, Dec 18, 2023 at 12:39:26PM +0000, Marc Zyngier wrote: > On Mon, 20 Nov 2023 13:09:44 +0000, > Marc Zyngier <maz@kernel.org> wrote: > > > > This is the 5th drop of NV support on arm64 for this year, and most > > probably the last one for this side of Christmas. > > Unless someone objects, I'm planning to take the first 10 patches of > this series into 6.8 (with the dependency on ID_AA64MMFR4_EL1.NV_frac > in patch #1 removed). For the first 10 patches: Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
On Mon, 20 Nov 2023 13:09:44 +0000, Marc Zyngier wrote: > This is the 5th drop of NV support on arm64 for this year, and most > probably the last one for this side of Christmas. > > For the previous episodes, see [1]. > > What's changed: > > [...] Applied to next, thanks! [01/43] arm64: cpufeatures: Restrict NV support to FEAT_NV2 commit: 2bfc654b89c4dd1c372bb2cbba6b5a0eb578d214 [02/43] KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt() commit: 111903d1f5b9334d1100e1c6ee08e740fa374d91 [03/43] KVM: arm64: nv: Compute NV view of idregs as a one-off commit: 3ed0b5123cd5a2a4f1fe4e594e7bf319e9eaf1da [04/43] KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR commit: 4d4f52052ba8357f1591cb9bc3086541070711af [05/43] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers commit: 3606e0b2e462164bced151dbb54ccfe42ac6c35b [06/43] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets commit: 60ce16cc122aad999129d23061fa35f63d5b1e9b [07/43] KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling commit: 2733dd10701abc6ab23d65a732f58fbeb80bd203 [08/43] KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers commit: 9b9cce60be85e6807bdb0eaa2f520e78dbab0659 [09/43] KVM: arm64: nv: Map VNCR-capable registers to a separate page commit: d8bd48e3f0ee9e1fdba2a2e453155a5354e48a8d [10/43] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() commit: fedc612314acfebf506e071bf3a941076aa56d10 Cheers, M.