mbox series

[v11,00/43] KVM: arm64: Nested Virtualization support (FEAT_NV2 only)

Message ID 20231120131027.854038-1-maz@kernel.org (mailing list archive)
Headers show
Series KVM: arm64: Nested Virtualization support (FEAT_NV2 only) | expand

Message

Marc Zyngier Nov. 20, 2023, 1:09 p.m. UTC
This is the 5th drop of NV support on arm64 for this year, and most
probably the last one for this side of Christmas.

For the previous episodes, see [1].

What's changed:

- Drop support for the original FEAT_NV. No existing hardware supports
  it without FEAT_NV2, and the architecture is deprecating the former
  entirely. This results in fewer patches, and a slightly simpler
  model overall.

- Reorganise the series to make it a bit more logical now that FEAT_NV
  is gone.

- Apply the NV idreg restrictions on VM first run rather than on each
  access.

- Make the nested vgic shadow CPU interface a per-CPU structure rather
  than per-vcpu.

- Fix the EL0 timer fastpath

- Work around the architecture deficiencies when trapping WFI from a
  L2 guest.

- Fix sampling of nested vgic state (MISR, ELRSR, EISR)

- Drop the patches that have already been merged (NV trap forwarding,
  per-MMU VTCR)

- Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].

The branch containing these patches (and more) is at [3]. As for the
previous rounds, my intention is to take a prefix of this series into
6.8, provided that it gets enough reviewing.

[1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
[2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
[3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only

Andre Przywara (1):
  KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ

Christoffer Dall (2):
  KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  KVM: arm64: nv: Unmap/flush shadow stage 2 page tables

Jintack Lim (3):
  KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
  KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2

Marc Zyngier (37):
  arm64: cpufeatures: Restrict NV support to FEAT_NV2
  KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()
  KVM: arm64: nv: Compute NV view of idregs as a one-off
  KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR
  KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  KVM: arm64: Introduce a bad_trap() primitive for unexpected trap
    handling
  KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers
  KVM: arm64: nv: Map VNCR-capable registers to a separate page
  KVM: arm64: nv: Handle virtual EL2 registers in
    vcpu_read/write_sys_reg()
  KVM: arm64: nv: Handle HCR_EL2.E2H specially
  KVM: arm64: nv: Handle CNTHCTL_EL2 specially
  KVM: arm64: nv: Save/Restore vEL2 sysregs
  KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2
  KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  KVM: arm64: nv: Handle shadow stage 2 page faults
  KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  KVM: arm64: nv: Set a handler for the system instruction traps
  KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  KVM: arm64: nv: Hide RAS from nested guests
  KVM: arm64: nv: Add handling of EL2-specific timer registers
  KVM: arm64: nv: Sync nested timer state with FEAT_NV2
  KVM: arm64: nv: Publish emulated timer interrupt state in the
    in-memory state
  KVM: arm64: nv: Load timer before the GIC
  KVM: arm64: nv: Nested GICv3 Support
  KVM: arm64: nv: Don't block in WFI from nested state
  KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
  KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt
    delivery
  KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
  KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
    information
  KVM: arm64: nv: Tag shadow S2 entries with nested level
  KVM: arm64: nv: Allocate VNCR page when required
  KVM: arm64: nv: Fast-track 'InHost' exception returns
  KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
  KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
  KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on
  KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT

 .../virt/kvm/devices/arm-vgic-v3.rst          |  12 +-
 arch/arm64/include/asm/esr.h                  |   1 +
 arch/arm64/include/asm/kvm_arm.h              |   3 +
 arch/arm64/include/asm/kvm_asm.h              |   4 +
 arch/arm64/include/asm/kvm_emulate.h          |  53 +-
 arch/arm64/include/asm/kvm_host.h             | 223 +++-
 arch/arm64/include/asm/kvm_hyp.h              |   2 +
 arch/arm64/include/asm/kvm_mmu.h              |  12 +
 arch/arm64/include/asm/kvm_nested.h           | 130 ++-
 arch/arm64/include/asm/sysreg.h               |   7 +
 arch/arm64/include/asm/vncr_mapping.h         | 102 ++
 arch/arm64/include/uapi/asm/kvm.h             |   1 +
 arch/arm64/kernel/cpufeature.c                |  22 +-
 arch/arm64/kvm/Makefile                       |   4 +-
 arch/arm64/kvm/arch_timer.c                   | 115 +-
 arch/arm64/kvm/arm.c                          |  46 +-
 arch/arm64/kvm/at.c                           | 219 ++++
 arch/arm64/kvm/emulate-nested.c               |  48 +-
 arch/arm64/kvm/handle_exit.c                  |  29 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |   8 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   5 +-
 arch/arm64/kvm/hyp/nvhe/switch.c              |   2 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |   2 +-
 arch/arm64/kvm/hyp/vgic-v3-sr.c               |   6 +-
 arch/arm64/kvm/hyp/vhe/switch.c               | 211 +++-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c            | 133 ++-
 arch/arm64/kvm/hyp/vhe/tlb.c                  |  83 ++
 arch/arm64/kvm/mmu.c                          | 248 ++++-
 arch/arm64/kvm/nested.c                       | 813 ++++++++++++++-
 arch/arm64/kvm/reset.c                        |   7 +
 arch/arm64/kvm/sys_regs.c                     | 978 ++++++++++++++++--
 arch/arm64/kvm/vgic/vgic-init.c               |  35 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |  29 +-
 arch/arm64/kvm/vgic/vgic-v3-nested.c          | 270 +++++
 arch/arm64/kvm/vgic/vgic-v3.c                 |  35 +-
 arch/arm64/kvm/vgic/vgic.c                    |  29 +
 arch/arm64/kvm/vgic/vgic.h                    |  10 +
 arch/arm64/tools/cpucaps                      |   2 +
 include/clocksource/arm_arch_timer.h          |   4 +
 include/kvm/arm_arch_timer.h                  |  19 +
 include/kvm/arm_vgic.h                        |  16 +
 include/uapi/linux/kvm.h                      |   1 +
 tools/arch/arm/include/uapi/asm/kvm.h         |   1 +
 43 files changed, 3725 insertions(+), 255 deletions(-)
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h
 create mode 100644 arch/arm64/kvm/at.c
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

Comments

Ganapatrao Kulkarni Nov. 21, 2023, 8:51 a.m. UTC | #1
Hi Marc,

On 20-11-2023 06:39 pm, Marc Zyngier wrote:
> This is the 5th drop of NV support on arm64 for this year, and most
> probably the last one for this side of Christmas.
> 
> For the previous episodes, see [1].
> 
> What's changed:
> 
> - Drop support for the original FEAT_NV. No existing hardware supports
>    it without FEAT_NV2, and the architecture is deprecating the former
>    entirely. This results in fewer patches, and a slightly simpler
>    model overall.
> 
> - Reorganise the series to make it a bit more logical now that FEAT_NV
>    is gone.
> 
> - Apply the NV idreg restrictions on VM first run rather than on each
>    access.
> 
> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>    than per-vcpu.
> 
> - Fix the EL0 timer fastpath
> 
> - Work around the architecture deficiencies when trapping WFI from a
>    L2 guest.
> 
> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> 
> - Drop the patches that have already been merged (NV trap forwarding,
>    per-MMU VTCR)
> 
> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> 
> The branch containing these patches (and more) is at [3]. As for the
> previous rounds, my intention is to take a prefix of this series into
> 6.8, provided that it gets enough reviewing.
> 
> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
>

V11 series is not booting on Ampere platform (I am yet to debug).
With lkvm, it is stuck at the very early stage itself and no early boot 
prints/logs.

Are there any changes needed in kvmtool for V11?

> Andre Przywara (1):
>    KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> 
> Christoffer Dall (2):
>    KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>    KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
> 
> Jintack Lim (3):
>    KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>    KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
>    KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
> 
> Marc Zyngier (37):
>    arm64: cpufeatures: Restrict NV support to FEAT_NV2
>    KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()
>    KVM: arm64: nv: Compute NV view of idregs as a one-off
>    KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR
>    KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
>    KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
>    KVM: arm64: Introduce a bad_trap() primitive for unexpected trap
>      handling
>    KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers
>    KVM: arm64: nv: Map VNCR-capable registers to a separate page
>    KVM: arm64: nv: Handle virtual EL2 registers in
>      vcpu_read/write_sys_reg()
>    KVM: arm64: nv: Handle HCR_EL2.E2H specially
>    KVM: arm64: nv: Handle CNTHCTL_EL2 specially
>    KVM: arm64: nv: Save/Restore vEL2 sysregs
>    KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2
>    KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
>    KVM: arm64: nv: Handle shadow stage 2 page faults
>    KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
>    KVM: arm64: nv: Set a handler for the system instruction traps
>    KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>    KVM: arm64: nv: Hide RAS from nested guests
>    KVM: arm64: nv: Add handling of EL2-specific timer registers
>    KVM: arm64: nv: Sync nested timer state with FEAT_NV2
>    KVM: arm64: nv: Publish emulated timer interrupt state in the
>      in-memory state
>    KVM: arm64: nv: Load timer before the GIC
>    KVM: arm64: nv: Nested GICv3 Support
>    KVM: arm64: nv: Don't block in WFI from nested state
>    KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
>    KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt
>      delivery
>    KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
>    KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
>      information
>    KVM: arm64: nv: Tag shadow S2 entries with nested level
>    KVM: arm64: nv: Allocate VNCR page when required
>    KVM: arm64: nv: Fast-track 'InHost' exception returns
>    KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
>    KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
>    KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on
>    KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
> 
>   .../virt/kvm/devices/arm-vgic-v3.rst          |  12 +-
>   arch/arm64/include/asm/esr.h                  |   1 +
>   arch/arm64/include/asm/kvm_arm.h              |   3 +
>   arch/arm64/include/asm/kvm_asm.h              |   4 +
>   arch/arm64/include/asm/kvm_emulate.h          |  53 +-
>   arch/arm64/include/asm/kvm_host.h             | 223 +++-
>   arch/arm64/include/asm/kvm_hyp.h              |   2 +
>   arch/arm64/include/asm/kvm_mmu.h              |  12 +
>   arch/arm64/include/asm/kvm_nested.h           | 130 ++-
>   arch/arm64/include/asm/sysreg.h               |   7 +
>   arch/arm64/include/asm/vncr_mapping.h         | 102 ++
>   arch/arm64/include/uapi/asm/kvm.h             |   1 +
>   arch/arm64/kernel/cpufeature.c                |  22 +-
>   arch/arm64/kvm/Makefile                       |   4 +-
>   arch/arm64/kvm/arch_timer.c                   | 115 +-
>   arch/arm64/kvm/arm.c                          |  46 +-
>   arch/arm64/kvm/at.c                           | 219 ++++
>   arch/arm64/kvm/emulate-nested.c               |  48 +-
>   arch/arm64/kvm/handle_exit.c                  |  29 +-
>   arch/arm64/kvm/hyp/include/hyp/switch.h       |   8 +-
>   arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   5 +-
>   arch/arm64/kvm/hyp/nvhe/switch.c              |   2 +-
>   arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |   2 +-
>   arch/arm64/kvm/hyp/vgic-v3-sr.c               |   6 +-
>   arch/arm64/kvm/hyp/vhe/switch.c               | 211 +++-
>   arch/arm64/kvm/hyp/vhe/sysreg-sr.c            | 133 ++-
>   arch/arm64/kvm/hyp/vhe/tlb.c                  |  83 ++
>   arch/arm64/kvm/mmu.c                          | 248 ++++-
>   arch/arm64/kvm/nested.c                       | 813 ++++++++++++++-
>   arch/arm64/kvm/reset.c                        |   7 +
>   arch/arm64/kvm/sys_regs.c                     | 978 ++++++++++++++++--
>   arch/arm64/kvm/vgic/vgic-init.c               |  35 +
>   arch/arm64/kvm/vgic/vgic-kvm-device.c         |  29 +-
>   arch/arm64/kvm/vgic/vgic-v3-nested.c          | 270 +++++
>   arch/arm64/kvm/vgic/vgic-v3.c                 |  35 +-
>   arch/arm64/kvm/vgic/vgic.c                    |  29 +
>   arch/arm64/kvm/vgic/vgic.h                    |  10 +
>   arch/arm64/tools/cpucaps                      |   2 +
>   include/clocksource/arm_arch_timer.h          |   4 +
>   include/kvm/arm_arch_timer.h                  |  19 +
>   include/kvm/arm_vgic.h                        |  16 +
>   include/uapi/linux/kvm.h                      |   1 +
>   tools/arch/arm/include/uapi/asm/kvm.h         |   1 +
>   43 files changed, 3725 insertions(+), 255 deletions(-)
>   create mode 100644 arch/arm64/include/asm/vncr_mapping.h
>   create mode 100644 arch/arm64/kvm/at.c
>   create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c
> 

Thanks,
Ganapat
Marc Zyngier Nov. 21, 2023, 9:08 a.m. UTC | #2
On Tue, 21 Nov 2023 08:51:35 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> Hi Marc,
> 
> On 20-11-2023 06:39 pm, Marc Zyngier wrote:
> > This is the 5th drop of NV support on arm64 for this year, and most
> > probably the last one for this side of Christmas.
> > 
> > For the previous episodes, see [1].
> > 
> > What's changed:
> > 
> > - Drop support for the original FEAT_NV. No existing hardware supports
> >    it without FEAT_NV2, and the architecture is deprecating the former
> >    entirely. This results in fewer patches, and a slightly simpler
> >    model overall.
> > 
> > - Reorganise the series to make it a bit more logical now that FEAT_NV
> >    is gone.
> > 
> > - Apply the NV idreg restrictions on VM first run rather than on each
> >    access.
> > 
> > - Make the nested vgic shadow CPU interface a per-CPU structure rather
> >    than per-vcpu.
> > 
> > - Fix the EL0 timer fastpath
> > 
> > - Work around the architecture deficiencies when trapping WFI from a
> >    L2 guest.
> > 
> > - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> > 
> > - Drop the patches that have already been merged (NV trap forwarding,
> >    per-MMU VTCR)
> > 
> > - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> > 
> > The branch containing these patches (and more) is at [3]. As for the
> > previous rounds, my intention is to take a prefix of this series into
> > 6.8, provided that it gets enough reviewing.
> > 
> > [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> > [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> > 
> 
> V11 series is not booting on Ampere platform (I am yet to debug).
> With lkvm, it is stuck at the very early stage itself and no early
> boot prints/logs.
> 
> Are there any changes needed in kvmtool for V11?

Not really, I'm still using the version I had built for 6.5. Is the
problem with L1 or L2?

However, this looks like a problem I've been chasing, and which I
though was only a M2 issue. In some situations, I'm getting interrupt
storms when L1 gets a level interrupt while in L2.

Can you cherry-pick [1] from my tree, and let me know if this helps?
This isn't a proper fix, but if L2 starts booting with this, I would
know this is a common issue.

Now, if your problem is with L1, I really have no idea.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/nv-6.8-nv2-only&id=759d2e18f8954f4c76eb1772f38301df6ed8fa5d
Ganapatrao Kulkarni Nov. 21, 2023, 9:26 a.m. UTC | #3
On 21-11-2023 02:38 pm, Marc Zyngier wrote:
> On Tue, 21 Nov 2023 08:51:35 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>> Hi Marc,
>>
>> On 20-11-2023 06:39 pm, Marc Zyngier wrote:
>>> This is the 5th drop of NV support on arm64 for this year, and most
>>> probably the last one for this side of Christmas.
>>>
>>> For the previous episodes, see [1].
>>>
>>> What's changed:
>>>
>>> - Drop support for the original FEAT_NV. No existing hardware supports
>>>     it without FEAT_NV2, and the architecture is deprecating the former
>>>     entirely. This results in fewer patches, and a slightly simpler
>>>     model overall.
>>>
>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
>>>     is gone.
>>>
>>> - Apply the NV idreg restrictions on VM first run rather than on each
>>>     access.
>>>
>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>>>     than per-vcpu.
>>>
>>> - Fix the EL0 timer fastpath
>>>
>>> - Work around the architecture deficiencies when trapping WFI from a
>>>     L2 guest.
>>>
>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
>>>
>>> - Drop the patches that have already been merged (NV trap forwarding,
>>>     per-MMU VTCR)
>>>
>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
>>>
>>> The branch containing these patches (and more) is at [3]. As for the
>>> previous rounds, my intention is to take a prefix of this series into
>>> 6.8, provided that it gets enough reviewing.
>>>
>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
>>>
>>
>> V11 series is not booting on Ampere platform (I am yet to debug).
>> With lkvm, it is stuck at the very early stage itself and no early
>> boot prints/logs.
>>
>> Are there any changes needed in kvmtool for V11?
> 
> Not really, I'm still using the version I had built for 6.5. Is the
> problem with L1 or L2?

Stuck in the L1 itself.

I am using kvmtool from 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16

> 
> However, this looks like a problem I've been chasing, and which I
> though was only a M2 issue. In some situations, I'm getting interrupt
> storms when L1 gets a level interrupt while in L2.
> 
> Can you cherry-pick [1] from my tree, and let me know if this helps?
> This isn't a proper fix, but if L2 starts booting with this, I would
> know this is a common issue.
> 
> Now, if your problem is with L1, I really have no idea.
> 
> Thanks,
> 
> 	M.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/nv-6.8-nv2-only&id=759d2e18f8954f4c76eb1772f38301df6ed8fa5d
> 

Thanks,
Ganapat
Marc Zyngier Nov. 21, 2023, 9:41 a.m. UTC | #4
On Tue, 21 Nov 2023 09:26:22 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> 
> On 21-11-2023 02:38 pm, Marc Zyngier wrote:
> > On Tue, 21 Nov 2023 08:51:35 +0000,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >> 
> >> Hi Marc,
> >> 
> >> On 20-11-2023 06:39 pm, Marc Zyngier wrote:
> >>> This is the 5th drop of NV support on arm64 for this year, and most
> >>> probably the last one for this side of Christmas.
> >>> 
> >>> For the previous episodes, see [1].
> >>> 
> >>> What's changed:
> >>> 
> >>> - Drop support for the original FEAT_NV. No existing hardware supports
> >>>     it without FEAT_NV2, and the architecture is deprecating the former
> >>>     entirely. This results in fewer patches, and a slightly simpler
> >>>     model overall.
> >>> 
> >>> - Reorganise the series to make it a bit more logical now that FEAT_NV
> >>>     is gone.
> >>> 
> >>> - Apply the NV idreg restrictions on VM first run rather than on each
> >>>     access.
> >>> 
> >>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
> >>>     than per-vcpu.
> >>> 
> >>> - Fix the EL0 timer fastpath
> >>> 
> >>> - Work around the architecture deficiencies when trapping WFI from a
> >>>     L2 guest.
> >>> 
> >>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> >>> 
> >>> - Drop the patches that have already been merged (NV trap forwarding,
> >>>     per-MMU VTCR)
> >>> 
> >>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> >>> 
> >>> The branch containing these patches (and more) is at [3]. As for the
> >>> previous rounds, my intention is to take a prefix of this series into
> >>> 6.8, provided that it gets enough reviewing.
> >>> 
> >>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> >>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> >>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> >>> 
> >> 
> >> V11 series is not booting on Ampere platform (I am yet to debug).
> >> With lkvm, it is stuck at the very early stage itself and no early
> >> boot prints/logs.
> >> 
> >> Are there any changes needed in kvmtool for V11?
> > 
> > Not really, I'm still using the version I had built for 6.5. Is the
> > problem with L1 or L2?
> 
> Stuck in the L1 itself.
> 
> I am using kvmtool from
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16

Huh. That's positively ancient. Yet, you shouldn't get into a
situation where the L1 guest locks up.

I have pushed out my kvmtool branch[1]. Please give it a go.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-6.5
Miguel Luis Nov. 21, 2023, 4:49 p.m. UTC | #5
Hi Marc,

> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
> 
> This is the 5th drop of NV support on arm64 for this year, and most
> probably the last one for this side of Christmas.
> 
> For the previous episodes, see [1].
> 
> What's changed:
> 
> - Drop support for the original FEAT_NV. No existing hardware supports
>  it without FEAT_NV2, and the architecture is deprecating the former
>  entirely. This results in fewer patches, and a slightly simpler
>  model overall.
> 
> - Reorganise the series to make it a bit more logical now that FEAT_NV
>  is gone.
> 
> - Apply the NV idreg restrictions on VM first run rather than on each
>  access.
> 
> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>  than per-vcpu.
> 
> - Fix the EL0 timer fastpath
> 
> - Work around the architecture deficiencies when trapping WFI from a
>  L2 guest.
> 
> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> 
> - Drop the patches that have already been merged (NV trap forwarding,
>  per-MMU VTCR)
> 
> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> 
> The branch containing these patches (and more) is at [3]. As for the
> previous rounds, my intention is to take a prefix of this series into
> 6.8, provided that it gets enough reviewing.
> 
> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> 

While I was testing this with kvmtool for 5.16 I noted the following on dmesg:

[  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
                { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },

This is CPACR_EL12.

Still need yet to debug.

As for QEMU, it is having issues enabling _EL2 feature although EL2 is supported by
checking KVM_CAP_ARM_EL2; need yet to debug this.

Thanks

Miguel

> Andre Przywara (1):
>  KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> 
> Christoffer Dall (2):
>  KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>  KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
> 
> Jintack Lim (3):
>  KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>  KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
>  KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
> 
> Marc Zyngier (37):
>  arm64: cpufeatures: Restrict NV support to FEAT_NV2
>  KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()
>  KVM: arm64: nv: Compute NV view of idregs as a one-off
>  KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR
>  KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
>  KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
>  KVM: arm64: Introduce a bad_trap() primitive for unexpected trap
>    handling
>  KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers
>  KVM: arm64: nv: Map VNCR-capable registers to a separate page
>  KVM: arm64: nv: Handle virtual EL2 registers in
>    vcpu_read/write_sys_reg()
>  KVM: arm64: nv: Handle HCR_EL2.E2H specially
>  KVM: arm64: nv: Handle CNTHCTL_EL2 specially
>  KVM: arm64: nv: Save/Restore vEL2 sysregs
>  KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2
>  KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
>  KVM: arm64: nv: Handle shadow stage 2 page faults
>  KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
>  KVM: arm64: nv: Set a handler for the system instruction traps
>  KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>  KVM: arm64: nv: Hide RAS from nested guests
>  KVM: arm64: nv: Add handling of EL2-specific timer registers
>  KVM: arm64: nv: Sync nested timer state with FEAT_NV2
>  KVM: arm64: nv: Publish emulated timer interrupt state in the
>    in-memory state
>  KVM: arm64: nv: Load timer before the GIC
>  KVM: arm64: nv: Nested GICv3 Support
>  KVM: arm64: nv: Don't block in WFI from nested state
>  KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
>  KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt
>    delivery
>  KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
>  KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
>    information
>  KVM: arm64: nv: Tag shadow S2 entries with nested level
>  KVM: arm64: nv: Allocate VNCR page when required
>  KVM: arm64: nv: Fast-track 'InHost' exception returns
>  KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
>  KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
>  KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on
>  KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
> 
> .../virt/kvm/devices/arm-vgic-v3.rst          |  12 +-
> arch/arm64/include/asm/esr.h                  |   1 +
> arch/arm64/include/asm/kvm_arm.h              |   3 +
> arch/arm64/include/asm/kvm_asm.h              |   4 +
> arch/arm64/include/asm/kvm_emulate.h          |  53 +-
> arch/arm64/include/asm/kvm_host.h             | 223 +++-
> arch/arm64/include/asm/kvm_hyp.h              |   2 +
> arch/arm64/include/asm/kvm_mmu.h              |  12 +
> arch/arm64/include/asm/kvm_nested.h           | 130 ++-
> arch/arm64/include/asm/sysreg.h               |   7 +
> arch/arm64/include/asm/vncr_mapping.h         | 102 ++
> arch/arm64/include/uapi/asm/kvm.h             |   1 +
> arch/arm64/kernel/cpufeature.c                |  22 +-
> arch/arm64/kvm/Makefile                       |   4 +-
> arch/arm64/kvm/arch_timer.c                   | 115 +-
> arch/arm64/kvm/arm.c                          |  46 +-
> arch/arm64/kvm/at.c                           | 219 ++++
> arch/arm64/kvm/emulate-nested.c               |  48 +-
> arch/arm64/kvm/handle_exit.c                  |  29 +-
> arch/arm64/kvm/hyp/include/hyp/switch.h       |   8 +-
> arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   5 +-
> arch/arm64/kvm/hyp/nvhe/switch.c              |   2 +-
> arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |   2 +-
> arch/arm64/kvm/hyp/vgic-v3-sr.c               |   6 +-
> arch/arm64/kvm/hyp/vhe/switch.c               | 211 +++-
> arch/arm64/kvm/hyp/vhe/sysreg-sr.c            | 133 ++-
> arch/arm64/kvm/hyp/vhe/tlb.c                  |  83 ++
> arch/arm64/kvm/mmu.c                          | 248 ++++-
> arch/arm64/kvm/nested.c                       | 813 ++++++++++++++-
> arch/arm64/kvm/reset.c                        |   7 +
> arch/arm64/kvm/sys_regs.c                     | 978 ++++++++++++++++--
> arch/arm64/kvm/vgic/vgic-init.c               |  35 +
> arch/arm64/kvm/vgic/vgic-kvm-device.c         |  29 +-
> arch/arm64/kvm/vgic/vgic-v3-nested.c          | 270 +++++
> arch/arm64/kvm/vgic/vgic-v3.c                 |  35 +-
> arch/arm64/kvm/vgic/vgic.c                    |  29 +
> arch/arm64/kvm/vgic/vgic.h                    |  10 +
> arch/arm64/tools/cpucaps                      |   2 +
> include/clocksource/arm_arch_timer.h          |   4 +
> include/kvm/arm_arch_timer.h                  |  19 +
> include/kvm/arm_vgic.h                        |  16 +
> include/uapi/linux/kvm.h                      |   1 +
> tools/arch/arm/include/uapi/asm/kvm.h         |   1 +
> 43 files changed, 3725 insertions(+), 255 deletions(-)
> create mode 100644 arch/arm64/include/asm/vncr_mapping.h
> create mode 100644 arch/arm64/kvm/at.c
> create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c
> 
> -- 
> 2.39.2
>
Marc Zyngier Nov. 21, 2023, 7:02 p.m. UTC | #6
On Tue, 21 Nov 2023 16:49:52 +0000,
Miguel Luis <miguel.luis@oracle.com> wrote:
> 
> Hi Marc,
> 
> > On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
> > 
> > This is the 5th drop of NV support on arm64 for this year, and most
> > probably the last one for this side of Christmas.
> > 
> > For the previous episodes, see [1].
> > 
> > What's changed:
> > 
> > - Drop support for the original FEAT_NV. No existing hardware supports
> >  it without FEAT_NV2, and the architecture is deprecating the former
> >  entirely. This results in fewer patches, and a slightly simpler
> >  model overall.
> > 
> > - Reorganise the series to make it a bit more logical now that FEAT_NV
> >  is gone.
> > 
> > - Apply the NV idreg restrictions on VM first run rather than on each
> >  access.
> > 
> > - Make the nested vgic shadow CPU interface a per-CPU structure rather
> >  than per-vcpu.
> > 
> > - Fix the EL0 timer fastpath
> > 
> > - Work around the architecture deficiencies when trapping WFI from a
> >  L2 guest.
> > 
> > - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> > 
> > - Drop the patches that have already been merged (NV trap forwarding,
> >  per-MMU VTCR)
> > 
> > - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> > 
> > The branch containing these patches (and more) is at [3]. As for the
> > previous rounds, my intention is to take a prefix of this series into
> > 6.8, provided that it gets enough reviewing.
> > 
> > [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> > [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> > 
> 
> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
> 
> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
>                 { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
> 
> This is CPACR_EL12.

CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...

> Still need yet to debug.

Can you disassemble the guest around the offending PC?

> As for QEMU, it is having issues enabling _EL2 feature although EL2
> is supported by checking KVM_CAP_ARM_EL2; need yet to debug this.

The capability number changes at each release. Make sure you resync
your includes.

	M.
Ganapatrao Kulkarni Nov. 22, 2023, 11:10 a.m. UTC | #7
On 21-11-2023 03:11 pm, Marc Zyngier wrote:
> On Tue, 21 Nov 2023 09:26:22 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>>
>> On 21-11-2023 02:38 pm, Marc Zyngier wrote:
>>> On Tue, 21 Nov 2023 08:51:35 +0000,
>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>
>>>>
>>>> Hi Marc,
>>>>
>>>> On 20-11-2023 06:39 pm, Marc Zyngier wrote:
>>>>> This is the 5th drop of NV support on arm64 for this year, and most
>>>>> probably the last one for this side of Christmas.
>>>>>
>>>>> For the previous episodes, see [1].
>>>>>
>>>>> What's changed:
>>>>>
>>>>> - Drop support for the original FEAT_NV. No existing hardware supports
>>>>>      it without FEAT_NV2, and the architecture is deprecating the former
>>>>>      entirely. This results in fewer patches, and a slightly simpler
>>>>>      model overall.
>>>>>
>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
>>>>>      is gone.
>>>>>
>>>>> - Apply the NV idreg restrictions on VM first run rather than on each
>>>>>      access.
>>>>>
>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>>>>>      than per-vcpu.
>>>>>
>>>>> - Fix the EL0 timer fastpath
>>>>>
>>>>> - Work around the architecture deficiencies when trapping WFI from a
>>>>>      L2 guest.
>>>>>
>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
>>>>>
>>>>> - Drop the patches that have already been merged (NV trap forwarding,
>>>>>      per-MMU VTCR)
>>>>>
>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
>>>>>
>>>>> The branch containing these patches (and more) is at [3]. As for the
>>>>> previous rounds, my intention is to take a prefix of this series into
>>>>> 6.8, provided that it gets enough reviewing.
>>>>>
>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
>>>>>
>>>>
>>>> V11 series is not booting on Ampere platform (I am yet to debug).
>>>> With lkvm, it is stuck at the very early stage itself and no early
>>>> boot prints/logs.
>>>>
>>>> Are there any changes needed in kvmtool for V11?
>>>
>>> Not really, I'm still using the version I had built for 6.5. Is the
>>> problem with L1 or L2?
>>
>> Stuck in the L1 itself.
>>
>> I am using kvmtool from
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16
> 
> Huh. That's positively ancient. Yet, you shouldn't get into a
> situation where the L1 guest locks up.
> 
> I have pushed out my kvmtool branch[1]. Please give it a go.
> 
No change, still L1 hangs. Captured ftrace and the L1 is keep 
looping/faulting around same address across kvm_entry and kvm_exits.

It is a weird behavior, L1 is faulting and looping around MDCR and 
AA64MMFR3_EL1 access in function __finalise_el2.

asm:
ffffffc080528a58:       d53dc000        mrs     x0, vbar_el12
ffffffc080528a5c:       d518c000        msr     vbar_el1, x0
ffffffc080528a60:       d53c1120        mrs     x0, mdcr_el2
ffffffc080528a64:       9272f400        and     x0, x0, #0xffffffffffffcfff
ffffffc080528a68:       9266f400        and     x0, x0, #0xfffffffffcffffff
ffffffc080528a6c:       d51c1120        msr     mdcr_el2, x0
ffffffc080528a70:       d53d2040        mrs     x0, tcr_el12
ffffffc080528a74:       d5182040        msr     tcr_el1, x0
ffffffc080528a78:       d53d2000        mrs     x0, ttbr0_el12
ffffffc080528a7c:       d5182000        msr     ttbr0_el1, x0
ffffffc080528a80:       d53d2020        mrs     x0, ttbr1_el12
ffffffc080528a84:       d5182020        msr     ttbr1_el1, x0
ffffffc080528a88:       d53da200        mrs     x0, mair_el12
ffffffc080528a8c:       d518a200        msr     mair_el1, x0
ffffffc080528a90:       d5380761        mrs     x1, s3_0_c0_c7_3
ffffffc080528a94:       d3400c21        ubfx    x1, x1, #0, #4
ffffffc080528a98:       b4000141        cbz     x1, ffffffc080528ac0 
<__finalise_el2+0x270>
ffffffc080528a9c:       d53d2060        mrs     x0, s3_5_c2_c0_3

ftrace:
       kvm-vcpu-0-88776   [001] ...1.  6076.581774: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c
       kvm-vcpu-0-88776   [001] d..1.  6076.581774: kvm_entry: PC: 
0x0000000080528a6c
       kvm-vcpu-0-88776   [001] ...1.  6076.581775: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90
       kvm-vcpu-0-88776   [001] d..1.  6076.581776: kvm_entry: PC: 
0x0000000080528a90
       kvm-vcpu-0-88776   [001] ...1.  6076.581778: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a60
       kvm-vcpu-0-88776   [001] d..1.  6076.581778: kvm_entry: PC: 
0x0000000080528a60
       kvm-vcpu-0-88776   [001] ...1.  6076.581779: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c
       kvm-vcpu-0-88776   [001] d..1.  6076.581779: kvm_entry: PC: 
0x0000000080528a6c
       kvm-vcpu-0-88776   [001] ...1.  6076.581780: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90
       kvm-vcpu-0-88776   [001] d..1.  6076.581781: kvm_entry: PC: 
0x0000000080528a90
       kvm-vcpu-0-88776   [001] ...1.  6076.581783: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a60
       kvm-vcpu-0-88776   [001] d..1.  6076.581783: kvm_entry: PC: 
0x0000000080528a60
       kvm-vcpu-0-88776   [001] ...1.  6076.581784: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c
       kvm-vcpu-0-88776   [001] d..1.  6076.581784: kvm_entry: PC: 
0x0000000080528a6c
       kvm-vcpu-0-88776   [001] ...1.  6076.581785: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90
       kvm-vcpu-0-88776   [001] d..1.  6076.581786: kvm_entry: PC: 
0x0000000080528a90
       kvm-vcpu-0-88776   [001] ...1.  6076.581788: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a60
       kvm-vcpu-0-88776   [001] d..1.  6076.581788: kvm_entry: PC: 
0x0000000080528a60
       kvm-vcpu-0-88776   [001] ...1.  6076.581789: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a6c
       kvm-vcpu-0-88776   [001] d..1.  6076.581789: kvm_entry: PC: 
0x0000000080528a6c
       kvm-vcpu-0-88776   [001] ...1.  6076.581790: kvm_exit: TRAP: 
HSR_EC: 0x0018 (SYS64), PC: 0x0000000080528a90



Thanks,
Ganapat
Marc Zyngier Nov. 22, 2023, 11:39 a.m. UTC | #8
On Wed, 22 Nov 2023 11:10:10 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> No change, still L1 hangs. Captured ftrace and the L1 is keep
> looping/faulting around same address across kvm_entry and kvm_exits.
> 
> It is a weird behavior, L1 is faulting and looping around MDCR and
> AA64MMFR3_EL1 access in function __finalise_el2.

I really can't see how this happens. There are no backward branches,
and we don't seem to reach the ERET either. So something must affect
the state after the trap of ID_AA64MMFR3_EL1.

	M.
Miguel Luis Nov. 23, 2023, 4:21 p.m. UTC | #9
Hi Marc,

On 21/11/2023 18:02, Marc Zyngier wrote:
> On Tue, 21 Nov 2023 16:49:52 +0000,
> Miguel Luis <miguel.luis@oracle.com> wrote:
>> Hi Marc,
>>
>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
>>>
>>> This is the 5th drop of NV support on arm64 for this year, and most
>>> probably the last one for this side of Christmas.
>>>
>>> For the previous episodes, see [1].
>>>
>>> What's changed:
>>>
>>> - Drop support for the original FEAT_NV. No existing hardware supports
>>>  it without FEAT_NV2, and the architecture is deprecating the former
>>>  entirely. This results in fewer patches, and a slightly simpler
>>>  model overall.
>>>
>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
>>>  is gone.
>>>
>>> - Apply the NV idreg restrictions on VM first run rather than on each
>>>  access.
>>>
>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>>>  than per-vcpu.
>>>
>>> - Fix the EL0 timer fastpath
>>>
>>> - Work around the architecture deficiencies when trapping WFI from a
>>>  L2 guest.
>>>
>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
>>>
>>> - Drop the patches that have already been merged (NV trap forwarding,
>>>  per-MMU VTCR)
>>>
>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
>>>
>>> The branch containing these patches (and more) is at [3]. As for the
>>> previous rounds, my intention is to take a prefix of this series into
>>> 6.8, provided that it gets enough reviewing.
>>>
>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
>>>
>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
>>
>> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
>>                 { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
>>
>> This is CPACR_EL12.
> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...
>
>> Still need yet to debug.
> Can you disassemble the guest around the offending PC?

[ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9]
                { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },

 12baa00:    14000008     b    0x12baa20
 12baa04:    d000d501     adrp    x1, 0x2d5c000
 12baa08:    91154021     add    x1, x1, #0x550
 12baa0c:    f9400022     ldr    x2, [x1]
 12baa10:    f9400421     ldr    x1, [x1, #8]
 12baa14:    8a010042     and    x2, x2, x1
 12baa18:    d3441c42     ubfx    x2, x2, #4, #4
 12baa1c:    b4000082     cbz    x2, 0x12baa2c
 12baa20:    d2a175a0     mov    x0, #0xbad0000                 // #195887104
 12baa24:    f2994220     movk    x0, #0xca11
 12baa28:    d69f03e0     eret
 12baa2c:    d2c00080     mov    x0, #0x400000000               // #17179869184
 12baa30:    f2b10000     movk    x0, #0x8800, lsl #16
 12baa34:    f2800000     movk    x0, #0x0
 12baa38:    d51c1100     msr    hcr_el2, x0
 12baa3c:    d5033fdf     isb
 12baa40:    d53c4100     mrs    x0, sp_el1
 12baa44:    9100001f     mov    sp, x0
 12baa48:    d538d080     mrs    x0, tpidr_el1
 12baa4c:    d51cd040     msr    tpidr_el2, x0
 12baa50:    d53d1040     mrs    x0, cpacr_el12
 12baa54:    d5181040     msr    cpacr_el1, x0
 12baa58:    d53dc000     mrs    x0, vbar_el12
 12baa5c:    d518c000     msr    vbar_el1, x0
 12baa60:    d53c1120     mrs    x0, mdcr_el2
 12baa64:    9272f400     and    x0, x0, #0xffffffffffffcfff
 12baa68:    9266f400     and    x0, x0, #0xfffffffffcffffff
 12baa6c:    d51c1120     msr    mdcr_el2, x0
 12baa70:    d53d2040     mrs    x0, tcr_el12
 12baa74:    d5182040     msr    tcr_el1, x0
 12baa78:    d53d2000     mrs    x0, ttbr0_el12
 12baa7c:    d5182000     msr    ttbr0_el1, x0
 12baa80:    d53d2020     mrs    x0, ttbr1_el12
 12baa84:    d5182020     msr    ttbr1_el1, x0
 12baa88:    d53da200     mrs    x0, mair_el12
 12baa8c:    d518a200     msr    mair_el1, x0
 12baa90:    d5380761     mrs    x1, s3_0_c0_c7_3
 12baa94:    d3400c21     ubfx    x1, x1, #0, #4
 12baa98:    b4000141     cbz    x1, 0x12baac0
 12baa9c:    d53d2060     mrs    x0, s3_5_c2_c0_3


>> As for QEMU, it is having issues enabling _EL2 feature although EL2
>> is supported by checking KVM_CAP_ARM_EL2; need yet to debug this.
> The capability number changes at each release. Make sure you resync
> your includes.

Been there but it seems a different problem this time.

Thank you

Miguel

> 	M.
>
Marc Zyngier Nov. 23, 2023, 4:44 p.m. UTC | #10
On Thu, 23 Nov 2023 16:21:48 +0000,
Miguel Luis <miguel.luis@oracle.com> wrote:
> 
> Hi Marc,
> 
> On 21/11/2023 18:02, Marc Zyngier wrote:
> > On Tue, 21 Nov 2023 16:49:52 +0000,
> > Miguel Luis <miguel.luis@oracle.com> wrote:
> >> Hi Marc,
> >>
> >>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
> >>>
> >>> This is the 5th drop of NV support on arm64 for this year, and most
> >>> probably the last one for this side of Christmas.
> >>>
> >>> For the previous episodes, see [1].
> >>>
> >>> What's changed:
> >>>
> >>> - Drop support for the original FEAT_NV. No existing hardware supports
> >>>  it without FEAT_NV2, and the architecture is deprecating the former
> >>>  entirely. This results in fewer patches, and a slightly simpler
> >>>  model overall.
> >>>
> >>> - Reorganise the series to make it a bit more logical now that FEAT_NV
> >>>  is gone.
> >>>
> >>> - Apply the NV idreg restrictions on VM first run rather than on each
> >>>  access.
> >>>
> >>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
> >>>  than per-vcpu.
> >>>
> >>> - Fix the EL0 timer fastpath
> >>>
> >>> - Work around the architecture deficiencies when trapping WFI from a
> >>>  L2 guest.
> >>>
> >>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> >>>
> >>> - Drop the patches that have already been merged (NV trap forwarding,
> >>>  per-MMU VTCR)
> >>>
> >>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> >>>
> >>> The branch containing these patches (and more) is at [3]. As for the
> >>> previous rounds, my intention is to take a prefix of this series into
> >>> 6.8, provided that it gets enough reviewing.
> >>>
> >>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> >>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> >>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> >>>
> >> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
> >>
> >> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
> >>                 { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
> >>
> >> This is CPACR_EL12.
> > CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...
> >
> >> Still need yet to debug.
> > Can you disassemble the guest around the offending PC?
> 
> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9]
>                 { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
> 
>  12baa00:    14000008     b    0x12baa20
>  12baa04:    d000d501     adrp    x1, 0x2d5c000
>  12baa08:    91154021     add    x1, x1, #0x550
>  12baa0c:    f9400022     ldr    x2, [x1]
>  12baa10:    f9400421     ldr    x1, [x1, #8]
>  12baa14:    8a010042     and    x2, x2, x1
>  12baa18:    d3441c42     ubfx    x2, x2, #4, #4
>  12baa1c:    b4000082     cbz    x2, 0x12baa2c
>  12baa20:    d2a175a0     mov    x0, #0xbad0000                 // #195887104
>  12baa24:    f2994220     movk    x0, #0xca11
>  12baa28:    d69f03e0     eret
>  12baa2c:    d2c00080     mov    x0, #0x400000000               // #17179869184
>  12baa30:    f2b10000     movk    x0, #0x8800, lsl #16
>  12baa34:    f2800000     movk    x0, #0x0
>  12baa38:    d51c1100     msr    hcr_el2, x0
>  12baa3c:    d5033fdf     isb
>  12baa40:    d53c4100     mrs    x0, sp_el1
>  12baa44:    9100001f     mov    sp, x0
>  12baa48:    d538d080     mrs    x0, tpidr_el1
>  12baa4c:    d51cd040     msr    tpidr_el2, x0
>  12baa50:    d53d1040     mrs    x0, cpacr_el12
>  12baa54:    d5181040     msr    cpacr_el1, x0
>  12baa58:    d53dc000     mrs    x0, vbar_el12
>  12baa5c:    d518c000     msr    vbar_el1, x0
>  12baa60:    d53c1120     mrs    x0, mdcr_el2
>  12baa64:    9272f400     and    x0, x0, #0xffffffffffffcfff
>  12baa68:    9266f400     and    x0, x0, #0xfffffffffcffffff
>  12baa6c:    d51c1120     msr    mdcr_el2, x0
>  12baa70:    d53d2040     mrs    x0, tcr_el12
>  12baa74:    d5182040     msr    tcr_el1, x0
>  12baa78:    d53d2000     mrs    x0, ttbr0_el12
>  12baa7c:    d5182000     msr    ttbr0_el1, x0
>  12baa80:    d53d2020     mrs    x0, ttbr1_el12
>  12baa84:    d5182020     msr    ttbr1_el1, x0
>  12baa88:    d53da200     mrs    x0, mair_el12
>  12baa8c:    d518a200     msr    mair_el1, x0
>  12baa90:    d5380761     mrs    x1, s3_0_c0_c7_3
>  12baa94:    d3400c21     ubfx    x1, x1, #0, #4
>  12baa98:    b4000141     cbz    x1, 0x12baac0
>  12baa9c:    d53d2060     mrs    x0, s3_5_c2_c0_3

OK, this is suspiciously close to the location Ganapatrao was having
issues with. Are you running on the same hardware?

In any case, we should never take a trap for this access. Can you dump
HCR_EL2 at the point where the guest traps (in switch.c)?

> >> As for QEMU, it is having issues enabling _EL2 feature although EL2
> >> is supported by checking KVM_CAP_ARM_EL2; need yet to debug this.
> > The capability number changes at each release. Make sure you resync
> > your includes.
> 
> Been there but it seems a different problem this time.

Creating the VM with SVE? NV doesn't support it yet (and it has been
the case for a long while).

	M.
Ganapatrao Kulkarni Nov. 24, 2023, 9:50 a.m. UTC | #11
On 23-11-2023 10:14 pm, Marc Zyngier wrote:
> On Thu, 23 Nov 2023 16:21:48 +0000,
> Miguel Luis <miguel.luis@oracle.com> wrote:
>>
>> Hi Marc,
>>
>> On 21/11/2023 18:02, Marc Zyngier wrote:
>>> On Tue, 21 Nov 2023 16:49:52 +0000,
>>> Miguel Luis <miguel.luis@oracle.com> wrote:
>>>> Hi Marc,
>>>>
>>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
>>>>>
>>>>> This is the 5th drop of NV support on arm64 for this year, and most
>>>>> probably the last one for this side of Christmas.
>>>>>
>>>>> For the previous episodes, see [1].
>>>>>
>>>>> What's changed:
>>>>>
>>>>> - Drop support for the original FEAT_NV. No existing hardware supports
>>>>>   it without FEAT_NV2, and the architecture is deprecating the former
>>>>>   entirely. This results in fewer patches, and a slightly simpler
>>>>>   model overall.
>>>>>
>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
>>>>>   is gone.
>>>>>
>>>>> - Apply the NV idreg restrictions on VM first run rather than on each
>>>>>   access.
>>>>>
>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>>>>>   than per-vcpu.
>>>>>
>>>>> - Fix the EL0 timer fastpath
>>>>>
>>>>> - Work around the architecture deficiencies when trapping WFI from a
>>>>>   L2 guest.
>>>>>
>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
>>>>>
>>>>> - Drop the patches that have already been merged (NV trap forwarding,
>>>>>   per-MMU VTCR)
>>>>>
>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
>>>>>
>>>>> The branch containing these patches (and more) is at [3]. As for the
>>>>> previous rounds, my intention is to take a prefix of this series into
>>>>> 6.8, provided that it gets enough reviewing.
>>>>>
>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
>>>>>
>>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
>>>>
>>>> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
>>>>                  { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
>>>>
>>>> This is CPACR_EL12.
>>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...
>>>
>>>> Still need yet to debug.
>>> Can you disassemble the guest around the offending PC?
>>
>> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9]
>>                  { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
>>
>>   12baa00:    14000008     b    0x12baa20
>>   12baa04:    d000d501     adrp    x1, 0x2d5c000
>>   12baa08:    91154021     add    x1, x1, #0x550
>>   12baa0c:    f9400022     ldr    x2, [x1]
>>   12baa10:    f9400421     ldr    x1, [x1, #8]
>>   12baa14:    8a010042     and    x2, x2, x1
>>   12baa18:    d3441c42     ubfx    x2, x2, #4, #4
>>   12baa1c:    b4000082     cbz    x2, 0x12baa2c
>>   12baa20:    d2a175a0     mov    x0, #0xbad0000                 // #195887104
>>   12baa24:    f2994220     movk    x0, #0xca11
>>   12baa28:    d69f03e0     eret
>>   12baa2c:    d2c00080     mov    x0, #0x400000000               // #17179869184
>>   12baa30:    f2b10000     movk    x0, #0x8800, lsl #16
>>   12baa34:    f2800000     movk    x0, #0x0
>>   12baa38:    d51c1100     msr    hcr_el2, x0
>>   12baa3c:    d5033fdf     isb
>>   12baa40:    d53c4100     mrs    x0, sp_el1
>>   12baa44:    9100001f     mov    sp, x0
>>   12baa48:    d538d080     mrs    x0, tpidr_el1
>>   12baa4c:    d51cd040     msr    tpidr_el2, x0
>>   12baa50:    d53d1040     mrs    x0, cpacr_el12
>>   12baa54:    d5181040     msr    cpacr_el1, x0
>>   12baa58:    d53dc000     mrs    x0, vbar_el12
>>   12baa5c:    d518c000     msr    vbar_el1, x0
>>   12baa60:    d53c1120     mrs    x0, mdcr_el2
>>   12baa64:    9272f400     and    x0, x0, #0xffffffffffffcfff
>>   12baa68:    9266f400     and    x0, x0, #0xfffffffffcffffff
>>   12baa6c:    d51c1120     msr    mdcr_el2, x0
>>   12baa70:    d53d2040     mrs    x0, tcr_el12
>>   12baa74:    d5182040     msr    tcr_el1, x0
>>   12baa78:    d53d2000     mrs    x0, ttbr0_el12
>>   12baa7c:    d5182000     msr    ttbr0_el1, x0
>>   12baa80:    d53d2020     mrs    x0, ttbr1_el12
>>   12baa84:    d5182020     msr    ttbr1_el1, x0
>>   12baa88:    d53da200     mrs    x0, mair_el12
>>   12baa8c:    d518a200     msr    mair_el1, x0
>>   12baa90:    d5380761     mrs    x1, s3_0_c0_c7_3
>>   12baa94:    d3400c21     ubfx    x1, x1, #0, #4
>>   12baa98:    b4000141     cbz    x1, 0x12baac0
>>   12baa9c:    d53d2060     mrs    x0, s3_5_c2_c0_3
> 
> OK, this is suspiciously close to the location Ganapatrao was having
> issues with. Are you running on the same hardware?
> 
> In any case, we should never take a trap for this access. Can you dump
> HCR_EL2 at the point where the guest traps (in switch.c)?
> 

I have dumped HCR_EL2 before entry to L1 in both V11 and V10.
on V10 HCR_EL2=0x2743c827c263f
on V11 HCR_EL2=0x27c3c827c263f

on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false 
resulting in NV1 bit set along with NV and NV2.
AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1.

I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true.
There could be a bug in V11 or E2H0 patchset resulting in 
vcpu_el2_e2h_is_set() returning false?

>>>> As for QEMU, it is having issues enabling _EL2 feature although EL2
>>>> is supported by checking KVM_CAP_ARM_EL2; need yet to debug this.
>>> The capability number changes at each release. Make sure you resync
>>> your includes.
>>
>> Been there but it seems a different problem this time.
> 
> Creating the VM with SVE? NV doesn't support it yet (and it has been
> the case for a long while).
> 
> 	M.
> 

Thanks,
Ganapat
Marc Zyngier Nov. 24, 2023, 10:19 a.m. UTC | #12
On Fri, 24 Nov 2023 09:50:33 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> 
> On 23-11-2023 10:14 pm, Marc Zyngier wrote:
> > On Thu, 23 Nov 2023 16:21:48 +0000,
> > Miguel Luis <miguel.luis@oracle.com> wrote:
> >> 
> >> Hi Marc,
> >> 
> >> On 21/11/2023 18:02, Marc Zyngier wrote:
> >>> On Tue, 21 Nov 2023 16:49:52 +0000,
> >>> Miguel Luis <miguel.luis@oracle.com> wrote:
> >>>> Hi Marc,
> >>>> 
> >>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
> >>>>> 
> >>>>> This is the 5th drop of NV support on arm64 for this year, and most
> >>>>> probably the last one for this side of Christmas.
> >>>>> 
> >>>>> For the previous episodes, see [1].
> >>>>> 
> >>>>> What's changed:
> >>>>> 
> >>>>> - Drop support for the original FEAT_NV. No existing hardware supports
> >>>>>   it without FEAT_NV2, and the architecture is deprecating the former
> >>>>>   entirely. This results in fewer patches, and a slightly simpler
> >>>>>   model overall.
> >>>>> 
> >>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
> >>>>>   is gone.
> >>>>> 
> >>>>> - Apply the NV idreg restrictions on VM first run rather than on each
> >>>>>   access.
> >>>>> 
> >>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
> >>>>>   than per-vcpu.
> >>>>> 
> >>>>> - Fix the EL0 timer fastpath
> >>>>> 
> >>>>> - Work around the architecture deficiencies when trapping WFI from a
> >>>>>   L2 guest.
> >>>>> 
> >>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> >>>>> 
> >>>>> - Drop the patches that have already been merged (NV trap forwarding,
> >>>>>   per-MMU VTCR)
> >>>>> 
> >>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> >>>>> 
> >>>>> The branch containing these patches (and more) is at [3]. As for the
> >>>>> previous rounds, my intention is to take a prefix of this series into
> >>>>> 6.8, provided that it gets enough reviewing.
> >>>>> 
> >>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> >>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> >>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> >>>>> 
> >>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
> >>>> 
> >>>> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
> >>>>                  { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
> >>>> 
> >>>> This is CPACR_EL12.
> >>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...
> >>> 
> >>>> Still need yet to debug.
> >>> Can you disassemble the guest around the offending PC?
> >> 
> >> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9]
> >>                  { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
> >> 
> >>   12baa00:    14000008     b    0x12baa20
> >>   12baa04:    d000d501     adrp    x1, 0x2d5c000
> >>   12baa08:    91154021     add    x1, x1, #0x550
> >>   12baa0c:    f9400022     ldr    x2, [x1]
> >>   12baa10:    f9400421     ldr    x1, [x1, #8]
> >>   12baa14:    8a010042     and    x2, x2, x1
> >>   12baa18:    d3441c42     ubfx    x2, x2, #4, #4
> >>   12baa1c:    b4000082     cbz    x2, 0x12baa2c
> >>   12baa20:    d2a175a0     mov    x0, #0xbad0000                 // #195887104
> >>   12baa24:    f2994220     movk    x0, #0xca11
> >>   12baa28:    d69f03e0     eret
> >>   12baa2c:    d2c00080     mov    x0, #0x400000000               // #17179869184
> >>   12baa30:    f2b10000     movk    x0, #0x8800, lsl #16
> >>   12baa34:    f2800000     movk    x0, #0x0
> >>   12baa38:    d51c1100     msr    hcr_el2, x0
> >>   12baa3c:    d5033fdf     isb
> >>   12baa40:    d53c4100     mrs    x0, sp_el1
> >>   12baa44:    9100001f     mov    sp, x0
> >>   12baa48:    d538d080     mrs    x0, tpidr_el1
> >>   12baa4c:    d51cd040     msr    tpidr_el2, x0
> >>   12baa50:    d53d1040     mrs    x0, cpacr_el12
> >>   12baa54:    d5181040     msr    cpacr_el1, x0
> >>   12baa58:    d53dc000     mrs    x0, vbar_el12
> >>   12baa5c:    d518c000     msr    vbar_el1, x0
> >>   12baa60:    d53c1120     mrs    x0, mdcr_el2
> >>   12baa64:    9272f400     and    x0, x0, #0xffffffffffffcfff
> >>   12baa68:    9266f400     and    x0, x0, #0xfffffffffcffffff
> >>   12baa6c:    d51c1120     msr    mdcr_el2, x0
> >>   12baa70:    d53d2040     mrs    x0, tcr_el12
> >>   12baa74:    d5182040     msr    tcr_el1, x0
> >>   12baa78:    d53d2000     mrs    x0, ttbr0_el12
> >>   12baa7c:    d5182000     msr    ttbr0_el1, x0
> >>   12baa80:    d53d2020     mrs    x0, ttbr1_el12
> >>   12baa84:    d5182020     msr    ttbr1_el1, x0
> >>   12baa88:    d53da200     mrs    x0, mair_el12
> >>   12baa8c:    d518a200     msr    mair_el1, x0
> >>   12baa90:    d5380761     mrs    x1, s3_0_c0_c7_3
> >>   12baa94:    d3400c21     ubfx    x1, x1, #0, #4
> >>   12baa98:    b4000141     cbz    x1, 0x12baac0
> >>   12baa9c:    d53d2060     mrs    x0, s3_5_c2_c0_3
> > 
> > OK, this is suspiciously close to the location Ganapatrao was having
> > issues with. Are you running on the same hardware?
> > 
> > In any case, we should never take a trap for this access. Can you dump
> > HCR_EL2 at the point where the guest traps (in switch.c)?
> > 
> 
> I have dumped HCR_EL2 before entry to L1 in both V11 and V10.
> on V10 HCR_EL2=0x2743c827c263f
> on V11 HCR_EL2=0x27c3c827c263f
> 
> on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false
> resulting in NV1 bit set along with NV and NV2.
> AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1.
> 
> I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true.
> There could be a bug in V11 or E2H0 patchset resulting in
> vcpu_el2_e2h_is_set() returning false?

The E2H0 series should only force vcpu_el2_e2h_is_set() to return
true, but not set it to false. Can you dump the *guest's* version of
HCR_EL2 at this point?

	M.
Ganapatrao Kulkarni Nov. 24, 2023, 12:34 p.m. UTC | #13
On 24-11-2023 03:49 pm, Marc Zyngier wrote:
> On Fri, 24 Nov 2023 09:50:33 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>>
>> On 23-11-2023 10:14 pm, Marc Zyngier wrote:
>>> On Thu, 23 Nov 2023 16:21:48 +0000,
>>> Miguel Luis <miguel.luis@oracle.com> wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> On 21/11/2023 18:02, Marc Zyngier wrote:
>>>>> On Tue, 21 Nov 2023 16:49:52 +0000,
>>>>> Miguel Luis <miguel.luis@oracle.com> wrote:
>>>>>> Hi Marc,
>>>>>>
>>>>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
>>>>>>>
>>>>>>> This is the 5th drop of NV support on arm64 for this year, and most
>>>>>>> probably the last one for this side of Christmas.
>>>>>>>
>>>>>>> For the previous episodes, see [1].
>>>>>>>
>>>>>>> What's changed:
>>>>>>>
>>>>>>> - Drop support for the original FEAT_NV. No existing hardware supports
>>>>>>>    it without FEAT_NV2, and the architecture is deprecating the former
>>>>>>>    entirely. This results in fewer patches, and a slightly simpler
>>>>>>>    model overall.
>>>>>>>
>>>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
>>>>>>>    is gone.
>>>>>>>
>>>>>>> - Apply the NV idreg restrictions on VM first run rather than on each
>>>>>>>    access.
>>>>>>>
>>>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>>>>>>>    than per-vcpu.
>>>>>>>
>>>>>>> - Fix the EL0 timer fastpath
>>>>>>>
>>>>>>> - Work around the architecture deficiencies when trapping WFI from a
>>>>>>>    L2 guest.
>>>>>>>
>>>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
>>>>>>>
>>>>>>> - Drop the patches that have already been merged (NV trap forwarding,
>>>>>>>    per-MMU VTCR)
>>>>>>>
>>>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
>>>>>>>
>>>>>>> The branch containing these patches (and more) is at [3]. As for the
>>>>>>> previous rounds, my intention is to take a prefix of this series into
>>>>>>> 6.8, provided that it gets enough reviewing.
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
>>>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
>>>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
>>>>>>>
>>>>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
>>>>>>
>>>>>> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
>>>>>>                   { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
>>>>>>
>>>>>> This is CPACR_EL12.
>>>>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...
>>>>>
>>>>>> Still need yet to debug.
>>>>> Can you disassemble the guest around the offending PC?
>>>>
>>>> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9]
>>>>                   { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
>>>>
>>>>    12baa00:    14000008     b    0x12baa20
>>>>    12baa04:    d000d501     adrp    x1, 0x2d5c000
>>>>    12baa08:    91154021     add    x1, x1, #0x550
>>>>    12baa0c:    f9400022     ldr    x2, [x1]
>>>>    12baa10:    f9400421     ldr    x1, [x1, #8]
>>>>    12baa14:    8a010042     and    x2, x2, x1
>>>>    12baa18:    d3441c42     ubfx    x2, x2, #4, #4
>>>>    12baa1c:    b4000082     cbz    x2, 0x12baa2c
>>>>    12baa20:    d2a175a0     mov    x0, #0xbad0000                 // #195887104
>>>>    12baa24:    f2994220     movk    x0, #0xca11
>>>>    12baa28:    d69f03e0     eret
>>>>    12baa2c:    d2c00080     mov    x0, #0x400000000               // #17179869184
>>>>    12baa30:    f2b10000     movk    x0, #0x8800, lsl #16
>>>>    12baa34:    f2800000     movk    x0, #0x0
>>>>    12baa38:    d51c1100     msr    hcr_el2, x0
>>>>    12baa3c:    d5033fdf     isb
>>>>    12baa40:    d53c4100     mrs    x0, sp_el1
>>>>    12baa44:    9100001f     mov    sp, x0
>>>>    12baa48:    d538d080     mrs    x0, tpidr_el1
>>>>    12baa4c:    d51cd040     msr    tpidr_el2, x0
>>>>    12baa50:    d53d1040     mrs    x0, cpacr_el12
>>>>    12baa54:    d5181040     msr    cpacr_el1, x0
>>>>    12baa58:    d53dc000     mrs    x0, vbar_el12
>>>>    12baa5c:    d518c000     msr    vbar_el1, x0
>>>>    12baa60:    d53c1120     mrs    x0, mdcr_el2
>>>>    12baa64:    9272f400     and    x0, x0, #0xffffffffffffcfff
>>>>    12baa68:    9266f400     and    x0, x0, #0xfffffffffcffffff
>>>>    12baa6c:    d51c1120     msr    mdcr_el2, x0
>>>>    12baa70:    d53d2040     mrs    x0, tcr_el12
>>>>    12baa74:    d5182040     msr    tcr_el1, x0
>>>>    12baa78:    d53d2000     mrs    x0, ttbr0_el12
>>>>    12baa7c:    d5182000     msr    ttbr0_el1, x0
>>>>    12baa80:    d53d2020     mrs    x0, ttbr1_el12
>>>>    12baa84:    d5182020     msr    ttbr1_el1, x0
>>>>    12baa88:    d53da200     mrs    x0, mair_el12
>>>>    12baa8c:    d518a200     msr    mair_el1, x0
>>>>    12baa90:    d5380761     mrs    x1, s3_0_c0_c7_3
>>>>    12baa94:    d3400c21     ubfx    x1, x1, #0, #4
>>>>    12baa98:    b4000141     cbz    x1, 0x12baac0
>>>>    12baa9c:    d53d2060     mrs    x0, s3_5_c2_c0_3
>>>
>>> OK, this is suspiciously close to the location Ganapatrao was having
>>> issues with. Are you running on the same hardware?
>>>
>>> In any case, we should never take a trap for this access. Can you dump
>>> HCR_EL2 at the point where the guest traps (in switch.c)?
>>>
>>
>> I have dumped HCR_EL2 before entry to L1 in both V11 and V10.
>> on V10 HCR_EL2=0x2743c827c263f
>> on V11 HCR_EL2=0x27c3c827c263f
>>
>> on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false
>> resulting in NV1 bit set along with NV and NV2.
>> AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1.
>>
>> I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true.
>> There could be a bug in V11 or E2H0 patchset resulting in
>> vcpu_el2_e2h_is_set() returning false?
> 
> The E2H0 series should only force vcpu_el2_e2h_is_set() to return
> true, but not set it to false. Can you dump the *guest's* version of
> HCR_EL2 at this point?
> 

with V11: vhcr_el2=0x100030080000000 mask=0x100af00ffffffff
with V10: vhcr_el2=0x488000000
with hack+V11: vhcr_el2=0x488000000 mask=0x100af00ffffffff

Thanks,
Ganapat
Marc Zyngier Nov. 24, 2023, 12:51 p.m. UTC | #14
On Fri, 24 Nov 2023 12:34:41 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> 
> On 24-11-2023 03:49 pm, Marc Zyngier wrote:
> > On Fri, 24 Nov 2023 09:50:33 +0000,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >> 
> >> 
> >> On 23-11-2023 10:14 pm, Marc Zyngier wrote:
> >>> On Thu, 23 Nov 2023 16:21:48 +0000,
> >>> Miguel Luis <miguel.luis@oracle.com> wrote:
> >>>> 
> >>>> Hi Marc,
> >>>> 
> >>>> On 21/11/2023 18:02, Marc Zyngier wrote:
> >>>>> On Tue, 21 Nov 2023 16:49:52 +0000,
> >>>>> Miguel Luis <miguel.luis@oracle.com> wrote:
> >>>>>> Hi Marc,
> >>>>>> 
> >>>>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
> >>>>>>> 
> >>>>>>> This is the 5th drop of NV support on arm64 for this year, and most
> >>>>>>> probably the last one for this side of Christmas.
> >>>>>>> 
> >>>>>>> For the previous episodes, see [1].
> >>>>>>> 
> >>>>>>> What's changed:
> >>>>>>> 
> >>>>>>> - Drop support for the original FEAT_NV. No existing hardware supports
> >>>>>>>    it without FEAT_NV2, and the architecture is deprecating the former
> >>>>>>>    entirely. This results in fewer patches, and a slightly simpler
> >>>>>>>    model overall.
> >>>>>>> 
> >>>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
> >>>>>>>    is gone.
> >>>>>>> 
> >>>>>>> - Apply the NV idreg restrictions on VM first run rather than on each
> >>>>>>>    access.
> >>>>>>> 
> >>>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
> >>>>>>>    than per-vcpu.
> >>>>>>> 
> >>>>>>> - Fix the EL0 timer fastpath
> >>>>>>> 
> >>>>>>> - Work around the architecture deficiencies when trapping WFI from a
> >>>>>>>    L2 guest.
> >>>>>>> 
> >>>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
> >>>>>>> 
> >>>>>>> - Drop the patches that have already been merged (NV trap forwarding,
> >>>>>>>    per-MMU VTCR)
> >>>>>>> 
> >>>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
> >>>>>>> 
> >>>>>>> The branch containing these patches (and more) is at [3]. As for the
> >>>>>>> previous rounds, my intention is to take a prefix of this series into
> >>>>>>> 6.8, provided that it gets enough reviewing.
> >>>>>>> 
> >>>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
> >>>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
> >>>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> >>>>>>> 
> >>>>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
> >>>>>> 
> >>>>>> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
> >>>>>>                   { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
> >>>>>> 
> >>>>>> This is CPACR_EL12.
> >>>>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...
> >>>>> 
> >>>>>> Still need yet to debug.
> >>>>> Can you disassemble the guest around the offending PC?
> >>>> 
> >>>> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9]
> >>>>                   { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
> >>>> 
> >>>>    12baa00:    14000008     b    0x12baa20
> >>>>    12baa04:    d000d501     adrp    x1, 0x2d5c000
> >>>>    12baa08:    91154021     add    x1, x1, #0x550
> >>>>    12baa0c:    f9400022     ldr    x2, [x1]
> >>>>    12baa10:    f9400421     ldr    x1, [x1, #8]
> >>>>    12baa14:    8a010042     and    x2, x2, x1
> >>>>    12baa18:    d3441c42     ubfx    x2, x2, #4, #4
> >>>>    12baa1c:    b4000082     cbz    x2, 0x12baa2c
> >>>>    12baa20:    d2a175a0     mov    x0, #0xbad0000                 // #195887104
> >>>>    12baa24:    f2994220     movk    x0, #0xca11
> >>>>    12baa28:    d69f03e0     eret
> >>>>    12baa2c:    d2c00080     mov    x0, #0x400000000               // #17179869184
> >>>>    12baa30:    f2b10000     movk    x0, #0x8800, lsl #16
> >>>>    12baa34:    f2800000     movk    x0, #0x0
> >>>>    12baa38:    d51c1100     msr    hcr_el2, x0
> >>>>    12baa3c:    d5033fdf     isb

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This.

> >>>>    12baa40:    d53c4100     mrs    x0, sp_el1
> >>>>    12baa44:    9100001f     mov    sp, x0
> >>>>    12baa48:    d538d080     mrs    x0, tpidr_el1
> >>>>    12baa4c:    d51cd040     msr    tpidr_el2, x0
> >>>>    12baa50:    d53d1040     mrs    x0, cpacr_el12
> >>>>    12baa54:    d5181040     msr    cpacr_el1, x0
> >>>>    12baa58:    d53dc000     mrs    x0, vbar_el12
> >>>>    12baa5c:    d518c000     msr    vbar_el1, x0
> >>>>    12baa60:    d53c1120     mrs    x0, mdcr_el2
> >>>>    12baa64:    9272f400     and    x0, x0, #0xffffffffffffcfff
> >>>>    12baa68:    9266f400     and    x0, x0, #0xfffffffffcffffff
> >>>>    12baa6c:    d51c1120     msr    mdcr_el2, x0
> >>>>    12baa70:    d53d2040     mrs    x0, tcr_el12
> >>>>    12baa74:    d5182040     msr    tcr_el1, x0
> >>>>    12baa78:    d53d2000     mrs    x0, ttbr0_el12
> >>>>    12baa7c:    d5182000     msr    ttbr0_el1, x0
> >>>>    12baa80:    d53d2020     mrs    x0, ttbr1_el12
> >>>>    12baa84:    d5182020     msr    ttbr1_el1, x0
> >>>>    12baa88:    d53da200     mrs    x0, mair_el12
> >>>>    12baa8c:    d518a200     msr    mair_el1, x0
> >>>>    12baa90:    d5380761     mrs    x1, s3_0_c0_c7_3
> >>>>    12baa94:    d3400c21     ubfx    x1, x1, #0, #4
> >>>>    12baa98:    b4000141     cbz    x1, 0x12baac0
> >>>>    12baa9c:    d53d2060     mrs    x0, s3_5_c2_c0_3
> >>> 
> >>> OK, this is suspiciously close to the location Ganapatrao was having
> >>> issues with. Are you running on the same hardware?
> >>> 
> >>> In any case, we should never take a trap for this access. Can you dump
> >>> HCR_EL2 at the point where the guest traps (in switch.c)?
> >>> 
> >> 
> >> I have dumped HCR_EL2 before entry to L1 in both V11 and V10.
> >> on V10 HCR_EL2=0x2743c827c263f
> >> on V11 HCR_EL2=0x27c3c827c263f
> >> 
> >> on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false
> >> resulting in NV1 bit set along with NV and NV2.
> >> AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1.
> >> 
> >> I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true.
> >> There could be a bug in V11 or E2H0 patchset resulting in
> >> vcpu_el2_e2h_is_set() returning false?
> > 
> > The E2H0 series should only force vcpu_el2_e2h_is_set() to return
> > true, but not set it to false. Can you dump the *guest's* version of
> > HCR_EL2 at this point?
> > 
> 
> with V11: vhcr_el2=0x100030080000000 mask=0x100af00ffffffff

How is this value possible if the write to HCR_EL2 has taken place?
When do you sample this?

> with V10: vhcr_el2=0x488000000
> with hack+V11: vhcr_el2=0x488000000 mask=0x100af00ffffffff

Well, of course, if you constrain the value of HCR_EL2...

	M.
Ganapatrao Kulkarni Nov. 24, 2023, 1:22 p.m. UTC | #15
On 24-11-2023 06:21 pm, Marc Zyngier wrote:
> On Fri, 24 Nov 2023 12:34:41 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>>
>> On 24-11-2023 03:49 pm, Marc Zyngier wrote:
>>> On Fri, 24 Nov 2023 09:50:33 +0000,
>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>
>>>>
>>>>
>>>> On 23-11-2023 10:14 pm, Marc Zyngier wrote:
>>>>> On Thu, 23 Nov 2023 16:21:48 +0000,
>>>>> Miguel Luis <miguel.luis@oracle.com> wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> On 21/11/2023 18:02, Marc Zyngier wrote:
>>>>>>> On Tue, 21 Nov 2023 16:49:52 +0000,
>>>>>>> Miguel Luis <miguel.luis@oracle.com> wrote:
>>>>>>>> Hi Marc,
>>>>>>>>
>>>>>>>>> On 20 Nov 2023, at 12:09, Marc Zyngier <maz@kernel.org> wrote:
>>>>>>>>>
>>>>>>>>> This is the 5th drop of NV support on arm64 for this year, and most
>>>>>>>>> probably the last one for this side of Christmas.
>>>>>>>>>
>>>>>>>>> For the previous episodes, see [1].
>>>>>>>>>
>>>>>>>>> What's changed:
>>>>>>>>>
>>>>>>>>> - Drop support for the original FEAT_NV. No existing hardware supports
>>>>>>>>>     it without FEAT_NV2, and the architecture is deprecating the former
>>>>>>>>>     entirely. This results in fewer patches, and a slightly simpler
>>>>>>>>>     model overall.
>>>>>>>>>
>>>>>>>>> - Reorganise the series to make it a bit more logical now that FEAT_NV
>>>>>>>>>     is gone.
>>>>>>>>>
>>>>>>>>> - Apply the NV idreg restrictions on VM first run rather than on each
>>>>>>>>>     access.
>>>>>>>>>
>>>>>>>>> - Make the nested vgic shadow CPU interface a per-CPU structure rather
>>>>>>>>>     than per-vcpu.
>>>>>>>>>
>>>>>>>>> - Fix the EL0 timer fastpath
>>>>>>>>>
>>>>>>>>> - Work around the architecture deficiencies when trapping WFI from a
>>>>>>>>>     L2 guest.
>>>>>>>>>
>>>>>>>>> - Fix sampling of nested vgic state (MISR, ELRSR, EISR)
>>>>>>>>>
>>>>>>>>> - Drop the patches that have already been merged (NV trap forwarding,
>>>>>>>>>     per-MMU VTCR)
>>>>>>>>>
>>>>>>>>> - Rebased on top of 6.7-rc2 + the FEAT_E2H0 support [2].
>>>>>>>>>
>>>>>>>>> The branch containing these patches (and more) is at [3]. As for the
>>>>>>>>> previous rounds, my intention is to take a prefix of this series into
>>>>>>>>> 6.8, provided that it gets enough reviewing.
>>>>>>>>>
>>>>>>>>> [1] https://lore.kernel.org/r/20230515173103.1017669-1-maz@kernel.org
>>>>>>>>> [2] https://lore.kernel.org/r/20231120123721.851738-1-maz@kernel.org
>>>>>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
>>>>>>>>>
>>>>>>>> While I was testing this with kvmtool for 5.16 I noted the following on dmesg:
>>>>>>>>
>>>>>>>> [  803.014258] kvm [19040]: Unsupported guest sys_reg access at: 8129fa50 [600003c9]
>>>>>>>>                    { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
>>>>>>>>
>>>>>>>> This is CPACR_EL12.
>>>>>>> CPACR_EL12 is redirected to VNCR[0x100]. It really shouldn't trap...
>>>>>>>
>>>>>>>> Still need yet to debug.
>>>>>>> Can you disassemble the guest around the offending PC?
>>>>>>
>>>>>> [ 1248.686350] kvm [7013]: Unsupported guest sys_reg access at: 812baa50 [600003c9]
>>>>>>                    { Op0( 3), Op1( 5), CRn( 1), CRm( 0), Op2( 2), func_read },
>>>>>>
>>>>>>     12baa00:    14000008     b    0x12baa20
>>>>>>     12baa04:    d000d501     adrp    x1, 0x2d5c000
>>>>>>     12baa08:    91154021     add    x1, x1, #0x550
>>>>>>     12baa0c:    f9400022     ldr    x2, [x1]
>>>>>>     12baa10:    f9400421     ldr    x1, [x1, #8]
>>>>>>     12baa14:    8a010042     and    x2, x2, x1
>>>>>>     12baa18:    d3441c42     ubfx    x2, x2, #4, #4
>>>>>>     12baa1c:    b4000082     cbz    x2, 0x12baa2c
>>>>>>     12baa20:    d2a175a0     mov    x0, #0xbad0000                 // #195887104
>>>>>>     12baa24:    f2994220     movk    x0, #0xca11
>>>>>>     12baa28:    d69f03e0     eret
>>>>>>     12baa2c:    d2c00080     mov    x0, #0x400000000               // #17179869184
>>>>>>     12baa30:    f2b10000     movk    x0, #0x8800, lsl #16
>>>>>>     12baa34:    f2800000     movk    x0, #0x0
>>>>>>     12baa38:    d51c1100     msr    hcr_el2, x0
>>>>>>     12baa3c:    d5033fdf     isb
> 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This.
> 
>>>>>>     12baa40:    d53c4100     mrs    x0, sp_el1
>>>>>>     12baa44:    9100001f     mov    sp, x0
>>>>>>     12baa48:    d538d080     mrs    x0, tpidr_el1
>>>>>>     12baa4c:    d51cd040     msr    tpidr_el2, x0
>>>>>>     12baa50:    d53d1040     mrs    x0, cpacr_el12
>>>>>>     12baa54:    d5181040     msr    cpacr_el1, x0
>>>>>>     12baa58:    d53dc000     mrs    x0, vbar_el12
>>>>>>     12baa5c:    d518c000     msr    vbar_el1, x0
>>>>>>     12baa60:    d53c1120     mrs    x0, mdcr_el2
>>>>>>     12baa64:    9272f400     and    x0, x0, #0xffffffffffffcfff
>>>>>>     12baa68:    9266f400     and    x0, x0, #0xfffffffffcffffff
>>>>>>     12baa6c:    d51c1120     msr    mdcr_el2, x0
>>>>>>     12baa70:    d53d2040     mrs    x0, tcr_el12
>>>>>>     12baa74:    d5182040     msr    tcr_el1, x0
>>>>>>     12baa78:    d53d2000     mrs    x0, ttbr0_el12
>>>>>>     12baa7c:    d5182000     msr    ttbr0_el1, x0
>>>>>>     12baa80:    d53d2020     mrs    x0, ttbr1_el12
>>>>>>     12baa84:    d5182020     msr    ttbr1_el1, x0
>>>>>>     12baa88:    d53da200     mrs    x0, mair_el12
>>>>>>     12baa8c:    d518a200     msr    mair_el1, x0
>>>>>>     12baa90:    d5380761     mrs    x1, s3_0_c0_c7_3
>>>>>>     12baa94:    d3400c21     ubfx    x1, x1, #0, #4
>>>>>>     12baa98:    b4000141     cbz    x1, 0x12baac0
>>>>>>     12baa9c:    d53d2060     mrs    x0, s3_5_c2_c0_3
>>>>>
>>>>> OK, this is suspiciously close to the location Ganapatrao was having
>>>>> issues with. Are you running on the same hardware?
>>>>>
>>>>> In any case, we should never take a trap for this access. Can you dump
>>>>> HCR_EL2 at the point where the guest traps (in switch.c)?
>>>>>
>>>>
>>>> I have dumped HCR_EL2 before entry to L1 in both V11 and V10.
>>>> on V10 HCR_EL2=0x2743c827c263f
>>>> on V11 HCR_EL2=0x27c3c827c263f
>>>>
>>>> on V11 the function vcpu_el2_e2h_is_set(vcpu) is returning false
>>>> resulting in NV1 bit set along with NV and NV2.
>>>> AFAIK, For L1 to be in VHE, NV1 bit should be zero and NV=NV2=1.
>>>>
>>>> I could boot L1 then L2, if I hack vcpu_el2_e2h_is_set to return true.
>>>> There could be a bug in V11 or E2H0 patchset resulting in
>>>> vcpu_el2_e2h_is_set() returning false?
>>>
>>> The E2H0 series should only force vcpu_el2_e2h_is_set() to return
>>> true, but not set it to false. Can you dump the *guest's* version of
>>> HCR_EL2 at this point?
>>>
>>
>> with V11: vhcr_el2=0x100030080000000 mask=0x100af00ffffffff
> 
> How is this value possible if the write to HCR_EL2 has taken place?
> When do you sample this?

I am not sure how and where it got set. I think, whatever it is set, it 
is due to false return of vcpu_el2_e2h_is_set(). Need to understand/debug.
The vhcr_el2 value I have shared is traced along with hcr in function 
__activate_traps/__compute_hcr.

> 
>> with V10: vhcr_el2=0x488000000
>> with hack+V11: vhcr_el2=0x488000000 mask=0x100af00ffffffff
> 
> Well, of course, if you constrain the value of HCR_EL2...
> 
> 	M.
> 

Thanks,
Ganapat
Marc Zyngier Nov. 24, 2023, 2:32 p.m. UTC | #16
On Fri, 24 Nov 2023 13:22:22 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> > How is this value possible if the write to HCR_EL2 has taken place?
> > When do you sample this?
> 
> I am not sure how and where it got set. I think, whatever it is set,
> it is due to false return of vcpu_el2_e2h_is_set(). Need to
> understand/debug.
> The vhcr_el2 value I have shared is traced along with hcr in function
> __activate_traps/__compute_hcr.

Here's my hunch:

The guest boots with E2H=0, because we don't advertise anything else
on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2
means that HCR_EL2 is writable (to memory) without a trap. But we're
still running with NV1=1.

Subsequently, we access a sysreg that should never trap for a VHE
guest, but we're with the wrong config. Bad things happen.

Unfortunately, NV2 is pretty much incompatible with E2H being updated,
because it cannot perform the changes that this would result into at
the point where they should happen. We can try and do a best effort
handling, but you can always trick it.

Anyway, can you see if the hack below helps? I'm not keen on it at
all, but this would be a good data point.

	M.

From c4b856221661393b884cbf673d100faaa8dc018a Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@kernel.org>
Date: Fri, 26 May 2023 12:16:05 +0100
Subject: [PATCH] KVM: arm64: Opportunistically track HCR_EL2.E2H being flipped

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h       |  9 +++++++--
 arch/arm64/kvm/hyp/include/hyp/switch.h | 13 +++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c         | 10 ++++++++--
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index c91f607e989d..d45ef41de5fb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -655,6 +655,9 @@ struct kvm_vcpu_arch {
 	/* State flags for kernel bookkeeping, unused by the hypervisor code */
 	u8 sflags;
 
+	/* Bookkeeping flags for NV */
+	u8 nvflags;
+
 	/*
 	 * Don't run the guest (internal implementation need).
 	 *
@@ -858,8 +861,6 @@ struct kvm_vcpu_arch {
 #define DEBUG_STATE_SAVE_SPE	__vcpu_single_flag(iflags, BIT(5))
 /* Save TRBE context if active  */
 #define DEBUG_STATE_SAVE_TRBE	__vcpu_single_flag(iflags, BIT(6))
-/* vcpu running in HYP context */
-#define VCPU_HYP_CONTEXT	__vcpu_single_flag(iflags, BIT(7))
 
 /* SVE enabled for host EL0 */
 #define HOST_SVE_ENABLED	__vcpu_single_flag(sflags, BIT(0))
@@ -878,6 +879,10 @@ struct kvm_vcpu_arch {
 /* WFI instruction trapped */
 #define IN_WFI			__vcpu_single_flag(sflags, BIT(7))
 
+/* vcpu running in HYP context */
+#define VCPU_HYP_CONTEXT	__vcpu_single_flag(nvflags, BIT(0))
+/* vcpu entered with HCR_EL2.E2H set */
+#define VCPU_HCR_E2H		__vcpu_single_flag(nvflags, BIT(1))
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
 #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index aed2ea35082c..9c1346116d61 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -669,6 +669,19 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	 */
 	synchronize_vcpu_pstate(vcpu, exit_code);
 
+	if (vcpu_has_nv(vcpu) &&
+	    (!!vcpu_get_flag(vcpu, VCPU_HCR_E2H) ^ vcpu_el2_e2h_is_set(vcpu))) {
+		if (vcpu_el2_e2h_is_set(vcpu)) {
+			sysreg_clear_set(hcr_el2, HCR_NV1, 0);
+			vcpu_set_flag(vcpu, VCPU_HCR_E2H);
+		} else {
+			sysreg_clear_set(hcr_el2, 0, HCR_NV1);
+			vcpu_clear_flag(vcpu, VCPU_HCR_E2H);
+		}
+
+		return true;
+	}
+
 	/*
 	 * Check whether we want to repaint the state one way or
 	 * another.
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 8d1e9d1adabe..395aaa06f358 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -447,10 +447,16 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
-	if (is_hyp_ctxt(vcpu))
+	if (is_hyp_ctxt(vcpu)) {
+		if (vcpu_el2_e2h_is_set(vcpu))
+			vcpu_set_flag(vcpu, VCPU_HCR_E2H);
+		else
+			vcpu_clear_flag(vcpu, VCPU_HCR_E2H);
+
 		vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
-	else
+	} else {
 		vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT);
+	}
 
 	do {
 		/* Jump in the fire! */
Ganapatrao Kulkarni Nov. 27, 2023, 7:26 a.m. UTC | #17
On 24-11-2023 08:02 pm, Marc Zyngier wrote:
> On Fri, 24 Nov 2023 13:22:22 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>> How is this value possible if the write to HCR_EL2 has taken place?
>>> When do you sample this?
>>
>> I am not sure how and where it got set. I think, whatever it is set,
>> it is due to false return of vcpu_el2_e2h_is_set(). Need to
>> understand/debug.
>> The vhcr_el2 value I have shared is traced along with hcr in function
>> __activate_traps/__compute_hcr.
> 
> Here's my hunch:
> 
> The guest boots with E2H=0, because we don't advertise anything else
> on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2
> means that HCR_EL2 is writable (to memory) without a trap. But we're
> still running with NV1=1.
> 
> Subsequently, we access a sysreg that should never trap for a VHE
> guest, but we're with the wrong config. Bad things happen.
> 
> Unfortunately, NV2 is pretty much incompatible with E2H being updated,
> because it cannot perform the changes that this would result into at
> the point where they should happen. We can try and do a best effort
> handling, but you can always trick it.
> 
> Anyway, can you see if the hack below helps? I'm not keen on it at
> all, but this would be a good data point.

Thanks Marc, this diff fixes the issue.
Just wondering what is changed w.r.t to L1 handling from V10 to V11 that 
it requires this trick?
Also why this was not seen on your platform, is it E2H0 enabled?

> 
> 	M.
> 
>  From c4b856221661393b884cbf673d100faaa8dc018a Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz@kernel.org>
> Date: Fri, 26 May 2023 12:16:05 +0100
> Subject: [PATCH] KVM: arm64: Opportunistically track HCR_EL2.E2H being flipped
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_host.h       |  9 +++++++--
>   arch/arm64/kvm/hyp/include/hyp/switch.h | 13 +++++++++++++
>   arch/arm64/kvm/hyp/vhe/switch.c         | 10 ++++++++--
>   3 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index c91f607e989d..d45ef41de5fb 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -655,6 +655,9 @@ struct kvm_vcpu_arch {
>   	/* State flags for kernel bookkeeping, unused by the hypervisor code */
>   	u8 sflags;
>   
> +	/* Bookkeeping flags for NV */
> +	u8 nvflags;
> +
>   	/*
>   	 * Don't run the guest (internal implementation need).
>   	 *
> @@ -858,8 +861,6 @@ struct kvm_vcpu_arch {
>   #define DEBUG_STATE_SAVE_SPE	__vcpu_single_flag(iflags, BIT(5))
>   /* Save TRBE context if active  */
>   #define DEBUG_STATE_SAVE_TRBE	__vcpu_single_flag(iflags, BIT(6))
> -/* vcpu running in HYP context */
> -#define VCPU_HYP_CONTEXT	__vcpu_single_flag(iflags, BIT(7))
>   
>   /* SVE enabled for host EL0 */
>   #define HOST_SVE_ENABLED	__vcpu_single_flag(sflags, BIT(0))
> @@ -878,6 +879,10 @@ struct kvm_vcpu_arch {
>   /* WFI instruction trapped */
>   #define IN_WFI			__vcpu_single_flag(sflags, BIT(7))
>   
> +/* vcpu running in HYP context */
> +#define VCPU_HYP_CONTEXT	__vcpu_single_flag(nvflags, BIT(0))
> +/* vcpu entered with HCR_EL2.E2H set */
> +#define VCPU_HCR_E2H		__vcpu_single_flag(nvflags, BIT(1))
>   
>   /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>   #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index aed2ea35082c..9c1346116d61 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -669,6 +669,19 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>   	 */
>   	synchronize_vcpu_pstate(vcpu, exit_code);
>   
> +	if (vcpu_has_nv(vcpu) &&
> +	    (!!vcpu_get_flag(vcpu, VCPU_HCR_E2H) ^ vcpu_el2_e2h_is_set(vcpu))) {
> +		if (vcpu_el2_e2h_is_set(vcpu)) {
> +			sysreg_clear_set(hcr_el2, HCR_NV1, 0);
> +			vcpu_set_flag(vcpu, VCPU_HCR_E2H);
> +		} else {
> +			sysreg_clear_set(hcr_el2, 0, HCR_NV1);
> +			vcpu_clear_flag(vcpu, VCPU_HCR_E2H);
> +		}
> +
> +		return true;
> +	}
> +
>   	/*
>   	 * Check whether we want to repaint the state one way or
>   	 * another.
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 8d1e9d1adabe..395aaa06f358 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -447,10 +447,16 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   	sysreg_restore_guest_state_vhe(guest_ctxt);
>   	__debug_switch_to_guest(vcpu);
>   
> -	if (is_hyp_ctxt(vcpu))
> +	if (is_hyp_ctxt(vcpu)) {
> +		if (vcpu_el2_e2h_is_set(vcpu))
> +			vcpu_set_flag(vcpu, VCPU_HCR_E2H);
> +		else
> +			vcpu_clear_flag(vcpu, VCPU_HCR_E2H);
> +
>   		vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
> -	else
> +	} else {
>   		vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT);
> +	}
>   
>   	do {
>   		/* Jump in the fire! */

Thanks,
Ganapat
Marc Zyngier Nov. 27, 2023, 9:22 a.m. UTC | #18
On Mon, 27 Nov 2023 07:26:58 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> 
> On 24-11-2023 08:02 pm, Marc Zyngier wrote:
> > On Fri, 24 Nov 2023 13:22:22 +0000,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >>> How is this value possible if the write to HCR_EL2 has taken place?
> >>> When do you sample this?
> >> 
> >> I am not sure how and where it got set. I think, whatever it is set,
> >> it is due to false return of vcpu_el2_e2h_is_set(). Need to
> >> understand/debug.
> >> The vhcr_el2 value I have shared is traced along with hcr in function
> >> __activate_traps/__compute_hcr.
> > 
> > Here's my hunch:
> > 
> > The guest boots with E2H=0, because we don't advertise anything else
> > on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2
> > means that HCR_EL2 is writable (to memory) without a trap. But we're
> > still running with NV1=1.
> > 
> > Subsequently, we access a sysreg that should never trap for a VHE
> > guest, but we're with the wrong config. Bad things happen.
> > 
> > Unfortunately, NV2 is pretty much incompatible with E2H being updated,
> > because it cannot perform the changes that this would result into at
> > the point where they should happen. We can try and do a best effort
> > handling, but you can always trick it.
> > 
> > Anyway, can you see if the hack below helps? I'm not keen on it at
> > all, but this would be a good data point.
> 
> Thanks Marc, this diff fixes the issue.
> Just wondering what is changed w.r.t to L1 handling from V10 to V11
> that it requires this trick?

Not completely sure. Before v11, anything that would trap would be
silently handled by the FEAT_NV code. Now, a trap for something that
is supposed to be redirected to VNCR results in an UNDEF exception.

I suspect that the exception is handled again as a call to
__finalise_el2(), probably because the write to VBAR_EL1 didn't do
what it was supposed to do.

> Also why this was not seen on your platform, is it E2H0 enabled?

It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no
problems, as the guest cannot trick the host into losing track of the
state (which I'm pretty sure can happen even with this ugly hack).

I will probably completely disable NV1 support in the next drop, and
make NV support only VHE guests. Which is the only mode that makes any
sense anyway.

Thanks,

	M.
Ganapatrao Kulkarni Nov. 27, 2023, 10:59 a.m. UTC | #19
On 27-11-2023 02:52 pm, Marc Zyngier wrote:
> On Mon, 27 Nov 2023 07:26:58 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>>
>> On 24-11-2023 08:02 pm, Marc Zyngier wrote:
>>> On Fri, 24 Nov 2023 13:22:22 +0000,
>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>
>>>>> How is this value possible if the write to HCR_EL2 has taken place?
>>>>> When do you sample this?
>>>>
>>>> I am not sure how and where it got set. I think, whatever it is set,
>>>> it is due to false return of vcpu_el2_e2h_is_set(). Need to
>>>> understand/debug.
>>>> The vhcr_el2 value I have shared is traced along with hcr in function
>>>> __activate_traps/__compute_hcr.
>>>
>>> Here's my hunch:
>>>
>>> The guest boots with E2H=0, because we don't advertise anything else
>>> on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2
>>> means that HCR_EL2 is writable (to memory) without a trap. But we're
>>> still running with NV1=1.
>>>
>>> Subsequently, we access a sysreg that should never trap for a VHE
>>> guest, but we're with the wrong config. Bad things happen.
>>>
>>> Unfortunately, NV2 is pretty much incompatible with E2H being updated,
>>> because it cannot perform the changes that this would result into at
>>> the point where they should happen. We can try and do a best effort
>>> handling, but you can always trick it.
>>>
>>> Anyway, can you see if the hack below helps? I'm not keen on it at
>>> all, but this would be a good data point.
>>
>> Thanks Marc, this diff fixes the issue.
>> Just wondering what is changed w.r.t to L1 handling from V10 to V11
>> that it requires this trick?
> 
> Not completely sure. Before v11, anything that would trap would be
> silently handled by the FEAT_NV code. Now, a trap for something that
> is supposed to be redirected to VNCR results in an UNDEF exception.
> 
> I suspect that the exception is handled again as a call to
> __finalise_el2(), probably because the write to VBAR_EL1 didn't do
> what it was supposed to do.
> 
>> Also why this was not seen on your platform, is it E2H0 enabled?
> 
> It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no
> problems, as the guest cannot trick the host into losing track of the
> state (which I'm pretty sure can happen even with this ugly hack).
> 
> I will probably completely disable NV1 support in the next drop, and
> make NV support only VHE guests. Which is the only mode that makes any
> sense anyway.
> 

Thanks, absolutely makes sense to have *VHE-only* L1, looking forward to 
a next drop.

Thanks,
Ganapat
Marc Zyngier Nov. 27, 2023, 11:45 a.m. UTC | #20
On Mon, 27 Nov 2023 10:59:36 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> 
> On 27-11-2023 02:52 pm, Marc Zyngier wrote:
> > On Mon, 27 Nov 2023 07:26:58 +0000,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >> 
> >> 
> >> On 24-11-2023 08:02 pm, Marc Zyngier wrote:
> >>> On Fri, 24 Nov 2023 13:22:22 +0000,
> >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >>>> 
> >>>>> How is this value possible if the write to HCR_EL2 has taken place?
> >>>>> When do you sample this?
> >>>> 
> >>>> I am not sure how and where it got set. I think, whatever it is set,
> >>>> it is due to false return of vcpu_el2_e2h_is_set(). Need to
> >>>> understand/debug.
> >>>> The vhcr_el2 value I have shared is traced along with hcr in function
> >>>> __activate_traps/__compute_hcr.
> >>> 
> >>> Here's my hunch:
> >>> 
> >>> The guest boots with E2H=0, because we don't advertise anything else
> >>> on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2
> >>> means that HCR_EL2 is writable (to memory) without a trap. But we're
> >>> still running with NV1=1.
> >>> 
> >>> Subsequently, we access a sysreg that should never trap for a VHE
> >>> guest, but we're with the wrong config. Bad things happen.
> >>> 
> >>> Unfortunately, NV2 is pretty much incompatible with E2H being updated,
> >>> because it cannot perform the changes that this would result into at
> >>> the point where they should happen. We can try and do a best effort
> >>> handling, but you can always trick it.
> >>> 
> >>> Anyway, can you see if the hack below helps? I'm not keen on it at
> >>> all, but this would be a good data point.
> >> 
> >> Thanks Marc, this diff fixes the issue.
> >> Just wondering what is changed w.r.t to L1 handling from V10 to V11
> >> that it requires this trick?
> > 
> > Not completely sure. Before v11, anything that would trap would be
> > silently handled by the FEAT_NV code. Now, a trap for something that
> > is supposed to be redirected to VNCR results in an UNDEF exception.
> > 
> > I suspect that the exception is handled again as a call to
> > __finalise_el2(), probably because the write to VBAR_EL1 didn't do
> > what it was supposed to do.
> > 
> >> Also why this was not seen on your platform, is it E2H0 enabled?
> > 
> > It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no
> > problems, as the guest cannot trick the host into losing track of the
> > state (which I'm pretty sure can happen even with this ugly hack).
> > 
> > I will probably completely disable NV1 support in the next drop, and
> > make NV support only VHE guests. Which is the only mode that makes any
> > sense anyway.
> > 
> 
> Thanks, absolutely makes sense to have *VHE-only* L1, looking forward
> to a next drop.

Note that this won't be restricted to L1, but will affect *everything.

No non-VHE guest will be supported at any level whatsoever, and NV
will always expose ID_AA64MMFR4_EL1.E2H0=0b1110, indicating that
HCR_EL2.NV1 is RES0, on top of ID_AA64MMFR4_EL1.NV_frac=1 (NV2 only).

	M.
Ganapatrao Kulkarni Nov. 27, 2023, 12:18 p.m. UTC | #21
On 27-11-2023 05:15 pm, Marc Zyngier wrote:
> On Mon, 27 Nov 2023 10:59:36 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>>
>> On 27-11-2023 02:52 pm, Marc Zyngier wrote:
>>> On Mon, 27 Nov 2023 07:26:58 +0000,
>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>
>>>>
>>>>
>>>> On 24-11-2023 08:02 pm, Marc Zyngier wrote:
>>>>> On Fri, 24 Nov 2023 13:22:22 +0000,
>>>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>>>
>>>>>>> How is this value possible if the write to HCR_EL2 has taken place?
>>>>>>> When do you sample this?
>>>>>>
>>>>>> I am not sure how and where it got set. I think, whatever it is set,
>>>>>> it is due to false return of vcpu_el2_e2h_is_set(). Need to
>>>>>> understand/debug.
>>>>>> The vhcr_el2 value I have shared is traced along with hcr in function
>>>>>> __activate_traps/__compute_hcr.
>>>>>
>>>>> Here's my hunch:
>>>>>
>>>>> The guest boots with E2H=0, because we don't advertise anything else
>>>>> on your HW. So we run with NV1=1 until we try to *upgrade* to VHE. NV2
>>>>> means that HCR_EL2 is writable (to memory) without a trap. But we're
>>>>> still running with NV1=1.
>>>>>
>>>>> Subsequently, we access a sysreg that should never trap for a VHE
>>>>> guest, but we're with the wrong config. Bad things happen.
>>>>>
>>>>> Unfortunately, NV2 is pretty much incompatible with E2H being updated,
>>>>> because it cannot perform the changes that this would result into at
>>>>> the point where they should happen. We can try and do a best effort
>>>>> handling, but you can always trick it.
>>>>>
>>>>> Anyway, can you see if the hack below helps? I'm not keen on it at
>>>>> all, but this would be a good data point.
>>>>
>>>> Thanks Marc, this diff fixes the issue.
>>>> Just wondering what is changed w.r.t to L1 handling from V10 to V11
>>>> that it requires this trick?
>>>
>>> Not completely sure. Before v11, anything that would trap would be
>>> silently handled by the FEAT_NV code. Now, a trap for something that
>>> is supposed to be redirected to VNCR results in an UNDEF exception.
>>>
>>> I suspect that the exception is handled again as a call to
>>> __finalise_el2(), probably because the write to VBAR_EL1 didn't do
>>> what it was supposed to do.
>>>
>>>> Also why this was not seen on your platform, is it E2H0 enabled?
>>>
>>> It doesn't have FEAT_E2H0, and that's the whole point. No E2H0, no
>>> problems, as the guest cannot trick the host into losing track of the
>>> state (which I'm pretty sure can happen even with this ugly hack).
>>>
>>> I will probably completely disable NV1 support in the next drop, and
>>> make NV support only VHE guests. Which is the only mode that makes any
>>> sense anyway.
>>>
>>
>> Thanks, absolutely makes sense to have *VHE-only* L1, looking forward
>> to a next drop.
> 
> Note that this won't be restricted to L1, but will affect *everything.
> 
Ok.

> No non-VHE guest will be supported at any level whatsoever, and NV
> will always expose ID_AA64MMFR4_EL1.E2H0=0b1110, indicating that
> HCR_EL2.NV1 is RES0, on top of ID_AA64MMFR4_EL1.NV_frac=1 (NV2 only).

OK, Even I was thinking the same instead of the work-around/trick, then 
I felt it is still needed since the L1 may be any distro kernel and it 
may not have code to interpret/decode ID_AA64MMFR4_EL1.E2H0.


Thanks,
Ganapat
> 
> 	M.
>
Marc Zyngier Nov. 27, 2023, 1:57 p.m. UTC | #22
On Mon, 27 Nov 2023 12:18:45 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> >>> I will probably completely disable NV1 support in the next drop, and
> >>> make NV support only VHE guests. Which is the only mode that makes any
> >>> sense anyway.
> >>> 
> >> 
> >> Thanks, absolutely makes sense to have *VHE-only* L1, looking forward
> >> to a next drop.
> > 
> > Note that this won't be restricted to L1, but will affect *everything.
> > 
> Ok.
> 
> > No non-VHE guest will be supported at any level whatsoever, and NV
> > will always expose ID_AA64MMFR4_EL1.E2H0=0b1110, indicating that
> > HCR_EL2.NV1 is RES0, on top of ID_AA64MMFR4_EL1.NV_frac=1 (NV2 only).
> 
> OK, Even I was thinking the same instead of the work-around/trick,
> then I felt it is still needed since the L1 may be any distro kernel
> and it may not have code to interpret/decode ID_AA64MMFR4_EL1.E2H0.

The problem is that we can't make that reliable. Current Linux kernels
will work with E2H RES1, as you experienced it by hacking KVM. Old
crap won't work, but I can't say I care.

My point is: KVM NV support will be compliant with the architecture.
SW that needs to run with NV will have to understand the version of
the architecture that KVM exposes. If you need distros to be upgraded,
then you can work with them to update their stuff.

	M.
Marc Zyngier Dec. 18, 2023, 12:39 p.m. UTC | #23
On Mon, 20 Nov 2023 13:09:44 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> This is the 5th drop of NV support on arm64 for this year, and most
> probably the last one for this side of Christmas.

Unless someone objects, I'm planning to take the first 10 patches of
this series into 6.8 (with the dependency on ID_AA64MMFR4_EL1.NV_frac
in patch #1 removed).

	M.
Oliver Upton Dec. 18, 2023, 7:51 p.m. UTC | #24
On Mon, Dec 18, 2023 at 12:39:26PM +0000, Marc Zyngier wrote:
> On Mon, 20 Nov 2023 13:09:44 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
> > 
> > This is the 5th drop of NV support on arm64 for this year, and most
> > probably the last one for this side of Christmas.
> 
> Unless someone objects, I'm planning to take the first 10 patches of
> this series into 6.8 (with the dependency on ID_AA64MMFR4_EL1.NV_frac
> in patch #1 removed).

For the first 10 patches:

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Marc Zyngier Dec. 19, 2023, 10:32 a.m. UTC | #25
On Mon, 20 Nov 2023 13:09:44 +0000, Marc Zyngier wrote:
> This is the 5th drop of NV support on arm64 for this year, and most
> probably the last one for this side of Christmas.
> 
> For the previous episodes, see [1].
> 
> What's changed:
> 
> [...]

Applied to next, thanks!

[01/43] arm64: cpufeatures: Restrict NV support to FEAT_NV2
        commit: 2bfc654b89c4dd1c372bb2cbba6b5a0eb578d214
[02/43] KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()
        commit: 111903d1f5b9334d1100e1c6ee08e740fa374d91
[03/43] KVM: arm64: nv: Compute NV view of idregs as a one-off
        commit: 3ed0b5123cd5a2a4f1fe4e594e7bf319e9eaf1da
[04/43] KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR
        commit: 4d4f52052ba8357f1591cb9bc3086541070711af
[05/43] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
        commit: 3606e0b2e462164bced151dbb54ccfe42ac6c35b
[06/43] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
        commit: 60ce16cc122aad999129d23061fa35f63d5b1e9b
[07/43] KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling
        commit: 2733dd10701abc6ab23d65a732f58fbeb80bd203
[08/43] KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers
        commit: 9b9cce60be85e6807bdb0eaa2f520e78dbab0659
[09/43] KVM: arm64: nv: Map VNCR-capable registers to a separate page
        commit: d8bd48e3f0ee9e1fdba2a2e453155a5354e48a8d
[10/43] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
        commit: fedc612314acfebf506e071bf3a941076aa56d10

Cheers,

	M.