Message ID | 20240322170945.3292593-6-maz@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Move host-specific data out of kvm_vcpu_arch | expand |
On Fri, Mar 22, 2024 at 05:09:45PM +0000, Marc Zyngier wrote: > In retrospect, it is fairly obvious that the FP state ownership > is only meaningful for a given CPU, and that locating this > information in the vcpu was just a mistake. > > Move the ownership tracking into the host data structure, and > rename it from fp_state to fp_owner, which is a better description > (name suggested by Mark Brown). There's still the thing with the interaction with SME support - to summarise what I think you're asking for the userspace ABI there: - Create a requirement for userspace to set SVCR prior to setting any vector impacted register to ensure the correct format and that data isn't zeroed when SVCR is set. - Use the value of SVCR.SM and the guest maximum SVE and SME VLs to select the currently visible vector length for the Z, P and FFR registers, and if FFR can be accessed if not available in streaming mode. - Changes to SVCR.SM zero register data in the same way writes to the physical register do. - This also implies discarding or failing all writes to ZA and ZT0 unless SVCR.ZA is set for consistency. - Add support for the V registers in the sysreg interface when SVE is enabled. then the implementation can do what it likes to achieve that, the most obvious thing being to store in native format for the current hardware mode based on SVCR.{SM,ZA}. Does that sound about right?
On Fri, 22 Mar 2024 17:52:45 +0000, Mark Brown <broonie@kernel.org> wrote: > > On Fri, Mar 22, 2024 at 05:09:45PM +0000, Marc Zyngier wrote: > > In retrospect, it is fairly obvious that the FP state ownership > > is only meaningful for a given CPU, and that locating this > > information in the vcpu was just a mistake. > > > > Move the ownership tracking into the host data structure, and > > rename it from fp_state to fp_owner, which is a better description > > (name suggested by Mark Brown). > > There's still the thing with the interaction with SME support - to > summarise what I think you're asking for the userspace ABI there: Well, the SME support is still pretty prospective, and this patch has no impact on an existing ABI. > > - Create a requirement for userspace to set SVCR prior to setting any > vector impacted register to ensure the correct format and that data > isn't zeroed when SVCR is set. > - Use the value of SVCR.SM and the guest maximum SVE and SME VLs to > select the currently visible vector length for the Z, P and FFR > registers, and if FFR can be accessed if not available in streaming > mode. > - Changes to SVCR.SM zero register data in the same way writes to the > physical register do. > - This also implies discarding or failing all writes to ZA and ZT0 > unless SVCR.ZA is set for consistency. All of that seems reasonable, as long as it is comes as a consequence of enabling SME. It should be run by the QEMU people though, as they are the ones that will make use of it. Please Cc them when you post the patches or even better, reach out to them beforehand. > - Add support for the V registers in the sysreg interface when SVE is > enabled. We already support the V registers with KVM_REG_ARM_CORE_REG(). Why would you add any new interface for this? The kernel should be perfectly capable of dealing with the placement of the data in the internal structures, and there is no need to tie the userspace ABI to how we deal with that placement (kvm_regs is already purely userspace). > then the implementation can do what it likes to achieve that, the most > obvious thing being to store in native format for the current hardware > mode based on SVCR.{SM,ZA}. Does that sound about right? Apart from the statement about the V registers, this seems OK. But again, I want to see this agreed with the QEMU folks. M.
On Sat, Mar 23, 2024 at 07:06:24PM +0000, Marc Zyngier wrote: > Mark Brown <broonie@kernel.org> wrote: > > There's still the thing with the interaction with SME support - to > > summarise what I think you're asking for the userspace ABI there: > Well, the SME support is still pretty prospective, and this patch has > no impact on an existing ABI. Sure, hopefully I should have a new verison out this release - it was mostly held up waiting for the ID register parsing framework you got merged last release. Though holidays do make things a bit tight timing wise. > > - Add support for the V registers in the sysreg interface when SVE is > > enabled. > We already support the V registers with KVM_REG_ARM_CORE_REG(). Why > would you add any new interface for this? The kernel should be > perfectly capable of dealing with the placement of the data in the > internal structures, and there is no need to tie the userspace ABI to > how we deal with that placement (kvm_regs is already purely > userspace). This was referring to the fact that currently when SVE is enabled access to the V registers as V registers via _CORE_REG() is blocked and they can only be accessed as a subset of the Z registers (see the check at the end of core_reg_size_from_offset() in guest.c). > > then the implementation can do what it likes to achieve that, the most > > obvious thing being to store in native format for the current hardware > > mode based on SVCR.{SM,ZA}. Does that sound about right? > Apart from the statement about the V registers, this seems OK. But > again, I want to see this agreed with the QEMU folks. Great, thanks.
On Fri, Mar 22, 2024 at 05:09:45PM +0000, Marc Zyngier wrote: > In retrospect, it is fairly obvious that the FP state ownership > is only meaningful for a given CPU, and that locating this > information in the vcpu was just a mistake. > > Move the ownership tracking into the host data structure, and > rename it from fp_state to fp_owner, which is a better description > (name suggested by Mark Brown). Reviewed-by: Mark Brown <broonie@kernel.org>
On Mon, 25 Mar 2024 00:27:43 +0000, Mark Brown <broonie@kernel.org> wrote: > > > > - Add support for the V registers in the sysreg interface when SVE is > > > enabled. > > > We already support the V registers with KVM_REG_ARM_CORE_REG(). Why > > would you add any new interface for this? The kernel should be > > perfectly capable of dealing with the placement of the data in the > > internal structures, and there is no need to tie the userspace ABI to > > how we deal with that placement (kvm_regs is already purely > > userspace). > > This was referring to the fact that currently when SVE is enabled access > to the V registers as V registers via _CORE_REG() is blocked and they > can only be accessed as a subset of the Z registers (see the check at > the end of core_reg_size_from_offset() in guest.c). But what behaviour do you expect from allowing such a write? Insert in place? Or zero the upper bits of the vector, as per R_WKYLB? One is wrong, and the other wrecks havoc on unsuspecting userspace. My take on this is that when a VM is S*E aware, only the writes to the largest *enabled* registers should take place. This is similar to what we do for FP/SIMD: we only allow writes to the V registers, and not to Q, D, S, H or B, although that happens by construction. For S*E, dropping the write on the floor (or return some error that userspace will understand as benign) is the least bad option. M.
On Mon, Mar 25, 2024 at 09:23:18AM +0000, Marc Zyngier wrote: > Mark Brown <broonie@kernel.org> wrote: > > This was referring to the fact that currently when SVE is enabled access > > to the V registers as V registers via _CORE_REG() is blocked and they > > can only be accessed as a subset of the Z registers (see the check at > > the end of core_reg_size_from_offset() in guest.c). > But what behaviour do you expect from allowing such a write? Insert in > place? Or zero the upper bits of the vector, as per R_WKYLB? One is > wrong, and the other wrecks havoc on unsuspecting userspace. It would have to be the former due to the ABI issue I think. > My take on this is that when a VM is S*E aware, only the writes to the > largest *enabled* registers should take place. This is similar to what > we do for FP/SIMD: we only allow writes to the V registers, and not to > Q, D, S, H or B, although that happens by construction. For S*E, > dropping the write on the floor (or return some error that userspace > will understand as benign) is the least bad option. OK, this does mean that in the case of a SME only guest we'll end up with registers not just changing size but appearing and disappearing depending on SVCR.SM. It wasn't clear to me that this was a good idea from an ABI point of view, it's a level up beyond the size changing thing and there's a tradeoff with the "model what the architecture does" model.
On Mon, 25 Mar 2024 14:57:27 +0000, Mark Brown <broonie@kernel.org> wrote: > > [1 <text/plain; us-ascii (quoted-printable)>] > On Mon, Mar 25, 2024 at 09:23:18AM +0000, Marc Zyngier wrote: > > Mark Brown <broonie@kernel.org> wrote: > > > > This was referring to the fact that currently when SVE is enabled access > > > to the V registers as V registers via _CORE_REG() is blocked and they > > > can only be accessed as a subset of the Z registers (see the check at > > > the end of core_reg_size_from_offset() in guest.c). > > > But what behaviour do you expect from allowing such a write? Insert in > > place? Or zero the upper bits of the vector, as per R_WKYLB? One is > > wrong, and the other wrecks havoc on unsuspecting userspace. > > It would have to be the former due to the ABI issue I think. No, that's an architecture violation. > > My take on this is that when a VM is S*E aware, only the writes to the > > largest *enabled* registers should take place. This is similar to what > > we do for FP/SIMD: we only allow writes to the V registers, and not to > > Q, D, S, H or B, although that happens by construction. For S*E, > > dropping the write on the floor (or return some error that userspace > > will understand as benign) is the least bad option. > > OK, this does mean that in the case of a SME only guest we'll end up > with registers not just changing size but appearing and disappearing > depending on SVCR.SM. It wasn't clear to me that this was a good idea > from an ABI point of view, it's a level up beyond the size changing > thing and there's a tradeoff with the "model what the architecture does" > model. The registers don't have to disappear, they just have to become WI, just like it is the case today with SVE. Yes, the ABI becomes contextual, but we're past the point where we can treat the various register files as a bag of bits that can be save/restored without any ordering. We already have similar requirements for complex parts of KVM where ordering is required, such as GICv3 and the ITS, and we follow what the architecture requires. The same thing applies for the CPU. M.
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index debc3753d2ef..b17f2269fb81 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -594,7 +594,7 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu) val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN); if (!vcpu_has_sve(vcpu) || - (vcpu->arch.fp_state != FP_STATE_GUEST_OWNED)) + (*host_data_ptr(fp_owner) != FP_STATE_GUEST_OWNED)) val |= CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN; if (cpus_have_final_cap(ARM64_SME)) val |= CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN; @@ -602,7 +602,7 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu) val = CPTR_NVHE_EL2_RES1; if (vcpu_has_sve(vcpu) && - (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)) + (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED)) val |= CPTR_EL2_TZ; if (cpus_have_final_cap(ARM64_SME)) val &= ~CPTR_EL2_TSM; diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 838cdee2ecf7..951303e976de 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -545,6 +545,13 @@ struct kvm_host_data { struct kvm_cpu_context host_ctxt; struct user_fpsimd_state *fpsimd_state; /* hyp VA */ + /* Ownership of the FP regs */ + enum { + FP_STATE_FREE, + FP_STATE_HOST_OWNED, + FP_STATE_GUEST_OWNED, + } fp_owner; + /* * host_debug_state contains the host registers which are * saved and restored during world switches. @@ -621,13 +628,6 @@ struct kvm_vcpu_arch { /* Exception Information */ struct kvm_vcpu_fault_info fault; - /* Ownership of the FP regs */ - enum { - FP_STATE_FREE, - FP_STATE_HOST_OWNED, - FP_STATE_GUEST_OWNED, - } fp_state; - /* Configuration flags, set once and for all before the vcpu can run */ u8 cflags; diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index a24287c3ba99..66d8112da268 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -378,12 +378,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO; - /* - * Default value for the FP state, will be overloaded at load - * time if we support FP (pretty likely) - */ - vcpu->arch.fp_state = FP_STATE_FREE; - /* Set up the timer */ kvm_timer_vcpu_init(vcpu); diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index e6bd99358615..98f8b9272e24 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -84,7 +84,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) * guest in kvm_arch_vcpu_ctxflush_fp() and override this to * FP_STATE_FREE if the flag set. */ - vcpu->arch.fp_state = FP_STATE_HOST_OWNED; + *host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED; *host_data_ptr(fpsimd_state) = kern_hyp_va(¤t->thread.uw.fpsimd_state); vcpu_clear_flag(vcpu, HOST_SVE_ENABLED); @@ -109,7 +109,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) * been saved, this is very unlikely to happen. */ if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) { - vcpu->arch.fp_state = FP_STATE_FREE; + *host_data_ptr(fp_owner) = FP_STATE_FREE; fpsimd_save_and_flush_cpu_state(); } } @@ -125,7 +125,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu) { if (test_thread_flag(TIF_FOREIGN_FPSTATE)) - vcpu->arch.fp_state = FP_STATE_FREE; + *host_data_ptr(fp_owner) = FP_STATE_FREE; } /* @@ -141,7 +141,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) WARN_ON_ONCE(!irqs_disabled()); - if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) { + if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) { /* * Currently we do not support SME guests so SVCR is @@ -194,7 +194,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) isb(); } - if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) { + if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) { if (vcpu_has_sve(vcpu)) { __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR); diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 6def6ad8dd48..2629420d0659 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -42,7 +42,7 @@ extern struct kvm_exception_table_entry __stop___kvm_ex_table; /* Check whether the FP regs are owned by the guest */ static inline bool guest_owns_fp_regs(struct kvm_vcpu *vcpu) { - return vcpu->arch.fp_state == FP_STATE_GUEST_OWNED; + return *host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED; } /* Save the 32-bit only FPSIMD system register state */ @@ -376,7 +376,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) isb(); /* Write out the host state if it's in the registers */ - if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED) + if (*host_data_ptr(fp_owner) == FP_STATE_HOST_OWNED) __fpsimd_save_state(*host_data_ptr(fpsimd_state)); /* Restore the guest state */ @@ -389,7 +389,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) if (!(read_sysreg(hcr_el2) & HCR_RW)) write_sysreg(__vcpu_sys_reg(vcpu, FPEXC32_EL2), fpexc32_el2); - vcpu->arch.fp_state = FP_STATE_GUEST_OWNED; + *host_data_ptr(fp_owner) = FP_STATE_GUEST_OWNED; return true; } diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index c5f625dc1f07..26561c562f7a 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -39,7 +39,6 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu) hyp_vcpu->vcpu.arch.cptr_el2 = host_vcpu->arch.cptr_el2; hyp_vcpu->vcpu.arch.iflags = host_vcpu->arch.iflags; - hyp_vcpu->vcpu.arch.fp_state = host_vcpu->arch.fp_state; hyp_vcpu->vcpu.arch.debug_ptr = kern_hyp_va(host_vcpu->arch.debug_ptr); @@ -63,7 +62,6 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu) host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault; host_vcpu->arch.iflags = hyp_vcpu->vcpu.arch.iflags; - host_vcpu->arch.fp_state = hyp_vcpu->vcpu.arch.fp_state; host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr; for (i = 0; i < hyp_cpu_if->used_lrs; ++i) diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 544a419b9a39..1f82d531a494 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -337,7 +337,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu) __sysreg_restore_state_nvhe(host_ctxt); - if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) + if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) __fpsimd_save_fpexc32(vcpu); __debug_switch_to_host(vcpu); diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index 14b7a6bc5909..b92f9fe2d50e 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -258,7 +258,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) sysreg_restore_host_state_vhe(host_ctxt); - if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) + if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) __fpsimd_save_fpexc32(vcpu); __debug_switch_to_host(vcpu);
In retrospect, it is fairly obvious that the FP state ownership is only meaningful for a given CPU, and that locating this information in the vcpu was just a mistake. Move the ownership tracking into the host data structure, and rename it from fp_state to fp_owner, which is a better description (name suggested by Mark Brown). Signed-off-by: Marc Zyngier <maz@kernel.org> --- arch/arm64/include/asm/kvm_emulate.h | 4 ++-- arch/arm64/include/asm/kvm_host.h | 14 +++++++------- arch/arm64/kvm/arm.c | 6 ------ arch/arm64/kvm/fpsimd.c | 10 +++++----- arch/arm64/kvm/hyp/include/hyp/switch.h | 6 +++--- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 -- arch/arm64/kvm/hyp/nvhe/switch.c | 2 +- arch/arm64/kvm/hyp/vhe/switch.c | 2 +- 8 files changed, 19 insertions(+), 27 deletions(-)