diff mbox series

[v2,5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch

Message ID 20240322170945.3292593-6-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Move host-specific data out of kvm_vcpu_arch | expand

Commit Message

Marc Zyngier March 22, 2024, 5:09 p.m. UTC
In retrospect, it is fairly obvious that the FP state ownership
is only meaningful for a given CPU, and that locating this
information in the vcpu was just a mistake.

Move the ownership tracking into the host data structure, and
rename it from fp_state to fp_owner, which is a better description
(name suggested by Mark Brown).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h    |  4 ++--
 arch/arm64/include/asm/kvm_host.h       | 14 +++++++-------
 arch/arm64/kvm/arm.c                    |  6 ------
 arch/arm64/kvm/fpsimd.c                 | 10 +++++-----
 arch/arm64/kvm/hyp/include/hyp/switch.h |  6 +++---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  2 --
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c         |  2 +-
 8 files changed, 19 insertions(+), 27 deletions(-)

Comments

Mark Brown March 22, 2024, 5:52 p.m. UTC | #1
On Fri, Mar 22, 2024 at 05:09:45PM +0000, Marc Zyngier wrote:
> In retrospect, it is fairly obvious that the FP state ownership
> is only meaningful for a given CPU, and that locating this
> information in the vcpu was just a mistake.
> 
> Move the ownership tracking into the host data structure, and
> rename it from fp_state to fp_owner, which is a better description
> (name suggested by Mark Brown).

There's still the thing with the interaction with SME support - to
summarise what I think you're asking for the userspace ABI there:

 - Create a requirement for userspace to set SVCR prior to setting any
   vector impacted register to ensure the correct format and that data
   isn't zeroed when SVCR is set.
 - Use the value of SVCR.SM and the guest maximum SVE and SME VLs to
   select the currently visible vector length for the Z, P and FFR
   registers, and if FFR can be accessed if not available in streaming
   mode.
 - Changes to SVCR.SM zero register data in the same way writes to the
   physical register do.
 - This also implies discarding or failing all writes to ZA and ZT0
   unless SVCR.ZA is set for consistency.
 - Add support for the V registers in the sysreg interface when SVE is
   enabled.

then the implementation can do what it likes to achieve that, the most
obvious thing being to store in native format for the current hardware
mode based on SVCR.{SM,ZA}.  Does that sound about right?
Marc Zyngier March 23, 2024, 7:06 p.m. UTC | #2
On Fri, 22 Mar 2024 17:52:45 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Fri, Mar 22, 2024 at 05:09:45PM +0000, Marc Zyngier wrote:
> > In retrospect, it is fairly obvious that the FP state ownership
> > is only meaningful for a given CPU, and that locating this
> > information in the vcpu was just a mistake.
> > 
> > Move the ownership tracking into the host data structure, and
> > rename it from fp_state to fp_owner, which is a better description
> > (name suggested by Mark Brown).
> 
> There's still the thing with the interaction with SME support - to
> summarise what I think you're asking for the userspace ABI there:

Well, the SME support is still pretty prospective, and this patch has
no impact on an existing ABI.

> 
>  - Create a requirement for userspace to set SVCR prior to setting any
>    vector impacted register to ensure the correct format and that data
>    isn't zeroed when SVCR is set.
>  - Use the value of SVCR.SM and the guest maximum SVE and SME VLs to
>    select the currently visible vector length for the Z, P and FFR
>    registers, and if FFR can be accessed if not available in streaming
>    mode.
>  - Changes to SVCR.SM zero register data in the same way writes to the
>    physical register do.
>  - This also implies discarding or failing all writes to ZA and ZT0
>    unless SVCR.ZA is set for consistency.

All of that seems reasonable, as long as it is comes as a consequence
of enabling SME. It should be run by the QEMU people though, as they
are the ones that will make use of it. Please Cc them when you post
the patches or even better, reach out to them beforehand.

>  - Add support for the V registers in the sysreg interface when SVE is
>    enabled.

We already support the V registers with KVM_REG_ARM_CORE_REG(). Why
would you add any new interface for this?  The kernel should be
perfectly capable of dealing with the placement of the data in the
internal structures, and there is no need to tie the userspace ABI to
how we deal with that placement (kvm_regs is already purely
userspace).

> then the implementation can do what it likes to achieve that, the most
> obvious thing being to store in native format for the current hardware
> mode based on SVCR.{SM,ZA}.  Does that sound about right?

Apart from the statement about the V registers, this seems OK. But
again, I want to see this agreed with the QEMU folks.

	M.
Mark Brown March 25, 2024, 12:27 a.m. UTC | #3
On Sat, Mar 23, 2024 at 07:06:24PM +0000, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > There's still the thing with the interaction with SME support - to
> > summarise what I think you're asking for the userspace ABI there:

> Well, the SME support is still pretty prospective, and this patch has
> no impact on an existing ABI.

Sure, hopefully I should have a new verison out this release - it was
mostly held up waiting for the ID register parsing framework you got
merged last release.  Though holidays do make things a bit tight timing
wise.

> >  - Add support for the V registers in the sysreg interface when SVE is
> >    enabled.

> We already support the V registers with KVM_REG_ARM_CORE_REG(). Why
> would you add any new interface for this?  The kernel should be
> perfectly capable of dealing with the placement of the data in the
> internal structures, and there is no need to tie the userspace ABI to
> how we deal with that placement (kvm_regs is already purely
> userspace).

This was referring to the fact that currently when SVE is enabled access
to the V registers as V registers via _CORE_REG() is blocked and they
can only be accessed as a subset of the Z registers (see the check at
the end of core_reg_size_from_offset() in guest.c).

> > then the implementation can do what it likes to achieve that, the most
> > obvious thing being to store in native format for the current hardware
> > mode based on SVCR.{SM,ZA}.  Does that sound about right?

> Apart from the statement about the V registers, this seems OK. But
> again, I want to see this agreed with the QEMU folks.

Great, thanks.
Mark Brown March 25, 2024, 12:28 a.m. UTC | #4
On Fri, Mar 22, 2024 at 05:09:45PM +0000, Marc Zyngier wrote:
> In retrospect, it is fairly obvious that the FP state ownership
> is only meaningful for a given CPU, and that locating this
> information in the vcpu was just a mistake.
> 
> Move the ownership tracking into the host data structure, and
> rename it from fp_state to fp_owner, which is a better description
> (name suggested by Mark Brown).

Reviewed-by: Mark Brown <broonie@kernel.org>
Marc Zyngier March 25, 2024, 9:23 a.m. UTC | #5
On Mon, 25 Mar 2024 00:27:43 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> > >  - Add support for the V registers in the sysreg interface when SVE is
> > >    enabled.
> 
> > We already support the V registers with KVM_REG_ARM_CORE_REG(). Why
> > would you add any new interface for this?  The kernel should be
> > perfectly capable of dealing with the placement of the data in the
> > internal structures, and there is no need to tie the userspace ABI to
> > how we deal with that placement (kvm_regs is already purely
> > userspace).
> 
> This was referring to the fact that currently when SVE is enabled access
> to the V registers as V registers via _CORE_REG() is blocked and they
> can only be accessed as a subset of the Z registers (see the check at
> the end of core_reg_size_from_offset() in guest.c).

But what behaviour do you expect from allowing such a write? Insert in
place? Or zero the upper bits of the vector, as per R_WKYLB? One is
wrong, and the other wrecks havoc on unsuspecting userspace.

My take on this is that when a VM is S*E aware, only the writes to the
largest *enabled* registers should take place. This is similar to what
we do for FP/SIMD: we only allow writes to the V registers, and not to
Q, D, S, H or B, although that happens by construction. For S*E,
dropping the write on the floor (or return some error that userspace
will understand as benign) is the least bad option.

	M.
Mark Brown March 25, 2024, 2:57 p.m. UTC | #6
On Mon, Mar 25, 2024 at 09:23:18AM +0000, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > This was referring to the fact that currently when SVE is enabled access
> > to the V registers as V registers via _CORE_REG() is blocked and they
> > can only be accessed as a subset of the Z registers (see the check at
> > the end of core_reg_size_from_offset() in guest.c).

> But what behaviour do you expect from allowing such a write? Insert in
> place? Or zero the upper bits of the vector, as per R_WKYLB? One is
> wrong, and the other wrecks havoc on unsuspecting userspace.

It would have to be the former due to the ABI issue I think.

> My take on this is that when a VM is S*E aware, only the writes to the
> largest *enabled* registers should take place. This is similar to what
> we do for FP/SIMD: we only allow writes to the V registers, and not to
> Q, D, S, H or B, although that happens by construction. For S*E,
> dropping the write on the floor (or return some error that userspace
> will understand as benign) is the least bad option.

OK, this does mean that in the case of a SME only guest we'll end up
with registers not just changing size but appearing and disappearing
depending on SVCR.SM.  It wasn't clear to me that this was a good idea
from an ABI point of view, it's a level up beyond the size changing
thing and there's a tradeoff with the "model what the architecture does"
model.
Marc Zyngier March 27, 2024, 9:04 a.m. UTC | #7
On Mon, 25 Mar 2024 14:57:27 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (quoted-printable)>]
> On Mon, Mar 25, 2024 at 09:23:18AM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> 
> > > This was referring to the fact that currently when SVE is enabled access
> > > to the V registers as V registers via _CORE_REG() is blocked and they
> > > can only be accessed as a subset of the Z registers (see the check at
> > > the end of core_reg_size_from_offset() in guest.c).
> 
> > But what behaviour do you expect from allowing such a write? Insert in
> > place? Or zero the upper bits of the vector, as per R_WKYLB? One is
> > wrong, and the other wrecks havoc on unsuspecting userspace.
> 
> It would have to be the former due to the ABI issue I think.

No, that's an architecture violation.

> > My take on this is that when a VM is S*E aware, only the writes to the
> > largest *enabled* registers should take place. This is similar to what
> > we do for FP/SIMD: we only allow writes to the V registers, and not to
> > Q, D, S, H or B, although that happens by construction. For S*E,
> > dropping the write on the floor (or return some error that userspace
> > will understand as benign) is the least bad option.
> 
> OK, this does mean that in the case of a SME only guest we'll end up
> with registers not just changing size but appearing and disappearing
> depending on SVCR.SM.  It wasn't clear to me that this was a good idea
> from an ABI point of view, it's a level up beyond the size changing
> thing and there's a tradeoff with the "model what the architecture does"
> model.

The registers don't have to disappear, they just have to become WI,
just like it is the case today with SVE. Yes, the ABI becomes
contextual, but we're past the point where we can treat the various
register files as a bag of bits that can be save/restored without any
ordering.

We already have similar requirements for complex parts of KVM where
ordering is required, such as GICv3 and the ITS, and we follow what
the architecture requires. The same thing applies for the CPU.

	M.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index debc3753d2ef..b17f2269fb81 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -594,7 +594,7 @@  static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
 		val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
 
 		if (!vcpu_has_sve(vcpu) ||
-		    (vcpu->arch.fp_state != FP_STATE_GUEST_OWNED))
+		    (*host_data_ptr(fp_owner) != FP_STATE_GUEST_OWNED))
 			val |= CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN;
 		if (cpus_have_final_cap(ARM64_SME))
 			val |= CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN;
@@ -602,7 +602,7 @@  static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
 		val = CPTR_NVHE_EL2_RES1;
 
 		if (vcpu_has_sve(vcpu) &&
-		    (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED))
+		    (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED))
 			val |= CPTR_EL2_TZ;
 		if (cpus_have_final_cap(ARM64_SME))
 			val &= ~CPTR_EL2_TSM;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 838cdee2ecf7..951303e976de 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -545,6 +545,13 @@  struct kvm_host_data {
 	struct kvm_cpu_context host_ctxt;
 	struct user_fpsimd_state *fpsimd_state;	/* hyp VA */
 
+	/* Ownership of the FP regs */
+	enum {
+		FP_STATE_FREE,
+		FP_STATE_HOST_OWNED,
+		FP_STATE_GUEST_OWNED,
+	} fp_owner;
+
 	/*
 	 * host_debug_state contains the host registers which are
 	 * saved and restored during world switches.
@@ -621,13 +628,6 @@  struct kvm_vcpu_arch {
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
-	/* Ownership of the FP regs */
-	enum {
-		FP_STATE_FREE,
-		FP_STATE_HOST_OWNED,
-		FP_STATE_GUEST_OWNED,
-	} fp_state;
-
 	/* Configuration flags, set once and for all before the vcpu can run */
 	u8 cflags;
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a24287c3ba99..66d8112da268 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -378,12 +378,6 @@  int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
 
-	/*
-	 * Default value for the FP state, will be overloaded at load
-	 * time if we support FP (pretty likely)
-	 */
-	vcpu->arch.fp_state = FP_STATE_FREE;
-
 	/* Set up the timer */
 	kvm_timer_vcpu_init(vcpu);
 
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index e6bd99358615..98f8b9272e24 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -84,7 +84,7 @@  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 	 * guest in kvm_arch_vcpu_ctxflush_fp() and override this to
 	 * FP_STATE_FREE if the flag set.
 	 */
-	vcpu->arch.fp_state = FP_STATE_HOST_OWNED;
+	*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
 	*host_data_ptr(fpsimd_state) = kern_hyp_va(&current->thread.uw.fpsimd_state);
 
 	vcpu_clear_flag(vcpu, HOST_SVE_ENABLED);
@@ -109,7 +109,7 @@  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 		 * been saved, this is very unlikely to happen.
 		 */
 		if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) {
-			vcpu->arch.fp_state = FP_STATE_FREE;
+			*host_data_ptr(fp_owner) = FP_STATE_FREE;
 			fpsimd_save_and_flush_cpu_state();
 		}
 	}
@@ -125,7 +125,7 @@  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
 {
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
-		vcpu->arch.fp_state = FP_STATE_FREE;
+		*host_data_ptr(fp_owner) = FP_STATE_FREE;
 }
 
 /*
@@ -141,7 +141,7 @@  void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 
 	WARN_ON_ONCE(!irqs_disabled());
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) {
 
 		/*
 		 * Currently we do not support SME guests so SVCR is
@@ -194,7 +194,7 @@  void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 		isb();
 	}
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) {
 		if (vcpu_has_sve(vcpu)) {
 			__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 6def6ad8dd48..2629420d0659 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -42,7 +42,7 @@  extern struct kvm_exception_table_entry __stop___kvm_ex_table;
 /* Check whether the FP regs are owned by the guest */
 static inline bool guest_owns_fp_regs(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.fp_state == FP_STATE_GUEST_OWNED;
+	return *host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED;
 }
 
 /* Save the 32-bit only FPSIMD system register state */
@@ -376,7 +376,7 @@  static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	isb();
 
 	/* Write out the host state if it's in the registers */
-	if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
+	if (*host_data_ptr(fp_owner) == FP_STATE_HOST_OWNED)
 		__fpsimd_save_state(*host_data_ptr(fpsimd_state));
 
 	/* Restore the guest state */
@@ -389,7 +389,7 @@  static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (!(read_sysreg(hcr_el2) & HCR_RW))
 		write_sysreg(__vcpu_sys_reg(vcpu, FPEXC32_EL2), fpexc32_el2);
 
-	vcpu->arch.fp_state = FP_STATE_GUEST_OWNED;
+	*host_data_ptr(fp_owner) = FP_STATE_GUEST_OWNED;
 
 	return true;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index c5f625dc1f07..26561c562f7a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -39,7 +39,6 @@  static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	hyp_vcpu->vcpu.arch.cptr_el2	= host_vcpu->arch.cptr_el2;
 
 	hyp_vcpu->vcpu.arch.iflags	= host_vcpu->arch.iflags;
-	hyp_vcpu->vcpu.arch.fp_state	= host_vcpu->arch.fp_state;
 
 	hyp_vcpu->vcpu.arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
 
@@ -63,7 +62,6 @@  static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	host_vcpu->arch.fault		= hyp_vcpu->vcpu.arch.fault;
 
 	host_vcpu->arch.iflags		= hyp_vcpu->vcpu.arch.iflags;
-	host_vcpu->arch.fp_state	= hyp_vcpu->vcpu.arch.fp_state;
 
 	host_cpu_if->vgic_hcr		= hyp_cpu_if->vgic_hcr;
 	for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 544a419b9a39..1f82d531a494 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -337,7 +337,7 @@  int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_state_nvhe(host_ctxt);
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED)
 		__fpsimd_save_fpexc32(vcpu);
 
 	__debug_switch_to_host(vcpu);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 14b7a6bc5909..b92f9fe2d50e 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -258,7 +258,7 @@  static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED)
 		__fpsimd_save_fpexc32(vcpu);
 
 	__debug_switch_to_host(vcpu);