diff mbox series

[v2,2/5] KVM: arm64: Get rid of host SVE tracking/saving

Message ID 20211028111640.3663631-3-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Rework FPSIMD/SVE tracking | expand

Commit Message

Marc Zyngier Oct. 28, 2021, 11:16 a.m. UTC
The SVE host tracking in KVM is pretty involved. It relies on a
set of flags tracking the ownership of the SVE register, as well
as that of the EL0 access.

It is also pretty scary: __hyp_sve_save_host() computes
a thread_struct pointer and obtains a sve_state which gets directly
accessed without further ado, even on nVHE. How can this even work?

The answer to that is that it doesn't, and that this is mostly dead
code. Closer examination shows that on executing a syscall, userspace
loses its SVE state entirely. This is part of the ABI. Another
thing to notice is that although the kernel provides helpers such as
kernel_neon_begin()/end(), they only deal with the FP/NEON state,
and not SVE.

Given that you can only execute a guest as the result of a syscall,
and that the kernel cannot use SVE by itself, it becomes pretty
obvious that there is never any host SVE state to save, and that
this code is only there to increase confusion.

Get rid of the TIF_SVE tracking and host save infrastructure altogether.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h       |  1 -
 arch/arm64/kvm/fpsimd.c                 | 17 +++++-----------
 arch/arm64/kvm/hyp/include/hyp/switch.h | 27 +++----------------------
 3 files changed, 8 insertions(+), 37 deletions(-)

Comments

Mark Brown Oct. 28, 2021, 1:02 p.m. UTC | #1
On Thu, Oct 28, 2021 at 12:16:37PM +0100, Marc Zyngier wrote:
> The SVE host tracking in KVM is pretty involved. It relies on a
> set of flags tracking the ownership of the SVE register, as well
> as that of the EL0 access.

> It is also pretty scary: __hyp_sve_save_host() computes
> a thread_struct pointer and obtains a sve_state which gets directly
> accessed without further ado, even on nVHE. How can this even work?

> The answer to that is that it doesn't, and that this is mostly dead
> code. Closer examination shows that on executing a syscall, userspace
> loses its SVE state entirely. This is part of the ABI. Another
> thing to notice is that although the kernel provides helpers such as
> kernel_neon_begin()/end(), they only deal with the FP/NEON state,
> and not SVE.

> Given that you can only execute a guest as the result of a syscall,
> and that the kernel cannot use SVE by itself, it becomes pretty
> obvious that there is never any host SVE state to save, and that
> this code is only there to increase confusion.

Ah, this explains a lot and does in fact make life a lot easier, though
we're going to get some of the fun back for SME since the ABI does not
invalidate ZA on syscall.  That said there we have a register we can
check to see if the state is live rather than having to track what's
going on with TIF.  I've also currently got changes in the SME patch set
which do mean that we won't clear TIF_SVE on syscall entry while SME is
active, however I can rework that to fit in with this change easily
enough which given the simplifications introduced seems like it is
clearly the right thing to do so:

Reviewed-by: Mark Brown <broonie@kernel.org>
Zenghui Yu Nov. 10, 2021, 1:19 p.m. UTC | #2
Hi Marc,

On 2021/10/28 19:16, Marc Zyngier wrote:
> The SVE host tracking in KVM is pretty involved. It relies on a
> set of flags tracking the ownership of the SVE register, as well
> as that of the EL0 access.
> 
> It is also pretty scary: __hyp_sve_save_host() computes
> a thread_struct pointer and obtains a sve_state which gets directly
> accessed without further ado, even on nVHE. How can this even work?
> 
> The answer to that is that it doesn't, and that this is mostly dead
> code. Closer examination shows that on executing a syscall, userspace
> loses its SVE state entirely. This is part of the ABI. Another
> thing to notice is that although the kernel provides helpers such as
> kernel_neon_begin()/end(), they only deal with the FP/NEON state,
> and not SVE.
> 
> Given that you can only execute a guest as the result of a syscall,
> and that the kernel cannot use SVE by itself, it becomes pretty
> obvious that there is never any host SVE state to save, and that
> this code is only there to increase confusion.
> 
> Get rid of the TIF_SVE tracking and host save infrastructure altogether.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 5621020b28de..38ca332c10fe 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -73,15 +73,11 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
>  {
>  	BUG_ON(!current->mm);
> +	BUG_ON(test_thread_flag(TIF_SVE));
>  
> -	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
> -			      KVM_ARM64_HOST_SVE_IN_USE |
> -			      KVM_ARM64_HOST_SVE_ENABLED);
> +	vcpu->arch.flags &= ~KVM_ARM64_FP_ENABLED;
>  	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
>  
> -	if (test_thread_flag(TIF_SVE))
> -		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;

The comment about TIF_SVE on top of kvm_arch_vcpu_load_fp() becomes
obsolete now. Maybe worth removing it?

| *
| * TIF_SVE is backed up here, since it may get clobbered with guest state.
| * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).

> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index a0e78a6027be..722dfde7f1aa 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -207,16 +207,6 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
>  	return __get_fault_info(esr, &vcpu->arch.fault);
>  }
>  
> -static inline void __hyp_sve_save_host(struct kvm_vcpu *vcpu)
> -{
> -	struct thread_struct *thread;
> -
> -	thread = container_of(vcpu->arch.host_fpsimd_state, struct thread_struct,
> -			      uw.fpsimd_state);
> -
> -	__sve_save_state(sve_pffr(thread), &vcpu->arch.host_fpsimd_state->fpsr);
> -}

Nit: This removes the only user of __sve_save_state() helper. Should we
still keep it in fpsimd.S?

Thanks,
Zenghui
Marc Zyngier Nov. 22, 2021, 3:57 p.m. UTC | #3
Hi Zenghui,

On Wed, 10 Nov 2021 13:19:23 +0000,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2021/10/28 19:16, Marc Zyngier wrote:
> > The SVE host tracking in KVM is pretty involved. It relies on a
> > set of flags tracking the ownership of the SVE register, as well
> > as that of the EL0 access.
> > 
> > It is also pretty scary: __hyp_sve_save_host() computes
> > a thread_struct pointer and obtains a sve_state which gets directly
> > accessed without further ado, even on nVHE. How can this even work?
> > 
> > The answer to that is that it doesn't, and that this is mostly dead
> > code. Closer examination shows that on executing a syscall, userspace
> > loses its SVE state entirely. This is part of the ABI. Another
> > thing to notice is that although the kernel provides helpers such as
> > kernel_neon_begin()/end(), they only deal with the FP/NEON state,
> > and not SVE.
> > 
> > Given that you can only execute a guest as the result of a syscall,
> > and that the kernel cannot use SVE by itself, it becomes pretty
> > obvious that there is never any host SVE state to save, and that
> > this code is only there to increase confusion.
> > 
> > Get rid of the TIF_SVE tracking and host save infrastructure altogether.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> > index 5621020b28de..38ca332c10fe 100644
> > --- a/arch/arm64/kvm/fpsimd.c
> > +++ b/arch/arm64/kvm/fpsimd.c
> > @@ -73,15 +73,11 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> >  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
> >  {
> >  	BUG_ON(!current->mm);
> > +	BUG_ON(test_thread_flag(TIF_SVE));
> >  -	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
> > -			      KVM_ARM64_HOST_SVE_IN_USE |
> > -			      KVM_ARM64_HOST_SVE_ENABLED);
> > +	vcpu->arch.flags &= ~KVM_ARM64_FP_ENABLED;
> >  	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
> >  -	if (test_thread_flag(TIF_SVE))
> > -		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
> 
> The comment about TIF_SVE on top of kvm_arch_vcpu_load_fp() becomes
> obsolete now. Maybe worth removing it?
> 
> | *
> | * TIF_SVE is backed up here, since it may get clobbered with guest state.
> | * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).

Indeed. Now gone.

> 
> > diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > index a0e78a6027be..722dfde7f1aa 100644
> > --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> > +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > @@ -207,16 +207,6 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> >  	return __get_fault_info(esr, &vcpu->arch.fault);
> >  }
> >  -static inline void __hyp_sve_save_host(struct kvm_vcpu *vcpu)
> > -{
> > -	struct thread_struct *thread;
> > -
> > -	thread = container_of(vcpu->arch.host_fpsimd_state, struct thread_struct,
> > -			      uw.fpsimd_state);
> > -
> > -	__sve_save_state(sve_pffr(thread), &vcpu->arch.host_fpsimd_state->fpsr);
> > -}
> 
> Nit: This removes the only user of __sve_save_state() helper. Should we
> still keep it in fpsimd.S?

I was in two minds about that, as I'd like to eventually be able to
use SVE for protected guests, where the hypervisor itself has to be in
charge of the FP/SVE save-restore.

But that's probably several months away, and I can always revert a
deletion patch if I need to, so let's get rid of it now.

Thanks for the suggestions.

	M.
Mark Brown Nov. 22, 2021, 5:58 p.m. UTC | #4
On Mon, Nov 22, 2021 at 03:57:32PM +0000, Marc Zyngier wrote:
> Zenghui Yu <yuzenghui@huawei.com> wrote:

> > Nit: This removes the only user of __sve_save_state() helper. Should we
> > still keep it in fpsimd.S?

> I was in two minds about that, as I'd like to eventually be able to
> use SVE for protected guests, where the hypervisor itself has to be in
> charge of the FP/SVE save-restore.

> But that's probably several months away, and I can always revert a
> deletion patch if I need to, so let's get rid of it now.

While we're on the subject of potential future work we might in future
want to not disable SVE on every syscall if (as seems likely) it turns
out that that's more performant for small vector lengths which would
mean some minor reshuffling here to do something like convert the saved
state to FPSIMD and drop TIF_SVE in _vcpu_load_fp().  As with using SVE
in protected guests that can just be done when needed though.
Marc Zyngier Nov. 22, 2021, 6:10 p.m. UTC | #5
On Mon, 22 Nov 2021 17:58:00 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Mon, Nov 22, 2021 at 03:57:32PM +0000, Marc Zyngier wrote:
> > Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> > > Nit: This removes the only user of __sve_save_state() helper. Should we
> > > still keep it in fpsimd.S?
> 
> > I was in two minds about that, as I'd like to eventually be able to
> > use SVE for protected guests, where the hypervisor itself has to be in
> > charge of the FP/SVE save-restore.
> 
> > But that's probably several months away, and I can always revert a
> > deletion patch if I need to, so let's get rid of it now.
> 
> While we're on the subject of potential future work we might in future
> want to not disable SVE on every syscall if (as seems likely) it turns
> out that that's more performant for small vector lengths

[...]

How are you going to retrofit that into userspace? This would be an
ABI change, and I'm not sure how you'd want to deal with that
transition...

	M.
Mark Brown Nov. 22, 2021, 6:30 p.m. UTC | #6
On Mon, Nov 22, 2021 at 06:10:25PM +0000, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > While we're on the subject of potential future work we might in future
> > want to not disable SVE on every syscall if (as seems likely) it turns
> > out that that's more performant for small vector lengths

> How are you going to retrofit that into userspace? This would be an
> ABI change, and I'm not sure how you'd want to deal with that
> transition...

We don't need to change the ABI, the ABI just says we zero the registers
that aren't shared with FPSIMD.  Instead of doing that on taking a SVE
access trap to reenable SVE after having disabled TIF_SVE we could do
that during the syscall, userspace can't tell the difference other than
via the different formats we use to report the SVE register set via
ptrace if it single steps over a syscall.  Even then I'm struggling to
think of a scenario where userspace would be relying on that.

You could also implement a similar optimisation by forcing on TIF_SVE
whenever we return to userspace but that would create a cost for
userspace tasks that don't use SVE on SVE capable hardware so doesn't
seem as good.  In any case it's not an issue for now since anything here
will need benchmarking on a reasonable range of hardware.
Marc Zyngier Nov. 23, 2021, 10:11 a.m. UTC | #7
On Mon, 22 Nov 2021 18:30:16 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Mon, Nov 22, 2021 at 06:10:25PM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> 
> > > While we're on the subject of potential future work we might in future
> > > want to not disable SVE on every syscall if (as seems likely) it turns
> > > out that that's more performant for small vector lengths
> 
> > How are you going to retrofit that into userspace? This would be an
> > ABI change, and I'm not sure how you'd want to deal with that
> > transition...
> 
> We don't need to change the ABI, the ABI just says we zero the registers
> that aren't shared with FPSIMD.  Instead of doing that on taking a SVE
> access trap to reenable SVE after having disabled TIF_SVE we could do
> that during the syscall, userspace can't tell the difference other than
> via the different formats we use to report the SVE register set via
> ptrace if it single steps over a syscall.  Even then I'm struggling to
> think of a scenario where userspace would be relying on that.

That's not the point I'm trying to make.

Userspace expects to have lost SVE information over a syscall (even if
the VL is 128, it expects to have lost P0..P15 and FFR). How do you
plan to tell userspace that this behaviour has changed?

	M.
Mark Brown Nov. 23, 2021, 12:33 p.m. UTC | #8
On Tue, Nov 23, 2021 at 10:11:33AM +0000, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > We don't need to change the ABI, the ABI just says we zero the registers
> > that aren't shared with FPSIMD.  Instead of doing that on taking a SVE
> > access trap to reenable SVE after having disabled TIF_SVE we could do

> That's not the point I'm trying to make.

> Userspace expects to have lost SVE information over a syscall (even if
> the VL is 128, it expects to have lost P0..P15 and FFR). How do you
> plan to tell userspace that this behaviour has changed?

My point is that this doesn't need to change.  Userspace can't tell if
we zeroed the non-shared state on syscall or on some later access trap.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1363c1ff66fb..e24d960244d9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -410,7 +410,6 @@  struct kvm_vcpu_arch {
 #define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
 #define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
 #define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded */
-#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
 #define KVM_ARM64_HOST_SVE_ENABLED	(1 << 4) /* SVE enabled for EL0 */
 #define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
 #define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 5621020b28de..38ca332c10fe 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -73,15 +73,11 @@  int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
 	BUG_ON(!current->mm);
+	BUG_ON(test_thread_flag(TIF_SVE));
 
-	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
-			      KVM_ARM64_HOST_SVE_IN_USE |
-			      KVM_ARM64_HOST_SVE_ENABLED);
+	vcpu->arch.flags &= ~KVM_ARM64_FP_ENABLED;
 	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
 
-	if (test_thread_flag(TIF_SVE))
-		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
-
 	if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
 		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED;
 }
@@ -115,13 +111,11 @@  void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 {
 	unsigned long flags;
-	bool host_has_sve = system_supports_sve();
-	bool guest_has_sve = vcpu_has_sve(vcpu);
 
 	local_irq_save(flags);
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
-		if (guest_has_sve) {
+		if (vcpu_has_sve(vcpu)) {
 			__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
 
 			/* Restore the VL that was saved when bound to the CPU */
@@ -131,7 +125,7 @@  void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 		}
 
 		fpsimd_save_and_flush_cpu_state();
-	} else if (has_vhe() && host_has_sve) {
+	} else if (has_vhe() && system_supports_sve()) {
 		/*
 		 * The FPSIMD/SVE state in the CPU has not been touched, and we
 		 * have SVE (and VHE): CPACR_EL1 (alias CPTR_EL2) has been
@@ -145,8 +139,7 @@  void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 			sysreg_clear_set(CPACR_EL1, CPACR_EL1_ZEN_EL0EN, 0);
 	}
 
-	update_thread_flag(TIF_SVE,
-			   vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE);
+	update_thread_flag(TIF_SVE, 0);
 
 	local_irq_restore(flags);
 }
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index a0e78a6027be..722dfde7f1aa 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -207,16 +207,6 @@  static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
 	return __get_fault_info(esr, &vcpu->arch.fault);
 }
 
-static inline void __hyp_sve_save_host(struct kvm_vcpu *vcpu)
-{
-	struct thread_struct *thread;
-
-	thread = container_of(vcpu->arch.host_fpsimd_state, struct thread_struct,
-			      uw.fpsimd_state);
-
-	__sve_save_state(sve_pffr(thread), &vcpu->arch.host_fpsimd_state->fpsr);
-}
-
 static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
 {
 	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
@@ -228,21 +218,14 @@  static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
 /* Check for an FPSIMD/SVE trap and handle as appropriate */
 static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 {
-	bool sve_guest, sve_host;
+	bool sve_guest;
 	u8 esr_ec;
 	u64 reg;
 
 	if (!system_supports_fpsimd())
 		return false;
 
-	if (system_supports_sve()) {
-		sve_guest = vcpu_has_sve(vcpu);
-		sve_host = vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE;
-	} else {
-		sve_guest = false;
-		sve_host = false;
-	}
-
+	sve_guest = vcpu_has_sve(vcpu);
 	esr_ec = kvm_vcpu_trap_get_class(vcpu);
 	if (esr_ec != ESR_ELx_EC_FP_ASIMD &&
 	    esr_ec != ESR_ELx_EC_SVE)
@@ -269,11 +252,7 @@  static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 	isb();
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
-		if (sve_host)
-			__hyp_sve_save_host(vcpu);
-		else
-			__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
-
+		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
 		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
 	}