diff mbox series

[v2,1/8] KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state

Message ID 20250206141102.954688-2-mark.rutland@arm.com (mailing list archive)
State New
Headers show
Series KVM: arm64: FPSIMD/SVE/SME fixes | expand

Commit Message

Mark Rutland Feb. 6, 2025, 2:10 p.m. UTC
There are several problems with the way hyp code lazily saves the host's
FPSIMD/SVE state, including:

* Host SVE being discarded unexpectedly due to inconsistent
  configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to
  result in QEMU crashes where SVE is used by memmove(), as reported by
  Eric Auger:

  https://issues.redhat.com/browse/RHEL-68997

* Host SVE state is discarded *after* modification by ptrace, which was an
  unintentional ptrace ABI change introduced with lazy discarding of SVE state.

* The host FPMR value can be discarded when running a non-protected VM,
  where FPMR support is not exposed to a VM, and that VM uses
  FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR
  before unbinding the host's FPSIMD/SVE/SME state, leaving a stale
  value in memory.

Avoid these by eagerly saving and "flushing" the host's FPSIMD/SVE/SME
state when loading a vCPU such that KVM does not need to save any of the
host's FPSIMD/SVE/SME state. For clarity, fpsimd_kvm_prepare() is
removed and the necessary call to fpsimd_save_and_flush_cpu_state() is
placed in kvm_arch_vcpu_load_fp(). As 'fpsimd_state' and 'fpmr_ptr'
should not be used, they are set to NULL; all uses of these will be
removed in subsequent patches.

Historical problems go back at least as far as v5.17, e.g. erroneous
assumptions about TIF_SVE being clear in commit:

  8383741ab2e773a9 ("KVM: arm64: Get rid of host SVE tracking/saving")

... and so this eager save+flush probably needs to be backported to ALL
stable trees.

Fixes: 93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests")
Fixes: 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
Fixes: ef3be86021c3bdf3 ("KVM: arm64: Add save/restore support for FPMR")
Reported-by: Eric Auger <eauger@redhat.com>
Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Cc: stable@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 25 -------------------------
 arch/arm64/kvm/fpsimd.c    | 35 ++++++++++-------------------------
 2 files changed, 10 insertions(+), 50 deletions(-)

Comments

Will Deacon Feb. 7, 2025, 12:27 p.m. UTC | #1
On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
> There are several problems with the way hyp code lazily saves the host's
> FPSIMD/SVE state, including:
> 
> * Host SVE being discarded unexpectedly due to inconsistent
>   configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to
>   result in QEMU crashes where SVE is used by memmove(), as reported by
>   Eric Auger:
> 
>   https://issues.redhat.com/browse/RHEL-68997
> 
> * Host SVE state is discarded *after* modification by ptrace, which was an
>   unintentional ptrace ABI change introduced with lazy discarding of SVE state.
> 
> * The host FPMR value can be discarded when running a non-protected VM,
>   where FPMR support is not exposed to a VM, and that VM uses
>   FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR
>   before unbinding the host's FPSIMD/SVE/SME state, leaving a stale
>   value in memory.

How hard would it be to write tests for these three scenarios? If we
had something to exercise the relevant paths then...

> ... and so this eager save+flush probably needs to be backported to ALL
> stable trees.

... this backporting might be a little easier to be sure about?

Will
Mark Rutland Feb. 7, 2025, 1:21 p.m. UTC | #2
On Fri, Feb 07, 2025 at 12:27:51PM +0000, Will Deacon wrote:
> On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
> > There are several problems with the way hyp code lazily saves the host's
> > FPSIMD/SVE state, including:
> > 
> > * Host SVE being discarded unexpectedly due to inconsistent
> >   configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to
> >   result in QEMU crashes where SVE is used by memmove(), as reported by
> >   Eric Auger:
> > 
> >   https://issues.redhat.com/browse/RHEL-68997
> > 
> > * Host SVE state is discarded *after* modification by ptrace, which was an
> >   unintentional ptrace ABI change introduced with lazy discarding of SVE state.
> > 
> > * The host FPMR value can be discarded when running a non-protected VM,
> >   where FPMR support is not exposed to a VM, and that VM uses
> >   FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR
> >   before unbinding the host's FPSIMD/SVE/SME state, leaving a stale
> >   value in memory.
> 
> How hard would it be to write tests for these three scenarios? If we
> had something to exercise the relevant paths then...
>
> > ... and so this eager save+flush probably needs to be backported to ALL
> > stable trees.
> 
> ... this backporting might be a little easier to be sure about?

For the first case I have a quick and dirty test, which I've pushed to
my arm64/kvm/fpsimd-tests branch in my kernel.org repo:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git

For the last case it should be possible to do something similar, but I
hadn't had the time to dig in to the KVM selftests infrastructure and
figure out how to confiugre the guest appropriately.

For the ptrace case, the same symptoms can be provoked outside of KVM
(and I'm currently working to fix that). From my PoV the important thing
is that this fix happens to remove KVM from the set of cases the other
fixes need to care about.

FWIW I was assuming that I'd be handling the upstream backports, and I'd
be testing with the test above and some additional assertions hacked
into the kernel for testing.

Mark.
Marc Zyngier Feb. 10, 2025, 10:53 a.m. UTC | #3
On Fri, 07 Feb 2025 13:21:44 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Fri, Feb 07, 2025 at 12:27:51PM +0000, Will Deacon wrote:
> > On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
> > > There are several problems with the way hyp code lazily saves the host's
> > > FPSIMD/SVE state, including:
> > > 
> > > * Host SVE being discarded unexpectedly due to inconsistent
> > >   configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to
> > >   result in QEMU crashes where SVE is used by memmove(), as reported by
> > >   Eric Auger:
> > > 
> > >   https://issues.redhat.com/browse/RHEL-68997
> > > 
> > > * Host SVE state is discarded *after* modification by ptrace, which was an
> > >   unintentional ptrace ABI change introduced with lazy discarding of SVE state.
> > > 
> > > * The host FPMR value can be discarded when running a non-protected VM,
> > >   where FPMR support is not exposed to a VM, and that VM uses
> > >   FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR
> > >   before unbinding the host's FPSIMD/SVE/SME state, leaving a stale
> > >   value in memory.
> > 
> > How hard would it be to write tests for these three scenarios? If we
> > had something to exercise the relevant paths then...
> >
> > > ... and so this eager save+flush probably needs to be backported to ALL
> > > stable trees.
> > 
> > ... this backporting might be a little easier to be sure about?
> 
> For the first case I have a quick and dirty test, which I've pushed to
> my arm64/kvm/fpsimd-tests branch in my kernel.org repo:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
> 
> For the last case it should be possible to do something similar, but I
> hadn't had the time to dig in to the KVM selftests infrastructure and
> figure out how to confiugre the guest appropriately.
> 
> For the ptrace case, the same symptoms can be provoked outside of KVM
> (and I'm currently working to fix that). From my PoV the important thing
> is that this fix happens to remove KVM from the set of cases the other
> fixes need to care about.
> 
> FWIW I was assuming that I'd be handling the upstream backports, and I'd
> be testing with the test above and some additional assertions hacked
> into the kernel for testing.

I agree that having the tests around would be great, if only to catch
potential repressions.

However, I really don't want to gate the fixes on these tests. So
unless someone shouts, I intend to take this series in very shortly.
We can always merge the tests as a subsequent improvement.

Thanks,

	M.
Will Deacon Feb. 10, 2025, 3:05 p.m. UTC | #4
On Fri, Feb 07, 2025 at 01:21:44PM +0000, Mark Rutland wrote:
> On Fri, Feb 07, 2025 at 12:27:51PM +0000, Will Deacon wrote:
> > On Thu, Feb 06, 2025 at 02:10:55PM +0000, Mark Rutland wrote:
> > > There are several problems with the way hyp code lazily saves the host's
> > > FPSIMD/SVE state, including:
> > > 
> > > * Host SVE being discarded unexpectedly due to inconsistent
> > >   configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to
> > >   result in QEMU crashes where SVE is used by memmove(), as reported by
> > >   Eric Auger:
> > > 
> > >   https://issues.redhat.com/browse/RHEL-68997
> > > 
> > > * Host SVE state is discarded *after* modification by ptrace, which was an
> > >   unintentional ptrace ABI change introduced with lazy discarding of SVE state.
> > > 
> > > * The host FPMR value can be discarded when running a non-protected VM,
> > >   where FPMR support is not exposed to a VM, and that VM uses
> > >   FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR
> > >   before unbinding the host's FPSIMD/SVE/SME state, leaving a stale
> > >   value in memory.
> > 
> > How hard would it be to write tests for these three scenarios? If we
> > had something to exercise the relevant paths then...
> >
> > > ... and so this eager save+flush probably needs to be backported to ALL
> > > stable trees.
> > 
> > ... this backporting might be a little easier to be sure about?
> 
> For the first case I have a quick and dirty test, which I've pushed to
> my arm64/kvm/fpsimd-tests branch in my kernel.org repo:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
> 
> For the last case it should be possible to do something similar, but I
> hadn't had the time to dig in to the KVM selftests infrastructure and
> figure out how to confiugre the guest appropriately.
> 
> For the ptrace case, the same symptoms can be provoked outside of KVM
> (and I'm currently working to fix that). From my PoV the important thing
> is that this fix happens to remove KVM from the set of cases the other
> fixes need to care about.
> 
> FWIW I was assuming that I'd be handling the upstream backports, and I'd
> be testing with the test above and some additional assertions hacked
> into the kernel for testing.

The backporting for Android will probably diverge slightly from upstream,
so if I'm able to run your tests as well then it would really handy.
Thanks for sharing the stuff you currently have.

The patch looks fine:

Acked-by: Will Deacon <will@kernel.org>

Will
diff mbox series

Patch

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 2b601d88762d4..8370d55f03533 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1694,31 +1694,6 @@  void fpsimd_signal_preserve_current_state(void)
 		sve_to_fpsimd(current);
 }
 
-/*
- * Called by KVM when entering the guest.
- */
-void fpsimd_kvm_prepare(void)
-{
-	if (!system_supports_sve())
-		return;
-
-	/*
-	 * KVM does not save host SVE state since we can only enter
-	 * the guest from a syscall so the ABI means that only the
-	 * non-saved SVE state needs to be saved.  If we have left
-	 * SVE enabled for performance reasons then update the task
-	 * state to be FPSIMD only.
-	 */
-	get_cpu_fpsimd_context();
-
-	if (test_and_clear_thread_flag(TIF_SVE)) {
-		sve_to_fpsimd(current);
-		current->thread.fp_type = FP_STATE_FPSIMD;
-	}
-
-	put_cpu_fpsimd_context();
-}
-
 /*
  * Associate current's FPSIMD context with this cpu
  * The caller must have ownership of the cpu FPSIMD context before calling
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 4d3d1a2eb1570..ceeb0a4893aa7 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -54,16 +54,18 @@  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 	if (!system_supports_fpsimd())
 		return;
 
-	fpsimd_kvm_prepare();
-
 	/*
-	 * We will check TIF_FOREIGN_FPSTATE just before entering the
-	 * guest in kvm_arch_vcpu_ctxflush_fp() and override this to
-	 * FP_STATE_FREE if the flag set.
+	 * Ensure that any host FPSIMD/SVE/SME state is saved and unbound such
+	 * that the host kernel is responsible for restoring this state upon
+	 * return to userspace, and the hyp code doesn't need to save anything.
+	 *
+	 * When the host may use SME, fpsimd_save_and_flush_cpu_state() ensures
+	 * that PSTATE.{SM,ZA} == {0,0}.
 	 */
-	*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
-	*host_data_ptr(fpsimd_state) = kern_hyp_va(&current->thread.uw.fpsimd_state);
-	*host_data_ptr(fpmr_ptr) = kern_hyp_va(&current->thread.uw.fpmr);
+	fpsimd_save_and_flush_cpu_state();
+	*host_data_ptr(fp_owner) = FP_STATE_FREE;
+	*host_data_ptr(fpsimd_state) = NULL;
+	*host_data_ptr(fpmr_ptr) = NULL;
 
 	host_data_clear_flag(HOST_SVE_ENABLED);
 	if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
@@ -73,23 +75,6 @@  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 		host_data_clear_flag(HOST_SME_ENABLED);
 		if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN)
 			host_data_set_flag(HOST_SME_ENABLED);
-
-		/*
-		 * If PSTATE.SM is enabled then save any pending FP
-		 * state and disable PSTATE.SM. If we leave PSTATE.SM
-		 * enabled and the guest does not enable SME via
-		 * CPACR_EL1.SMEN then operations that should be valid
-		 * may generate SME traps from EL1 to EL1 which we
-		 * can't intercept and which would confuse the guest.
-		 *
-		 * Do the same for PSTATE.ZA in the case where there
-		 * is state in the registers which has not already
-		 * been saved, this is very unlikely to happen.
-		 */
-		if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) {
-			*host_data_ptr(fp_owner) = FP_STATE_FREE;
-			fpsimd_save_and_flush_cpu_state();
-		}
 	}
 
 	/*