diff mbox series

KVM: arm64/sve: Ensure SVE is trapped after guest exit

Message ID 20250121100026.3974971-1-mark.rutland@arm.com (mailing list archive)
State New
Headers show
Series KVM: arm64/sve: Ensure SVE is trapped after guest exit | expand

Commit Message

Mark Rutland Jan. 21, 2025, 10 a.m. UTC
There is a period of time after returning from a KVM_RUN ioctl where
userspace may use SVE without trapping, but the kernel can unexpectedly
discard the live SVE state. Eric Auger has observed this causing QEMU
crashes where SVE is used by memmove():

  https://issues.redhat.com/browse/RHEL-68997

The only state discarded is the user SVE state of the task which issued
the KVM_RUN ioctl. Other tasks are unaffected, plain FPSIMD state is
unaffected, and kernel state is unaffected.

This happens because fpsimd_kvm_prepare() incorrectly manipulates the
FPSIMD/SVE state. When the vCPU is loaded, fpsimd_kvm_prepare()
unconditionally clears TIF_SVE but does not reconfigure CPACR_EL1.ZEN to
trap userspace SVE usage. If the vCPU does not use FPSIMD/SVE and hyp
does not save the host's FPSIMD/SVE state, the kernel may return to
userspace with TIF_SVE clear while SVE is still enabled in
CPACR_EL1.ZEN. Subsequent userspace usage of SVE will not be trapped,
and the next save of userspace FPSIMD/SVE state will only store the
FPSIMD portion due to TIF_SVE being clear, discarding any SVE state.

The broken logic was originally introduced in commit:

  93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests")

... though at the time fp_user_discard() would reconfigure CPACR_EL1.ZEN
to trap subsequent SVE usage, masking the issue until that logic was
removed in commit:

  8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")

Avoid this issue by reconfiguring CPACR_EL1.ZEN when clearing
TIF_SVE. At the same time, add a comment to explain why
current->thread.fp_type must be set even though the FPSIMD state is not
foreign. A similar issue exists when SME is enabled, and will require
further rework. As SME currently depends on BROKEN, a BUILD_BUG() and
comment are added for now, and this issue will need to be fixed properly
in a follow-up patch.

Commit 93ae6b01bafee8fa also introduced an unintended ptrace ABI change.
Unconditionally clearing TIF_SVE regardless of whether the state is
foreign discards saved SVE state created by ptrace after syscall entry.
Avoid this by only clearing TIF_SVE when the FPSIMD/SVE state is not
foreign. When the state is foreign, KVM hyp code does not need to save
any host state, and so this will not affect KVM.

There appear to be further issues with unintentional SVE state
discarding, largely impacting ptrace and signal handling, which will
need to be addressed in separate patches.

Reported-by: Eric Auger <eauger@redhat.com>
Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
Cc: stable@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

I believe there are some other issues in this area, but I'm sending this
out on its own because I beleive the other issues are more complex while
this is self-contained, and people are actively hitting this case in
production.

I intend to follow-up with fixes for the other cases I mention in the
commit message, and for the SME case with the BUILD_BUG_ON().

Mark.

Comments

Marc Zyngier Jan. 21, 2025, 11:20 a.m. UTC | #1
Hi Mark,

On Tue, 21 Jan 2025 10:00:26 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> There is a period of time after returning from a KVM_RUN ioctl where
> userspace may use SVE without trapping, but the kernel can unexpectedly
> discard the live SVE state. Eric Auger has observed this causing QEMU
> crashes where SVE is used by memmove():
> 
>   https://issues.redhat.com/browse/RHEL-68997
> 
> The only state discarded is the user SVE state of the task which issued
> the KVM_RUN ioctl. Other tasks are unaffected, plain FPSIMD state is
> unaffected, and kernel state is unaffected.
> 
> This happens because fpsimd_kvm_prepare() incorrectly manipulates the
> FPSIMD/SVE state. When the vCPU is loaded, fpsimd_kvm_prepare()
> unconditionally clears TIF_SVE but does not reconfigure CPACR_EL1.ZEN to
> trap userspace SVE usage. If the vCPU does not use FPSIMD/SVE and hyp
> does not save the host's FPSIMD/SVE state, the kernel may return to
> userspace with TIF_SVE clear while SVE is still enabled in
> CPACR_EL1.ZEN. Subsequent userspace usage of SVE will not be trapped,
> and the next save of userspace FPSIMD/SVE state will only store the
> FPSIMD portion due to TIF_SVE being clear, discarding any SVE state.
> 
> The broken logic was originally introduced in commit:
> 
>   93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests")
> 
> ... though at the time fp_user_discard() would reconfigure CPACR_EL1.ZEN
> to trap subsequent SVE usage, masking the issue until that logic was
> removed in commit:
> 
>   8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
> 
> Avoid this issue by reconfiguring CPACR_EL1.ZEN when clearing
> TIF_SVE. At the same time, add a comment to explain why
> current->thread.fp_type must be set even though the FPSIMD state is not
> foreign. A similar issue exists when SME is enabled, and will require
> further rework. As SME currently depends on BROKEN, a BUILD_BUG() and
> comment are added for now, and this issue will need to be fixed properly
> in a follow-up patch.
> 
> Commit 93ae6b01bafee8fa also introduced an unintended ptrace ABI change.
> Unconditionally clearing TIF_SVE regardless of whether the state is
> foreign discards saved SVE state created by ptrace after syscall entry.
> Avoid this by only clearing TIF_SVE when the FPSIMD/SVE state is not
> foreign. When the state is foreign, KVM hyp code does not need to save
> any host state, and so this will not affect KVM.
> 
> There appear to be further issues with unintentional SVE state
> discarding, largely impacting ptrace and signal handling, which will
> need to be addressed in separate patches.
> 
> Reported-by: Eric Auger <eauger@redhat.com>
> Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
> Cc: stable@vger.kernel.org
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Jeremy Linton <jeremy.linton@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> ---
>  arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> I believe there are some other issues in this area, but I'm sending this
> out on its own because I beleive the other issues are more complex while
> this is self-contained, and people are actively hitting this case in
> production.
> 
> I intend to follow-up with fixes for the other cases I mention in the
> commit message, and for the SME case with the BUILD_BUG_ON().
> 
> Mark.
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 8c4c1a2186cc5..e4053a90ed240 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1711,8 +1711,24 @@ void fpsimd_kvm_prepare(void)
>  	 */
>  	get_cpu_fpsimd_context();
>  
> -	if (test_and_clear_thread_flag(TIF_SVE)) {
> -		sve_to_fpsimd(current);
> +	if (!test_thread_flag(TIF_FOREIGN_FPSTATE) &&
> +	    test_and_clear_thread_flag(TIF_SVE)) {
> +		sve_user_disable();

I'm pretty happy with this fix. However...

> +
> +		/*
> +		 * The KVM hyp code doesn't set fp_type when saving the host's
> +		 * FPSIMD state. Set fp_type here in case the hyp code saves
> +		 * the host state.

Should KVM do that? The comment seems to indicate that this is
papering over yet another bug...

> +		 *
> +		 * If hyp code does not save the host state, then the host
> +		 * state remains live on the CPU and saved fp_type is
> +		 * irrelevant until it is overwritten by a later call to
> +		 * fpsimd_save_user_state().

I'm not sure I understand this. If fp_type is irrelevant, surely it is
*forever* irrelevant, not until something else happens. Or am I
missing something?

> +		 *
> +		 * This is *NOT* sufficient when CONFIG_ARM64_SME=y, where
> +		 * fp_type can be FP_STATE_SVE regardless of TIF_SVE.
> +		 */
> +		BUILD_BUG_ON(IS_ENABLED(CONFIG_ARM64_SME));

I'd rather not have this build-time failure, as this is very likely to
annoy a lot of people. Instead, just make SME unselectable with KVM:

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 100570a048c5e..88bedf95a3662 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2271,6 +2271,7 @@ config ARM64_SME
 	default y
 	depends on ARM64_SVE
 	depends on BROKEN
+	depends on !KVM
 	help
 	  The Scalable Matrix Extension (SME) is an extension to the AArch64
 	  execution state which utilises a substantial subset of the SVE

Thanks,

	M.
Mark Brown Jan. 21, 2025, 1:33 p.m. UTC | #2
On Tue, Jan 21, 2025 at 11:20:04AM +0000, Marc Zyngier wrote:
> Mark Rutland <mark.rutland@arm.com> wrote:

> > +		/*
> > +		 * The KVM hyp code doesn't set fp_type when saving the host's
> > +		 * FPSIMD state. Set fp_type here in case the hyp code saves
> > +		 * the host state.

> Should KVM do that? The comment seems to indicate that this is
> papering over yet another bug...

It should, we need to record what format the data we saved was in so
that if the saved state is loaded we load what we actually saved.

> > +		 * If hyp code does not save the host state, then the host
> > +		 * state remains live on the CPU and saved fp_type is
> > +		 * irrelevant until it is overwritten by a later call to
> > +		 * fpsimd_save_user_state().

> I'm not sure I understand this. If fp_type is irrelevant, surely it is
> *forever* irrelevant, not until something else happens. Or am I
> missing something?

The confusion stems from the fact that the setting of fp_type is done
separately to the actual state save since we don't map fp_type to the
hypervisor.  Either we don't save in KVM, in which case any later save
in the host kernel will write both state and fp_type, or we save in KVM
in which case this results in the correct fp_type being set.  In both
cases we have live state in the registers and whatever is currently in
memory should be considered stale and it's not the end of the world if
it's corrupt.  It would be a lot more straightforward if we wrote
fp_type from the hypervisor as part of the save.

I want to rework the way we represent the state so that we don't have
the state spread around like this, avoiding the need to map individual
parts of it separately to the hypervisor.  We need something like that
to sensibly implement in kernel SVE/SME which we will need at some
point.
Mark Brown Jan. 21, 2025, 1:59 p.m. UTC | #3
On Tue, Jan 21, 2025 at 10:00:26AM +0000, Mark Rutland wrote:
> There is a period of time after returning from a KVM_RUN ioctl where
> userspace may use SVE without trapping, but the kernel can unexpectedly
> discard the live SVE state. Eric Auger has observed this causing QEMU
> crashes where SVE is used by memmove():

Reviewed-by: Mark Brown <broonie@kernel.org>
Eric Auger Jan. 21, 2025, 3:32 p.m. UTC | #4
Hi Mark,

On 1/21/25 11:00 AM, Mark Rutland wrote:
> There is a period of time after returning from a KVM_RUN ioctl where
> userspace may use SVE without trapping, but the kernel can unexpectedly
> discard the live SVE state. Eric Auger has observed this causing QEMU
> crashes where SVE is used by memmove():
> 
>   https://issues.redhat.com/browse/RHEL-68997
> 
> The only state discarded is the user SVE state of the task which issued
> the KVM_RUN ioctl. Other tasks are unaffected, plain FPSIMD state is
> unaffected, and kernel state is unaffected.
> 
> This happens because fpsimd_kvm_prepare() incorrectly manipulates the
> FPSIMD/SVE state. When the vCPU is loaded, fpsimd_kvm_prepare()
> unconditionally clears TIF_SVE but does not reconfigure CPACR_EL1.ZEN to
> trap userspace SVE usage. If the vCPU does not use FPSIMD/SVE and hyp
> does not save the host's FPSIMD/SVE state, the kernel may return to
> userspace with TIF_SVE clear while SVE is still enabled in
> CPACR_EL1.ZEN. Subsequent userspace usage of SVE will not be trapped,
> and the next save of userspace FPSIMD/SVE state will only store the
> FPSIMD portion due to TIF_SVE being clear, discarding any SVE state.
> 
> The broken logic was originally introduced in commit:
> 
>   93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests")
> 
> ... though at the time fp_user_discard() would reconfigure CPACR_EL1.ZEN
> to trap subsequent SVE usage, masking the issue until that logic was
> removed in commit:
> 
>   8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
> 
> Avoid this issue by reconfiguring CPACR_EL1.ZEN when clearing
> TIF_SVE. At the same time, add a comment to explain why
> current->thread.fp_type must be set even though the FPSIMD state is not
> foreign. A similar issue exists when SME is enabled, and will require
> further rework. As SME currently depends on BROKEN, a BUILD_BUG() and
> comment are added for now, and this issue will need to be fixed properly
> in a follow-up patch.
> 
> Commit 93ae6b01bafee8fa also introduced an unintended ptrace ABI change.
> Unconditionally clearing TIF_SVE regardless of whether the state is
> foreign discards saved SVE state created by ptrace after syscall entry.
> Avoid this by only clearing TIF_SVE when the FPSIMD/SVE state is not
> foreign. When the state is foreign, KVM hyp code does not need to save
> any host state, and so this will not affect KVM.
> 
> There appear to be further issues with unintentional SVE state
> discarding, largely impacting ptrace and signal handling, which will
> need to be addressed in separate patches.
> 
> Reported-by: Eric Auger <eauger@redhat.com>
> Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
> Cc: stable@vger.kernel.org
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Jeremy Linton <jeremy.linton@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Eric Auger <eric.auger@redhat.com>

Thanks!

Eric

> ---
>  arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> I believe there are some other issues in this area, but I'm sending this
> out on its own because I beleive the other issues are more complex while
> this is self-contained, and people are actively hitting this case in
> production.
> 
> I intend to follow-up with fixes for the other cases I mention in the
> commit message, and for the SME case with the BUILD_BUG_ON().
> 
> Mark.
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 8c4c1a2186cc5..e4053a90ed240 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1711,8 +1711,24 @@ void fpsimd_kvm_prepare(void)
>  	 */
>  	get_cpu_fpsimd_context();
>  
> -	if (test_and_clear_thread_flag(TIF_SVE)) {
> -		sve_to_fpsimd(current);
> +	if (!test_thread_flag(TIF_FOREIGN_FPSTATE) &&
> +	    test_and_clear_thread_flag(TIF_SVE)) {
> +		sve_user_disable();
> +
> +		/*
> +		 * The KVM hyp code doesn't set fp_type when saving the host's
> +		 * FPSIMD state. Set fp_type here in case the hyp code saves
> +		 * the host state.
> +		 *
> +		 * If hyp code does not save the host state, then the host
> +		 * state remains live on the CPU and saved fp_type is
> +		 * irrelevant until it is overwritten by a later call to
> +		 * fpsimd_save_user_state().
> +		 *
> +		 * This is *NOT* sufficient when CONFIG_ARM64_SME=y, where
> +		 * fp_type can be FP_STATE_SVE regardless of TIF_SVE.
> +		 */
> +		BUILD_BUG_ON(IS_ENABLED(CONFIG_ARM64_SME));
>  		current->thread.fp_type = FP_STATE_FPSIMD;
>  	}
>
Mark Rutland Jan. 21, 2025, 3:37 p.m. UTC | #5
On Tue, Jan 21, 2025 at 11:20:04AM +0000, Marc Zyngier wrote:
> Hi Mark,
> 
> On Tue, 21 Jan 2025 10:00:26 +0000,
> Mark Rutland <mark.rutland@arm.com> wrote:
> > 
> > There is a period of time after returning from a KVM_RUN ioctl where
> > userspace may use SVE without trapping, but the kernel can unexpectedly
> > discard the live SVE state. Eric Auger has observed this causing QEMU
> > crashes where SVE is used by memmove():
> > 
> >   https://issues.redhat.com/browse/RHEL-68997
> > 
> > The only state discarded is the user SVE state of the task which issued
> > the KVM_RUN ioctl. Other tasks are unaffected, plain FPSIMD state is
> > unaffected, and kernel state is unaffected.
> > 
> > This happens because fpsimd_kvm_prepare() incorrectly manipulates the
> > FPSIMD/SVE state. When the vCPU is loaded, fpsimd_kvm_prepare()
> > unconditionally clears TIF_SVE but does not reconfigure CPACR_EL1.ZEN to
> > trap userspace SVE usage. If the vCPU does not use FPSIMD/SVE and hyp
> > does not save the host's FPSIMD/SVE state, the kernel may return to
> > userspace with TIF_SVE clear while SVE is still enabled in
> > CPACR_EL1.ZEN. Subsequent userspace usage of SVE will not be trapped,
> > and the next save of userspace FPSIMD/SVE state will only store the
> > FPSIMD portion due to TIF_SVE being clear, discarding any SVE state.
> > 
> > The broken logic was originally introduced in commit:
> > 
> >   93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests")
> > 
> > ... though at the time fp_user_discard() would reconfigure CPACR_EL1.ZEN
> > to trap subsequent SVE usage, masking the issue until that logic was
> > removed in commit:
> > 
> >   8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
> > 
> > Avoid this issue by reconfiguring CPACR_EL1.ZEN when clearing
> > TIF_SVE. At the same time, add a comment to explain why
> > current->thread.fp_type must be set even though the FPSIMD state is not
> > foreign. A similar issue exists when SME is enabled, and will require
> > further rework. As SME currently depends on BROKEN, a BUILD_BUG() and
> > comment are added for now, and this issue will need to be fixed properly
> > in a follow-up patch.
> > 
> > Commit 93ae6b01bafee8fa also introduced an unintended ptrace ABI change.
> > Unconditionally clearing TIF_SVE regardless of whether the state is
> > foreign discards saved SVE state created by ptrace after syscall entry.
> > Avoid this by only clearing TIF_SVE when the FPSIMD/SVE state is not
> > foreign. When the state is foreign, KVM hyp code does not need to save
> > any host state, and so this will not affect KVM.
> > 
> > There appear to be further issues with unintentional SVE state
> > discarding, largely impacting ptrace and signal handling, which will
> > need to be addressed in separate patches.
> > 
> > Reported-by: Eric Auger <eauger@redhat.com>
> > Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
> > Cc: stable@vger.kernel.org
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Florian Weimer <fweimer@redhat.com>
> > Cc: Jeremy Linton <jeremy.linton@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Mark Brown <broonie@kernel.org>
> > Cc: Oliver Upton <oliver.upton@linux.dev>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Will Deacon <will@kernel.org>
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > ---
> >  arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> > I believe there are some other issues in this area, but I'm sending this
> > out on its own because I beleive the other issues are more complex while
> > this is self-contained, and people are actively hitting this case in
> > production.
> > 
> > I intend to follow-up with fixes for the other cases I mention in the
> > commit message, and for the SME case with the BUILD_BUG_ON().
> > 
> > Mark.
> > 
> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > index 8c4c1a2186cc5..e4053a90ed240 100644
> > --- a/arch/arm64/kernel/fpsimd.c
> > +++ b/arch/arm64/kernel/fpsimd.c
> > @@ -1711,8 +1711,24 @@ void fpsimd_kvm_prepare(void)
> >  	 */
> >  	get_cpu_fpsimd_context();
> >  
> > -	if (test_and_clear_thread_flag(TIF_SVE)) {
> > -		sve_to_fpsimd(current);
> > +	if (!test_thread_flag(TIF_FOREIGN_FPSTATE) &&
> > +	    test_and_clear_thread_flag(TIF_SVE)) {
> > +		sve_user_disable();
> 
> I'm pretty happy with this fix. However...
> 
> > +
> > +		/*
> > +		 * The KVM hyp code doesn't set fp_type when saving the host's
> > +		 * FPSIMD state. Set fp_type here in case the hyp code saves
> > +		 * the host state.
> 
> Should KVM do that? The comment seems to indicate that this is
> papering over yet another bug...

Yes; really this should be done at hyp (and at that point, hyp could
actually save the entire host SVE state), but that's a larger change and
more painful for backporting, which is why I didn't go that route. I'm
happy to go try to fix hyp to do that, or I can make the comment more
explicit that this is a bodge, if that's all you're after?

Alternatively, we could take the large hammer approach and always save
and unbind the host state prior to entering the guest, so that hyp
doesn't need to save anything. An unconditional call to
fpsimd_save_and_flush_cpu_state() would suffice, and that'd also
implicitly fix the SME issue below.

> > +		 *
> > +		 * If hyp code does not save the host state, then the host
> > +		 * state remains live on the CPU and saved fp_type is
> > +		 * irrelevant until it is overwritten by a later call to
> > +		 * fpsimd_save_user_state().
> 
> I'm not sure I understand this. If fp_type is irrelevant, surely it is
> *forever* irrelevant, not until something else happens. Or am I
> missing something?

Sorry, this was not very clear.

What this is trying to say is that *while the state is live on a CPU*
fp_type is irrelevant, and it's only meaningful when saving/restoring
state. As above, the only reason to set it here is so that *if* hyp
saves and unbinds the state, fp_type will accurately describe what the
hyp code saved.

The key thing is that there are two possibilities:

(1) The guest doesn't use FPSIMD/SVE, and no trap is taken to save the
    host state. In this case, fp_type is not consumed before the next
    time state has to be written back to memory (the act of which will
    set fp_type).

    So in this case, setting fp_type is redundant but benign.

(2) The guest *does* use FPSIMD/SVE, and a trap is taken to hyp to save
    the host state. In this case the hyp code will save the task's
    FPSIMD state to task->thread.uw.fpsimd_state, but will not update
    task->thread.fp_type accordingly, and:

    * If fp_type happened to be FP_STATE_FPSIMD, all is good and a later
      restore will load the state saved by the hyp code.

    * If fp_type happened to be FP_STATE_SVE, a later restore will load
      stale state from task->thread.sve_state.

... does that make sense?

> > +		 *
> > +		 * This is *NOT* sufficient when CONFIG_ARM64_SME=y, where
> > +		 * fp_type can be FP_STATE_SVE regardless of TIF_SVE.
> > +		 */
> > +		BUILD_BUG_ON(IS_ENABLED(CONFIG_ARM64_SME));
> 
> I'd rather not have this build-time failure, as this is very likely to
> annoy a lot of people. Instead, just make SME unselectable with KVM:

I'm happy to change this, but FWIW I'd used BUILD_BUG() here because it
kept that associated with the comment and logic, and because we disabled
SME in commit:

  81235ae0c846e1fb4 ("arm64: Kconfig: Make SME depend on BROKEN for now)"

... which was CC'd stable, and so this *shouldn't* blow up on anything
with that commit.

So I can:

(a) Add the dependency, as you suggest.

(b) Leave that as-is.

(c) Solve this in a different way so that we don't need a BUILD_BUG() or
    dependency. e.g. fix the SME case at the same time, at the cost of
    possibly needing to do a bit more work when backporting.

... any preference?

Mark.

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 100570a048c5e..88bedf95a3662 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2271,6 +2271,7 @@ config ARM64_SME
>  	default y
>  	depends on ARM64_SVE
>  	depends on BROKEN
> +	depends on !KVM
>  	help
>  	  The Scalable Matrix Extension (SME) is an extension to the AArch64
>  	  execution state which utilises a substantial subset of the SVE
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 8c4c1a2186cc5..e4053a90ed240 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1711,8 +1711,24 @@  void fpsimd_kvm_prepare(void)
 	 */
 	get_cpu_fpsimd_context();
 
-	if (test_and_clear_thread_flag(TIF_SVE)) {
-		sve_to_fpsimd(current);
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE) &&
+	    test_and_clear_thread_flag(TIF_SVE)) {
+		sve_user_disable();
+
+		/*
+		 * The KVM hyp code doesn't set fp_type when saving the host's
+		 * FPSIMD state. Set fp_type here in case the hyp code saves
+		 * the host state.
+		 *
+		 * If hyp code does not save the host state, then the host
+		 * state remains live on the CPU and saved fp_type is
+		 * irrelevant until it is overwritten by a later call to
+		 * fpsimd_save_user_state().
+		 *
+		 * This is *NOT* sufficient when CONFIG_ARM64_SME=y, where
+		 * fp_type can be FP_STATE_SVE regardless of TIF_SVE.
+		 */
+		BUILD_BUG_ON(IS_ENABLED(CONFIG_ARM64_SME));
 		current->thread.fp_type = FP_STATE_FPSIMD;
 	}