mbox series

[0/6] x86/fpu: Preparatory changes for guest AMX support

Message ID 20211214022825.563892248@linutronix.de (mailing list archive)
Headers show
Series x86/fpu: Preparatory changes for guest AMX support | expand

Message

Thomas Gleixner Dec. 14, 2021, 2:50 a.m. UTC
Folks,

this is a follow up to the initial sketch of patches which got picked up by
Jing and have been posted in combination with the KVM parts:

   https://lore.kernel.org/r/20211208000359.2853257-1-yang.zhong@intel.com

This update is only touching the x86/fpu code and not changing anything on
the KVM side.

    BIG FAT WARNING: This is compile tested only!

In course of the dicsussion of the above patchset it turned out that there
are a few conceptual issues vs. hardware and software state and also
vs. guest restore.

This series addresses this with the following changes vs. the original
approach:

  1) fpstate reallocation is now independent of fpu_swap_kvm_fpstate()

     It is triggered directly via XSETBV and XFD MSR write emulation which
     are used both for runtime and restore purposes.

     For this it provides two wrappers around a common update function, one
     for XCR0 and one for XFD.

     Both check the validity of the arguments and the correct sizing of the
     guest FPU fpstate. If the size is not sufficient, fpstate is
     reallocated.

     The functions can fail.

  2) XFD synchronization

     KVM must neither touch the XFD MSR nor the fpstate->xfd software state
     in order to guarantee state consistency.

     In the MSR write emulation case the XFD specific update handler has to
     be invoked. See #1

     If MSR write emulation is disabled because the buffer size is
     sufficient for all use cases, i.e.:

     		guest_fpu::xfeatures == guest_fpu::perm

     then there is no guarantee that the XFD software state on VMEXIT is
     the same as the state on VMENTER.

     A separate synchronization function is provided which reads the XFD
     MSR and updates the relevant software state. This function has to be
     invoked after a VMEXIT before reenabling interrupts.

With that the KVM logic looks like this:

     xsetbv_emulate()
	ret = fpu_update_guest_xcr0(&vcpu->arch.guest_fpu, xcr0);
	if (ret)
		handle_fail()
	....


     kvm_emulate_wrmsr()
        ....
	case MSR_IA32_XFD:
	     ret = fpu_update_guest_xfd(&vcpu->arch.guest_fpu, vcpu->arch.xcr0, msrval);
	     if (ret)
		handle_fail()
	     ....

This covers both the case of a running vCPU and the case of restore.

The XFD synchronization mechanism is only relevant for a running vCPU after
VMEXIT when XFD MSR write emulation is disabled:

     vcpu_run()
	vcpu_enter_guest()
	  for (;;) {
	      ...
	      vmenter();
	      ...
	  };
	  ...

	  if (!xfd_write_emulated(vcpu))
		fpu_sync_guest_vmexit_xfd_state();

	  local_irq_enable();

It has no relevance for the guest restore case.

With that all XFD/fpstate related issues should be covered in a consistent
way.

CPUID validation can be done without exporting yet more FPU functions:

      if (requested_xfeatures & ~vcpu->arch.guest_fpu.perm)
      		return -ENOPONY;

That's the purpose of fpu_guest::perm from the beginning along with
fpu_guest::xfeatures for other validation purposes.

XFD_ERR MSR handling is completely separate and as discussed a KVM only
issue for now. KVM has to ensure that the MSR is 0 before interrupts are
enabled. So this is not touched here.

The only remaining issue is the KVM XSTATE save/restore size checking which
probably requires some FPU core assistance. But that requires some more
thoughts vs. the IOCTL interface extension and once that is settled it
needs to be solved in one go. But that's an orthogonal issue to the above.

The series is also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/people/tglx/devel.git x86/fpu-kvm

Thanks,

	tglx
---
 include/asm/fpu/api.h    |   63 ++++++++++++++++++++++++
 include/asm/fpu/types.h  |   22 ++++++++
 include/uapi/asm/prctl.h |   26 +++++----
 kernel/fpu/core.c        |  123 ++++++++++++++++++++++++++++++++++++++++++++---
 kernel/fpu/xstate.c      |  118 +++++++++++++++++++++++++++------------------
 kernel/fpu/xstate.h      |   20 ++++++-
 kernel/process.c         |    2 
 7 files changed, 307 insertions(+), 67 deletions(-)

Comments

Tian, Kevin Dec. 14, 2021, 6:50 a.m. UTC | #1
> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Tuesday, December 14, 2021 10:50 AM
> 
> Folks,
> 
> this is a follow up to the initial sketch of patches which got picked up by
> Jing and have been posted in combination with the KVM parts:
> 
>    https://lore.kernel.org/r/20211208000359.2853257-1-
> yang.zhong@intel.com
> 
> This update is only touching the x86/fpu code and not changing anything on
> the KVM side.
> 
>     BIG FAT WARNING: This is compile tested only!
> 
> In course of the dicsussion of the above patchset it turned out that there
> are a few conceptual issues vs. hardware and software state and also
> vs. guest restore.

Overall this is definitely a good move and also help simplify the
KVM side logic. 
Liu, Jing2 Dec. 14, 2021, 6:52 a.m. UTC | #2
Hi Thomas,

On 12/14/2021 10:50 AM, Thomas Gleixner wrote:
> Folks,
>
> this is a follow up to the initial sketch of patches which got picked up by
> Jing and have been posted in combination with the KVM parts:
>
>     https://lore.kernel.org/r/20211208000359.2853257-1-yang.zhong@intel.com
>
> This update is only touching the x86/fpu code and not changing anything on
> the KVM side.
>
>      BIG FAT WARNING: This is compile tested only!
>
> In course of the dicsussion of the above patchset it turned out that there
> are a few conceptual issues vs. hardware and software state and also
> vs. guest restore.
>
> This series addresses this with the following changes vs. the original
> approach:
>
>    1) fpstate reallocation is now independent of fpu_swap_kvm_fpstate()
>
>       It is triggered directly via XSETBV and XFD MSR write emulation which
>       are used both for runtime and restore purposes.
>
>       For this it provides two wrappers around a common update function, one
>       for XCR0 and one for XFD.
>
>       Both check the validity of the arguments and the correct sizing of the
>       guest FPU fpstate. If the size is not sufficient, fpstate is
>       reallocated.
>
>       The functions can fail.
>
>    2) XFD synchronization
>
>       KVM must neither touch the XFD MSR nor the fpstate->xfd software state
>       in order to guarantee state consistency.
>
>       In the MSR write emulation case the XFD specific update handler has to
>       be invoked. See #1
>
>       If MSR write emulation is disabled because the buffer size is
>       sufficient for all use cases, i.e.:
>
>       		guest_fpu::xfeatures == guest_fpu::perm
>
The buffer size can be sufficient once one of the features is requested 
since
kernel fpu realloc full size (permitted). And I think we don't want to 
disable
interception until all the features are detected e.g., one by one.

Thus it can be guest_fpu::xfeatures != guest_fpu::perm.


Thanks,
Jing
Tian, Kevin Dec. 14, 2021, 7:54 a.m. UTC | #3
> From: Liu, Jing2 <jing2.liu@linux.intel.com>
> Sent: Tuesday, December 14, 2021 2:52 PM
> 
> On 12/14/2021 10:50 AM, Thomas Gleixner wrote:
> >       If MSR write emulation is disabled because the buffer size is
> >       sufficient for all use cases, i.e.:
> >
> >       		guest_fpu::xfeatures == guest_fpu::perm
> >
> The buffer size can be sufficient once one of the features is requested
> since
> kernel fpu realloc full size (permitted). And I think we don't want to
> disable
> interception until all the features are detected e.g., one by one.
> 
> Thus it can be guest_fpu::xfeatures != guest_fpu::perm.
> 

There are two options to handle multiple xfd features.

a) a conservative approach as Thomas suggested, i.e. don't disable emulation
   until all the features in guest_fpu::perm are requested by the guest. This
   definitely has poor performance if the guest only wants to use a subset of
   perm features. But functionally p.o.v it just works.

   Given we only have one xfeature today, let's just use this simple check which
   has ZERO negative impact.

b) an optimized approach by dynamically enabling/disabling emulation. e.g.
   we can disable emulation after the 1st xfd feature is enabled and then 
   reenable it in #NM vmexit handler when XFD_ERR includes a bit which is 
   not in guest_fpu::xfeatures, sort of like:

	--xfd trapped, perm has two xfd features--
	(G) access xfd_feature1;
	(H) trap #NM (XFD_ERR = xfd_feature1) and inject #NM;
	(G) WRMSR(IA32_XFD, (-1ULL) & ~xfd_feature1);
	(H) reallocate fpstate and disable write emulation for XFD;

	--xfd passed through--
	(G) do something...
	(G) access xfd_feature2;
	(H) trap #NM (XFD_ERR = xfd_feature2), enable emulation, inject #NM;

	--xfd trapped--
	(G) WRMSR(IA32_XFD, (-1ULL) & ~(xfd_feature1 | xfd_feature2));
	(H) reallocate fpstate and disable write emulation for XFD;

	--xfd passed through--
	(G) do something...

Thanks
Kevin
Paolo Bonzini Dec. 14, 2021, 10:42 a.m. UTC | #4
On 12/14/21 03:50, Thomas Gleixner wrote:
> The only remaining issue is the KVM XSTATE save/restore size checking which
> probably requires some FPU core assistance. But that requires some more
> thoughts vs. the IOCTL interface extension and once that is settled it
> needs to be solved in one go. But that's an orthogonal issue to the above.

That's not a big deal because KVM uses the uncompacted format.  So 
KVM_CHECK_EXTENSION and KVM_GET_XSAVE can just use CPUID to retrieve the 
size and uncompacted offset of the largest bit that is set in 
kvm_supported_xcr0, while KVM_SET_XSAVE can do the same with the largest 
bit that is set in the xstate_bv.

Paolo



> The series is also available from git:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/people/tglx/devel.git x86/fpu-kvm
Thomas Gleixner Dec. 14, 2021, 1:24 p.m. UTC | #5
On Tue, Dec 14 2021 at 11:42, Paolo Bonzini wrote:
> On 12/14/21 03:50, Thomas Gleixner wrote:
>> The only remaining issue is the KVM XSTATE save/restore size checking which
>> probably requires some FPU core assistance. But that requires some more
>> thoughts vs. the IOCTL interface extension and once that is settled it
>> needs to be solved in one go. But that's an orthogonal issue to the above.
>
> That's not a big deal because KVM uses the uncompacted format.  So 
> KVM_CHECK_EXTENSION and KVM_GET_XSAVE can just use CPUID to retrieve the 
> size and uncompacted offset of the largest bit that is set in 
> kvm_supported_xcr0, while KVM_SET_XSAVE can do the same with the largest 
> bit that is set in the xstate_bv.

For simplicity you can just get that information from guest_fpu. See
below.

Thanks,

        tglx
---
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -518,6 +518,11 @@ struct fpu_guest {
 	u64				perm;
 
 	/*
+	 * @uabi_size:			Size required for save/restore
+	 */
+	unsigned int			uabi_size;
+
+	/*
 	 * @fpstate:			Pointer to the allocated guest fpstate
 	 */
 	struct fpstate			*fpstate;
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -240,6 +240,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_
 	gfpu->fpstate		= fpstate;
 	gfpu->xfeatures		= fpu_user_cfg.default_features;
 	gfpu->perm		= fpu_user_cfg.default_features;
+	gfpu->uabi_size		= fpu_user_cfg.default_size;
 	fpu_init_guest_permissions(gfpu);
 
 	return true;
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1545,6 +1545,7 @@ static int fpstate_realloc(u64 xfeatures
 		newfps->is_confidential = curfps->is_confidential;
 		newfps->in_use = curfps->in_use;
 		guest_fpu->xfeatures |= xfeatures;
+		guest_fpu->uabi_size = usize;
 	}
 
 	fpregs_lock();