diff mbox series

[02/19] x86/fpu: Prepare KVM for dynamically enabled states

Message ID 20211208000359.2853257-3-yang.zhong@intel.com (mailing list archive)
State New, archived
Headers show
Series AMX Support in KVM | expand

Commit Message

Yang Zhong Dec. 8, 2021, 12:03 a.m. UTC
From: Thomas Gleixner <tglx@linutronix.de>

Add more fields for tracking per-vCPU permissions for dynamic XSTATE
components:

  - user_xfeatures

    Track which features are currently enabled for the vCPU

  - user_perm

    Copied from guest_perm of the group leader thread. The first
    vCPU which does the copy locks the guest_perm

  - realloc_request

    KVM sets this field to request dynamically-enabled features
    which require reallocation of @fpstate

Initialize those fields properly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 arch/x86/include/asm/fpu/types.h | 23 +++++++++++++++++++++++
 arch/x86/kernel/fpu/core.c       | 26 +++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Dec. 13, 2021, 9:12 a.m. UTC | #1
On 12/8/21 01:03, Yang Zhong wrote:
>    - user_xfeatures
> 
>      Track which features are currently enabled for the vCPU

Please rename to alloc_xfeatures

>    - user_perm
> 
>      Copied from guest_perm of the group leader thread. The first
>      vCPU which does the copy locks the guest_perm

Please rename to perm_xfeatures.

>    - realloc_request
> 
>      KVM sets this field to request dynamically-enabled features
>      which require reallocation of @fpstate

This field should be in vcpu->arch, and there is no need for 
fpu_guest_realloc_fpstate.  Rename __xfd_enable_feature to 
fpu_enable_xfd_feature and add it to the public API, then just do

	if (unlikely(vcpu->arch.xfd_realloc_request)) {
		u64 request = vcpu->arch.xfd_realloc_request;
		ret = fpu_enable_xfd(request, enter_guest);
	}

to kvm_put_guest_fpu.

Paolo
Thomas Gleixner Dec. 13, 2021, noon UTC | #2
On Mon, Dec 13 2021 at 10:12, Paolo Bonzini wrote:
> On 12/8/21 01:03, Yang Zhong wrote:
>>    - user_xfeatures
>> 
>>      Track which features are currently enabled for the vCPU
>
> Please rename to alloc_xfeatures

That name makes no sense at all. This has nothing to do with alloc.

>>    - user_perm
>> 
>>      Copied from guest_perm of the group leader thread. The first
>>      vCPU which does the copy locks the guest_perm
>
> Please rename to perm_xfeatures.

All of that is following the naming conventions in the FPU code related
to permissions etc.

>>    - realloc_request
>> 
>>      KVM sets this field to request dynamically-enabled features
>>      which require reallocation of @fpstate
>
> This field should be in vcpu->arch, and there is no need for 
> fpu_guest_realloc_fpstate.  Rename __xfd_enable_feature to 
> fpu_enable_xfd_feature and add it to the public API, then just do
>
> 	if (unlikely(vcpu->arch.xfd_realloc_request)) {
> 		u64 request = vcpu->arch.xfd_realloc_request;
> 		ret = fpu_enable_xfd(request, enter_guest);
> 	}
>
> to kvm_put_guest_fpu.

Why? Yet another export of FPU internals just because?

Also what clears the reallocation request and what is the @enter_guest
argument supposed to help with?

I have no idea what you are trying to achieve.

Thanks,

        tglx
Paolo Bonzini Dec. 13, 2021, 12:45 p.m. UTC | #3
On 12/13/21 13:00, Thomas Gleixner wrote:
> On Mon, Dec 13 2021 at 10:12, Paolo Bonzini wrote:
>> On 12/8/21 01:03, Yang Zhong wrote:
>>>     - user_xfeatures
>>>
>>>       Track which features are currently enabled for the vCPU
>>
>> Please rename to alloc_xfeatures
> 
> That name makes no sense at all. This has nothing to do with alloc.

Isn't that the features for which space is currently allocated?

fpstate_realloc does

+	if (guest_fpu) {
+		newfps->is_guest = true;
+		newfps->is_confidential = curfps->is_confidential;
+		guest_fpu->user_xfeatures |= xfeatures;
+	}
+

and kvm_check_guest_realloc_fpstate does


+		if ((guest_fpu->user_xfeatures & request) != request) {
+			vcpu->arch.guest_fpu.realloc_request |= request;
+			return true;
+		}

Reading "user_xfeatures" in there is cryptic, it seems like it's 
something related to the userspace thread or group that has invoked the 
KVM ioctl.  If it's renamed to alloc_xfeatures, then this:

+		missing = request & ~guest_fpu->alloc_xfeatures;
+		if (missing) {
+			vcpu->arch.guest_fpu.realloc_request |= missing;
+			return true;
+		}

makes it obvious that the allocation is for features that are requested 
but haven't been allocated in the xstate yet.

>>>     - user_perm
>>>
>>>       Copied from guest_perm of the group leader thread. The first
>>>       vCPU which does the copy locks the guest_perm
>>
>> Please rename to perm_xfeatures.
> 
> All of that is following the naming conventions in the FPU code related
> to permissions etc.

perm or guest_perm is just fine as well; but user_perm is not (there's 
no preexisting reference to user_perm in the code, in fact, as far as I 
can see).

>>>     - realloc_request
>>>
>>>       KVM sets this field to request dynamically-enabled features
>>>       which require reallocation of @fpstate
>>
>> This field should be in vcpu->arch, and there is no need for
>> fpu_guest_realloc_fpstate.  Rename __xfd_enable_feature to
>> fpu_enable_xfd_feature and add it to the public API, then just do
>>
>> 	if (unlikely(vcpu->arch.xfd_realloc_request)) {
>> 		u64 request = vcpu->arch.xfd_realloc_request;
>> 		ret = fpu_enable_xfd(request, enter_guest);
>> 	}
>>
>> to kvm_put_guest_fpu.
> 
> Why? Yet another export of FPU internals just because?

It's one function more and one field less.  I prefer another export of 
FPU internals, to a write to a random field with undocumented invariants.

For example, why WARN_ON_ONCE if enter_guest == true?  If you enter the 
guest after the host has restored MSR_IA32_XFD with KVM_SET_MSR, the 
WARN_ON_ONCE can actually fire.  It would be just fine to skip the 
reallocation until enter_guest == false, which is you actually XSAVE 
into the guest_fpu.

As an aside, realloc_request (if it survives, see below) shouldn't be 
added until patch 6.

> Also what clears the reallocation request and what is the @enter_guest
> argument supposed to help with?

Nothing---make it

  	if (unlikely(vcpu->arch.xfd_realloc_request)) {
  		u64 request = vcpu->arch.xfd_realloc_request;
  		ret = fpu_enable_xfd(request, &vcpu->arch.guest_fpu);
		if (!ret)
			vcpu->arch.xfd_realloc_request = 0;
  	}

but in fact, I'm not sure why the request has to be delayed at all.  The 
obvious implementation of a write to XFD, after all the validity checks, 
is just

	/* This function just calls xfd_enable_feature.  */
	r = fpu_guest_alloc_xfeatures(&vcpu->arch.guest_fpu,
				      vcpu->arch.xcr0 & ~xfd);
	/*
	 * An error means that userspace has screwed up by not doing
	 * arch_prctl(ARCH_REQ_XCOMP_GUEST_PERM).  If we are here coming
	 * from a ioctl fail it, if the guest has done WRMSR or XSETBV
	 * it will inject a #GP.
	 */
	if (r < 0)
		return 1;

	vcpu->arch.xfd = xfd;

with none of the kvm_check_guest_realloc_fpstate or KVM_EXIT_FPU_REALLOC 
business.  No field and a simple, self-contained external API.  The same 
code works for __kvm_set_xcr as well, just with "xcr0 & ~vcpu->arch.xfd" 
as the second argument instead.

(Maybe I'm missing why KVM_EXIT_FPU_REALLOC is needed, but I'm not 
ashamed to say that, given that new userspace API was added with 
documentation or tests.  The only comment in the code is:

+		/*
+		 * Check if fpstate reallocate is required. If yes, then
+		 * let the fpu core do reallocation and update xfd;
+		 * otherwise, update xfd here.
+		 */
+		if (kvm_check_guest_realloc_fpstate(vcpu, data)) {
+			vcpu->run->exit_reason = KVM_EXIT_FPU_REALLOC;
+			vcpu->arch.complete_userspace_io =
+				kvm_skip_emulated_instruction;
+			return KVM_MSR_RET_USERSPACE;
+		}
+

which has nothing to do with the actual content of either 
kvm_check_guest_realloc_fpstate or the "then" branch.  Userspace can 
just do ARCH_REQ_XCOMP_GUEST_PERM based on the guest CPUID, before 
KVM_SET_CPUID2, KVM_SET_MSR or KVM_SET_XCR).

Paolo
Thomas Gleixner Dec. 13, 2021, 7:50 p.m. UTC | #4
Paolo,

On Mon, Dec 13 2021 at 13:45, Paolo Bonzini wrote:
> On 12/13/21 13:00, Thomas Gleixner wrote:
>> On Mon, Dec 13 2021 at 10:12, Paolo Bonzini wrote:
>>> Please rename to alloc_xfeatures
>> 
>> That name makes no sense at all. This has nothing to do with alloc.
>
> Isn't that the features for which space is currently allocated?

It is, but from the kernel POV this is user. :)

> Reading "user_xfeatures" in there is cryptic, it seems like it's 
> something related to the userspace thread or group that has invoked the 
> KVM ioctl.  If it's renamed to alloc_xfeatures, then this:
>
> +		missing = request & ~guest_fpu->alloc_xfeatures;
> +		if (missing) {
> +			vcpu->arch.guest_fpu.realloc_request |= missing;
> +			return true;
> +		}
>
> makes it obvious that the allocation is for features that are requested 
> but haven't been allocated in the xstate yet.

Let's rename it to xfeatures and perm and be done with it.

>> Why? Yet another export of FPU internals just because?
>
> It's one function more and one field less.  I prefer another export of 
> FPU internals, to a write to a random field with undocumented
> invariants.

We want less not more exports. :)

> For example, why WARN_ON_ONCE if enter_guest == true?  If you enter the 
> guest after the host has restored MSR_IA32_XFD with KVM_SET_MSR, the

Indeed restoring a guest might require buffer reallocation, I missed
that, duh!

On restore the following components are involved:

   XCR0, XFD, XSTATE

XCR0 and XFD have to be restored _before_ XSTATE and that needs to
be enforced.

But independent of the ordering of XCR0 and XFD restore the following
check applies to both the restore and the runtime logic:

int kvm_fpu_realloc(struct kvm_vcpu *vcpu, u64 xcr0, u64 xfd)
{
   	u64 expand, enabled = xcr0 & ~xfd;

        expand = enabled & ~vcpu->arch.guest_fpu.xfeatures;
        if (!expand)
        	return 0;
        
        return fpu_enable_guest_features(&vcpu->arch.guest_fpu, expand);
}

int fpu_enable_guest_features(struct guest_fpu *gfpu, u64 which)
{
        permission_checks();
        ...
        return fpstate_realloc(.....)
}

fpstate_realloc() needs to be careful about flipping the pointers
depending on the question whether guest_fpu->fpstate is actually active,
i.e.:

        current->thread.fpu.fpstate == gfpu->fpstate

I'm halfways done with that. Will send something soonish.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 6ddf80637697..861cffca3209 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -504,6 +504,29 @@  struct fpu {
  * Guest pseudo FPU container
  */
 struct fpu_guest {
+	/*
+	 * @user_xfeatures:		xfeature bitmap of features which are
+	 *				currently enabled for the guest vCPU.
+	 */
+	u64				user_xfeatures;
+
+	/*
+	 * @user_perm:			xfeature bitmap of features which are
+	 *				permitted to be enabled for the guest
+	 *				vCPU.
+	 */
+	u64				user_perm;
+
+	/*
+	 * @realloc_request:		xfeature bitmap of features which are
+	 *				requested to be enabled dynamically
+	 *				which requires reallocation of @fpstate
+	 *
+	 *				Set by an intercept handler and
+	 *				evaluated in fpu_swap_kvm_fpstate()
+	 */
+	u64				realloc_request;
+
 	/*
 	 * @fpstate:			Pointer to the allocated guest fpstate
 	 */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index ab19b3d8b2f7..fe592799508c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -201,6 +201,26 @@  void fpu_reset_from_exception_fixup(void)
 #if IS_ENABLED(CONFIG_KVM)
 static void __fpstate_reset(struct fpstate *fpstate);
 
+static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
+{
+	struct fpu_state_perm *fpuperm;
+	u64 perm;
+
+	if (!IS_ENABLED(CONFIG_X86_64))
+		return;
+
+	spin_lock_irq(&current->sighand->siglock);
+	fpuperm = &current->group_leader->thread.fpu.guest_perm;
+	perm = fpuperm->__state_perm;
+
+	/* First fpstate allocation locks down permissions. */
+	WRITE_ONCE(fpuperm->__state_perm, perm | FPU_GUEST_PERM_LOCKED);
+
+	spin_unlock_irq(&current->sighand->siglock);
+
+	gfpu->user_perm = perm & ~FPU_GUEST_PERM_LOCKED;
+}
+
 bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
 {
 	struct fpstate *fpstate;
@@ -216,7 +236,11 @@  bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
 	fpstate->is_valloc	= true;
 	fpstate->is_guest	= true;
 
-	gfpu->fpstate = fpstate;
+	gfpu->fpstate		= fpstate;
+	gfpu->user_xfeatures	= fpu_user_cfg.default_features;
+	gfpu->user_perm		= fpu_user_cfg.default_features;
+	fpu_init_guest_permissions(gfpu);
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(fpu_alloc_guest_fpstate);