diff mbox series

[RFC,1/2] KVM: x86: Add a new system attribute for dynamic XSTATE component

Message ID 20220823231402.7839-2-chang.seok.bae@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Add a new attribute to control dynamic XSTATE components | expand

Commit Message

Chang S. Bae Aug. 23, 2022, 11:14 p.m. UTC
== Background ==

A set of architecture-specific prctl() options offer to control dynamic
XSTATE components in VCPUs. Userspace VMMs may interact with the host using
ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM.

However, they are separated from the KVM API. KVM may select features that
the host supports and advertise them through the KVM_X86_XCOMP_GUEST_SUPP
attribute.

== Problem ==

QEMU [1] queries the features through the KVM API instead of using the x86
arch_prctl() option. But it still needs to use arch_prctl() to request the
permission. Then this step may become fragile because it does not guarantee
to comply with the KVM policy.

== Solution ==

Introduce a new attribute: KVM_X86_XCOMP_GUEST_PERM, and make it available
via the KVM_GET_DEVICE_ATTR and KVM_SET_DEVICE_ATTR APIs.

The implementation needs to use the established fpu_xstate_prctl()
extension for guest permissions. Export it via a new function
xstate_req_guest_perm() that KVM may use.

[1] https://gitlab.com/qemu-project/qemu/-/commit/19db68ca68a7
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yang Zhong <yang.zhong@intel.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/fpu/api.h  |  1 +
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kernel/fpu/xstate.c    |  6 ++++++
 arch/x86/kvm/x86.c              | 31 +++++++++++++++++++++++++++++++
 4 files changed, 39 insertions(+)

Comments

Sean Christopherson Aug. 24, 2022, 9:42 p.m. UTC | #1
On Tue, Aug 23, 2022, Chang S. Bae wrote:
> == Background ==
> 
> A set of architecture-specific prctl() options offer to control dynamic
> XSTATE components in VCPUs. Userspace VMMs may interact with the host using
> ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM.
> 
> However, they are separated from the KVM API. KVM may select features that
> the host supports and advertise them through the KVM_X86_XCOMP_GUEST_SUPP
> attribute.
> 
> == Problem ==
> 
> QEMU [1] queries the features through the KVM API instead of using the x86
> arch_prctl() option. But it still needs to use arch_prctl() to request the
> permission. Then this step may become fragile because it does not guarantee
> to comply with the KVM policy.

But backdooring through KVM doesn't prevent usersepace from walking in through
the front door (arch_prctl()), i.e. this doesn't protect the kernel in any way.
KVM needs to ensure that _KVM_ doesn't screw up and let userspace use features
that KVM doesn't support.  The kernel's restrictions on using features goes on
top, i.e. KVM must behave correctly irrespective of kernel restrictions.

If QEMU wants to assert that it didn't misconfigure itself, it can assert on the
config in any number of ways, e.g. assert that ARCH_GET_XCOMP_GUEST_PERM is a
subset of KVM_X86_XCOMP_GUEST_SUPP at the end of kvm_request_xsave_components().
Chang S. Bae Aug. 24, 2022, 10:49 p.m. UTC | #2
On 8/24/2022 2:42 PM, Sean Christopherson wrote:
> On Tue, Aug 23, 2022, Chang S. Bae wrote:
>> == Background ==
>>
>> A set of architecture-specific prctl() options offer to control dynamic
>> XSTATE components in VCPUs. Userspace VMMs may interact with the host using
>> ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM.
>>
>> However, they are separated from the KVM API. KVM may select features that
>> the host supports and advertise them through the KVM_X86_XCOMP_GUEST_SUPP
>> attribute.
>>
>> == Problem ==
>>
>> QEMU [1] queries the features through the KVM API instead of using the x86
>> arch_prctl() option. But it still needs to use arch_prctl() to request the
>> permission. Then this step may become fragile because it does not guarantee
>> to comply with the KVM policy.
> 
> But backdooring through KVM doesn't prevent usersepace from walking in through
> the front door (arch_prctl()), i.e. this doesn't protect the kernel in any way.

No, I don't think backdooring is established in this proposal. The body 
of the arch_prctl() support is encapsulated inside of the x86 core code. 
KVM is simply calling it like arch_prctl() does.

> KVM needs to ensure that _KVM_ doesn't screw up and let userspace use features
> that KVM doesn't support.  The kernel's restrictions on using features goes on
> top, i.e. KVM must behave correctly irrespective of kernel restrictions.

Maybe this is a policy decision. I don't think that 
ARCH_REQ_XCOMP_GUEST_PERM goes away with this. Userspace may still use 
the arch_prctl() set. But then it makes more sense and consistent to use 
ARCH_GET_XCOMP_SUPP in first place, instead of KVM_X86_XCOMP_GUEST_SUPP, no?

> If QEMU wants to assert that it didn't misconfigure itself, it can assert on the
> config in any number of ways, e.g. assert that ARCH_GET_XCOMP_GUEST_PERM is a
> subset of KVM_X86_XCOMP_GUEST_SUPP at the end of kvm_request_xsave_components().

Yes, but I guess the new attribute can make it simple.

Thanks,
Chang
Sean Christopherson Aug. 25, 2022, 4:19 p.m. UTC | #3
On Wed, Aug 24, 2022, Chang S. Bae wrote:
> On 8/24/2022 2:42 PM, Sean Christopherson wrote:
> Maybe this is a policy decision. I don't think that
> ARCH_REQ_XCOMP_GUEST_PERM goes away with this. Userspace may still use the
> arch_prctl() set. But then it makes more sense and consistent to use
> ARCH_GET_XCOMP_SUPP in first place, instead of KVM_X86_XCOMP_GUEST_SUPP, no?

KVM_X86_XCOMP_GUEST_SUPP is needed so that userspace understands what _KVM_
supports.

> > If QEMU wants to assert that it didn't misconfigure itself, it can assert on the
> > config in any number of ways, e.g. assert that ARCH_GET_XCOMP_GUEST_PERM is a
> > subset of KVM_X86_XCOMP_GUEST_SUPP at the end of kvm_request_xsave_components().
> 
> Yes, but I guess the new attribute can make it simple.

Adding new uAPI and new exports to eliminate one line of userspace code is not a
good tradeoff.  Am I missing something?  This really seems like solution looking
for a problem.
Chang S. Bae Aug. 25, 2022, 8:45 p.m. UTC | #4
On 8/25/2022 9:19 AM, Sean Christopherson wrote:
> 
> Adding new uAPI and new exports to eliminate one line of userspace code is not a
> good tradeoff.  Am I missing something?  This really seems like solution looking
> for a problem.

Well, then that's your call.

Yeah, that simplification is really minor. With this, I would rather 
think KVM wants enforcement before relaying the request to the host. 
That enforcement is unthinkable without supported_xcr0. But it looks 
like userspace is somehow trusted here for KVM.

Thanks,
Chang
Sean Christopherson Aug. 25, 2022, 9:54 p.m. UTC | #5
On Thu, Aug 25, 2022, Chang S. Bae wrote:
> On 8/25/2022 9:19 AM, Sean Christopherson wrote:
> > 
> > Adding new uAPI and new exports to eliminate one line of userspace code is not a
> > good tradeoff.  Am I missing something?  This really seems like solution looking
> > for a problem.
> 
> Well, then that's your call.
> 
> Yeah, that simplification is really minor. With this, I would rather think
> KVM wants enforcement before relaying the request to the host. That
> enforcement is unthinkable without supported_xcr0.

My point is that KVM needs supported_xcr0 (or similar behavior) regardless of
whether or not the kernel further restricts feature usage.

> But it looks like userspace is somehow trusted here for KVM.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 503a577814b2..e4670d56b695 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -133,6 +133,7 @@  static inline void fpstate_free(struct fpu *fpu) { }
 extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
 
 extern u64 xstate_get_guest_group_perm(void);
+extern int xstate_req_guest_perm(unsigned long idx);
 
 /* KVM specific functions */
 extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 46de10a809ec..6ab9a2b38061 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -461,6 +461,7 @@  struct kvm_sync_regs {
 
 /* attributes for system fd (group 0) */
 #define KVM_X86_XCOMP_GUEST_SUPP	0
+#define KVM_X86_XCOMP_GUEST_PERM	1
 
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..ac365cb96304 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1687,6 +1687,12 @@  u64 xstate_get_guest_group_perm(void)
 }
 EXPORT_SYMBOL_GPL(xstate_get_guest_group_perm);
 
+int xstate_req_guest_perm(unsigned long idx)
+{
+	return xstate_request_perm(idx, true);
+}
+EXPORT_SYMBOL_GPL(xstate_req_guest_perm);
+
 /**
  * fpu_xstate_prctl - xstate permission operations
  * @tsk:	Redundant pointer to current
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 205ebdc2b11b..f4a1e94117d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4514,12 +4514,34 @@  static int kvm_x86_dev_get_attr(struct kvm_device_attr *attr)
 		if (put_user(kvm_caps.supported_xcr0, uaddr))
 			return -EFAULT;
 		return 0;
+	case KVM_X86_XCOMP_GUEST_PERM: {
+		u64 permitted = xstate_get_guest_group_perm() & kvm_caps.supported_xcr0;
+
+		return put_user(permitted, uaddr);
+	}
 	default:
 		return -ENXIO;
 		break;
 	}
 }
 
+static int kvm_x86_dev_set_attr(struct kvm_device_attr *attr)
+{
+	unsigned long idx = (unsigned long) kvm_get_attr_addr(attr);
+
+	if (attr->group)
+		return -ENXIO;
+
+	switch (attr->attr) {
+	case KVM_X86_XCOMP_GUEST_PERM:
+		if (!((1ULL << idx) & kvm_caps.supported_xcr0))
+			return -EOPNOTSUPP;
+		return xstate_req_guest_perm(idx);
+	default:
+		return -ENXIO;
+	}
+}
+
 static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
 {
 	if (attr->group)
@@ -4629,6 +4651,15 @@  long kvm_arch_dev_ioctl(struct file *filp,
 		r = kvm_x86_dev_get_attr(&attr);
 		break;
 	}
+	case KVM_SET_DEVICE_ATTR: {
+		struct kvm_device_attr attr;
+
+		r = -EFAULT;
+		if (copy_from_user(&attr, (void __user *)arg, sizeof(attr)))
+			break;
+		r = kvm_x86_dev_set_attr(&attr);
+		break;
+	}
 	case KVM_HAS_DEVICE_ATTR: {
 		struct kvm_device_attr attr;
 		r = -EFAULT;