diff mbox series

[RFC] KVM: x86: Disallow KVM_SET_CPUID{2} if the vCPU is in guest mode

Message ID 20191218174255.30773-1-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC] KVM: x86: Disallow KVM_SET_CPUID{2} if the vCPU is in guest mode | expand

Commit Message

Sean Christopherson Dec. 18, 2019, 5:42 p.m. UTC
Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to
avoid complications and potentially undesirable KVM behavior.  Allowing
userspace to change a guest's capabilities while L2 is active would at
best result in unexpected behavior in the guest (L1 or L2), and at worst
induce bad KVM behavior by breaking fundamental assumptions regarding
transitions between L0, L1 and L2.

Cc: Jim Mattson <jmattson@google.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---

This came up in the context of the CET series, where passing through MSRs
to L1 depends on the CPUID-based capabilities of the guest[*].  The CET
problem is solvable, but IMO unnecessarily complex.   And I'm more
concerned that userspace would be able to induce bad behavior in KVM by
changing core capabilites while L2 is active, e.g. VMX, LM, LA57, etc...

Tagged RFC as this is an ABI change, though I highly doubt it actually
affects a real world VMM.

[*] https://lkml.kernel.org/r/20191218160228.GB25201@linux.intel.com/

 arch/x86/kvm/x86.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jim Mattson Dec. 18, 2019, 7:38 p.m. UTC | #1
On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to
> avoid complications and potentially undesirable KVM behavior.  Allowing
> userspace to change a guest's capabilities while L2 is active would at
> best result in unexpected behavior in the guest (L1 or L2), and at worst
> induce bad KVM behavior by breaking fundamental assumptions regarding
> transitions between L0, L1 and L2.

This seems a bit contrived. As long as we're breaking the ABI, can we
disallow changes to CPUID once the vCPU has been powered on?
Sean Christopherson Dec. 18, 2019, 8:10 p.m. UTC | #2
On Wed, Dec 18, 2019 at 11:38:43AM -0800, Jim Mattson wrote:
> On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to
> > avoid complications and potentially undesirable KVM behavior.  Allowing
> > userspace to change a guest's capabilities while L2 is active would at
> > best result in unexpected behavior in the guest (L1 or L2), and at worst
> > induce bad KVM behavior by breaking fundamental assumptions regarding
> > transitions between L0, L1 and L2.
> 
> This seems a bit contrived. As long as we're breaking the ABI, can we
> disallow changes to CPUID once the vCPU has been powered on?

I can at least concoct scenarios where changing CPUID after KVM_RUN
provides value, e.g. effectively creating a new VM/vCPU without destroying
the kernel's underlying data structures and without putting the file
descriptors, for performance (especially if KVM avoids its hardware on/off
paths) or sandboxing (process has access to a VM fd, but not /dev/kvm).

A truly contrived, but technically architecturally accurate, scenario would
be modeling SGX interaction with the machine check architecutre.  Per the
SDM, #MCs or clearing bits in IA32_MCi_CTL disable SGX, which is reflected
in CPUID:

  Any machine check exception (#MC) that occurs after Intel SGX is first
  enables causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0)
  It cannot be enabled until after the next reset.

  Any act of clearing bits from '1 to '0 in any of the IA32_MCi_CTL register
  may disable Intel SGX (set CPUID.SGX_Leaf.0:EAX[SGX1] to 0) until the next
  reset.

I doubt a userspace VMM would actively model that behavior, but it's at
least theoretically possible.  Yes, it would technically be possible for
SGX to be disabled while L2 is active, but I don't think it's unreasonable
to require userspace to first force the vCPU out of L2.
Jim Mattson Dec. 18, 2019, 8:57 p.m. UTC | #3
On Wed, Dec 18, 2019 at 12:10 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Wed, Dec 18, 2019 at 11:38:43AM -0800, Jim Mattson wrote:
> > On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to
> > > avoid complications and potentially undesirable KVM behavior.  Allowing
> > > userspace to change a guest's capabilities while L2 is active would at
> > > best result in unexpected behavior in the guest (L1 or L2), and at worst
> > > induce bad KVM behavior by breaking fundamental assumptions regarding
> > > transitions between L0, L1 and L2.
> >
> > This seems a bit contrived. As long as we're breaking the ABI, can we
> > disallow changes to CPUID once the vCPU has been powered on?
>
> I can at least concoct scenarios where changing CPUID after KVM_RUN
> provides value, e.g. effectively creating a new VM/vCPU without destroying
> the kernel's underlying data structures and without putting the file
> descriptors, for performance (especially if KVM avoids its hardware on/off
> paths) or sandboxing (process has access to a VM fd, but not /dev/kvm).
>
> A truly contrived, but technically architecturally accurate, scenario would
> be modeling SGX interaction with the machine check architecutre.  Per the
> SDM, #MCs or clearing bits in IA32_MCi_CTL disable SGX, which is reflected
> in CPUID:
>
>   Any machine check exception (#MC) that occurs after Intel SGX is first
>   enables causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0)
>   It cannot be enabled until after the next reset.
>
>   Any act of clearing bits from '1 to '0 in any of the IA32_MCi_CTL register
>   may disable Intel SGX (set CPUID.SGX_Leaf.0:EAX[SGX1] to 0) until the next
>   reset.
>
> I doubt a userspace VMM would actively model that behavior, but it's at
> least theoretically possible.  Yes, it would technically be possible for
> SGX to be disabled while L2 is active, but I don't think it's unreasonable
> to require userspace to first force the vCPU out of L2.

IIt's perfectly reasonable for a machine check to be handled by L2, in
which case, it would be rather onerous to require userspace to force
the vCPU out of L2 to clear CPUID.SGX_Leaf.0:EAX[SGX1].
Sean Christopherson Dec. 18, 2019, 9:24 p.m. UTC | #4
On Wed, Dec 18, 2019 at 12:57:41PM -0800, Jim Mattson wrote:
> On Wed, Dec 18, 2019 at 12:10 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Wed, Dec 18, 2019 at 11:38:43AM -0800, Jim Mattson wrote:
> > > On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson
> > > <sean.j.christopherson@intel.com> wrote:
> > > >
> > > > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to
> > > > avoid complications and potentially undesirable KVM behavior.  Allowing
> > > > userspace to change a guest's capabilities while L2 is active would at
> > > > best result in unexpected behavior in the guest (L1 or L2), and at worst
> > > > induce bad KVM behavior by breaking fundamental assumptions regarding
> > > > transitions between L0, L1 and L2.
> > >
> > > This seems a bit contrived. As long as we're breaking the ABI, can we
> > > disallow changes to CPUID once the vCPU has been powered on?
> >
> > I can at least concoct scenarios where changing CPUID after KVM_RUN
> > provides value, e.g. effectively creating a new VM/vCPU without destroying
> > the kernel's underlying data structures and without putting the file
> > descriptors, for performance (especially if KVM avoids its hardware on/off
> > paths) or sandboxing (process has access to a VM fd, but not /dev/kvm).
> >
> > A truly contrived, but technically architecturally accurate, scenario would
> > be modeling SGX interaction with the machine check architecutre.  Per the
> > SDM, #MCs or clearing bits in IA32_MCi_CTL disable SGX, which is reflected
> > in CPUID:
> >
> >   Any machine check exception (#MC) that occurs after Intel SGX is first
> >   enables causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0)
> >   It cannot be enabled until after the next reset.
> >
> >   Any act of clearing bits from '1 to '0 in any of the IA32_MCi_CTL register
> >   may disable Intel SGX (set CPUID.SGX_Leaf.0:EAX[SGX1] to 0) until the next
> >   reset.
> >
> > I doubt a userspace VMM would actively model that behavior, but it's at
> > least theoretically possible.  Yes, it would technically be possible for
> > SGX to be disabled while L2 is active, but I don't think it's unreasonable
> > to require userspace to first force the vCPU out of L2.
> 
> IIt's perfectly reasonable for a machine check to be handled by L2, in
> which case, it would be rather onerous to require userspace to force
> the vCPU out of L2 to clear CPUID.SGX_Leaf.0:EAX[SGX1].

Hrm.  I just had to go and think of SGX...  I guess it's probably best to
suck it up and have CET update the right bitmaps.
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8bb2fb1705ff..974983140e42 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4189,6 +4189,10 @@  long kvm_arch_vcpu_ioctl(struct file *filp,
 		struct kvm_cpuid __user *cpuid_arg = argp;
 		struct kvm_cpuid cpuid;
 
+		r = -EBUSY;
+		if (is_guest_mode(vcpu))
+			goto out;
+
 		r = -EFAULT;
 		if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
 			goto out;
@@ -4199,6 +4203,10 @@  long kvm_arch_vcpu_ioctl(struct file *filp,
 		struct kvm_cpuid2 __user *cpuid_arg = argp;
 		struct kvm_cpuid2 cpuid;
 
+		r = -EBUSY;
+		if (is_guest_mode(vcpu))
+			goto out;
+
 		r = -EFAULT;
 		if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
 			goto out;