Message ID | 20191218174255.30773-1-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] KVM: x86: Disallow KVM_SET_CPUID{2} if the vCPU is in guest mode | expand |
On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to > avoid complications and potentially undesirable KVM behavior. Allowing > userspace to change a guest's capabilities while L2 is active would at > best result in unexpected behavior in the guest (L1 or L2), and at worst > induce bad KVM behavior by breaking fundamental assumptions regarding > transitions between L0, L1 and L2. This seems a bit contrived. As long as we're breaking the ABI, can we disallow changes to CPUID once the vCPU has been powered on?
On Wed, Dec 18, 2019 at 11:38:43AM -0800, Jim Mattson wrote: > On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to > > avoid complications and potentially undesirable KVM behavior. Allowing > > userspace to change a guest's capabilities while L2 is active would at > > best result in unexpected behavior in the guest (L1 or L2), and at worst > > induce bad KVM behavior by breaking fundamental assumptions regarding > > transitions between L0, L1 and L2. > > This seems a bit contrived. As long as we're breaking the ABI, can we > disallow changes to CPUID once the vCPU has been powered on? I can at least concoct scenarios where changing CPUID after KVM_RUN provides value, e.g. effectively creating a new VM/vCPU without destroying the kernel's underlying data structures and without putting the file descriptors, for performance (especially if KVM avoids its hardware on/off paths) or sandboxing (process has access to a VM fd, but not /dev/kvm). A truly contrived, but technically architecturally accurate, scenario would be modeling SGX interaction with the machine check architecutre. Per the SDM, #MCs or clearing bits in IA32_MCi_CTL disable SGX, which is reflected in CPUID: Any machine check exception (#MC) that occurs after Intel SGX is first enables causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0) It cannot be enabled until after the next reset. Any act of clearing bits from '1 to '0 in any of the IA32_MCi_CTL register may disable Intel SGX (set CPUID.SGX_Leaf.0:EAX[SGX1] to 0) until the next reset. I doubt a userspace VMM would actively model that behavior, but it's at least theoretically possible. Yes, it would technically be possible for SGX to be disabled while L2 is active, but I don't think it's unreasonable to require userspace to first force the vCPU out of L2.
On Wed, Dec 18, 2019 at 12:10 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Wed, Dec 18, 2019 at 11:38:43AM -0800, Jim Mattson wrote: > > On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to > > > avoid complications and potentially undesirable KVM behavior. Allowing > > > userspace to change a guest's capabilities while L2 is active would at > > > best result in unexpected behavior in the guest (L1 or L2), and at worst > > > induce bad KVM behavior by breaking fundamental assumptions regarding > > > transitions between L0, L1 and L2. > > > > This seems a bit contrived. As long as we're breaking the ABI, can we > > disallow changes to CPUID once the vCPU has been powered on? > > I can at least concoct scenarios where changing CPUID after KVM_RUN > provides value, e.g. effectively creating a new VM/vCPU without destroying > the kernel's underlying data structures and without putting the file > descriptors, for performance (especially if KVM avoids its hardware on/off > paths) or sandboxing (process has access to a VM fd, but not /dev/kvm). > > A truly contrived, but technically architecturally accurate, scenario would > be modeling SGX interaction with the machine check architecutre. Per the > SDM, #MCs or clearing bits in IA32_MCi_CTL disable SGX, which is reflected > in CPUID: > > Any machine check exception (#MC) that occurs after Intel SGX is first > enables causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0) > It cannot be enabled until after the next reset. > > Any act of clearing bits from '1 to '0 in any of the IA32_MCi_CTL register > may disable Intel SGX (set CPUID.SGX_Leaf.0:EAX[SGX1] to 0) until the next > reset. > > I doubt a userspace VMM would actively model that behavior, but it's at > least theoretically possible. Yes, it would technically be possible for > SGX to be disabled while L2 is active, but I don't think it's unreasonable > to require userspace to first force the vCPU out of L2. IIt's perfectly reasonable for a machine check to be handled by L2, in which case, it would be rather onerous to require userspace to force the vCPU out of L2 to clear CPUID.SGX_Leaf.0:EAX[SGX1].
On Wed, Dec 18, 2019 at 12:57:41PM -0800, Jim Mattson wrote: > On Wed, Dec 18, 2019 at 12:10 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Wed, Dec 18, 2019 at 11:38:43AM -0800, Jim Mattson wrote: > > > On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to > > > > avoid complications and potentially undesirable KVM behavior. Allowing > > > > userspace to change a guest's capabilities while L2 is active would at > > > > best result in unexpected behavior in the guest (L1 or L2), and at worst > > > > induce bad KVM behavior by breaking fundamental assumptions regarding > > > > transitions between L0, L1 and L2. > > > > > > This seems a bit contrived. As long as we're breaking the ABI, can we > > > disallow changes to CPUID once the vCPU has been powered on? > > > > I can at least concoct scenarios where changing CPUID after KVM_RUN > > provides value, e.g. effectively creating a new VM/vCPU without destroying > > the kernel's underlying data structures and without putting the file > > descriptors, for performance (especially if KVM avoids its hardware on/off > > paths) or sandboxing (process has access to a VM fd, but not /dev/kvm). > > > > A truly contrived, but technically architecturally accurate, scenario would > > be modeling SGX interaction with the machine check architecutre. Per the > > SDM, #MCs or clearing bits in IA32_MCi_CTL disable SGX, which is reflected > > in CPUID: > > > > Any machine check exception (#MC) that occurs after Intel SGX is first > > enables causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0) > > It cannot be enabled until after the next reset. > > > > Any act of clearing bits from '1 to '0 in any of the IA32_MCi_CTL register > > may disable Intel SGX (set CPUID.SGX_Leaf.0:EAX[SGX1] to 0) until the next > > reset. > > > > I doubt a userspace VMM would actively model that behavior, but it's at > > least theoretically possible. Yes, it would technically be possible for > > SGX to be disabled while L2 is active, but I don't think it's unreasonable > > to require userspace to first force the vCPU out of L2. > > IIt's perfectly reasonable for a machine check to be handled by L2, in > which case, it would be rather onerous to require userspace to force > the vCPU out of L2 to clear CPUID.SGX_Leaf.0:EAX[SGX1]. Hrm. I just had to go and think of SGX... I guess it's probably best to suck it up and have CET update the right bitmaps.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8bb2fb1705ff..974983140e42 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4189,6 +4189,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid __user *cpuid_arg = argp; struct kvm_cpuid cpuid; + r = -EBUSY; + if (is_guest_mode(vcpu)) + goto out; + r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out; @@ -4199,6 +4203,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid2 __user *cpuid_arg = argp; struct kvm_cpuid2 cpuid; + r = -EBUSY; + if (is_guest_mode(vcpu)) + goto out; + r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;
Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to avoid complications and potentially undesirable KVM behavior. Allowing userspace to change a guest's capabilities while L2 is active would at best result in unexpected behavior in the guest (L1 or L2), and at worst induce bad KVM behavior by breaking fundamental assumptions regarding transitions between L0, L1 and L2. Cc: Jim Mattson <jmattson@google.com> Cc: Weijiang Yang <weijiang.yang@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- This came up in the context of the CET series, where passing through MSRs to L1 depends on the CPUID-based capabilities of the guest[*]. The CET problem is solvable, but IMO unnecessarily complex. And I'm more concerned that userspace would be able to induce bad behavior in KVM by changing core capabilites while L2 is active, e.g. VMX, LM, LA57, etc... Tagged RFC as this is an ABI change, though I highly doubt it actually affects a real world VMM. [*] https://lkml.kernel.org/r/20191218160228.GB25201@linux.intel.com/ arch/x86/kvm/x86.c | 8 ++++++++ 1 file changed, 8 insertions(+)