diff mbox series

KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2

Message ID 20220226002124.2747985-1-oupton@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 | expand

Commit Message

Oliver Upton Feb. 26, 2022, 12:21 a.m. UTC
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
advertise the set of quirks which may be disabled to userspace, so it is
impossible to predict the behavior of KVM. Worse yet,
KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
it fails to reject attempts to set invalid quirk bits.

The only valid workaround for the quirky quirks API is to add a new CAP.
Actually advertise the set of quirks that can be disabled to userspace
so it can predict KVM's behavior. Reject values for cap->args[0] that
contain invalid bits.

Finally, add documentation for the new capability and describe the
existing quirks.

Signed-off-by: Oliver Upton <oupton@google.com>
---

 This patch applies cleanly to 5.17-rc5. I am working on
 another series that introduces yet another KVM quirk [1], but wanted to
 send this patch out ahead of reworking that series.

 Should we introduce another quirk, KVM_X86_QUIRK_QUIRKY_DISABLE_QUIRKS,
 that provides the new ABI to the old quirks capability? :-P

 Documentation/virt/kvm/api.rst  | 51 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h |  7 +++++
 arch/x86/kvm/x86.c              |  8 ++++++
 include/uapi/linux/kvm.h        |  1 +
 4 files changed, 67 insertions(+)

Comments

Oliver Upton Feb. 26, 2022, 12:22 a.m. UTC | #1
On Sat, Feb 26, 2022 at 12:21:24AM +0000, Oliver Upton wrote:
> KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
> advertise the set of quirks which may be disabled to userspace, so it is
> impossible to predict the behavior of KVM. Worse yet,
> KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
> it fails to reject attempts to set invalid quirk bits.
> 
> The only valid workaround for the quirky quirks API is to add a new CAP.
> Actually advertise the set of quirks that can be disabled to userspace
> so it can predict KVM's behavior. Reject values for cap->args[0] that
> contain invalid bits.
> 
> Finally, add documentation for the new capability and describe the
> existing quirks.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
> 
>  This patch applies cleanly to 5.17-rc5. I am working on
>  another series that introduces yet another KVM quirk [1], but wanted to
>  send this patch out ahead of reworking that series.
> 
>  Should we introduce another quirk, KVM_X86_QUIRK_QUIRKY_DISABLE_QUIRKS,
>  that provides the new ABI to the old quirks capability? :-P
> 

 [1]: http://lore.kernel.org/r/20220225200823.2522321-1-oupton@google.com

>  Documentation/virt/kvm/api.rst  | 51 +++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/kvm_host.h |  7 +++++
>  arch/x86/kvm/x86.c              |  8 ++++++
>  include/uapi/linux/kvm.h        |  1 +
>  4 files changed, 67 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a4267104db50..bad9e54cbb68 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6997,6 +6997,57 @@ indicated by the fd to the VM this is called on.
>  This is intended to support intra-host migration of VMs between userspace VMMs,
>  upgrading the VMM process without interrupting the guest.
>  
> +7.30 KVM_CAP_DISABLE_QUIRKS2
> +----------------------------
> +
> +:Capability: KVM_CAP_DISABLE_QUIRKS2
> +:Parameters: args[0] - set of KVM quirks to disable
> +:Architectures: x86
> +:Type: vm
> +
> +This capability, if enabled, will cause KVM to disable some behavior
> +quirks.
> +
> +Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of
> +quirks that can be disabled in KVM.
> +
> +The argument to KVM_ENABLE_CAP for this capability is a bitmask of
> +quirks to disable, and must be a subset of the bitmask returned by
> +KVM_CHECK_EXTENSION.
> +
> +The valid bits in cap.args[0] are:
> +
> +=================================== ============================================
> + KVM_X86_QUIRK_LINT0_ENABLED        By default, the reset value for the LVT
> +                                    LINT0 register is 0x700 (APIC_MODE_EXTINT).
> +                                    When this quirk is disabled, the reset value
> +                                    is 0x10000 (APIC_LVT_MASKED).
> +
> + KVM_X86_QUIRK_CD_NW_CLEARED        By default, KVM clears CR0.CD and CR0.NW.
> +                                    When this quirk is disabled, KVM does not
> +                                    change the value of CR0.CD and CR0.NW.
> +
> + KVM_X86_QUIRK_LAPIC_MMIO_HOLE      By default, the MMIO LAPIC interface is
> +                                    available even when configured for x2APIC
> +                                    mode. When this quirk is disabled, KVM
> +                                    disables the MMIO LAPIC interface if the
> +                                    LAPIC is in x2APIC mode.
> +
> + KVM_X86_QUIRK_OUT_7E_INC_RIP       By default, KVM pre-increments %rip before
> +                                    exiting to userspace for an OUT instruction
> +                                    to port 0x7e. When this quirk is disabled,
> +                                    KVM does not pre-increment %rip before
> +                                    exiting to userspace.
> +
> + KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets
> +                                    CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if
> +                                    IA32_MISC_ENABLE[bit 18] (MWAIT) is set.
> +                                    Additionally, when this quirk is disabled,
> +                                    KVM clears CPUID.01H:ECX[bit 3] if
> +                                    IA32_MISC_ENABLE[bit 18] is cleared.
> +=================================== ============================================
> +
> +
>  8. Other capabilities.
>  ======================
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6dcccb304775..4f01eb977338 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1955,4 +1955,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
>  #define KVM_CLOCK_VALID_FLAGS						\
>  	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
>  
> +#define KVM_X86_VALID_QUIRKS			\
> +	(KVM_X86_QUIRK_LINT0_REENABLED |	\
> +	 KVM_X86_QUIRK_CD_NW_CLEARED |		\
> +	 KVM_X86_QUIRK_LAPIC_MMIO_HOLE |	\
> +	 KVM_X86_QUIRK_OUT_7E_INC_RIP |		\
> +	 KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 641044db415d..e5227aca9e7e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4331,6 +4331,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  			r = sizeof(struct kvm_xsave);
>  		break;
>  	}
> +	case KVM_CAP_DISABLE_QUIRKS2:
> +		r = KVM_X86_VALID_QUIRKS;
> +		break;
>  	default:
>  		break;
>  	}
> @@ -5877,6 +5880,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		return -EINVAL;
>  
>  	switch (cap->cap) {
> +	case KVM_CAP_DISABLE_QUIRKS2:
> +		r = -EINVAL;
> +		if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
> +			break;
> +		fallthrough;
>  	case KVM_CAP_DISABLE_QUIRKS:
>  		kvm->arch.disabled_quirks = cap->args[0];
>  		r = 0;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 5191b57e1562..9f7410496b36 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1134,6 +1134,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_VM_GPA_BITS 207
>  #define KVM_CAP_XSAVE2 208
>  #define KVM_CAP_SYS_ATTRIBUTES 209
> +#define KVM_CAP_DISABLE_QUIRKS2 210
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.35.1.574.g5d30c73bfb-goog
>
Sean Christopherson Feb. 28, 2022, 4:33 p.m. UTC | #2
On Sat, Feb 26, 2022, Oliver Upton wrote:
> KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
> advertise the set of quirks which may be disabled to userspace, so it is
> impossible to predict the behavior of KVM. Worse yet,
> KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
> it fails to reject attempts to set invalid quirk bits.

FWIW, we do have a way out without adding another capability.  The 'flags' field
is enforced for all capabilities, we could use a bit there to add "v2" functionality.
Userspace can assume KVM_QUIRK_ENFORCE_QUIRKS is allowed if the return from probing
the capability is >1.

It's gross and forced, just an idea if we want to avoid yet another cap.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c712c33c1521..2a8449d1cf24 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4229,7 +4229,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
        case KVM_CAP_HYPERV_TIME:
        case KVM_CAP_IOAPIC_POLARITY_IGNORED:
        case KVM_CAP_TSC_DEADLINE_TIMER:
-       case KVM_CAP_DISABLE_QUIRKS:
        case KVM_CAP_SET_BOOT_CPU_ID:
        case KVM_CAP_SPLIT_IRQCHIP:
        case KVM_CAP_IMMEDIATE_EXIT:
@@ -4254,6 +4253,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
        case KVM_CAP_VAPIC:
                r = 1;
                break;
+       case KVM_CAP_DISABLE_QUIRKS:
+               r = KVM_X86_VALID_QUIRKS;
+               break;
        case KVM_CAP_EXIT_HYPERCALL:
                r = KVM_EXIT_HYPERCALL_VALID_MASK;
                break;
@@ -5892,11 +5894,19 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 {
        int r;

-       if (cap->flags)
+       if (cap->flags && cap->cap != KVM_CAP_DISABLE_QUIRKS)
                return -EINVAL;

        switch (cap->cap) {
        case KVM_CAP_DISABLE_QUIRKS:
+               r = -EINVAL;
+               if (cap->flags & ~KVM_QUIRK_ENFORCE_QUIRKS)
+                       break;
+
+               if ((cap->flags & KVM_QUIRK_ENFORCE_QUIRKS) &&
+                   (cap->args[0] & ~KVM_X86_VALID_QUIRKS))
+                       break;
+
                kvm->arch.disabled_quirks = cap->args[0];
                r = 0;
                break;

> +7.30 KVM_CAP_DISABLE_QUIRKS2
> +----------------------------
> +
> +:Capability: KVM_CAP_DISABLE_QUIRKS2
> +:Parameters: args[0] - set of KVM quirks to disable
> +:Architectures: x86
> +:Type: vm
> +
> +This capability, if enabled, will cause KVM to disable some behavior
> +quirks.
> +
> +Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of
> +quirks that can be disabled in KVM.
> +
> +The argument to KVM_ENABLE_CAP for this capability is a bitmask of
> +quirks to disable, and must be a subset of the bitmask returned by
> +KVM_CHECK_EXTENSION.
> +
> +The valid bits in cap.args[0] are:
> +
> +=================================== ============================================
> + KVM_X86_QUIRK_LINT0_ENABLED        By default, the reset value for the LVT

LINT0_REEANBLED.
Oliver Upton Feb. 28, 2022, 6 p.m. UTC | #3
On Mon, Feb 28, 2022 at 04:33:57PM +0000, Sean Christopherson wrote:
> On Sat, Feb 26, 2022, Oliver Upton wrote:
> > KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
> > advertise the set of quirks which may be disabled to userspace, so it is
> > impossible to predict the behavior of KVM. Worse yet,
> > KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
> > it fails to reject attempts to set invalid quirk bits.
> 
> FWIW, we do have a way out without adding another capability.  The 'flags' field
> is enforced for all capabilities, we could use a bit there to add "v2" functionality.
> Userspace can assume KVM_QUIRK_ENFORCE_QUIRKS is allowed if the return from probing
> the capability is >1.
> 
> It's gross and forced, just an idea if we want to avoid yet another cap.

I had considered this before sending out v1, but was concerned if a
userspace didn't correctly handle a return value >1 from
KVM_CHECK_EXTENSION. Turns out, I can't even find any evidence of the
KVM_CAP_DISABLE_QUIRKS used by userspace. I spot checked QEMU, kvmtool,
and a couple of the rusty ones.

The only other thing that comes to mind is it's a bit gross for userspace
to do a graceful fallback if KVM_QUIRK_ENFORCE_QUIRKS isn't valid, since
most userspace would just error out on -EINVAL. At least with a new cap
userspace could follow a somewhat standardized way to discover if the
kernel supports enforced quirks.

[...]

> > +7.30 KVM_CAP_DISABLE_QUIRKS2
> > +----------------------------
> > +
> > +:Capability: KVM_CAP_DISABLE_QUIRKS2
> > +:Parameters: args[0] - set of KVM quirks to disable
> > +:Architectures: x86
> > +:Type: vm
> > +
> > +This capability, if enabled, will cause KVM to disable some behavior
> > +quirks.
> > +
> > +Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of
> > +quirks that can be disabled in KVM.
> > +
> > +The argument to KVM_ENABLE_CAP for this capability is a bitmask of
> > +quirks to disable, and must be a subset of the bitmask returned by
> > +KVM_CHECK_EXTENSION.
> > +
> > +The valid bits in cap.args[0] are:
> > +
> > +=================================== ============================================
> > + KVM_X86_QUIRK_LINT0_ENABLED        By default, the reset value for the LVT
> 
> LINT0_REEANBLED.

Oops. Thanks!

--
Oliver
Sean Christopherson Feb. 28, 2022, 6:22 p.m. UTC | #4
On Mon, Feb 28, 2022, Oliver Upton wrote:
> On Mon, Feb 28, 2022 at 04:33:57PM +0000, Sean Christopherson wrote:
> > On Sat, Feb 26, 2022, Oliver Upton wrote:
> > > KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
> > > advertise the set of quirks which may be disabled to userspace, so it is
> > > impossible to predict the behavior of KVM. Worse yet,
> > > KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
> > > it fails to reject attempts to set invalid quirk bits.
> > 
> > FWIW, we do have a way out without adding another capability.  The 'flags' field
> > is enforced for all capabilities, we could use a bit there to add "v2" functionality.
> > Userspace can assume KVM_QUIRK_ENFORCE_QUIRKS is allowed if the return from probing
> > the capability is >1.
> > 
> > It's gross and forced, just an idea if we want to avoid yet another cap.
> 
> I had considered this before sending out v1, but was concerned if a
> userspace didn't correctly handle a return value >1 from
> KVM_CHECK_EXTENSION.

Ah, right, userspace could theoretically check "== 1".  Blech.

> Turns out, I can't even find any evidence of the KVM_CAP_DISABLE_QUIRKS used
> by userspace. I spot checked QEMU, kvmtool,
> and a couple of the rusty ones.
> 
> The only other thing that comes to mind is it's a bit gross for userspace
> to do a graceful fallback if KVM_QUIRK_ENFORCE_QUIRKS isn't valid, since
> most userspace would just error out on -EINVAL. At least with a new cap
> userspace could follow a somewhat standardized way to discover if the
> kernel supports enforced quirks.

Yeah, a QUIRKS2 is probably easier for everyone.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a4267104db50..bad9e54cbb68 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6997,6 +6997,57 @@  indicated by the fd to the VM this is called on.
 This is intended to support intra-host migration of VMs between userspace VMMs,
 upgrading the VMM process without interrupting the guest.
 
+7.30 KVM_CAP_DISABLE_QUIRKS2
+----------------------------
+
+:Capability: KVM_CAP_DISABLE_QUIRKS2
+:Parameters: args[0] - set of KVM quirks to disable
+:Architectures: x86
+:Type: vm
+
+This capability, if enabled, will cause KVM to disable some behavior
+quirks.
+
+Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of
+quirks that can be disabled in KVM.
+
+The argument to KVM_ENABLE_CAP for this capability is a bitmask of
+quirks to disable, and must be a subset of the bitmask returned by
+KVM_CHECK_EXTENSION.
+
+The valid bits in cap.args[0] are:
+
+=================================== ============================================
+ KVM_X86_QUIRK_LINT0_ENABLED        By default, the reset value for the LVT
+                                    LINT0 register is 0x700 (APIC_MODE_EXTINT).
+                                    When this quirk is disabled, the reset value
+                                    is 0x10000 (APIC_LVT_MASKED).
+
+ KVM_X86_QUIRK_CD_NW_CLEARED        By default, KVM clears CR0.CD and CR0.NW.
+                                    When this quirk is disabled, KVM does not
+                                    change the value of CR0.CD and CR0.NW.
+
+ KVM_X86_QUIRK_LAPIC_MMIO_HOLE      By default, the MMIO LAPIC interface is
+                                    available even when configured for x2APIC
+                                    mode. When this quirk is disabled, KVM
+                                    disables the MMIO LAPIC interface if the
+                                    LAPIC is in x2APIC mode.
+
+ KVM_X86_QUIRK_OUT_7E_INC_RIP       By default, KVM pre-increments %rip before
+                                    exiting to userspace for an OUT instruction
+                                    to port 0x7e. When this quirk is disabled,
+                                    KVM does not pre-increment %rip before
+                                    exiting to userspace.
+
+ KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets
+                                    CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if
+                                    IA32_MISC_ENABLE[bit 18] (MWAIT) is set.
+                                    Additionally, when this quirk is disabled,
+                                    KVM clears CPUID.01H:ECX[bit 3] if
+                                    IA32_MISC_ENABLE[bit 18] is cleared.
+=================================== ============================================
+
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6dcccb304775..4f01eb977338 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1955,4 +1955,11 @@  int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 #define KVM_CLOCK_VALID_FLAGS						\
 	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
 
+#define KVM_X86_VALID_QUIRKS			\
+	(KVM_X86_QUIRK_LINT0_REENABLED |	\
+	 KVM_X86_QUIRK_CD_NW_CLEARED |		\
+	 KVM_X86_QUIRK_LAPIC_MMIO_HOLE |	\
+	 KVM_X86_QUIRK_OUT_7E_INC_RIP |		\
+	 KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 641044db415d..e5227aca9e7e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4331,6 +4331,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 			r = sizeof(struct kvm_xsave);
 		break;
 	}
+	case KVM_CAP_DISABLE_QUIRKS2:
+		r = KVM_X86_VALID_QUIRKS;
+		break;
 	default:
 		break;
 	}
@@ -5877,6 +5880,11 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		return -EINVAL;
 
 	switch (cap->cap) {
+	case KVM_CAP_DISABLE_QUIRKS2:
+		r = -EINVAL;
+		if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
+			break;
+		fallthrough;
 	case KVM_CAP_DISABLE_QUIRKS:
 		kvm->arch.disabled_quirks = cap->args[0];
 		r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5191b57e1562..9f7410496b36 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1134,6 +1134,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
+#define KVM_CAP_DISABLE_QUIRKS2 210
 
 #ifdef KVM_CAP_IRQ_ROUTING