[v3] kvm: better MWAIT emulation for guests
diff mbox

Message ID 1489448438-29865-1-git-send-email-mst@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin March 13, 2017, 11:44 p.m. UTC
Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
unless explicitly provided with kernel command line argument
"idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
without checking CPUID.

We currently emulate that as a NOP but on VMX we can do better: let
guest stop the CPU until timer, IPI or memory change.  CPU will be busy
but that isn't any worse than a NOP emulation.

Note that mwait within guests is not the same as on real hardware
because halt causes an exit while mwait doesn't.  For this reason it
might not be a good idea to use the regular MWAIT flag in CPUID to
signal this capability.  Add a flag in the hypervisor leaf instead.

Additionally, we add a capability for QEMU - e.g. if it knows there's an
isolated CPU dedicated for the VCPU it can set the standard MWAIT flag
to improve guest behaviour.

Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Note: SVM bits are untested at this point. Seems pretty
obvious though.

changes from v2:
- add a capability to allow host userspace to detect new kernels
- more documentation to clarify the semantics of the feature flag
  and why it's useful
- svm support as suggested by Radim

changes from v1:
- typo fix resulting in rest of leaf flags being overwritten
  Reported by: Wanpeng Li <kernellwp@gmail.com>
- updated commit log with data about guests helped by this feature
- better document differences between mwait and halt for guests

 Documentation/virtual/kvm/api.txt    | 12 ++++++------
 Documentation/virtual/kvm/cpuid.txt  |  6 ++++++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/cpuid.c                 |  3 +++
 arch/x86/kvm/svm.c                   |  2 --
 arch/x86/kvm/vmx.c                   |  4 ----
 arch/x86/kvm/x86.c                   |  3 +++
 include/uapi/linux/kvm.h             |  1 +
 8 files changed, 20 insertions(+), 12 deletions(-)

Comments

Radim Krčmář March 14, 2017, 1:58 p.m. UTC | #1
2017-03-14 01:44+0200, Michael S. Tsirkin:
> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
> unless explicitly provided with kernel command line argument
> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
> without checking CPUID.
> 
> We currently emulate that as a NOP but on VMX we can do better: let
> guest stop the CPU until timer, IPI or memory change.  CPU will be busy
> but that isn't any worse than a NOP emulation.
> 
> Note that mwait within guests is not the same as on real hardware
> because halt causes an exit while mwait doesn't.  For this reason it
> might not be a good idea to use the regular MWAIT flag in CPUID to
> signal this capability.  Add a flag in the hypervisor leaf instead.
> 
> Additionally, we add a capability for QEMU - e.g. if it knows there's an
> isolated CPU dedicated for the VCPU it can set the standard MWAIT flag
> to improve guest behaviour.
> 
> Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Note: SVM bits are untested at this point. Seems pretty
> obvious though.
> 
> changes from v2:
> - add a capability to allow host userspace to detect new kernels
> - more documentation to clarify the semantics of the feature flag
>   and why it's useful
> - svm support as suggested by Radim
> 
> changes from v1:
> - typo fix resulting in rest of leaf flags being overwritten
>   Reported by: Wanpeng Li <kernellwp@gmail.com>
> - updated commit log with data about guests helped by this feature
> - better document differences between mwait and halt for guests
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> @@ -4135,11 +4135,11 @@ available, means that that the kernel can support guests using the
>  radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
>  processor).
>  
> -8.4 KVM_CAP_PPC_HASH_MMU_V3

This patch should not not remove the PPC capability from docs.

(The right name is KVM_CAP_PPC_HASH_V3, but that is for another patch.)

> +8.5 KVM_CAP_X86_GUEST_MWAIT
>  
> -Architectures: ppc
> +Architectures: x86
>  
> -This capability, if KVM_CHECK_EXTENSION indicates that it is
> -available, means that that the kernel can support guests using the
> -hashed page table MMU defined in Power ISA V3.00 (as implemented in
> -the POWER9 processor), including in-memory segment tables.
> +This capability indicates that guest using memory monotoring instructions
> +(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
> +spent while virtual CPU is halted in this way will then be accounted for as
> +guest running time on the host (as opposed to e.g. HLT).
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -2684,6 +2684,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_ADJUST_CLOCK:
>  		r = KVM_CLOCK_TSC_STABLE;
>  		break;
> +	case KVM_CAP_X86_GUEST_MWAIT:
> +		r = !!this_cpu_has(X86_FEATURE_MWAIT);

this_cpu_has already returns bool, so !! is not needed.

I can fix both while applying.

> +		break;
>  	case KVM_CAP_X86_SMM:
>  		/* SMBASE is usually relocated above 1M on modern chipsets,
>  		 * and SMM handlers might indeed rely on 4G segment limits,
Michael S. Tsirkin March 14, 2017, 3:34 p.m. UTC | #2
On Tue, Mar 14, 2017 at 02:58:24PM +0100, Radim Krčmář wrote:
> 2017-03-14 01:44+0200, Michael S. Tsirkin:
> > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
> > unless explicitly provided with kernel command line argument
> > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
> > without checking CPUID.
> > 
> > We currently emulate that as a NOP but on VMX we can do better: let
> > guest stop the CPU until timer, IPI or memory change.  CPU will be busy
> > but that isn't any worse than a NOP emulation.
> > 
> > Note that mwait within guests is not the same as on real hardware
> > because halt causes an exit while mwait doesn't.  For this reason it
> > might not be a good idea to use the regular MWAIT flag in CPUID to
> > signal this capability.  Add a flag in the hypervisor leaf instead.
> > 
> > Additionally, we add a capability for QEMU - e.g. if it knows there's an
> > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag
> > to improve guest behaviour.
> > 
> > Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Note: SVM bits are untested at this point. Seems pretty
> > obvious though.
> > 
> > changes from v2:
> > - add a capability to allow host userspace to detect new kernels
> > - more documentation to clarify the semantics of the feature flag
> >   and why it's useful
> > - svm support as suggested by Radim
> > 
> > changes from v1:
> > - typo fix resulting in rest of leaf flags being overwritten
> >   Reported by: Wanpeng Li <kernellwp@gmail.com>
> > - updated commit log with data about guests helped by this feature
> > - better document differences between mwait and halt for guests
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > @@ -4135,11 +4135,11 @@ available, means that that the kernel can support guests using the
> >  radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
> >  processor).
> >  
> > -8.4 KVM_CAP_PPC_HASH_MMU_V3
> 
> This patch should not not remove the PPC capability from docs.
> 
> (The right name is KVM_CAP_PPC_HASH_V3, but that is for another patch.)

Oops my bad. If you do decide you want me to respin because of this,
pls let me know.

> > +8.5 KVM_CAP_X86_GUEST_MWAIT
> >  
> > -Architectures: ppc
> > +Architectures: x86
> >  
> > -This capability, if KVM_CHECK_EXTENSION indicates that it is
> > -available, means that that the kernel can support guests using the
> > -hashed page table MMU defined in Power ISA V3.00 (as implemented in
> > -the POWER9 processor), including in-memory segment tables.
> > +This capability indicates that guest using memory monotoring instructions
> > +(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
> > +spent while virtual CPU is halted in this way will then be accounted for as
> > +guest running time on the host (as opposed to e.g. HLT).
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > @@ -2684,6 +2684,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  	case KVM_CAP_ADJUST_CLOCK:
> >  		r = KVM_CLOCK_TSC_STABLE;
> >  		break;
> > +	case KVM_CAP_X86_GUEST_MWAIT:
> > +		r = !!this_cpu_has(X86_FEATURE_MWAIT);
> 
> this_cpu_has already returns bool, so !! is not needed.
> 
> I can fix both while applying.

OK, pls let me know if you need any more.

> > +		break;
> >  	case KVM_CAP_X86_SMM:
> >  		/* SMBASE is usually relocated above 1M on modern chipsets,
> >  		 * and SMM handlers might indeed rely on 4G segment limits,

Patch
diff mbox

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 0694509..c7beb07 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4135,11 +4135,11 @@  available, means that that the kernel can support guests using the
 radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
 processor).
 
-8.4 KVM_CAP_PPC_HASH_MMU_V3
+8.5 KVM_CAP_X86_GUEST_MWAIT
 
-Architectures: ppc
+Architectures: x86
 
-This capability, if KVM_CHECK_EXTENSION indicates that it is
-available, means that that the kernel can support guests using the
-hashed page table MMU defined in Power ISA V3.00 (as implemented in
-the POWER9 processor), including in-memory segment tables.
+This capability indicates that guest using memory monotoring instructions
+(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such time
+spent while virtual CPU is halted in this way will then be accounted for as
+guest running time on the host (as opposed to e.g. HLT).
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..04c201c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,12 @@  KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
                                    ||       || before enabling paravirtualized
                                    ||       || spinlock support.
 ------------------------------------------------------------------------------
+KVM_FEATURE_MWAIT                  ||     8 || guest can use monitor/mwait
+                                   ||       || to halt the VCPU without exits,
+                                   ||       || time spent while halted in this
+                                   ||       || way is accounted for on host as
+                                   ||       || VCPU run time.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index cff0bb6..9cc77a7 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@ 
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
 #define KVM_FEATURE_PV_UNHALT		7
+#define KVM_FEATURE_MWAIT		8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index efde6cc..3c7fca83 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -594,6 +594,9 @@  static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
 
+		if (this_cpu_has(X86_FEATURE_MWAIT))
+			entry->eax |= (1 << KVM_FEATURE_MWAIT);
+
 		entry->ebx = 0;
 		entry->ecx = 0;
 		entry->edx = 0;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1efe2c..18e53bc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1198,8 +1198,6 @@  static void init_vmcb(struct vcpu_svm *svm)
 	set_intercept(svm, INTERCEPT_CLGI);
 	set_intercept(svm, INTERCEPT_SKINIT);
 	set_intercept(svm, INTERCEPT_WBINVD);
-	set_intercept(svm, INTERCEPT_MONITOR);
-	set_intercept(svm, INTERCEPT_MWAIT);
 	set_intercept(svm, INTERCEPT_XSETBV);
 
 	control->iopm_base_pa = iopm_base;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4bfe349..b167aba 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3547,13 +3547,9 @@  static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 	      CPU_BASED_USE_IO_BITMAPS |
 	      CPU_BASED_MOV_DR_EXITING |
 	      CPU_BASED_USE_TSC_OFFSETING |
-	      CPU_BASED_MWAIT_EXITING |
-	      CPU_BASED_MONITOR_EXITING |
 	      CPU_BASED_INVLPG_EXITING |
 	      CPU_BASED_RDPMC_EXITING;
 
-	printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
-
 	opt = CPU_BASED_TPR_SHADOW |
 	      CPU_BASED_USE_MSR_BITMAPS |
 	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1faf620..a507635 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2684,6 +2684,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ADJUST_CLOCK:
 		r = KVM_CLOCK_TSC_STABLE;
 		break;
+	case KVM_CAP_X86_GUEST_MWAIT:
+		r = !!this_cpu_has(X86_FEATURE_MWAIT);
+		break;
 	case KVM_CAP_X86_SMM:
 		/* SMBASE is usually relocated above 1M on modern chipsets,
 		 * and SMM handlers might indeed rely on 4G segment limits,
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f51d508..8b6bc06 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -883,6 +883,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_MMU_RADIX 134
 #define KVM_CAP_PPC_MMU_HASH_V3 135
 #define KVM_CAP_IMMEDIATE_EXIT 136
+#define KVM_CAP_X86_GUEST_MWAIT 137
 
 #ifdef KVM_CAP_IRQ_ROUTING