From patchwork Wed Mar 15 19:28:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 9626537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CA34E60244 for ; Wed, 15 Mar 2017 19:41:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BCA532793A for ; Wed, 15 Mar 2017 19:41:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B080D28533; Wed, 15 Mar 2017 19:41:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2A1ED2793A for ; Wed, 15 Mar 2017 19:41:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753117AbdCOTlY (ORCPT ); Wed, 15 Mar 2017 15:41:24 -0400 Received: from mail.kernel.org ([198.145.29.136]:46554 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752904AbdCOTlX (ORCPT ); Wed, 15 Mar 2017 15:41:23 -0400 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C89D0203DF; Wed, 15 Mar 2017 19:28:46 +0000 (UTC) Received: from redhat.com (pool-96-237-235-121.bstnma.fios.verizon.net [96.237.235.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 82D97203DC; Wed, 15 Mar 2017 19:28:44 +0000 (UTC) Date: Wed, 15 Mar 2017 21:28:43 +0200 From: "Michael S. Tsirkin" To: linux-kernel@vger.kernel.org Cc: "Gabriel L. Somlo" , Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Joerg Roedel , kvm@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH v4] kvm: better MWAIT emulation for guests Message-ID: <1489605443-21045-1-git-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Disposition: inline X-Mailer: git-send-email 2.8.0.287.g0deeb61 X-Mutt-Fcc: =sent X-Virus-Scanned: ClamAV using ClamSMTP Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: unless explicitly provided with kernel command line argument "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, without checking CPUID. We currently emulate that as a NOP but on VMX we can do better: let guest stop the CPU until timer, IPI or memory change. CPU will be busy but that isn't any worse than a NOP emulation. Note that mwait within guests is not the same as on real hardware because halt causes an exit while mwait doesn't. For this reason it might not be a good idea to use the regular MWAIT flag in CPUID to signal this capability. Add a flag in the hypervisor leaf instead. Additionally, we add a capability for QEMU - e.g. if it knows there's an isolated CPU dedicated for the VCPU it can set the standard MWAIT flag to improve guest behaviour. Reported-by: "Gabriel L. Somlo" Signed-off-by: Michael S. Tsirkin --- Note: SVM bits are untested at this point. Seems pretty obvious though. changes from v3: - don't enable capability if cli+mwait blocks interrupts - doc typo fixes (drop drop ppc doc) changes from v2: - add a capability to allow host userspace to detect new kernels - more documentation to clarify the semantics of the feature flag and why it's useful - svm support as suggested by Radim changes from v1: - typo fix resulting in rest of leaf flags being overwritten Reported by: Wanpeng Li - updated commit log with data about guests helped by this feature - better document differences between mwait and halt for guests Documentation/virtual/kvm/api.txt | 9 +++++++++ Documentation/virtual/kvm/cpuid.txt | 6 ++++++ arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/cpuid.c | 3 +++ arch/x86/kvm/svm.c | 2 -- arch/x86/kvm/vmx.c | 6 ++++-- arch/x86/kvm/x86.c | 3 +++ arch/x86/kvm/x86.h | 25 +++++++++++++++++++++++++ include/uapi/linux/kvm.h | 1 + 9 files changed, 52 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 3c248f7..6ee2e43 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4147,3 +4147,12 @@ This capability, if KVM_CHECK_EXTENSION indicates that it is available, means that that the kernel can support guests using the hashed page table MMU defined in Power ISA V3.00 (as implemented in the POWER9 processor), including in-memory segment tables. + +8.5 KVM_CAP_X86_GUEST_MWAIT + +Architectures: x86 + +This capability indicates that guest using memory monotoring instructions +(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time +spent while virtual CPU is halted in this way will then be accounted for as +guest running time on the host (as opposed to e.g. HLT). diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..04c201c 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. ------------------------------------------------------------------------------ +KVM_FEATURE_MWAIT || 8 || guest can use monitor/mwait + || || to halt the VCPU without exits, + || || time spent while halted in this + || || way is accounted for on host as + || || VCPU run time. +------------------------------------------------------------------------------ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index cff0bb6..9cc77a7 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_MWAIT 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index efde6cc..5638102 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); + if (kvm_mwait_in_guest()) + entry->eax |= (1 << KVM_FEATURE_MWAIT); + entry->ebx = 0; entry->ecx = 0; entry->edx = 0; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d1efe2c..18e53bc 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm) set_intercept(svm, INTERCEPT_CLGI); set_intercept(svm, INTERCEPT_SKINIT); set_intercept(svm, INTERCEPT_WBINVD); - set_intercept(svm, INTERCEPT_MONITOR); - set_intercept(svm, INTERCEPT_MWAIT); set_intercept(svm, INTERCEPT_XSETBV); control->iopm_base_pa = iopm_base; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 98e82ee..ea0c96a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3547,11 +3547,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MOV_DR_EXITING | CPU_BASED_USE_TSC_OFFSETING | - CPU_BASED_MWAIT_EXITING | - CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING | CPU_BASED_RDPMC_EXITING; + if (!kvm_mwait_in_guest()) + min |= CPU_BASED_MWAIT_EXITING | + CPU_BASED_MONITOR_EXITING; + opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1faf620..8c74fff 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2684,6 +2684,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ADJUST_CLOCK: r = KVM_CLOCK_TSC_STABLE; break; + case KVM_CAP_X86_GUEST_MWAIT: + r = kvm_mwait_in_guest(); + break; case KVM_CAP_X86_SMM: /* SMBASE is usually relocated above 1M on modern chipsets, * and SMM handlers might indeed rely on 4G segment limits, diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index e8ff3e4..e2f6974 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -1,6 +1,7 @@ #ifndef ARCH_X86_KVM_X86_H #define ARCH_X86_KVM_X86_H +#include #include #include #include "kvm_cache_regs.h" @@ -212,4 +213,28 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) __rem; \ }) +static bool kvm_mwait_in_guest(void) +{ + unsigned int eax, ebx, ecx; + + if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT)) + return -ENODEV; + + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return -ENODEV; + + /* + * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as + * they would allow guest to stop the CPU completely by disabling + * interrupts then invoking MWAIT. + */ + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) + return -ENODEV; + + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates); + + if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK)) + return -ENODEV; +} + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f51d508..8b6bc06 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PPC_MMU_RADIX 134 #define KVM_CAP_PPC_MMU_HASH_V3 135 #define KVM_CAP_IMMEDIATE_EXIT 136 +#define KVM_CAP_X86_GUEST_MWAIT 137 #ifdef KVM_CAP_IRQ_ROUTING