From patchwork Mon Apr 11 09:04:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808764 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75A98C433EF for ; Mon, 11 Apr 2022 09:36:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344519AbiDKJi6 (ORCPT ); Mon, 11 Apr 2022 05:38:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344530AbiDKJiu (ORCPT ); Mon, 11 Apr 2022 05:38:50 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4C7B403DD; Mon, 11 Apr 2022 02:36:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669794; x=1681205794; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=+PEUzauE89816wJ5786DV+v1jrWHJ+deIxMV0Fh2UqM=; b=RhCJ2qF0/DvaSwbiWSVhNMX38QFbc27udaMZNHW7ktwPI3PS6oAuHmVZ rJpgFrw6gxhHt0VN8Y5nwnspAygDYbnj7ZW2sUEckoBBv9gfj5xUJugf3 tbE6Ri1Y/obejdInyCSEXia8KoJjfktNc3GxrASnVN33c+rCv3oOAg80d XM/Iv8poekbeoGLzoieTGvMyDpDWHXdHQz6fNL50cMYQ7PnYODLTQ2HQ4 X70ot+qX+Ir5abXsbBYVhVHxeamy5V4MEzE6OWRrdt6sNjCcgREIlqZWo /OAAuw1QhcD2QkKw/fF4zP1BqBLFbuZUNadaw18pk0u2tv10CE5TxzaqL w==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="262255635" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="262255635" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:34 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050419" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:28 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v8 1/9] x86/cpu: Add new VMX feature, Tertiary VM-Execution control Date: Mon, 11 Apr 2022 17:04:39 +0800 Message-Id: <20220411090447.5928-2-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Robert Hoo A new 64-bit control field "tertiary processor-based VM-execution controls", is defined [1]. It's controlled by bit 17 of the primary processor-based VM-execution controls. Different from its brother VM-execution fields, this tertiary VM- execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs, TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH. Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3 (0x492), is also semantically different from its brothers, whose 64 bits consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2]. Therefore, its init_vmx_capabilities() is a little different from others. [1] ISE 6.2 "VMCS Changes" https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html [2] SDM Vol3. Appendix A.3 Reviewed-by: Sean Christopherson Reviewed-by: Maxim Levitsky Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/vmxfeatures.h | 3 ++- arch/x86/kernel/cpu/feat_ctl.c | 9 ++++++++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 0eb90d21049e..219a97098cf8 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -961,6 +961,7 @@ #define MSR_IA32_VMX_TRUE_EXIT_CTLS 0x0000048f #define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x00000490 #define MSR_IA32_VMX_VMFUNC 0x00000491 +#define MSR_IA32_VMX_PROCBASED_CTLS3 0x00000492 /* VMX_BASIC bits and bitmasks */ #define VMX_BASIC_VMCS_SIZE_SHIFT 32 diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h index d9a74681a77d..ff20776dc83b 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -5,7 +5,7 @@ /* * Defines VMX CPU feature bits */ -#define NVMXINTS 3 /* N 32-bit words worth of info */ +#define NVMXINTS 5 /* N 32-bit words worth of info */ /* * Note: If the comment begins with a quoted string, that string is used @@ -43,6 +43,7 @@ #define VMX_FEATURE_RDTSC_EXITING ( 1*32+ 12) /* "" VM-Exit on RDTSC */ #define VMX_FEATURE_CR3_LOAD_EXITING ( 1*32+ 15) /* "" VM-Exit on writes to CR3 */ #define VMX_FEATURE_CR3_STORE_EXITING ( 1*32+ 16) /* "" VM-Exit on reads from CR3 */ +#define VMX_FEATURE_TERTIARY_CONTROLS ( 1*32+ 17) /* "" Enable Tertiary VM-Execution Controls */ #define VMX_FEATURE_CR8_LOAD_EXITING ( 1*32+ 19) /* "" VM-Exit on writes to CR8 */ #define VMX_FEATURE_CR8_STORE_EXITING ( 1*32+ 20) /* "" VM-Exit on reads from CR8 */ #define VMX_FEATURE_VIRTUAL_TPR ( 1*32+ 21) /* "vtpr" TPR virtualization, a.k.a. TPR shadow */ diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index da696eb4821a..993697e71854 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -15,6 +15,8 @@ enum vmx_feature_leafs { MISC_FEATURES = 0, PRIMARY_CTLS, SECONDARY_CTLS, + TERTIARY_CTLS_LOW, + TERTIARY_CTLS_HIGH, NR_VMX_FEATURE_WORDS, }; @@ -22,7 +24,7 @@ enum vmx_feature_leafs { static void init_vmx_capabilities(struct cpuinfo_x86 *c) { - u32 supported, funcs, ept, vpid, ign; + u32 supported, funcs, ept, vpid, ign, low, high; BUILD_BUG_ON(NVMXINTS != NR_VMX_FEATURE_WORDS); @@ -42,6 +44,11 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c) rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported); c->vmx_capability[SECONDARY_CTLS] = supported; + /* All 64 bits of tertiary controls MSR are allowed-1 settings. */ + rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &low, &high); + c->vmx_capability[TERTIARY_CTLS_LOW] = low; + c->vmx_capability[TERTIARY_CTLS_HIGH] = high; + rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported); rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs); From patchwork Mon Apr 11 09:04:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808765 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33EA9C433FE for ; Mon, 11 Apr 2022 09:36:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344549AbiDKJi7 (ORCPT ); Mon, 11 Apr 2022 05:38:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344538AbiDKJi4 (ORCPT ); Mon, 11 Apr 2022 05:38:56 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88416403D6; Mon, 11 Apr 2022 02:36:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669802; x=1681205802; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=wTaXARG76bTc7Xk8fzK/DUZQvwtaVFIYOuxxY4pKMbc=; b=Y33I5cKz1GPdi7jbdSjnN/rxyvI0tFx2bvgeYMSut+cgutGxXip6Oun7 G1grfbpU5UsrNu2Rl6gwVzJSvQDu82oqq1QoMXpSBCe9xrDtQ1VCuSJwh 8NSTs7YREu6pvuogzUdWxsVcetOzHmV3DUtJqvRRSteHx9vFD7ItEgJLl vtFPsn+EdiSPrXEtcrIRn8drt9SNKDimNMeB01Fs236HM+5SSacke+EUx bCJqme5YErrYpZUdIoQmM1dmXj+dY5jvBoNyzMpYztKGJ/FJeALfF/O6L OMkmeirQbHMIZgWZwLkdPjmWYt1nWY7F0iPiIRmIy4s6VQuqoeNbNu6ej w==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="243960561" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="243960561" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:40 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050444" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:34 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v8 2/9] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Date: Mon, 11 Apr 2022 17:04:40 +0800 Message-Id: <20220411090447.5928-3-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Robert Hoo The Tertiary VM-Exec Control, different from previous control fields, is 64 bit. So extend BUILD_CONTROLS_SHADOW() by adding a 'bit' parameter, to support both 32 bit and 64 bit fields' auxiliary functions building. Suggested-by: Sean Christopherson Reviewed-by: Maxim Levitsky Reviewed-by: Sean Christopherson Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang --- arch/x86/kvm/vmx/vmx.h | 56 +++++++++++++++++++++--------------------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 9c6bfcd84008..122fdbf85a02 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -455,35 +455,35 @@ static inline u8 vmx_get_rvi(void) return vmcs_read16(GUEST_INTR_STATUS) & 0xff; } -#define BUILD_CONTROLS_SHADOW(lname, uname) \ -static inline void lname##_controls_set(struct vcpu_vmx *vmx, u32 val) \ -{ \ - if (vmx->loaded_vmcs->controls_shadow.lname != val) { \ - vmcs_write32(uname, val); \ - vmx->loaded_vmcs->controls_shadow.lname = val; \ - } \ -} \ -static inline u32 __##lname##_controls_get(struct loaded_vmcs *vmcs) \ -{ \ - return vmcs->controls_shadow.lname; \ -} \ -static inline u32 lname##_controls_get(struct vcpu_vmx *vmx) \ -{ \ - return __##lname##_controls_get(vmx->loaded_vmcs); \ -} \ -static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u32 val) \ -{ \ - lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \ -} \ -static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u32 val) \ -{ \ - lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ +#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \ +static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + if (vmx->loaded_vmcs->controls_shadow.lname != val) { \ + vmcs_write##bits(uname, val); \ + vmx->loaded_vmcs->controls_shadow.lname = val; \ + } \ +} \ +static inline u##bits __##lname##_controls_get(struct loaded_vmcs *vmcs) \ +{ \ + return vmcs->controls_shadow.lname; \ +} \ +static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx) \ +{ \ + return __##lname##_controls_get(vmx->loaded_vmcs); \ +} \ +static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \ +} \ +static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ } -BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS) -BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS) -BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL) -BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL) -BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL) +BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32) +BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) +BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32) /* * VMX_REGS_LAZY_LOAD_SET - The set of registers that will be updated in the From patchwork Mon Apr 11 09:04:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808766 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15BF3C433EF for ; Mon, 11 Apr 2022 09:37:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344558AbiDKJjY (ORCPT ); Mon, 11 Apr 2022 05:39:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243605AbiDKJjA (ORCPT ); Mon, 11 Apr 2022 05:39:00 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5349403DC; Mon, 11 Apr 2022 02:36:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669806; x=1681205806; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=yWZiD92B1+htNs9qMHPpwottBlxEfplMjGMUJXiwhMQ=; b=Fim3AQnH8Swiefvr7hbk6TjC+pFygHnF6rxlqFUZKNW48UgylOtkxk/U TWCDbaRlwOtLddEdLzdT2K4mjGchImYVPqNCzeZ98wDWJ2Vd0pkNSv44I ZKX9ePDOBpN7jrbmMeRH2+bOlKoN5Vw7+4AOBDAjsNej2ZsvQsighjgiH gV2Qi5K9ZNR5rM3mrEG/LgAZ0HHqvrLVExgaXt9wsHQ/y0yJnH9s7ZlyD ji+zCVgUHwPW1S5HYdKojCJ4Mq0CrzN0lHQMdbxQDRGKtI95VZUHnYj/V LGl29DBh6tU6o44PXtmWyTYS9tZCfF/LOvcgRR2cK4NozLQb2gtL/pUsO g==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="249358003" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="249358003" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:46 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050476" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:40 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v8 3/9] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Date: Mon, 11 Apr 2022 17:04:41 +0800 Message-Id: <20220411090447.5928-4-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Robert Hoo Check VMX features on tertiary execution control in VMCS config setup. Sub-features in tertiary execution control to be enabled are adjusted according to hardware capabilities although no sub-feature is enabled in this patch. EVMCSv1 doesn't support tertiary VM-execution control, so disable it when EVMCSv1 is in use. And define the auxiliary functions for Tertiary control field here, using the new BUILD_CONTROLS_SHADOW(). Reviewed-by: Maxim Levitsky Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang --- arch/x86/include/asm/vmx.h | 3 +++ arch/x86/kvm/vmx/capabilities.h | 7 +++++++ arch/x86/kvm/vmx/evmcs.c | 2 ++ arch/x86/kvm/vmx/evmcs.h | 1 + arch/x86/kvm/vmx/vmcs.h | 1 + arch/x86/kvm/vmx/vmx.c | 29 ++++++++++++++++++++++++++++- arch/x86/kvm/vmx/vmx.h | 1 + 7 files changed, 43 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0ffaa3156a4e..8c929596a299 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -31,6 +31,7 @@ #define CPU_BASED_RDTSC_EXITING VMCS_CONTROL_BIT(RDTSC_EXITING) #define CPU_BASED_CR3_LOAD_EXITING VMCS_CONTROL_BIT(CR3_LOAD_EXITING) #define CPU_BASED_CR3_STORE_EXITING VMCS_CONTROL_BIT(CR3_STORE_EXITING) +#define CPU_BASED_ACTIVATE_TERTIARY_CONTROLS VMCS_CONTROL_BIT(TERTIARY_CONTROLS) #define CPU_BASED_CR8_LOAD_EXITING VMCS_CONTROL_BIT(CR8_LOAD_EXITING) #define CPU_BASED_CR8_STORE_EXITING VMCS_CONTROL_BIT(CR8_STORE_EXITING) #define CPU_BASED_TPR_SHADOW VMCS_CONTROL_BIT(VIRTUAL_TPR) @@ -221,6 +222,8 @@ enum vmcs_field { ENCLS_EXITING_BITMAP_HIGH = 0x0000202F, TSC_MULTIPLIER = 0x00002032, TSC_MULTIPLIER_HIGH = 0x00002033, + TERTIARY_VM_EXEC_CONTROL = 0x00002034, + TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035, GUEST_PHYSICAL_ADDRESS = 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401, VMCS_LINK_POINTER = 0x00002800, diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 3f430e218375..31f3d88b3e4d 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -59,6 +59,7 @@ struct vmcs_config { u32 pin_based_exec_ctrl; u32 cpu_based_exec_ctrl; u32 cpu_based_2nd_exec_ctrl; + u64 cpu_based_3rd_exec_ctrl; u32 vmexit_ctrl; u32 vmentry_ctrl; struct nested_vmx_msrs nested; @@ -131,6 +132,12 @@ static inline bool cpu_has_secondary_exec_ctrls(void) CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; } +static inline bool cpu_has_tertiary_exec_ctrls(void) +{ + return vmcs_config.cpu_based_exec_ctrl & + CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; +} + static inline bool cpu_has_vmx_virtualize_apic_accesses(void) { return vmcs_config.cpu_based_2nd_exec_ctrl & diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c index 87e3dc10edf4..6a61b1ae7942 100644 --- a/arch/x86/kvm/vmx/evmcs.c +++ b/arch/x86/kvm/vmx/evmcs.c @@ -297,8 +297,10 @@ const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1); #if IS_ENABLED(CONFIG_HYPERV) __init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) { + vmcs_conf->cpu_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_EXEC_CTRL; vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL; vmcs_conf->cpu_based_2nd_exec_ctrl &= ~EVMCS1_UNSUPPORTED_2NDEXEC; + vmcs_conf->cpu_based_3rd_exec_ctrl = 0; vmcs_conf->vmexit_ctrl &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL; vmcs_conf->vmentry_ctrl &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL; diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h index 8d70f9aea94b..f886a8ff0342 100644 --- a/arch/x86/kvm/vmx/evmcs.h +++ b/arch/x86/kvm/vmx/evmcs.h @@ -50,6 +50,7 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs); */ #define EVMCS1_UNSUPPORTED_PINCTRL (PIN_BASED_POSTED_INTR | \ PIN_BASED_VMX_PREEMPTION_TIMER) +#define EVMCS1_UNSUPPORTED_EXEC_CTRL (CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) #define EVMCS1_UNSUPPORTED_2NDEXEC \ (SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | \ SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | \ diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index e325c290a816..e18dc68eeeeb 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -50,6 +50,7 @@ struct vmcs_controls_shadow { u32 pin; u32 exec; u32 secondary_exec; + u64 tertiary_exec; }; /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 04d170c4b61e..961e61044341 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2410,6 +2410,15 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, return 0; } +static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) +{ + u64 allowed; + + rdmsrl(msr, allowed); + + return ctl_opt & allowed; +} + static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, struct vmx_capability *vmx_cap) { @@ -2418,6 +2427,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, u32 _pin_based_exec_control = 0; u32 _cpu_based_exec_control = 0; u32 _cpu_based_2nd_exec_control = 0; + u64 _cpu_based_3rd_exec_control = 0; u32 _vmexit_control = 0; u32 _vmentry_control = 0; @@ -2439,7 +2449,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | - CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS | + CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS, &_cpu_based_exec_control) < 0) return -EIO; @@ -2513,6 +2524,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, "1-setting enable VPID VM-execution control\n"); } + if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) { + u64 opt3 = 0; + + _cpu_based_3rd_exec_control = adjust_vmx_controls64(opt3, + MSR_IA32_VMX_PROCBASED_CTLS3); + } + min = VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT; #ifdef CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; @@ -2599,6 +2617,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control; vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control; vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control; + vmcs_conf->cpu_based_3rd_exec_ctrl = _cpu_based_3rd_exec_control; vmcs_conf->vmexit_ctrl = _vmexit_control; vmcs_conf->vmentry_ctrl = _vmentry_control; @@ -4215,6 +4234,11 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) return exec_control; } +static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx) +{ + return vmcs_config.cpu_based_3rd_exec_ctrl; +} + /* * Adjust a single secondary execution control bit to intercept/allow an * instruction in the guest. This is usually done based on whether or not a @@ -4380,6 +4404,9 @@ static void init_vmcs(struct vcpu_vmx *vmx) if (cpu_has_secondary_exec_ctrls()) secondary_exec_controls_set(vmx, vmx_secondary_exec_control(vmx)); + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx)); + if (kvm_vcpu_apicv_active(&vmx->vcpu)) { vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 122fdbf85a02..85c067f2d7f2 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -484,6 +484,7 @@ BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32) BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32) BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64) /* * VMX_REGS_LAZY_LOAD_SET - The set of registers that will be updated in the From patchwork Mon Apr 11 09:04:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808767 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90299C433F5 for ; Mon, 11 Apr 2022 09:37:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243605AbiDKJj0 (ORCPT ); Mon, 11 Apr 2022 05:39:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344580AbiDKJjW (ORCPT ); Mon, 11 Apr 2022 05:39:22 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8D91403F4; Mon, 11 Apr 2022 02:36:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669812; x=1681205812; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1FfTghodE/lp1El9D6x6y9QyfhA09HoGqoypZzcTnLQ=; b=iCb4GzRist2FdOYlDLFyChvR074jP3KG0FBaNZoshgaqtyolEzFJgjSG mkNxBQkzqecGgpacY/lMbPJZJT2Zd4UWS1BYVFaaCHxRTXiwoPNFocFmX 6Cv5fZ1fJlFsg03cnZqvnhhFdxWIF9V/qeHHnke/WmcL6XYn7/tM5O+5i 9egbUI86RxyzR48UkvdbtaCkOvFmL16eYCzmnSQt8g79v0cS22qaX4Ibv RpR5xQM864Ek+qYz2e7Yl4by+q0rsPRWTOpuAv1sPwxrBU9+T4L3dThu0 JZ1XSsmQGebr9fAkT7vC/E4GCB6U99CjyT2lZ1tq45w4tfgjNJJgORXBM Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="260923467" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="260923467" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:52 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050495" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:46 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v8 4/9] KVM: VMX: Report tertiary_exec_control field in dump_vmcs() Date: Mon, 11 Apr 2022 17:04:42 +0800 Message-Id: <20220411090447.5928-5-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Robert Hoo Add tertiary_exec_control field report in dump_vmcs(). Meanwhile, reorganize the dump output of VMCS category as follows. Before change: *** Control State *** PinBased=0x000000ff CPUBased=0xb5a26dfa SecondaryExec=0x061037eb EntryControls=0000d1ff ExitControls=002befff After change: *** Control State *** CPUBased=0xb5a26dfa SecondaryExec=0x061037eb TertiaryExec=0x0000000000000010 PinBased=0x000000ff EntryControls=0000d1ff ExitControls=002befff Reviewed-by: Maxim Levitsky Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang --- arch/x86/kvm/vmx/vmx.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 961e61044341..f439abd52bad 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5867,6 +5867,7 @@ void dump_vmcs(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u32 vmentry_ctl, vmexit_ctl; u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control; + u64 tertiary_exec_control; unsigned long cr4; int efer_slot; @@ -5880,9 +5881,16 @@ void dump_vmcs(struct kvm_vcpu *vcpu) cpu_based_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL); cr4 = vmcs_readl(GUEST_CR4); - secondary_exec_control = 0; + if (cpu_has_secondary_exec_ctrls()) secondary_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); + else + secondary_exec_control = 0; + + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_control = vmcs_read64(TERTIARY_VM_EXEC_CONTROL); + else + tertiary_exec_control = 0; pr_err("VMCS %p, last attempted VM-entry on CPU %d\n", vmx->loaded_vmcs->vmcs, vcpu->arch.last_vmentry_cpu); @@ -5982,9 +5990,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu) vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); pr_err("*** Control State ***\n"); - pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n", - pin_based_exec_ctrl, cpu_based_exec_ctrl, secondary_exec_control); - pr_err("EntryControls=%08x ExitControls=%08x\n", vmentry_ctl, vmexit_ctl); + pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n", + cpu_based_exec_ctrl, secondary_exec_control, tertiary_exec_control); + pr_err("PinBased=0x%08x EntryControls=%08x ExitControls=%08x\n", + pin_based_exec_ctrl, vmentry_ctl, vmexit_ctl); pr_err("ExceptionBitmap=%08x PFECmask=%08x PFECmatch=%08x\n", vmcs_read32(EXCEPTION_BITMAP), vmcs_read32(PAGE_FAULT_ERROR_CODE_MASK), From patchwork Mon Apr 11 09:04:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808768 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62155C433FE for ; Mon, 11 Apr 2022 09:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344575AbiDKJj1 (ORCPT ); Mon, 11 Apr 2022 05:39:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344529AbiDKJjW (ORCPT ); Mon, 11 Apr 2022 05:39:22 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 585B9403FC; Mon, 11 Apr 2022 02:36:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669818; x=1681205818; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=RSripfRDPX6q0oBFdAW/Me0piSvmrss3ymZMZey5RrI=; b=f4xp2pqaPyrSRFvqXgQHk6aICo+JkFu5BB6H6TaO9tScgtag7YwfW7u2 OU/CUe/nEA/CgKuSflIPxO2utD13OEf5dBYMZxX+0/J0jsPL70OZJCt/L ZylIFpx9fgEgPE+a1mQi6RlvHcqOEA5QDoc6y++4S9Hd0ntTfUDyYVV36 9tl6/M2xoBL/Hrpnk6jwqFGgVJvYi5tXl4IToNbOKd+sqLDe/jTyHMpxI Y2VYlwebAWJp1fF4f04RvQziOVhOafEucfJQKqajBzWVzce7QYYgAoSR6 jnNKCgXfwOLg0AqYV/uJ6iTkX2hPk6tj/biDb3Dw+AFEoeh7XJ7geQUTb w==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="259671443" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="259671443" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:58 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050510" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:52 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v8 5/9] KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode Date: Mon, 11 Apr 2022 17:04:43 +0800 Message-Id: <20220411090447.5928-6-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR, i.e. will trap and generate an APIC-write VM-Exit instead of intercepting the WRMSR. Add support for handling "nodecode" x2APIC writes, which were previously impossible. Note, x2APIC MSR writes are 64 bits wide. Signed-off-by: Zeng Guang --- arch/x86/kvm/lapic.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 66b0eb0bda94..137c3a2f5180 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -67,6 +67,7 @@ static bool lapic_timer_advance_dynamic __read_mostly; #define LAPIC_TIMER_ADVANCE_NS_MAX 5000 /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 +static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data); static inline void __kvm_lapic_set_reg(char *regs, int reg_off, u32 val) { @@ -2230,10 +2231,27 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi); /* emulate APIC access in a trap manner */ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset) { - u32 val = kvm_lapic_get_reg(vcpu->arch.apic, offset); + struct kvm_lapic *apic = vcpu->arch.apic; + u64 val; + + if (apic_x2apic_mode(apic)) { + /* + * When guest APIC is in x2APIC mode and IPI virtualization + * is enabled, accessing APIC_ICR may cause trap-like VM-exit + * on Intel hardware. Other offsets are not possible. + */ + if (WARN_ON_ONCE(offset != APIC_ICR)) + return; - /* TODO: optimize to just emulate side effect w/o one more write */ - kvm_lapic_reg_write(vcpu->arch.apic, offset, val); + kvm_lapic_msr_read(apic, offset, &val); + kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32)); + trace_kvm_apic_write(APIC_ICR, val); + } else { + val = kvm_lapic_get_reg(apic, offset); + + /* TODO: optimize to just emulate side effect w/o one more write */ + kvm_lapic_reg_write(apic, offset, (u32)val); + } } EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode); From patchwork Mon Apr 11 09:04:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808769 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60295C433EF for ; Mon, 11 Apr 2022 09:37:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344593AbiDKJjb (ORCPT ); Mon, 11 Apr 2022 05:39:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344552AbiDKJjW (ORCPT ); Mon, 11 Apr 2022 05:39:22 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BA634090F; Mon, 11 Apr 2022 02:37:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669824; x=1681205824; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=j7yxrIW5JHTanE/rqGKH77B25ab5eWBeaxexb2VIdDk=; b=RpLbOXieJnlXFBKLCHkIGiU0K/MXMW/cB6tUYLDF4zquBKyMAEHUO7rU TU+5TVJoNaZdAFucFU4VVrndcxzPXMJ4zzKb98QFYIjvYKHt/dTQqJPjY oZdXtJVa6OOrvRaFBSQdhr0yzNWyaqFbzeEqMZ5PY4uYb5TQI2+fy6RHe eqow5XXErVT1/SLQV89ZwJWtSp1qG/KoPSf5FomqG/4qP5FalvwIBCTr5 pGyxG4vlCSfmdhGcIHSYmKNyzXmeZZXnYND7BrrKEysKDaOKfG9W7Y9Gf T18K3nMNfydpqWWCqsLbKbwsZeHBL1ikWrv93dYJFJyzmbhhLTfd0QgoO g==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="260923505" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="260923505" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:37:04 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050545" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:36:57 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Maxim Levitsky Subject: [PATCH v8 6/9] KVM: x86: lapic: don't allow to change APIC ID unconditionally Date: Mon, 11 Apr 2022 17:04:44 +0800 Message-Id: <20220411090447.5928-7-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Maxim Levitsky No normal guest has any reason to change physical APIC IDs, and allowing this introduces bugs into APIC acceleration code. And Intel recent hardware just ignores writes to APIC_ID in xAPIC mode. More background can be found at: https://lore.kernel.org/lkml/Yfw5ddGNOnDqxMLs@google.com/ Looks there is no much value to support writable xAPIC ID in guest except supporting some old and crazy use cases which probably would fail on real hardware. So, make xAPIC ID read-only for KVM guests. Signed-off-by: Maxim Levitsky Signed-off-by: Zeng Guang --- arch/x86/kvm/lapic.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 137c3a2f5180..62d5ce4dc0c5 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2047,10 +2047,17 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) switch (reg) { case APIC_ID: /* Local APIC ID */ - if (!apic_x2apic_mode(apic)) - kvm_apic_set_xapic_id(apic, val >> 24); - else + if (apic_x2apic_mode(apic)) { ret = 1; + break; + } + /* Don't allow changing APIC ID to avoid unexpected issues */ + if ((val >> 24) != apic->vcpu->vcpu_id) { + kvm_vm_bugged(apic->vcpu->kvm); + break; + } + + kvm_apic_set_xapic_id(apic, val >> 24); break; case APIC_TASKPRI: @@ -2635,11 +2642,15 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s, bool set) { - if (apic_x2apic_mode(vcpu->arch.apic)) { - u32 *id = (u32 *)(s->regs + APIC_ID); - u32 *ldr = (u32 *)(s->regs + APIC_LDR); - u64 icr; + u32 *id = (u32 *)(s->regs + APIC_ID); + u32 *ldr = (u32 *)(s->regs + APIC_LDR); + u64 icr; + if (!apic_x2apic_mode(vcpu->arch.apic)) { + /* Don't allow changing APIC ID to avoid unexpected issues */ + if ((*id >> 24) != vcpu->vcpu_id) + return -EINVAL; + } else { if (vcpu->kvm->arch.x2apic_format) { if (*id != vcpu->vcpu_id) return -EINVAL; From patchwork Mon Apr 11 09:04:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808772 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E75F3C433F5 for ; Mon, 11 Apr 2022 09:37:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344604AbiDKJkI (ORCPT ); Mon, 11 Apr 2022 05:40:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344633AbiDKJjt (ORCPT ); Mon, 11 Apr 2022 05:39:49 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBCE740923; Mon, 11 Apr 2022 02:37:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669853; x=1681205853; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=wVjlORT4Ee/YTCdDKrpjhYEEMJhlDlPVyc5jVtqX37o=; b=IZ6jW7MC2e6S7MLlR1l0KJdC0Xt1PCVweohDP7hW9dBMBhdLRxOGgO2A KPBHdd+qgPIW8CJR+kIbul2s0zQ1kl5/sodNbATZ60PVgTEBqvp6+CiyA JG6v6sEf9Ba8YtdYpGLMWgkob5bIXP28QUYt/w/U89P3EolvULwCxJ3F7 Y2prGcHNRohyVTx24LpqomfQ/zpxrBHJuMVgs51ZgknS4PBCNsAyx6u+P ARNSjWCcuf2CtyWXoUaCC6cNTnVWy4j7gWvt3X4oY638GrAE8Qre2RNJW 80WaRV9v9wHKNVInnSGRlQsUF7LU5JpWuUZyQf+7G6cn0LO9sfUv3uJEc w==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="259671479" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="259671479" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:37:09 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050567" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:37:03 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v8 7/9] KVM: Move kvm_arch_vcpu_precreate() under kvm->lock Date: Mon, 11 Apr 2022 17:04:45 +0800 Message-Id: <20220411090447.5928-8-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Arch specific KVM common data may require pre-allocation or other preprocess ready before vCPU creation at runtime. It's safe to invoke kvm_arch_vcpu_precreate() within the protection of kvm->lock directly rather than take into account in the implementation for each architecture. Suggested-by: Sean Christopherson Signed-off-by: Zeng Guang --- arch/s390/kvm/kvm-s390.c | 2 -- virt/kvm/kvm_main.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 156d1c25a3c1..5c795bbcf1ea 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -3042,9 +3042,7 @@ static int sca_can_add_vcpu(struct kvm *kvm, unsigned int id) if (!sclp.has_esca || !sclp.has_64bscao) return false; - mutex_lock(&kvm->lock); rc = kvm->arch.use_esca ? 0 : sca_switch_to_extended(kvm); - mutex_unlock(&kvm->lock); return rc == 0 && id < KVM_S390_ESCA_CPU_SLOTS; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 70e05af5ebea..a452e678a015 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3732,9 +3732,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) } kvm->created_vcpus++; + r = kvm_arch_vcpu_precreate(kvm, id); mutex_unlock(&kvm->lock); - r = kvm_arch_vcpu_precreate(kvm, id); if (r) goto vcpu_decrement; From patchwork Mon Apr 11 09:04:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808770 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB7ECC433F5 for ; Mon, 11 Apr 2022 09:37:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344552AbiDKJjl (ORCPT ); Mon, 11 Apr 2022 05:39:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344576AbiDKJjg (ORCPT ); Mon, 11 Apr 2022 05:39:36 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83594403E7; Mon, 11 Apr 2022 02:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669836; x=1681205836; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Ac+Tj+oOHnMDyhI2NE6QYoSazPkEr2C+i7OWolZUv+E=; b=DBy/3LuSo6LwUrj+IYYXRksJa1LAi5N/ejBlIYgk+hlobRdPWxAFB4lF YM1WD4j/Yt1ZYli7SIAWEtXEhkhq8B1mi3F7iMJj4efUMPIn3eP0L53R1 OwcndjDjGxEwPl9Cq9/eFwqCHOGP6pidxteXf4sFSGKjKx9Bvrvlhmf9i Otjw2yfKNBJGaeKEIGXgSM6EmJuv9bSe1mR87CyNjgylsOPUXx8ncTUVQ Vb8EepWUluiz56CnhfCY6Y+SaDvdy9aSyBDLRZ64GNUA3x/ZHC8SNSFPD hl3esKEuRQDFCpKI8kbH/8K5gIIGvgUxaAoiTBFkS/2fpcegJdHS765ob g==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="243960657" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="243960657" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:37:15 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050604" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:37:09 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v8 8/9] KVM: x86: Allow userspace set maximum VCPU id for VM Date: Mon, 11 Apr 2022 17:04:46 +0800 Message-Id: <20220411090447.5928-9-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace can assign maximum possible vcpu id for current VM session using KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl(). This is done for x86 only because the sole use case is to guide memory allocation for PID-pointer table, a structure needed to enable VMX IPI. By default, max_vcpu_ids set as KVM_MAX_VCPU_IDS. Suggested-by: Sean Christopherson Reviewed-by: Maxim Levitsky Signed-off-by: Zeng Guang --- Documentation/virt/kvm/api.rst | 17 +++++++++++++++++ arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/x86.c | 18 +++++++++++++++++- 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index d13fa6600467..bb0b0f3edefe 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7136,6 +7136,23 @@ The valid bits in cap.args[0] are: IA32_MISC_ENABLE[bit 18] is cleared. =================================== ============================================ +7.32 KVM_CAP_MAX_VCPU_ID +------------------------ + +:Architectures: x86 +:Target: VM +:Parameters: args[0] - maximum APIC ID value set for current VM +:Returns: 0 on success, -EINVAL if args[0] is beyond KVM_MAX_VCPU_IDS + supported in KVM or if vCPU has been created. + +Userspace is able to calculate the limit to APIC ID values from designated CPU +topology. This capability allows userspace to specify maximum possible APIC ID +assigned for current VM session prior to the creation of vCPUs. KVM can manage +memory allocation of VM-scope structures which depends on the value of APIC ID. + +Calling KVM_CHECK_EXTENSION for this capability returns the value of maximum APIC +ID that KVM supports at runtime. It sets as KVM_MAX_VCPU_IDS in VM initialization. + 8. Other capabilities. ====================== diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d23e80a56eb8..cdd14033988d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1238,6 +1238,12 @@ struct kvm_arch { hpa_t hv_root_tdp; spinlock_t hv_root_tdp_lock; #endif + /* + * VM-scope maximum vCPU ID. Used to determine the size of structures + * that increase along with the maximum vCPU ID, in which case, using + * the global KVM_MAX_VCPU_IDS may lead to significant memory waste. + */ + u32 max_vcpu_ids; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0c0ca599a353..d1a39285deab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4320,7 +4320,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_MAX_VCPUS; break; case KVM_CAP_MAX_VCPU_ID: - r = KVM_MAX_VCPU_IDS; + r = kvm->arch.max_vcpu_ids; break; case KVM_CAP_PV_MMU: /* obsolete */ r = 0; @@ -6064,6 +6064,18 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break; + case KVM_CAP_MAX_VCPU_ID: + r = -EINVAL; + if (cap->args[0] > KVM_MAX_VCPU_IDS) + break; + + mutex_lock(&kvm->lock); + if (!kvm->created_vcpus) { + kvm->arch.max_vcpu_ids = cap->args[0]; + r = 0; + } + mutex_unlock(&kvm->lock); + break; default: r = -EINVAL; break; @@ -11180,6 +11192,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) struct page *page; int r; + if (vcpu->vcpu_id >= vcpu->kvm->arch.max_vcpu_ids) + return -EINVAL; + vcpu->arch.last_vmentry_cpu = -1; vcpu->arch.regs_avail = ~0; vcpu->arch.regs_dirty = ~0; @@ -11704,6 +11719,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) spin_lock_init(&kvm->arch.hv_root_tdp_lock); kvm->arch.hv_root_tdp = INVALID_PAGE; #endif + kvm->arch.max_vcpu_ids = KVM_MAX_VCPU_IDS; INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn); INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn); From patchwork Mon Apr 11 09:04:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zeng Guang X-Patchwork-Id: 12808771 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B34BC433FE for ; Mon, 11 Apr 2022 09:37:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344610AbiDKJjo (ORCPT ); Mon, 11 Apr 2022 05:39:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344598AbiDKJji (ORCPT ); Mon, 11 Apr 2022 05:39:38 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E3D0403F2; Mon, 11 Apr 2022 02:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649669841; x=1681205841; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=fmhK6zVWoVjfoCK6ujK3oqyqir76HFnfQ4FeZ+ddfDw=; b=W84nqVvCCt+sfEJ+w81LHyFyhclynNMpbq6HQEIOag206DOXqTp3N1tz yAfTYU3EZgetwFgKEG0jb768R5IfvmJxvJbD0KzOY/PaubaBoBCJ7Av4w 3J/PfPmXXP2wlj3B5vp9KGX6DowIxQRceLWjzTFA6ehFLa8yXLZZkUhQD +Fkp9Wwi07RiKhvBbV7m5ukhrX6iTtaiTfvFPHcxgIosgg0EN6k/KpF8z DTuD1WuB/JLCCLJX1Dn+pthAa94wHvOxMMQkolYtJ3uYRWpbP2fiaIgL4 YTxwHId5+v6NQEMbXwMeBVPxPHsChQBzagjYGmxSQlVnSIwuWeJCUA5Ng w==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="243960671" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="243960671" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:37:20 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="572050638" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2022 02:37:15 -0700 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v8 9/9] KVM: VMX: enable IPI virtualization Date: Mon, 11 Apr 2022 17:04:47 +0800 Message-Id: <20220411090447.5928-10-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220411090447.5928-1-guang.zeng@intel.com> References: <20220411090447.5928-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chao Gao With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. The table size depends on maximum APIC ID assigned for current VM session from userspace. Allocating memory for PID-pointer table is deferred to vCPU creation, because irqchip mode and VM-scope maximum APIC ID is settled at that point. KVM can skip PID-pointer table allocation if !irqchip_in_kernel(). Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by: Chao Gao Signed-off-by: Zeng Guang --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/asm/vmx.h | 8 +++ arch/x86/include/asm/vmxfeatures.h | 2 + arch/x86/kvm/vmx/capabilities.h | 6 ++ arch/x86/kvm/vmx/posted_intr.c | 15 ++++- arch/x86/kvm/vmx/posted_intr.h | 2 + arch/x86/kvm/vmx/vmx.c | 89 ++++++++++++++++++++++++++---- arch/x86/kvm/vmx/vmx.h | 7 +++ arch/x86/kvm/x86.c | 6 +- 10 files changed, 124 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 3c368b639c04..357757e7227f 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -126,6 +126,7 @@ KVM_X86_OP_OPTIONAL(migrate_timers) KVM_X86_OP(msr_filter_changed) KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) +KVM_X86_OP_OPTIONAL(alloc_ipiv_pid_table) #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cdd14033988d..6c5ec99b22d5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1505,6 +1505,7 @@ struct kvm_x86_ops { int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); + int (*alloc_ipiv_pid_table)(struct kvm *kvm); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 8c929596a299..b79b6438acaa 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -76,6 +76,11 @@ #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE VMCS_CONTROL_BIT(USR_WAIT_PAUSE) #define SECONDARY_EXEC_BUS_LOCK_DETECTION VMCS_CONTROL_BIT(BUS_LOCK_DETECTION) +/* + * Definitions of Tertiary Processor-Based VM-Execution Controls. + */ +#define TERTIARY_EXEC_IPI_VIRT VMCS_CONTROL_BIT(IPI_VIRT) + #define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING) #define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING) #define PIN_BASED_VIRTUAL_NMIS VMCS_CONTROL_BIT(VIRTUAL_NMIS) @@ -159,6 +164,7 @@ static inline int vmx_misc_mseg_revid(u64 vmx_misc) enum vmcs_field { VIRTUAL_PROCESSOR_ID = 0x00000000, POSTED_INTR_NV = 0x00000002, + LAST_PID_POINTER_INDEX = 0x00000008, GUEST_ES_SELECTOR = 0x00000800, GUEST_CS_SELECTOR = 0x00000802, GUEST_SS_SELECTOR = 0x00000804, @@ -224,6 +230,8 @@ enum vmcs_field { TSC_MULTIPLIER_HIGH = 0x00002033, TERTIARY_VM_EXEC_CONTROL = 0x00002034, TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035, + PID_POINTER_TABLE = 0x00002042, + PID_POINTER_TABLE_HIGH = 0x00002043, GUEST_PHYSICAL_ADDRESS = 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401, VMCS_LINK_POINTER = 0x00002800, diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h index ff20776dc83b..589608c157bf 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -86,4 +86,6 @@ #define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */ #define VMX_FEATURE_BUS_LOCK_DETECTION ( 2*32+ 30) /* "" VM-Exit when bus lock caused */ +/* Tertiary Processor-Based VM-Execution Controls, word 3 */ +#define VMX_FEATURE_IPI_VIRT ( 3*32+ 4) /* Enable IPI virtualization */ #endif /* _ASM_X86_VMXFEATURES_H */ diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 31f3d88b3e4d..5f656c9e33be 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -13,6 +13,7 @@ extern bool __read_mostly enable_ept; extern bool __read_mostly enable_unrestricted_guest; extern bool __read_mostly enable_ept_ad_bits; extern bool __read_mostly enable_pml; +extern bool __read_mostly enable_ipiv; extern int __read_mostly pt_mode; #define PT_MODE_SYSTEM 0 @@ -283,6 +284,11 @@ static inline bool cpu_has_vmx_apicv(void) cpu_has_vmx_posted_intr(); } +static inline bool cpu_has_vmx_ipiv(void) +{ + return vmcs_config.cpu_based_3rd_exec_ctrl & TERTIARY_EXEC_IPI_VIRT; +} + static inline bool cpu_has_vmx_flexpriority(void) { return cpu_has_vmx_tpr_shadow() && diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 3834bb30ce54..1b12f9cfa280 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -177,11 +177,24 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu) local_irq_restore(flags); } +static bool vmx_can_use_pi_wakeup(struct kvm_vcpu *vcpu) +{ + /* + * If a blocked vCPU can be the target of posted interrupts, + * switching notification vector is needed so that kernel can + * be informed when an interrupt is posted and get the chance + * to wake up the blocked vCPU. For now, using posted interrupt + * for vCPU wakeup when IPI virtualization or VT-d PI can be + * enabled. + */ + return vmx_can_use_ipiv(vcpu) || vmx_can_use_vtd_pi(vcpu->kvm); +} + void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) { struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); - if (!vmx_can_use_vtd_pi(vcpu->kvm)) + if (!vmx_can_use_pi_wakeup(vcpu)) return; if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu)) diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h index 9a45d5c9f116..26992076552e 100644 --- a/arch/x86/kvm/vmx/posted_intr.h +++ b/arch/x86/kvm/vmx/posted_intr.h @@ -5,6 +5,8 @@ #define POSTED_INTR_ON 0 #define POSTED_INTR_SN 1 +#define PID_TABLE_ENTRY_VALID 1 + /* Posted-Interrupt Descriptor */ struct pi_desc { u32 pir[8]; /* Posted interrupt requested */ diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f439abd52bad..a5ef39e53d51 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -105,6 +105,9 @@ module_param(fasteoi, bool, S_IRUGO); module_param(enable_apicv, bool, S_IRUGO); +bool __read_mostly enable_ipiv = true; +module_param(enable_ipiv, bool, 0444); + /* * If nested=1, nested virtualization is supported, i.e., guests may use * VMX and be a hypervisor for its own guests. If nested=0, guests may not @@ -2525,7 +2528,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, } if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) { - u64 opt3 = 0; + u64 opt3 = TERTIARY_EXEC_IPI_VIRT; _cpu_based_3rd_exec_control = adjust_vmx_controls64(opt3, MSR_IA32_VMX_PROCBASED_CTLS3); @@ -3872,6 +3875,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu) vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); + if (enable_ipiv) + vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR), MSR_TYPE_RW); } } @@ -4194,15 +4199,19 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); - if (cpu_has_secondary_exec_ctrls()) { - if (kvm_vcpu_apicv_active(vcpu)) - secondary_exec_controls_setbit(vmx, - SECONDARY_EXEC_APIC_REGISTER_VIRT | - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); - else - secondary_exec_controls_clearbit(vmx, - SECONDARY_EXEC_APIC_REGISTER_VIRT | - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); + + if (kvm_vcpu_apicv_active(vcpu)) { + secondary_exec_controls_setbit(vmx, + SECONDARY_EXEC_APIC_REGISTER_VIRT | + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); + if (enable_ipiv) + tertiary_exec_controls_setbit(vmx, TERTIARY_EXEC_IPI_VIRT); + } else { + secondary_exec_controls_clearbit(vmx, + SECONDARY_EXEC_APIC_REGISTER_VIRT | + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); + if (enable_ipiv) + tertiary_exec_controls_clearbit(vmx, TERTIARY_EXEC_IPI_VIRT); } vmx_update_msr_bitmap_x2apic(vcpu); @@ -4236,7 +4245,16 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx) { - return vmcs_config.cpu_based_3rd_exec_ctrl; + u64 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl; + + /* + * IPI virtualization relies on APICv. Disable IPI virtualization if + * APICv is inhibited. + */ + if (!enable_ipiv || !kvm_vcpu_apicv_active(&vmx->vcpu)) + exec_control &= ~TERTIARY_EXEC_IPI_VIRT; + + return exec_control; } /* @@ -4384,10 +4402,37 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) return exec_control; } +int vmx_get_pid_table_order(struct kvm_vmx *kvm_vmx) +{ + return get_order(kvm_vmx->kvm.arch.max_vcpu_ids * sizeof(*kvm_vmx->pid_table)); +} + +static int vmx_alloc_ipiv_pid_table(struct kvm *kvm) +{ + struct page *pages; + struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm); + + if (!irqchip_in_kernel(kvm) || !enable_ipiv) + return 0; + if (kvm_vmx->pid_table) + return 0; + + pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, + vmx_get_pid_table_order(kvm_vmx)); + + if (!pages) + return -ENOMEM; + + kvm_vmx->pid_table = (void *)page_address(pages); + return 0; +} + #define VMX_XSS_EXIT_BITMAP 0 static void init_vmcs(struct vcpu_vmx *vmx) { + struct kvm_vmx *kvm_vmx = to_kvm_vmx(vmx->vcpu.kvm); + if (nested) nested_vmx_set_vmcs_shadowing_bitmap(); @@ -4419,6 +4464,11 @@ static void init_vmcs(struct vcpu_vmx *vmx) vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); } + if (vmx_can_use_ipiv(&vmx->vcpu)) { + vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table)); + vmcs_write16(LAST_PID_POINTER_INDEX, kvm_vmx->kvm.arch.max_vcpu_ids - 1); + } + if (!kvm_pause_in_guest(vmx->vcpu.kvm)) { vmcs_write32(PLE_GAP, ple_gap); vmx->ple_window = ple_window; @@ -7112,6 +7162,10 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu) goto free_vmcs; } + if (vmx_can_use_ipiv(vcpu)) + WRITE_ONCE(to_kvm_vmx(vcpu->kvm)->pid_table[vcpu->vcpu_id], + __pa(&vmx->pi_desc) | PID_TABLE_ENTRY_VALID); + return 0; free_vmcs: @@ -7746,6 +7800,14 @@ static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason) return supported & BIT(reason); } +static void vmx_vm_destroy(struct kvm *kvm) +{ + struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm); + + if (kvm_vmx->pid_table) + free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm_vmx)); +} + static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = "kvm_intel", @@ -7757,6 +7819,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .vm_size = sizeof(struct kvm_vmx), .vm_init = vmx_vm_init, + .vm_destroy = vmx_vm_destroy, .vcpu_create = vmx_vcpu_create, .vcpu_free = vmx_vcpu_free, @@ -7880,6 +7943,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + .alloc_ipiv_pid_table = vmx_alloc_ipiv_pid_table, }; static unsigned int vmx_handle_intel_pt_intr(void) @@ -8011,6 +8075,9 @@ static __init int hardware_setup(void) if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr = NULL; + if (!enable_apicv || !cpu_has_vmx_ipiv()) + enable_ipiv = false; + if (cpu_has_vmx_tsc_scaling()) kvm_has_tsc_control = true; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 85c067f2d7f2..4ab66b683624 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -365,6 +365,8 @@ struct kvm_vmx { unsigned int tss_addr; bool ept_identity_pagetable_done; gpa_t ept_identity_map_addr; + /* Posted Interrupt Descriptor (PID) table for IPI virtualization */ + u64 *pid_table; }; bool nested_vmx_allowed(struct kvm_vcpu *vcpu); @@ -580,4 +582,9 @@ static inline int vmx_get_instr_info_reg2(u32 vmx_instr_info) return (vmx_instr_info >> 28) & 0xf; } +static inline bool vmx_can_use_ipiv(struct kvm_vcpu *vcpu) +{ + return lapic_in_kernel(vcpu) && enable_ipiv; +} + #endif /* __KVM_X86_VMX_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d1a39285deab..23fbf52f7bea 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11180,11 +11180,15 @@ static int sync_regs(struct kvm_vcpu *vcpu) int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) { + int ret = 0; + if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0) pr_warn_once("kvm: SMP vm created on host with unstable TSC; " "guest TSC will not be reliable\n"); - return 0; + if (kvm_x86_ops.alloc_ipiv_pid_table) + ret = static_call(kvm_x86_alloc_ipiv_pid_table)(kvm); + return ret; } int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)