From patchwork Mon Nov 16 18:25:46 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910311
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 789ECC2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:16 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 395AB24671
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732594AbgKPS1y (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:27:54 -0500
Received: from mga06.intel.com ([134.134.136.31]:20622 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731107AbgKPS1y (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:54 -0500
IronPort-SDR: 
 7mHO8jlN6Dn0HGliqXULyPj81wm/4iYjeFfNDUob6RnUJCM9UiAsW00f62CtrP0ai/HrRh3o0y
 0WKH4G9BuseA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232409990"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232409990"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:52 -0800
IronPort-SDR: 
 AYXSg0NKU9zQuWlZqFPWBZjv+zY822RMK8oxUZs0JuX8ax32iouwi6HYMmbi9MT8AKwv8SHWZ6
 Dj0iZHxmX9dA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527729"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:52 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 01/67] x86/cpufeatures: Add synthetic feature flag for TDX
 (in host)
Date: Mon, 16 Nov 2020 10:25:46 -0800
Message-Id: 
 <9a74fb153bc21dc5cac46e84913b88182f216d1b.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..1bd2a414dcc0 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -230,6 +230,7 @@
 #define X86_FEATURE_FLEXPRIORITY	( 8*32+ 2) /* Intel FlexPriority */
 #define X86_FEATURE_EPT			( 8*32+ 3) /* Intel Extended Page Table */
 #define X86_FEATURE_VPID		( 8*32+ 4) /* Intel Virtual Processor ID */
+#define X86_FEATURE_TDX			( 8*32+ 5) /* Intel Trusted Domain Extensions */
 
 #define X86_FEATURE_VMMCALL		( 8*32+15) /* Prefer VMMCALL to VMCALL */
 #define X86_FEATURE_XENPV		( 8*32+16) /* "" Xen paravirtual guest */

From patchwork Mon Nov 16 18:25:47 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910453
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 28BF4C4742C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id D6323207BC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732750AbgKPS1z (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:27:55 -0500
Received: from mga06.intel.com ([134.134.136.31]:20621 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731905AbgKPS1y (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:54 -0500
IronPort-SDR: 
 hT+X4cX+TjhEd2pIpk2YTLzl+R89qyXYx2Urexi3pt1AJFobszQBQ5Zwn865BEZhFXN1jIBMzs
 zAHm3kfm0Sxg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232409993"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232409993"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:53 -0800
IronPort-SDR: 
 xzFFZO5jPKLUZ1kv0TBbv0rG/YN0c867BqTRrQSDS4K7Jo5dHxlELA7jTUAsyapafY+b9ZGTsp
 p8/UHFKxG1lA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527740"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:52 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 02/67] x86/msr-index: Define MSR_IA32_MKTME_KEYID_PART
 used by TDX
Date: Mon, 16 Nov 2020 10:25:47 -0800
Message-Id: 
 <e7c4c7c77bcf1177b63ac4edf3f108f6aec9d5f9.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Define MSR_IA32_MKTME_KEYID_PART, used by TDX to enumerate the TDX KeyID
space, which is carved out from the regular MKTME KeyIDs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/msr-index.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 972a34d93505..aad12236b33c 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -628,6 +628,8 @@
 #define MSR_IA32_UCODE_WRITE		0x00000079
 #define MSR_IA32_UCODE_REV		0x0000008b
 
+#define MSR_IA32_MKTME_KEYID_PART	0x00000087
+
 #define MSR_IA32_SMM_MONITOR_CTL	0x0000009b
 #define MSR_IA32_SMBASE			0x0000009e
 

From patchwork Mon Nov 16 18:25:48 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910449
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.9 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 10CA1C64E75
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:31 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id E217C207BC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388701AbgKPSeM (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:34:12 -0500
Received: from mga06.intel.com ([134.134.136.31]:20622 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731805AbgKPS1y (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:54 -0500
IronPort-SDR: 
 NkFAJYqJGm38hFnXsObiDFbYd7Tw1K4f+sq9p1SDxAEJuf83TqtTnrJionPb/0Vmx4KZpoD4+k
 QLK03CYW7IWA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232409994"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232409994"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:53 -0800
IronPort-SDR: 
 f+6zgaVr9G/MU7uVLgqDXVGz7an/x68J1UTSSZgpO0RIkMhsoYir7xeN4vxwo2Oh/H+4RcWMQ5
 /qDKMWlek9xg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527752"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:53 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Zhang Chen <chen.zhang@intel.com>,
        Kai Huang <kai.huang@linux.intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 03/67] x86/cpu: Move get_builtin_firmware() common code
 (from microcode only)
Date: Mon, 16 Nov 2020 10:25:48 -0800
Message-Id: 
 <46d35ce06d84c55ff02a05610ca3fb6d51ee1a71.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Zhang Chen <chen.zhang@intel.com>

Move get_builtin_firmware() to common.c so that it can be used to get
non-ucode firmware, e.g. Intel's SEAM modules, even if MICROCODE=n.

Require the consumers to select FW_LOADER, which is already true for
MICROCODE, instead of having dead code that returns false at runtime.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Co-developed-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/cpu.h            |  5 +++++
 arch/x86/include/asm/microcode.h      |  3 ---
 arch/x86/kernel/cpu/common.c          | 20 ++++++++++++++++++++
 arch/x86/kernel/cpu/microcode/core.c  | 18 ------------------
 arch/x86/kernel/cpu/microcode/intel.c |  1 +
 5 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccbd493b..0096ac7cad0a 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -7,6 +7,7 @@
 #include <linux/topology.h>
 #include <linux/nodemask.h>
 #include <linux/percpu.h>
+#include <linux/earlycpio.h>
 
 #ifdef CONFIG_SMP
 
@@ -37,6 +38,10 @@ extern int _debug_hotplug_cpu(int cpu, int action);
 
 int mwait_usable(const struct cpuinfo_x86 *);
 
+#if defined(CONFIG_MICROCODE) || defined(CONFIG_KVM_INTEL_TDX)
+bool get_builtin_firmware(struct cpio_data *cd, const char *name);
+#endif
+
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 2b7cc5397f80..4f10089f30de 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -131,15 +131,12 @@ int __init microcode_init(void);
 extern void __init load_ucode_bsp(void);
 extern void load_ucode_ap(void);
 void reload_early_microcode(void);
-extern bool get_builtin_firmware(struct cpio_data *cd, const char *name);
 extern bool initrd_gone;
 #else
 static inline int __init microcode_init(void)			{ return 0; };
 static inline void __init load_ucode_bsp(void)			{ }
 static inline void load_ucode_ap(void)				{ }
 static inline void reload_early_microcode(void)			{ }
-static inline bool
-get_builtin_firmware(struct cpio_data *cd, const char *name)	{ return false; }
 #endif
 
 #endif /* _ASM_X86_MICROCODE_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 35ad8480c464..87512c5854bb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -22,6 +22,7 @@
 #include <linux/io.h>
 #include <linux/syscore_ops.h>
 #include <linux/pgtable.h>
+#include <linux/firmware.h>
 
 #include <asm/cmdline.h>
 #include <asm/stackprotector.h>
@@ -87,6 +88,25 @@ void __init setup_cpu_local_masks(void)
 	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
 }
 
+#if defined(CONFIG_MICROCODE) || defined(CONFIG_KVM_INTEL_TDX)
+extern struct builtin_fw __start_builtin_fw[];
+extern struct builtin_fw __end_builtin_fw[];
+
+bool get_builtin_firmware(struct cpio_data *cd, const char *name)
+{
+	struct builtin_fw *b_fw;
+
+	for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) {
+		if (!strcmp(name, b_fw->name)) {
+			cd->size = b_fw->size;
+			cd->data = b_fw->data;
+			return true;
+		}
+	}
+	return false;
+}
+#endif
+
 static void default_init(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index ec6f0415bc6d..f877a9c19f42 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -22,7 +22,6 @@
 #include <linux/syscore_ops.h>
 #include <linux/miscdevice.h>
 #include <linux/capability.h>
-#include <linux/firmware.h>
 #include <linux/kernel.h>
 #include <linux/delay.h>
 #include <linux/mutex.h>
@@ -140,23 +139,6 @@ static bool __init check_loader_disabled_bsp(void)
 	return *res;
 }
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
-
-bool get_builtin_firmware(struct cpio_data *cd, const char *name)
-{
-	struct builtin_fw *b_fw;
-
-	for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) {
-		if (!strcmp(name, b_fw->name)) {
-			cd->size = b_fw->size;
-			cd->data = b_fw->data;
-			return true;
-		}
-	}
-	return false;
-}
-
 void __init load_ucode_bsp(void)
 {
 	unsigned int cpuid_1_eax;
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 6a99535d7f37..50e69d6cb3d9 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -36,6 +36,7 @@
 #include <asm/tlbflush.h>
 #include <asm/setup.h>
 #include <asm/msr.h>
+#include <asm/cpu.h>
 
 static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
 

From patchwork Mon Nov 16 18:25:49 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910455
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70BF3C64E69
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:31 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 2F785207BC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388745AbgKPSeN (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:34:13 -0500
Received: from mga06.intel.com ([134.134.136.31]:20621 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732556AbgKPS1y (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:54 -0500
IronPort-SDR: 
 tvoyh1nHyLYYzBKDZSUitvtis4X+jZ1Vb0hP/FBDPS7zIrdD28n+8BV0VdeJYuFgzDeq7eBAnj
 kA6dMGAbnW2g==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232409995"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232409995"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:54 -0800
IronPort-SDR: 
 5Rq4UIVgXXoOKOKdoO/B4Crqbb+CVEkthb46CjIGiUKA5JrVN/wiynBCxV8g8g0NMfljcv9yA2
 bR1ICpCacejw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527764"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:53 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 04/67] KVM: Export kvm_io_bus_read for use by TDX for PV
 MMIO
Date: Mon, 16 Nov 2020 10:25:49 -0800
Message-Id: 
 <ccee3428925b3436cfeca7b857267ac068c32eeb.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2541a17ff1c4..65e1737c4354 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4288,6 +4288,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
 	r = __kvm_io_bus_read(vcpu, bus, &range, val);
 	return r < 0 ? r : 0;
 }
+EXPORT_SYMBOL_GPL(kvm_io_bus_read);
 
 /* Caller must hold slots_lock. */
 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,

From patchwork Mon Nov 16 18:25:50 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910431
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E61B2C4742C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:54 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id A6B8C207BC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1733181AbgKPS17 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:27:59 -0500
Received: from mga06.intel.com ([134.134.136.31]:20621 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732621AbgKPS1z (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:55 -0500
IronPort-SDR: 
 Ef39cl+7K/sBivZoXQzoqo2Y8a2qkqtAsK0a5CbW1uRbCjilqMSODI9enmJnoXjzqmMTIoKKTU
 p9Wj4yvRf7Jg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232409997"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232409997"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:54 -0800
IronPort-SDR: 
 0czBnyECJbUX4m1eupE8CDltQXeiaxg3VkZEFTw5wQCagIEbyVdUhRY7xrVbDbdchOg63dnitr
 /kt4xkvsB3cg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527777"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:54 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 05/67] KVM: Enable hardware before doing arch VM
 initialization
Date: Mon, 16 Nov 2020 10:25:50 -0800
Message-Id: 
 <59f40d4844dbdedf42632a3a9e82b86f68060204.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Swap the order of hardware_enable_all() and kvm_arch_init_vm() to
accommodate Intel's TDX, which needs VMX to be enabled during VM init in
order to make SEAMCALLs.

This also provides consistent ordering between kvm_create_vm() and
kvm_destroy_vm() with respect to calling kvm_arch_destroy_vm() and
hardware_disable_all().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 65e1737c4354..11166e901582 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -764,7 +764,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		struct kvm_memslots *slots = kvm_alloc_memslots();
 
 		if (!slots)
-			goto out_err_no_arch_destroy_vm;
+			goto out_err_no_disable;
 		/* Generations must be different for each address space. */
 		slots->generation = i;
 		rcu_assign_pointer(kvm->memslots[i], slots);
@@ -774,19 +774,19 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		rcu_assign_pointer(kvm->buses[i],
 			kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
 		if (!kvm->buses[i])
-			goto out_err_no_arch_destroy_vm;
+			goto out_err_no_disable;
 	}
 
 	kvm->max_halt_poll_ns = halt_poll_ns;
 
-	r = kvm_arch_init_vm(kvm, type);
-	if (r)
-		goto out_err_no_arch_destroy_vm;
-
 	r = hardware_enable_all();
 	if (r)
 		goto out_err_no_disable;
 
+	r = kvm_arch_init_vm(kvm, type);
+	if (r)
+		goto out_err_no_arch_destroy_vm;
+
 #ifdef CONFIG_HAVE_KVM_IRQFD
 	INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
 #endif
@@ -813,10 +813,10 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
 #endif
 out_err_no_mmu_notifier:
-	hardware_disable_all();
-out_err_no_disable:
 	kvm_arch_destroy_vm(kvm);
 out_err_no_arch_destroy_vm:
+	hardware_disable_all();
+out_err_no_disable:
 	WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
 	for (i = 0; i < KVM_NR_BUSES; i++)
 		kfree(kvm_get_bus(kvm, i));

From patchwork Mon Nov 16 18:25:51 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910437
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8AD6DC64E7C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:56 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 53BA82080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732895AbgKPS17 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:27:59 -0500
Received: from mga06.intel.com ([134.134.136.31]:20622 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732673AbgKPS1z (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:55 -0500
IronPort-SDR: 
 8vgkqNb5fQSiD6s3VwTzEpvMIg48lyvqtJmAhR1EPuMJAjfdC1yWnMz9N91NjMWObvUa1WQrmw
 K6O4rNY/Cu9w==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232409999"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232409999"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:55 -0800
IronPort-SDR: 
 zcpRJgWgBFfVEbPO26jvGzKRs6bd8EAPdPmpoNz/U+K64YrNexf/67QR/+/V89LpQlTBrSwVMm
 Hy+TQHUSUmNQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527790"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:54 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 06/67] KVM: x86: Split core of hypercall emulation to
 helper function
Date: Mon, 16 Nov 2020 10:25:51 -0800
Message-Id: 
 <22558f45bbc04d0b491e11ea321dff0f146e43aa.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

By necessity, TDX will use a different register ABI for hypercalls.
Break out the core functionality so that it may be reused for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++
 arch/x86/kvm/x86.c              | 49 +++++++++++++++++++++------------
 2 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d44858b69353..c2639744ea09 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1549,6 +1549,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
 void kvm_request_apicv_update(struct kvm *kvm, bool activate,
 			      unsigned long bit);
 
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit);
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f5ede41bf9e6..0f67f762717a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8020,23 +8020,15 @@ static void kvm_sched_yield(struct kvm *kvm, unsigned long dest_id)
 		kvm_vcpu_yield_to(target);
 }
 
-int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit)
 {
-	unsigned long nr, a0, a1, a2, a3, ret;
-	int op_64_bit;
-
-	if (kvm_hv_hypercall_enabled(vcpu->kvm))
-		return kvm_hv_hypercall(vcpu);
-
-	nr = kvm_rax_read(vcpu);
-	a0 = kvm_rbx_read(vcpu);
-	a1 = kvm_rcx_read(vcpu);
-	a2 = kvm_rdx_read(vcpu);
-	a3 = kvm_rsi_read(vcpu);
+	unsigned long ret;
 
 	trace_kvm_hypercall(nr, a0, a1, a2, a3);
 
-	op_64_bit = is_64_bit_mode(vcpu);
 	if (!op_64_bit) {
 		nr &= 0xFFFFFFFF;
 		a0 &= 0xFFFFFFFF;
@@ -8045,11 +8037,6 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &= 0xFFFFFFFF;
 	}
 
-	if (kvm_x86_ops.get_cpl(vcpu) != 0) {
-		ret = -KVM_EPERM;
-		goto out;
-	}
-
 	ret = -KVM_ENOSYS;
 
 	switch (nr) {
@@ -8086,6 +8073,32 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		ret = -KVM_ENOSYS;
 		break;
 	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__kvm_emulate_hypercall);
+
+int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+	int op_64_bit;
+
+	if (kvm_hv_hypercall_enabled(vcpu->kvm))
+		return kvm_hv_hypercall(vcpu);
+
+	op_64_bit = is_64_bit_mode(vcpu);
+
+	if (kvm_x86_ops.get_cpl(vcpu) != 0) {
+		ret = -KVM_EPERM;
+		goto out;
+	}
+
+	nr = kvm_rax_read(vcpu);
+	a0 = kvm_rbx_read(vcpu);
+	a1 = kvm_rcx_read(vcpu);
+	a2 = kvm_rdx_read(vcpu);
+	a3 = kvm_rsi_read(vcpu);
+
+	ret = __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, op_64_bit);
 out:
 	if (!op_64_bit)
 		ret = (u32)ret;

From patchwork Mon Nov 16 18:25:52 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910443
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B93C1C64E7D
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:56 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 983562080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732873AbgKPS17 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:27:59 -0500
Received: from mga06.intel.com ([134.134.136.31]:20628 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732751AbgKPS14 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:56 -0500
IronPort-SDR: 
 nt0KsKvT45sFjLR+SDdsblu5r+hk87VvX7ilBLAgS7t3UEtGAKqtA0H6HnObjbJBuag4hd0j2G
 1YMDFwdsrbig==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410001"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410001"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:55 -0800
IronPort-SDR: 
 zLPfUXHzJA/iUnLxztnBp46/0qVs8jzBIaOvzXtTmRRGJ4ROYbN0oq4UEYLdKKBO5Thl5FON/v
 RJ3QC4MP/Lww==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527800"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:55 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 07/67] KVM: x86: Export kvm_mmio tracepoint for use by TDX
 for PV MMIO
Date: Mon, 16 Nov 2020 10:25:52 -0800
Message-Id: 
 <e36d54437a155e4bf4df6516e88f7b5f1f13f624.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0f67f762717a..1d999b57f21a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11237,6 +11237,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 EXPORT_SYMBOL_GPL(kvm_handle_invpcid);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);

From patchwork Mon Nov 16 18:25:53 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910447
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2F7E7C2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 002C82231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732861AbgKPS16 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:27:58 -0500
Received: from mga06.intel.com ([134.134.136.31]:20621 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731905AbgKPS16 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:58 -0500
IronPort-SDR: 
 JcWY1rdoS1+5/8+QDDWQnMOjSRnHBcnZHWfdwuqosp0asDCyzVc9Kpw7Z4COcRkkFat43MbDd8
 6h6stDt7KVew==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410004"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410004"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:56 -0800
IronPort-SDR: 
 3MYaggaoMZpInIeuFtjYxfDiHawkX9q2ihR09ohlLaCJMV3iSR2DU8k4c6sA8QVCi4rvFly3J4
 i6ojg2LftOaw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527813"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:55 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 08/67] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved
 memslot by default
Date: Mon, 16 Nov 2020 10:25:53 -0800
Message-Id: 
 <90feb381301829a1aeaa91440eba9cc49b40ba52.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Zap only leaf SPTEs when deleting/moving a memslot by default, and add a
module param to allow reverting to the old behavior of zapping all SPTEs
at all levels and memslots when any memslot is updated.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1f96adff8dc4..3c7e43e12513 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -84,6 +84,9 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint");
 static bool __read_mostly force_flush_and_sync_on_reuse;
 module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
 
+static bool __read_mostly memslot_update_zap_all;
+module_param(memslot_update_zap_all, bool, 0444);
+
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
  * where the hardware walks 2 page tables:
@@ -5441,11 +5444,26 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
 
+static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	/*
+	 * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
+	 * case scenario we'll have unused shadow pages lying around until they
+	 * are recycled due to age or when the VM is destroyed.
+	 */
+	spin_lock(&kvm->mmu_lock);
+	slot_handle_all_level(kvm, slot, kvm_zap_rmapp, true);
+	spin_unlock(&kvm->mmu_lock);
+}
+
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
 {
-	kvm_mmu_zap_all_fast(kvm);
+	if (memslot_update_zap_all)
+		kvm_mmu_zap_all_fast(kvm);
+	else
+		kvm_mmu_zap_memslot(kvm, slot);
 }
 
 void kvm_mmu_init_vm(struct kvm *kvm)

From patchwork Mon Nov 16 18:25:54 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910451
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8E711C63777
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 593B12231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388644AbgKPSd6 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:58 -0500
Received: from mga06.intel.com ([134.134.136.31]:20628 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732376AbgKPS16 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:58 -0500
IronPort-SDR: 
 NNYqIBeOuek3tTVQ5rSsBAXmiAYdC/z9ITPpGwCZtAjG40I0wbJVyMb1QcT9zex5ghEnBYsJb5
 G7Oa20Gdv2iQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410005"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410005"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:56 -0800
IronPort-SDR: 
 Bk24M7M9q/C2wzbxUGU/SOxozQNiOVqe0UwjGOTKYSYVwf0UNpF/XkIRK68Cp7pt5Ai2AfFQhQ
 RrnuKTZP+zmQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527820"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:56 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 09/67] KVM: Add infrastructure and macro to mark VM as
 bugged
Date: Mon, 16 Nov 2020 10:25:54 -0800
Message-Id: 
 <cca82188271f007949ad007d216dc15d89c9a2f0.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 include/linux/kvm_host.h | 27 +++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      | 10 +++++-----
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7f2e2a09ebbd..03c016ff1715 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -146,6 +146,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_PENDING_TIMER     2
 #define KVM_REQ_UNHALT            3
+#define KVM_REQ_VM_BUGGED	  (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQUEST_ARCH_BASE     8
 
 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
@@ -505,6 +506,8 @@ struct kvm {
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
+
+	bool vm_bugged;
 };
 
 #define kvm_err(fmt, ...) \
@@ -533,6 +536,30 @@ struct kvm {
 #define vcpu_err(vcpu, fmt, ...)					\
 	kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 
+static inline void kvm_vm_bugged(struct kvm *kvm)
+{
+	kvm->vm_bugged = true;
+	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
+}
+
+#define KVM_BUG(cond, kvm, fmt...)				\
+({								\
+	int __ret = (cond);					\
+								\
+	if (WARN_ONCE(__ret && !(kvm)->vm_bugged, fmt))		\
+		kvm_vm_bugged(kvm);				\
+	unlikely(__ret);					\
+})
+
+#define KVM_BUG_ON(cond, kvm)					\
+({								\
+	int __ret = (cond);					\
+								\
+	if (WARN_ON_ONCE(__ret && !(kvm)->vm_bugged))		\
+		kvm_vm_bugged(kvm);				\
+	unlikely(__ret);					\
+})
+
 static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
 {
 	return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 11166e901582..21af4f083674 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3194,7 +3194,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
 	struct kvm_fpu *fpu = NULL;
 	struct kvm_sregs *kvm_sregs = NULL;
 
-	if (vcpu->kvm->mm != current->mm)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
 		return -EIO;
 
 	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -3400,7 +3400,7 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
 	void __user *argp = compat_ptr(arg);
 	int r;
 
-	if (vcpu->kvm->mm != current->mm)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
 		return -EIO;
 
 	switch (ioctl) {
@@ -3466,7 +3466,7 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
 {
 	struct kvm_device *dev = filp->private_data;
 
-	if (dev->kvm->mm != current->mm)
+	if (dev->kvm->mm != current->mm || dev->kvm->vm_bugged)
 		return -EIO;
 
 	switch (ioctl) {
@@ -3682,7 +3682,7 @@ static long kvm_vm_ioctl(struct file *filp,
 	void __user *argp = (void __user *)arg;
 	int r;
 
-	if (kvm->mm != current->mm)
+	if (kvm->mm != current->mm || kvm->vm_bugged)
 		return -EIO;
 	switch (ioctl) {
 	case KVM_CREATE_VCPU:
@@ -3877,7 +3877,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
 	struct kvm *kvm = filp->private_data;
 	int r;
 
-	if (kvm->mm != current->mm)
+	if (kvm->mm != current->mm || kvm->vm_bugged)
 		return -EIO;
 	switch (ioctl) {
 	case KVM_GET_DIRTY_LOG: {

From patchwork Mon Nov 16 18:25:55 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910459
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E184C55ABD
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 1F50922370
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732864AbgKPSd6 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:58 -0500
Received: from mga06.intel.com ([134.134.136.31]:20622 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732785AbgKPS16 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:58 -0500
IronPort-SDR: 
 yhg1fiw3CbmMNVHaBEu2VK2zkh3FB7sjxF2YOUXqi/gSB4x9WMbyeijou1D/F8tDHHlclbDibo
 tvBkzWczsxWw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410006"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410006"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:57 -0800
IronPort-SDR: 
 FDxCEs86/sddN4PTr74F/9WjZ2LYpH7tc8hfsQk0wh4Eq34vCxhMuTua4zNzMvpsBe80j78bOX
 rFOrWcE62biw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527831"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:56 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 10/67] KVM: Export kvm_make_all_cpus_request() for use in
 marking VMs as bugged
Date: Mon, 16 Nov 2020 10:25:55 -0800
Message-Id: 
 <9f779ee86c4c69f806616bc05e8f865628c538eb.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Export kvm_make_all_cpus_request() and hoist the request helper
declarations of request up to the KVM_REQ_* definitions in preparation
for adding a "VM bugged" framework.  The framework will add KVM_BUG()
and KVM_BUG_ON() as alternatives to full BUG()/BUG_ON() for cases where
KVM has definitely hit a bug (in itself or in silicon) and the VM is all
but guaranteed to be hosed.  Marking a VM bugged will trigger a request
to all vCPUs to allow arch code to forcefully evict each vCPU from its
run loop.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 include/linux/kvm_host.h | 18 +++++++++---------
 virt/kvm/kvm_main.c      |  1 +
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 03c016ff1715..ad9b6963d19d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -155,6 +155,15 @@ static inline bool is_error_page(struct page *page)
 })
 #define KVM_ARCH_REQ(nr)           KVM_ARCH_REQ_FLAGS(nr, 0)
 
+bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
+				 struct kvm_vcpu *except,
+				 unsigned long *vcpu_bitmap, cpumask_var_t tmp);
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
+bool kvm_make_all_cpus_request_except(struct kvm *kvm, unsigned int req,
+				      struct kvm_vcpu *except);
+bool kvm_make_cpus_request_mask(struct kvm *kvm, unsigned int req,
+				unsigned long *vcpu_bitmap);
+
 #define KVM_USERSPACE_IRQ_SOURCE_ID		0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID	1
 
@@ -874,15 +883,6 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif
 
-bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
-				 struct kvm_vcpu *except,
-				 unsigned long *vcpu_bitmap, cpumask_var_t tmp);
-bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
-bool kvm_make_all_cpus_request_except(struct kvm *kvm, unsigned int req,
-				      struct kvm_vcpu *except);
-bool kvm_make_cpus_request_mask(struct kvm *kvm, unsigned int req,
-				unsigned long *vcpu_bitmap);
-
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 21af4f083674..b29b6c3484dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -304,6 +304,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {
 	return kvm_make_all_cpus_request_except(kvm, req, NULL);
 }
+EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);
 
 #ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
 void kvm_flush_remote_tlbs(struct kvm *kvm)

From patchwork Mon Nov 16 18:25:56 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910457
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AA7D2C63798
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9123B22370
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:34:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388316AbgKPSeG (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:34:06 -0500
Received: from mga06.intel.com ([134.134.136.31]:20621 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732812AbgKPS16 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:58 -0500
IronPort-SDR: 
 jcX0cotuB5DQ5I8DdzFEE0nFNLgHdTEhch9ldXNgnstGJ2gVYBKXo/erB5Jt4Hzc8WZCRD4Fj1
 pG6+8cqkjmJg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410007"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410007"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:57 -0800
IronPort-SDR: 
 kC8KfD5uJvEgMJtpQZuTeRjmkn+Gk271FtRgtnGPztFCTZd/U8DIXuVlnpow33FCaCP2p+uUUw
 x3BKuxLRC+nA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527840"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:57 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 11/67] KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs
 that are fatal to the VM
Date: Mon, 16 Nov 2020 10:25:56 -0800
Message-Id: 
 <57c00ce15fbf83118ebcc476f9728ebca6326178.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/svm/svm.c |  2 +-
 arch/x86/kvm/vmx/vmx.c | 23 ++++++++++++++---------
 arch/x86/kvm/x86.c     |  4 ++++
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2f32fd09e259..e001e3c9e4bc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1452,7 +1452,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 		load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu));
 		break;
 	default:
-		WARN_ON_ONCE(1);
+		KVM_BUG_ON(1, vcpu->kvm);
 	}
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 47b8357b9751..1c9ad3103c87 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2245,7 +2245,7 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 		vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits;
 		break;
 	default:
-		WARN_ON_ONCE(1);
+		KVM_BUG_ON(1, vcpu->kvm);
 		break;
 	}
 }
@@ -5006,6 +5006,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
 			return kvm_complete_insn_gp(vcpu, err);
 		case 3:
 			WARN_ON_ONCE(enable_unrestricted_guest);
+
 			err = kvm_set_cr3(vcpu, val);
 			return kvm_complete_insn_gp(vcpu, err);
 		case 4:
@@ -5031,14 +5032,13 @@ static int handle_cr(struct kvm_vcpu *vcpu)
 		}
 		break;
 	case 2: /* clts */
-		WARN_ONCE(1, "Guest should always own CR0.TS");
-		vmx_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS));
-		trace_kvm_cr_write(0, kvm_read_cr0(vcpu));
-		return kvm_skip_emulated_instruction(vcpu);
+		KVM_BUG(1, vcpu->kvm, "Guest always owns CR0.TS");
+		return -EIO;
 	case 1: /*mov from cr*/
 		switch (cr) {
 		case 3:
 			WARN_ON_ONCE(enable_unrestricted_guest);
+
 			val = kvm_read_cr3(vcpu);
 			kvm_register_write(vcpu, reg, val);
 			trace_kvm_cr_read(cr, val);
@@ -5377,7 +5377,9 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 
 static int handle_nmi_window(struct kvm_vcpu *vcpu)
 {
-	WARN_ON_ONCE(!enable_vnmi);
+	if (KVM_BUG_ON(!enable_vnmi, vcpu->kvm))
+		return -EIO;
+
 	exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING);
 	++vcpu->stat.nmi_window_exits;
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
@@ -5950,7 +5952,8 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	 * below) should never happen as that means we incorrectly allowed a
 	 * nested VM-Enter with an invalid vmcs12.
 	 */
-	WARN_ON_ONCE(vmx->nested.nested_run_pending);
+	if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm))
+		return -EIO;
 
 	/* If guest state is invalid, start emulating */
 	if (vmx->emulation_required)
@@ -6300,7 +6303,9 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	int max_irr;
 	bool max_irr_updated;
 
-	WARN_ON(!vcpu->arch.apicv_active);
+	if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))
+		return -EIO;
+
 	if (pi_test_on(&vmx->pi_desc)) {
 		pi_clear_on(&vmx->pi_desc);
 		/*
@@ -6382,7 +6387,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
 
-	if (WARN_ONCE(!is_external_intr(intr_info),
+	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1d999b57f21a..19b53aedc6c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8722,6 +8722,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	bool req_immediate_exit = false;
 
 	if (kvm_request_pending(vcpu)) {
+		if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
+			r = -EIO;
+			goto out;
+		}
 		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
 			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
 				r = 0;

From patchwork Mon Nov 16 18:25:57 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910441
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 10F32C4742C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:57 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id DF834207BC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388724AbgKPSdp (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:45 -0500
Received: from mga06.intel.com ([134.134.136.31]:20621 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732864AbgKPS16 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:58 -0500
IronPort-SDR: 
 3c+WGq8pk3bTJWOoDYpqaMYDYw56CyewhRndKju6aYmUM/l1b9P/WJaBMlmok5AbGVlTpzmswb
 XG6QWoS1o7CA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410009"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410009"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:58 -0800
IronPort-SDR: 
 qrZNCQIOXrUCKSEqnCxF0rlIYJl9oGoBc04p+nOQrxBdlUDjDRMU23f5pErQKLXXwdZmZLv5Ie
 sjDeg+7ZM6Kg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527850"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:57 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 12/67] KVM: x86/mmu: Mark VM as bugged if page fault
 returns RET_PF_INVALID
Date: Mon, 16 Nov 2020 10:25:57 -0800
Message-Id: 
 <d625c5ca20be0bba4b099c1bae65da8bdc09f554.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3c7e43e12513..bebd2b6ebcad 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5059,7 +5059,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 	if (r == RET_PF_INVALID) {
 		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
 					  lower_32_bits(error_code), false);
-		if (WARN_ON_ONCE(r == RET_PF_INVALID))
+		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
 	}
 

From patchwork Mon Nov 16 18:25:58 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910445
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3DF04C64EBC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:57 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 029002080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388715AbgKPSdo (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:44 -0500
Received: from mga06.intel.com ([134.134.136.31]:20622 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732868AbgKPS17 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:59 -0500
IronPort-SDR: 
 4COCV9HSduLgv/IBaAnyEWhfOAZ7yClAqEa7OE+LJtrTabhfelbaFs8RQ7nzmSG9VJGG9qa7Yt
 B85LNRlkcm1A==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410010"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410010"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:58 -0800
IronPort-SDR: 
 BJdytLI1SYiunGbzZUxtenGPWDbceua3LPbuOmDJb26Bg6JD2pypPnx5rsrgdG4FiId2q32aFt
 r/pVR/rrDz3w==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527859"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:58 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 13/67] KVM: VMX: Explicitly check for hv_remote_flush_tlb
 when loading pgd()
Date: Mon, 16 Nov 2020 10:25:58 -0800
Message-Id: 
 <96617e302a86ee0e4a263051b0cb7e51bf2ae4ed.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Explicitly check that kvm_x86_ops.tlb_remote_flush() points at Hyper-V's
implementation for PV flushing instead of assuming that a non-NULL
implemenation means running on Hyper-V.  Wrap the related logic in
ifdeffery as hv_remote_flush_tlb() is defined iff CONFIG_HYPERV!=n.

Short term, the explicit check makes it more obvious why a non-NULL
tlb_remote_flush() triggers EPTP shenanigans.  Long term, this will
allow TDX to define its own implementation of tlb_remote_flush() without
running afoul of Hyper-V.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 7 +++++--
 arch/x86/kvm/vmx/vmx.h | 2 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1c9ad3103c87..0703d82e7bad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3072,14 +3072,15 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
 		eptp = construct_eptp(vcpu, pgd, pgd_level);
 		vmcs_write64(EPT_POINTER, eptp);
 
-		if (kvm_x86_ops.tlb_remote_flush) {
+#if IS_ENABLED(CONFIG_HYPERV)
+		if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
 			spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
 			to_vmx(vcpu)->ept_pointer = eptp;
 			to_kvm_vmx(kvm)->ept_pointers_match
 				= EPT_POINTERS_CHECK;
 			spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
 		}
-
+#endif
 		if (!enable_unrestricted_guest && !is_paging(vcpu))
 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
 		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
@@ -6970,7 +6971,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 
 static int vmx_vm_init(struct kvm *kvm)
 {
+#if IS_ENABLED(CONFIG_HYPERV)
 	spin_lock_init(&to_kvm_vmx(kvm)->ept_pointer_lock);
+#endif
 
 	if (!ple_gap)
 		kvm->arch.pause_in_guest = true;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f6f66e5c6510..e9cdb0fb7f56 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -301,8 +301,10 @@ struct kvm_vmx {
 	bool ept_identity_pagetable_done;
 	gpa_t ept_identity_map_addr;
 
+#if IS_ENABLED(CONFIG_HYPERV)
 	enum ept_pointers_status ept_pointers_match;
 	spinlock_t ept_pointer_lock;
+#endif
 };
 
 bool nested_vmx_allowed(struct kvm_vcpu *vcpu);

From patchwork Mon Nov 16 18:25:59 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910439
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F1573C64E69
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:55 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id B9A4620853
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388691AbgKPSd2 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:28 -0500
Received: from mga06.intel.com ([134.134.136.31]:20628 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732871AbgKPS17 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:27:59 -0500
IronPort-SDR: 
 /bewYYkmRqkWWzYmHYosaC8RdkzAybN4SzbCY5f8IlLVAkFLK0C3IfmUdoYf3/WA69JAKRT96N
 ySmpotGshjwg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410011"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410011"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:58 -0800
IronPort-SDR: 
 B+v+kirPHf2zOjGQvzRHA0d/GaEuzx4CUa8CKXEbUM76bxxJAELInTJEOA2+VCqfBeC3mqlAJm
 7eWry/eBsVDQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527869"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:58 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 14/67] KVM: Add max_vcpus field in common 'struct kvm'
Date: Mon, 16 Nov 2020 10:25:59 -0800
Message-Id: 
 <deec01f358031b8fdcf17cfa3ff97ce1c6017ca7.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 ---
 arch/arm64/kvm/arm.c              | 7 ++-----
 arch/arm64/kvm/vgic/vgic-init.c   | 6 +++---
 include/linux/kvm_host.h          | 1 +
 virt/kvm/kvm_main.c               | 3 ++-
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 781d029b8aa8..259b05376807 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -95,9 +95,6 @@ struct kvm_arch {
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
 
-	/* The maximum number of vCPUs depends on the used GIC model */
-	int max_vcpus;
-
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5750ec34960e..b3ba6c66183d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -125,7 +125,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm_vgic_early_init(kvm);
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
-	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
+	kvm->max_vcpus = kvm_arm_default_max_vcpus();
 
 	return ret;
 out_free_stage2_pgd:
@@ -193,7 +193,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_MAX_VCPUS:
 	case KVM_CAP_MAX_VCPU_ID:
 		if (kvm)
-			r = kvm->arch.max_vcpus;
+			r = kvm->max_vcpus;
 		else
 			r = kvm_arm_default_max_vcpus();
 		break;
@@ -247,9 +247,6 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
 		return -EBUSY;
 
-	if (id >= kvm->arch.max_vcpus)
-		return -EINVAL;
-
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 32e32d67a127..9af003b62509 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -97,11 +97,11 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	ret = 0;
 
 	if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
-		kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
+		kvm->max_vcpus = VGIC_V2_MAX_CPUS;
 	else
-		kvm->arch.max_vcpus = VGIC_V3_MAX_CPUS;
+		kvm->max_vcpus = VGIC_V3_MAX_CPUS;
 
-	if (atomic_read(&kvm->online_vcpus) > kvm->arch.max_vcpus) {
+	if (atomic_read(&kvm->online_vcpus) > kvm->max_vcpus) {
 		ret = -E2BIG;
 		goto out_unlock;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ad9b6963d19d..95371750c23f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -467,6 +467,7 @@ struct kvm {
 	 * and is accessed atomically.
 	 */
 	atomic_t online_vcpus;
+	int max_vcpus;
 	int created_vcpus;
 	int last_boosted_vcpu;
 	struct list_head vm_list;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b29b6c3484dd..3dc41b6e12a0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -752,6 +752,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	mutex_init(&kvm->irq_lock);
 	mutex_init(&kvm->slots_lock);
 	INIT_LIST_HEAD(&kvm->devices);
+	kvm->max_vcpus = KVM_MAX_VCPUS;
 
 	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
 
@@ -3098,7 +3099,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 		return -EINVAL;
 
 	mutex_lock(&kvm->lock);
-	if (kvm->created_vcpus == KVM_MAX_VCPUS) {
+	if (kvm->created_vcpus >= kvm->max_vcpus) {
 		mutex_unlock(&kvm->lock);
 		return -EINVAL;
 	}

From patchwork Mon Nov 16 18:26:00 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910435
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B276FC63777
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:55 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 7270A2080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388592AbgKPSdU (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:20 -0500
Received: from mga06.intel.com ([134.134.136.31]:20631 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732986AbgKPS2A (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:00 -0500
IronPort-SDR: 
 l/3V7HYxsx1qJ6wJ0jVQXwNhW1mB085A2zFN3c+gTedQvo4hjWwDqb8x462jC68x1rnNZlUCCR
 g+aDcp4HpeWw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410013"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410013"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:59 -0800
IronPort-SDR: 
 no8lWM01D+Mlc0XWjfWSQpaZ3zAQ6POduMmX4kjePNnbjoo1RPwFtpQ8ZiMkbaV6VDbPXTojPV
 Z5RPIMGF+pEA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527880"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:59 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [RFC PATCH 15/67] KVM: x86: Add vm_type to differentiate legacy VMs
 from protected VMs
Date: Mon, 16 Nov 2020 10:26:00 -0800
Message-Id: 
 <e4879f4528d8e282d8b10c3dbc5526fa88b93e62.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a capability to effectively allow userspace to query what VM types
are supported by KVM.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h       | 2 ++
 arch/x86/include/uapi/asm/kvm.h       | 4 ++++
 arch/x86/kvm/svm/svm.c                | 6 ++++++
 arch/x86/kvm/vmx/vmx.c                | 6 ++++++
 arch/x86/kvm/x86.c                    | 9 ++++++++-
 include/uapi/linux/kvm.h              | 2 ++
 tools/arch/x86/include/uapi/asm/kvm.h | 4 ++++
 tools/include/uapi/linux/kvm.h        | 2 ++
 8 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c2639744ea09..1ff33efd6394 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -897,6 +897,7 @@ enum kvm_irqchip_mode {
 #define APICV_INHIBIT_REASON_X2APIC	5
 
 struct kvm_arch {
+	unsigned long vm_type;
 	unsigned long n_used_mmu_pages;
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
@@ -1090,6 +1091,7 @@ struct kvm_x86_ops {
 	bool (*has_emulated_msr)(u32 index);
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
 
+	bool (*is_vm_type_supported)(unsigned long vm_type);
 	unsigned int vm_size;
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 89e5f3d1bba8..29cdf262e516 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -486,4 +486,8 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+#define KVM_X86_LEGACY_VM	0
+#define KVM_X86_SEV_ES_VM	1
+#define KVM_X86_TDX_VM		2
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e001e3c9e4bc..11ab330a9b55 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4161,6 +4161,11 @@ static void svm_vm_destroy(struct kvm *kvm)
 	sev_vm_destroy(kvm);
 }
 
+static bool svm_is_vm_type_supported(unsigned long type)
+{
+	return type == KVM_X86_LEGACY_VM;
+}
+
 static int svm_vm_init(struct kvm *kvm)
 {
 	if (!pause_filter_count || !pause_filter_thresh)
@@ -4187,6 +4192,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_free = svm_free_vcpu,
 	.vcpu_reset = svm_vcpu_reset,
 
+	.is_vm_type_supported = svm_is_vm_type_supported,
 	.vm_size = sizeof(struct kvm_svm),
 	.vm_init = svm_vm_init,
 	.vm_destroy = svm_vm_destroy,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0703d82e7bad..b3ecdb96789a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6966,6 +6966,11 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 	return err;
 }
 
+static bool vmx_is_vm_type_supported(unsigned long type)
+{
+	return type == KVM_X86_LEGACY_VM;
+}
+
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
 
@@ -7603,6 +7608,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.cpu_has_accelerated_tpr = report_flexpriority,
 	.has_emulated_msr = vmx_has_emulated_msr,
 
+	.is_vm_type_supported = vmx_is_vm_type_supported,
 	.vm_size = sizeof(struct kvm_vmx),
 	.vm_init = vmx_vm_init,
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19b53aedc6c8..346394d83672 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3771,6 +3771,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_STEAL_TIME:
 		r = sched_info_on();
 		break;
+	case KVM_CAP_VM_TYPES:
+		r = BIT(KVM_X86_LEGACY_VM);
+		if (kvm_x86_ops.is_vm_type_supported(KVM_X86_TDX_VM))
+			r |= BIT(KVM_X86_TDX_VM);
+		break;
 	default:
 		break;
 	}
@@ -10249,9 +10254,11 @@ void kvm_arch_free_vm(struct kvm *kvm)
 
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
-	if (type)
+	if (!kvm_x86_ops.is_vm_type_supported(type))
 		return -EINVAL;
 
+	kvm->arch.vm_type = type;
+
 	INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..c603e9a004f1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1054,6 +1054,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_X86_MSR_FILTER 189
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
 
+#define KVM_CAP_VM_TYPES 1000
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h
index 0780f97c1850..44313ac967dd 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -466,4 +466,8 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+#define KVM_X86_LEGACY_VM	0
+#define KVM_X86_SEV_ES_VM	1
+#define KVM_X86_TDX_VM		2
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 7d8eced6f459..b043b01f0d87 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1038,6 +1038,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_DIAG318 186
 #define KVM_CAP_STEAL_TIME 187
 
+#define KVM_CAP_VM_TYPES 1000
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {

From patchwork Mon Nov 16 18:26:01 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910433
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 56D9EC61DD8
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:55 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 267EE2080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388661AbgKPSdP (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:15 -0500
Received: from mga06.intel.com ([134.134.136.31]:20628 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1733265AbgKPS2A (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:00 -0500
IronPort-SDR: 
 wN7mvgKS74vvKtp1y0HXYVhw54pg7YRoULaOeiBbdanZEyYurYoEjoTzZeAMs8llg86CmoYKzB
 ITymrfDdgm/g==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410014"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410014"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:59 -0800
IronPort-SDR: 
 9FmXc1jtuMWjLiDU3pLrABXnY1Af9TvpPoa1NSK3ojYR2tsiYfbzTwRLg6xYlwQF8kUBci6/Yu
 ghzzz3FNwYnQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527893"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:59 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 16/67] KVM: x86: Hoist kvm_dirty_regs check out of
 sync_regs()
Date: Mon, 16 Nov 2020 10:26:01 -0800
Message-Id: 
 <7954b5fb51bf3045db05ce391d6a9ee5512fc4b9.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the kvm_dirty_regs vs. KVM_SYNC_X86_VALID_FIELDS check out of
sync_regs() and into its sole caller, kvm_arch_vcpu_ioctl_run().  This
allows a future patch to allow synchronizing select state for protected
VMs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 346394d83672..1fa6a042984b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9257,7 +9257,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
-	if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
+	if ((kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) ||
+	    (kvm_run->kvm_dirty_regs & ~KVM_SYNC_X86_VALID_FIELDS)) {
 		r = -EINVAL;
 		goto out;
 	}
@@ -9778,9 +9779,6 @@ static void store_regs(struct kvm_vcpu *vcpu)
 
 static int sync_regs(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->run->kvm_dirty_regs & ~KVM_SYNC_X86_VALID_FIELDS)
-		return -EINVAL;
-
 	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
 		__set_regs(vcpu, &vcpu->run->s.regs.regs);
 		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;

From patchwork Mon Nov 16 18:26:02 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910417
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A0199C64E7B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:13 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 7F63F2080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2387974AbgKPS2B (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:01 -0500
Received: from mga06.intel.com ([134.134.136.31]:20631 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2387541AbgKPS2B (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:01 -0500
IronPort-SDR: 
 Vg83dSd5UFRouAKPihuajBzqn/+XmU/q2YX1+gmmz01qy4awjVpvj7/jmot/zp6Hulv8teIpeu
 cyuElxd3ZTUQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410016"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410016"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:00 -0800
IronPort-SDR: 
 OI90Sp+0qmyiFgxzCdljXpg25545QW2CfvlbrQzS4Ul6dLLLdHacOO7IzVMVaYWDUsqP0qOCMN
 gHEYX63lwytQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527908"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:27:59 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [RFC PATCH 17/67] KVM: x86: Introduce "protected guest" concept and
 block disallowed ioctls
Date: Mon, 16 Nov 2020 10:26:02 -0800
Message-Id: 
 <b2485e5d0fc7876f9e7750c95dd70914bbffb6d0.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add 'guest_state_protected' to mark a VM's state as being protected by
hardware/firmware, e.g. SEV-ES or TDX-SEAM.  Use the flag to disallow
ioctls() and/or flows that attempt to access protected state.

Return an error if userspace attempts to get/set register state for a
protected VM, e.g. a non-debug TDX guest.  KVM can't provide sane data,
it's userspace's responsibility to avoid attempting to read guest state
when it's known to be inaccessible.

Retrieving vCPU events is the one exception, as the userspace VMM is
allowed to inject NMIs.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/x86.c              | 113 +++++++++++++++++++++++++++-----
 2 files changed, 97 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1ff33efd6394..e687a8bd46ad 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -994,6 +994,8 @@ struct kvm_arch {
 		struct msr_bitmap_range ranges[16];
 	} msr_filter;
 
+	bool guest_state_protected;
+
 	struct kvm_pmu_event_filter *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1fa6a042984b..6154abecd546 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3966,7 +3966,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	int idx;
 
-	if (vcpu->preempted)
+	if (vcpu->preempted && !vcpu->kvm->arch.guest_state_protected)
 		vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);
 
 	/*
@@ -4074,6 +4074,9 @@ static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
 
 static int kvm_vcpu_ioctl_smi(struct kvm_vcpu *vcpu)
 {
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	kvm_make_request(KVM_REQ_SMI, vcpu);
 
 	return 0;
@@ -4120,6 +4123,9 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
 	unsigned bank_num = mcg_cap & 0xff;
 	u64 *banks = vcpu->arch.mce_banks;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL))
 		return -EINVAL;
 	/*
@@ -4212,7 +4218,8 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 		vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft;
 	events->interrupt.nr = vcpu->arch.interrupt.nr;
 	events->interrupt.soft = 0;
-	events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);
+	if (!vcpu->kvm->arch.guest_state_protected)
+		events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);
 
 	events->nmi.injected = vcpu->arch.nmi_injected;
 	events->nmi.pending = vcpu->arch.nmi_pending != 0;
@@ -4241,11 +4248,16 @@ static void kvm_smm_changed(struct kvm_vcpu *vcpu);
 static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 					      struct kvm_vcpu_events *events)
 {
-	if (events->flags & ~(KVM_VCPUEVENT_VALID_NMI_PENDING
-			      | KVM_VCPUEVENT_VALID_SIPI_VECTOR
-			      | KVM_VCPUEVENT_VALID_SHADOW
-			      | KVM_VCPUEVENT_VALID_SMM
-			      | KVM_VCPUEVENT_VALID_PAYLOAD))
+	u32 allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING |
+			    KVM_VCPUEVENT_VALID_SIPI_VECTOR |
+			    KVM_VCPUEVENT_VALID_SHADOW |
+			    KVM_VCPUEVENT_VALID_SMM |
+			    KVM_VCPUEVENT_VALID_PAYLOAD;
+
+	if (vcpu->kvm->arch.guest_state_protected)
+		allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING;
+
+	if (events->flags & ~allowed_flags)
 		return -EINVAL;
 
 	if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) {
@@ -4326,17 +4338,22 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
-					     struct kvm_debugregs *dbgregs)
+static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
+					    struct kvm_debugregs *dbgregs)
 {
 	unsigned long val;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
 	kvm_get_dr(vcpu, 6, &val);
 	dbgregs->dr6 = val;
 	dbgregs->dr7 = vcpu->arch.dr7;
 	dbgregs->flags = 0;
 	memset(&dbgregs->reserved, 0, sizeof(dbgregs->reserved));
+
+	return 0;
 }
 
 static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
@@ -4350,6 +4367,9 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 	if (dbgregs->dr7 & ~0xffffffffull)
 		return -EINVAL;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	memcpy(vcpu->arch.db, dbgregs->db, sizeof(vcpu->arch.db));
 	kvm_update_dr0123(vcpu);
 	vcpu->arch.dr6 = dbgregs->dr6;
@@ -4445,9 +4465,12 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 	}
 }
 
-static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
-					 struct kvm_xsave *guest_xsave)
+static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
+					struct kvm_xsave *guest_xsave)
 {
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
 		memset(guest_xsave, 0, sizeof(struct kvm_xsave));
 		fill_xsave((u8 *) guest_xsave->region, vcpu);
@@ -4458,6 +4481,8 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
 			XFEATURE_MASK_FPSSE;
 	}
+
+	return 0;
 }
 
 #define XSAVE_MXCSR_OFFSET 24
@@ -4469,6 +4494,9 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)];
 	u32 mxcsr = *(u32 *)&guest_xsave->region[XSAVE_MXCSR_OFFSET / sizeof(u32)];
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
 		/*
 		 * Here we allow setting states that are not present in
@@ -4488,18 +4516,22 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
-					struct kvm_xcrs *guest_xcrs)
+static int kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
+				       struct kvm_xcrs *guest_xcrs)
 {
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (!boot_cpu_has(X86_FEATURE_XSAVE)) {
 		guest_xcrs->nr_xcrs = 0;
-		return;
+		return 0;
 	}
 
 	guest_xcrs->nr_xcrs = 1;
 	guest_xcrs->flags = 0;
 	guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK;
 	guest_xcrs->xcrs[0].value = vcpu->arch.xcr0;
+	return 0;
 }
 
 static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
@@ -4507,6 +4539,9 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
 {
 	int i, r = 0;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (!boot_cpu_has(X86_FEATURE_XSAVE))
 		return -EINVAL;
 
@@ -4776,7 +4811,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_GET_DEBUGREGS: {
 		struct kvm_debugregs dbgregs;
 
-		kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
+		r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
+		if (r)
+			break;
 
 		r = -EFAULT;
 		if (copy_to_user(argp, &dbgregs,
@@ -4802,7 +4839,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		if (!u.xsave)
 			break;
 
-		kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
+		r = kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
+		if (r)
+			break;
 
 		r = -EFAULT;
 		if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave)))
@@ -4826,7 +4865,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		if (!u.xcrs)
 			break;
 
-		kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
+		r = kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
+		if (r)
+			break;
 
 		r = -EFAULT;
 		if (copy_to_user(argp, u.xcrs,
@@ -8136,6 +8177,15 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
 {
 	struct kvm_run *kvm_run = vcpu->run;
 
+	if (vcpu->kvm->arch.guest_state_protected) {
+		kvm_run->if_flag = false;
+		kvm_run->flags = false;
+		kvm_run->cr8 = 0;
+		kvm_run->apic_base = kvm_get_apic_base(vcpu);
+		kvm_run->ready_for_interrupt_injection = false;
+		return;
+	}
+
 	kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
 	kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;
 	kvm_run->cr8 = kvm_get_cr8(vcpu);
@@ -9263,6 +9313,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (vcpu->kvm->arch.guest_state_protected &&
+	    (kvm_run->kvm_valid_regs || kvm_run->kvm_dirty_regs)) {
+		r = -EINVAL;
+		goto out;
+	}
+
 	if (kvm_run->kvm_dirty_regs) {
 		r = sync_regs(vcpu);
 		if (r != 0)
@@ -9293,7 +9349,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 out:
 	kvm_put_guest_fpu(vcpu);
-	if (kvm_run->kvm_valid_regs)
+	if (kvm_run->kvm_valid_regs && !vcpu->kvm->arch.guest_state_protected)
 		store_regs(vcpu);
 	post_kvm_run_save(vcpu);
 	kvm_sigset_deactivate(vcpu);
@@ -9340,6 +9396,9 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 
 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	__get_regs(vcpu, regs);
 	vcpu_put(vcpu);
@@ -9380,6 +9439,9 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 
 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	__set_regs(vcpu, regs);
 	vcpu_put(vcpu);
@@ -9435,6 +9497,9 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 				  struct kvm_sregs *sregs)
 {
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	__get_sregs(vcpu, sregs);
 	vcpu_put(vcpu);
@@ -9634,6 +9699,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 {
 	int ret;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	ret = __set_sregs(vcpu, sregs);
 	vcpu_put(vcpu);
@@ -9646,6 +9714,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 	unsigned long rflags;
 	int i, r;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 
 	if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
@@ -9725,6 +9796,9 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
 	struct fxregs_state *fxsave;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 
 	fxsave = &vcpu->arch.guest_fpu->state.fxsave;
@@ -9745,6 +9819,9 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
 	struct fxregs_state *fxsave;
 
+	if (vcpu->kvm->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 
 	fxsave = &vcpu->arch.guest_fpu->state.fxsave;

From patchwork Mon Nov 16 18:26:03 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910427
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 29FBAC6379F
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:13 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id F1BC82080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388054AbgKPS2C (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:02 -0500
Received: from mga06.intel.com ([134.134.136.31]:20628 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2387771AbgKPS2B (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:01 -0500
IronPort-SDR: 
 WoVba1Axp/k8FrY97SANrXZ96KizjveiwvyTcmPfkcdjqgeaW34QXXkY+6UTQM8YaTOLQDtY9G
 Tn7yaLoRDXnw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410017"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410017"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:00 -0800
IronPort-SDR: 
 0JTcbWU0K0RTbpdP8qYFfT7xWmV1wAD3/6l5xaRUGXdlr+rV4VBpoLMgZWynVipuWDrWMDyAUF
 bMdOrZHl6M0w==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527917"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:00 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 18/67] KVM: x86: Add per-VM flag to disable direct IRQ
 injection
Date: Mon, 16 Nov 2020 10:26:03 -0800
Message-Id: 
 <9b3fb23c848a5937b47b6b784aca71427bf2e001.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a flag to disable IRQ injection, which is not supported by TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c              | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e687a8bd46ad..e8180a1fe610 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -995,6 +995,7 @@ struct kvm_arch {
 	} msr_filter;
 
 	bool guest_state_protected;
+	bool irq_injection_disallowed;
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6154abecd546..ec66d5d53a1a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4041,7 +4041,8 @@ static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
 				    struct kvm_interrupt *irq)
 {
-	if (irq->irq >= KVM_NR_INTERRUPTS)
+	if (irq->irq >= KVM_NR_INTERRUPTS ||
+	    vcpu->kvm->arch.irq_injection_disallowed)
 		return -EINVAL;
 
 	if (!irqchip_in_kernel(vcpu->kvm)) {
@@ -8170,6 +8171,7 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
 static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu)
 {
 	return vcpu->run->request_interrupt_window &&
+	       !vcpu->kvm->arch.irq_injection_disallowed &&
 		likely(!pic_in_kernel(vcpu->kvm));
 }
 

From patchwork Mon Nov 16 18:26:04 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910425
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4D976C64E75
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:13 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 2476320A8B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388425AbgKPSdD (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:33:03 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2387967AbgKPS2C (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:02 -0500
IronPort-SDR: 
 di7sk8XbKdhpmIzmkNq7//68U6Td0IMEP3Td53niZX9srnYPex7W7x2Jp9tId4bbiUha2utAWO
 e39RmAJyajhQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410018"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410018"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:01 -0800
IronPort-SDR: 
 3qzqJl0UN+MA5VvL+rL7AjbQ/k+loXJpB2Uj9oqSP6mtBFw2jJwwgWs/xBrkdJb53FHZRv5Iez
 /mmzD0PYCaxQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527929"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:00 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Kai Huang <kai.huang@linux.intel.com>
Subject: [RFC PATCH 19/67] KVM: x86: Add flag to disallow #MC injection /
 KVM_X86_SETUP_MCE
Date: Mon, 16 Nov 2020 10:26:04 -0800
Message-Id: 
 <d017cd51d02e42e9f14065acc7ec35ce82edc8bd.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a flag to disallow MCE injection and reject KVM_X86_SETUP_MCE with
-EINVAL when set.  TDX doesn't support injecting exceptions, including
(virtual) #MCs.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 14 +++++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e8180a1fe610..70528102d865 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -996,6 +996,7 @@ struct kvm_arch {
 
 	bool guest_state_protected;
 	bool irq_injection_disallowed;
+	bool mce_injection_disallowed;
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ec66d5d53a1a..2fb0d20c5788 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4095,15 +4095,16 @@ static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu,
 static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 					u64 mcg_cap)
 {
-	int r;
 	unsigned bank_num = mcg_cap & 0xff, bank;
 
-	r = -EINVAL;
+	if (vcpu->kvm->arch.mce_injection_disallowed)
+		return -EINVAL;
+
 	if (!bank_num || bank_num > KVM_MAX_MCE_BANKS)
-		goto out;
+		return -EINVAL;
 	if (mcg_cap & ~(kvm_mce_cap_supported | 0xff | 0xff0000))
-		goto out;
-	r = 0;
+		return -EINVAL;
+
 	vcpu->arch.mcg_cap = mcg_cap;
 	/* Init IA32_MCG_CTL to all 1s */
 	if (mcg_cap & MCG_CTL_P)
@@ -4113,8 +4114,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
 
 	kvm_x86_ops.setup_mce(vcpu);
-out:
-	return r;
+	return 0;
 }
 
 static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,

From patchwork Mon Nov 16 18:26:05 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910415
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E109C61DD8
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 29DF320A8B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388076AbgKPS2C (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:02 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2387771AbgKPS2C (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:02 -0500
IronPort-SDR: 
 sHx3o9pnSjqvHfNUj30X6Z49HIRj5DwPWT5a3k121bZL/PUDdX8As6H6SWIcjYRg9Xeb1qi13Y
 y6iKdC0RtfDg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410022"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410022"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:01 -0800
IronPort-SDR: 
 i4yQ5Wd0OYqDT6wr/HvI9hI80aFDLX250hmupBv9oUj8JJ2JGixIo65uOqI9s+0VThpRbE+LIw
 jaGVuFRi4Reg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527941"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:01 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Isaku Yamahata <isaku.yamahata@linux.intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 20/67] KVM: x86: Make KVM_CAP_X86_SMM a per-VM capability
Date: Mon, 16 Nov 2020 10:26:05 -0800
Message-Id: 
 <df0a8d2d04bde367e67a49da97fd56253db9f13d.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Isaku Yamahata <isaku.yamahata@linux.intel.com>

TDX doesn't support SMM, whereas VMX conditionally support SMM.  Rework
kvm_x86_ops.has_emulated_msr() to take a struct kvm so that TDX can
reject SMM by way of the MSR_IA32_SMBASE check.

This pair with a QEMU change to query SMM support using a VM ioctl().

Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/svm/svm.c          | 2 +-
 arch/x86/kvm/vmx/vmx.c          | 2 +-
 arch/x86/kvm/x86.c              | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 70528102d865..00b34d8f038b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1092,7 +1092,7 @@ struct kvm_x86_ops {
 	void (*hardware_disable)(void);
 	void (*hardware_unsetup)(void);
 	bool (*cpu_has_accelerated_tpr)(void);
-	bool (*has_emulated_msr)(u32 index);
+	bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
 
 	bool (*is_vm_type_supported)(unsigned long vm_type);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 11ab330a9b55..241a26e1fa71 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3720,7 +3720,7 @@ static bool svm_cpu_has_accelerated_tpr(void)
 	return false;
 }
 
-static bool svm_has_emulated_msr(u32 index)
+static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
 {
 	switch (index) {
 	case MSR_IA32_MCG_EXT_CTL:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b3ecdb96789a..2ee7eb7dac26 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6405,7 +6405,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		handle_exception_nmi_irqoff(vmx);
 }
 
-static bool vmx_has_emulated_msr(u32 index)
+static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
 {
 	switch (index) {
 	case MSR_IA32_SMBASE:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2fb0d20c5788..2f4b226d5b89 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3726,7 +3726,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 * fringe case that is not enabled except via specific settings
 		 * of the module parameters.
 		 */
-		r = kvm_x86_ops.has_emulated_msr(MSR_IA32_SMBASE);
+		r = kvm_x86_ops.has_emulated_msr(kvm, MSR_IA32_SMBASE);
 		break;
 	case KVM_CAP_VAPIC:
 		r = !kvm_x86_ops.cpu_has_accelerated_tpr();
@@ -5783,7 +5783,7 @@ static void kvm_init_msr_list(void)
 	}
 
 	for (i = 0; i < ARRAY_SIZE(emulated_msrs_all); i++) {
-		if (!kvm_x86_ops.has_emulated_msr(emulated_msrs_all[i]))
+		if (!kvm_x86_ops.has_emulated_msr(NULL, emulated_msrs_all[i]))
 			continue;
 
 		emulated_msrs[num_emulated_msrs++] = emulated_msrs_all[i];

From patchwork Mon Nov 16 18:26:06 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910421
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E721BC2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:11 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 907D9207BC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388081AbgKPS2D (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:03 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388071AbgKPS2D (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:03 -0500
IronPort-SDR: 
 1HLzF5QyXCIhTflP7BoYF0IM5uel07NygjZgy/oL+3osIbj1WpIQA1LODk9QztwP7l64BV238I
 jmBh+hWm4U6Q==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410025"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410025"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:02 -0800
IronPort-SDR: 
 PlpzYBcZGbVAeGL9Uy62ntD/c2SVXtPqBExV8eNBaqm+leIhHAEegXnSXPZ6MTCybggcemF5OU
 q9zjMBYs+l/Q==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527961"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:01 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 21/67] KVM: x86: Add flag to mark TSC as immutable (for
 TDX)
Date: Mon, 16 Nov 2020 10:26:06 -0800
Message-Id: 
 <7db8eb4687c539bbecb3a725e5fb345dd8560ae0.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

The TSC for TDX1 guests is fixed at TD creation time.  Add tsc_immutable
to reflect that the TSC of the guest cannot be changed in any way, and
use it to short circuit all paths that lead to one of the myriad TSC
adjustment flows.

Suggested-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 35 +++++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 00b34d8f038b..e5b706889d09 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -968,6 +968,7 @@ struct kvm_arch {
 	int audit_point;
 	#endif
 
+	bool tsc_immutable;
 	bool backwards_tsc_observed;
 	bool boot_vcpu_runs_old_kvmclock;
 	u32 bsp_vcpu_id;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2f4b226d5b89..01380f057d9f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2040,7 +2040,9 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 	u64 ratio;
 
 	/* Guest TSC same frequency as host TSC? */
-	if (!scale) {
+	if (!scale || vcpu->kvm->arch.tsc_immutable) {
+		if (scale)
+			pr_warn_ratelimited("Guest TSC immutable, scaling not supported\n");
 		vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
 		return 0;
 	}
@@ -2216,6 +2218,9 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 	bool already_matched;
 	bool synchronizing = false;
 
+	if (WARN_ON_ONCE(vcpu->kvm->arch.tsc_immutable))
+		return;
+
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
 	offset = kvm_compute_tsc_offset(vcpu, data);
 	ns = get_kvmclock_base_ns();
@@ -2641,6 +2646,10 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	u8 pvclock_flags;
 	bool use_master_clock;
 
+	/* Unable to update guest time if the TSC is immutable. */
+	if (ka->tsc_immutable)
+		return 0;
+
 	kernel_ns = 0;
 	host_tsc = 0;
 
@@ -3915,7 +3924,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		if (tsc_delta < 0)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
-		if (kvm_check_tsc_unstable()) {
+		if (kvm_check_tsc_unstable() &&
+		    !vcpu->kvm->arch.tsc_immutable) {
 			u64 offset = kvm_compute_tsc_offset(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
@@ -3929,7 +3939,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		 * On a host with synchronized TSC, there is no need to update
 		 * kvmclock on vcpu->cpu migration
 		 */
-		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
+		if ((!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) &&
+		    !vcpu->kvm->arch.tsc_immutable)
 			kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
 		if (vcpu->cpu != cpu)
 			kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
@@ -4888,10 +4899,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		break;
 	}
 	case KVM_SET_TSC_KHZ: {
-		u32 user_tsc_khz;
+		u32 user_tsc_khz = (u32)arg;
 
 		r = -EINVAL;
-		user_tsc_khz = (u32)arg;
+		if (vcpu->kvm->arch.tsc_immutable)
+			goto out;
 
 		if (kvm_has_tsc_control &&
 		    user_tsc_khz >= kvm_max_guest_tsc_khz)
@@ -10013,9 +10025,12 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 
 	if (mutex_lock_killable(&vcpu->mutex))
 		return;
-	vcpu_load(vcpu);
-	kvm_synchronize_tsc(vcpu, 0);
-	vcpu_put(vcpu);
+
+	if (!kvm->arch.tsc_immutable) {
+		vcpu_load(vcpu);
+		kvm_synchronize_tsc(vcpu, 0);
+		vcpu_put(vcpu);
+	}
 
 	/* poll control enabled by default */
 	vcpu->arch.msr_kvm_poll_control = 1;
@@ -10209,6 +10224,10 @@ int kvm_arch_hardware_enable(void)
 	if (backwards_tsc) {
 		u64 delta_cyc = max_tsc - local_tsc;
 		list_for_each_entry(kvm, &vm_list, vm_list) {
+			if (vcpu->kvm->arch.tsc_immutable) {
+				pr_warn_ratelimited("Backwards TSC observed and guest with immutable TSC active\n");
+				continue;
+			}
 			kvm->arch.backwards_tsc_observed = true;
 			kvm_for_each_vcpu(i, vcpu, kvm) {
 				vcpu->arch.tsc_offset_adjustment += delta_cyc;

From patchwork Mon Nov 16 18:26:07 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910419
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3257DC4742C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id E41402231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732897AbgKPScm (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:32:42 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2387771AbgKPS2D (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:03 -0500
IronPort-SDR: 
 ZXajAuBrZ2koRQ28pD2moigbziGlX5TanRdDgllmNqAhUC1OpEKmBDabfHhaqe2DB2jd3nW74q
 oL90CDntHhzw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410028"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410028"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:02 -0800
IronPort-SDR: 
 HSmSmve9NahGtLT9P0ekBHeZ+ETPGer5F6hFhlc8Q5AxThCCdr1jMH67bHx+ERS18CDjE7zU2t
 i7x5r3Jy1dHw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527973"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:02 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 22/67] KVM: Add per-VM flag to mark read-only memory as
 unsupported
Date: Mon, 16 Nov 2020 10:26:07 -0800
Message-Id: 
 <499bf4c92e07de7e27745cf8d266a1932d18d85f.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a flag for TDX to flag RO memory as unsupported and propagate it to
KVM_MEM_READONLY to allow reporting RO memory as unsupported on a per-VM
basis.  TDX1 doesn't expose permission bits to the VMM in the SEPT
tables, i.e. doesn't support read-only private memory.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c       | 4 +++-
 include/linux/kvm_host.h | 4 ++++
 virt/kvm/kvm_main.c      | 8 +++++---
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 01380f057d9f..4060f3d91f74 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3695,7 +3695,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ASYNC_PF_INT:
 	case KVM_CAP_GET_TSC_KHZ:
 	case KVM_CAP_KVMCLOCK_CTRL:
-	case KVM_CAP_READONLY_MEM:
 	case KVM_CAP_HYPERV_TIME:
 	case KVM_CAP_IOAPIC_POLARITY_IGNORED:
 	case KVM_CAP_TSC_DEADLINE_TIMER:
@@ -3785,6 +3784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		if (kvm_x86_ops.is_vm_type_supported(KVM_X86_TDX_VM))
 			r |= BIT(KVM_X86_TDX_VM);
 		break;
+	case KVM_CAP_READONLY_MEM:
+		r = kvm && kvm->readonly_mem_unsupported ? 0 : 1;
+		break;
 	default:
 		break;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 95371750c23f..1a0df7b83fd0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -517,6 +517,10 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 
+#ifdef __KVM_HAVE_READONLY_MEM
+	bool readonly_mem_unsupported;
+#endif
+
 	bool vm_bugged;
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3dc41b6e12a0..572a66a61c29 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1100,12 +1100,14 @@ static void update_memslots(struct kvm_memslots *slots,
 	}
 }
 
-static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem)
+static int check_memory_region_flags(struct kvm *kvm,
+				     const struct kvm_userspace_memory_region *mem)
 {
 	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
 #ifdef __KVM_HAVE_READONLY_MEM
-	valid_flags |= KVM_MEM_READONLY;
+	if (!kvm->readonly_mem_unsupported)
+		valid_flags |= KVM_MEM_READONLY;
 #endif
 
 	if (mem->flags & ~valid_flags)
@@ -1278,7 +1280,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	int as_id, id;
 	int r;
 
-	r = check_memory_region_flags(mem);
+	r = check_memory_region_flags(kvm, mem);
 	if (r)
 		return r;
 

From patchwork Mon Nov 16 18:26:08 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910411
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E835EC2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:37 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id AF4ED207BC
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388097AbgKPS2G (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:06 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388086AbgKPS2E (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:04 -0500
IronPort-SDR: 
 +JsGjOS9Qt2iT8fBSSW0DBTECaxYSi2JPoak/1nKUGvpnAqCkIQfLHamzv15e0IF2R5PWIfXtG
 JxDnjdTKzRBg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410031"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410031"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:03 -0800
IronPort-SDR: 
 kmdBeUL4mdS0+jujBwt2ADTN0JowZGlBjbiaErA3r/ls19CNWuswZNlgUfjAmq523RLBg4SlcU
 1EbibhERw4fg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527986"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:02 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 23/67] KVM: Add per-VM flag to disable dirty logging of
 memslots for TDs
Date: Mon, 16 Nov 2020 10:26:08 -0800
Message-Id: 
 <b8c847e84fd298abb25047b305ee76be13bea1b0.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a flag for TDX to mark dirty logging as unsupported.

Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c      | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1a0df7b83fd0..9682282cb258 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -517,6 +517,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 
+	bool dirty_log_unsupported;
 #ifdef __KVM_HAVE_READONLY_MEM
 	bool readonly_mem_unsupported;
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 572a66a61c29..aa5f27753756 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1103,7 +1103,10 @@ static void update_memslots(struct kvm_memslots *slots,
 static int check_memory_region_flags(struct kvm *kvm,
 				     const struct kvm_userspace_memory_region *mem)
 {
-	u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
+	u32 valid_flags = 0;
+
+	if (!kvm->dirty_log_unsupported)
+		valid_flags |= KVM_MEM_LOG_DIRTY_PAGES;
 
 #ifdef __KVM_HAVE_READONLY_MEM
 	if (!kvm->readonly_mem_unsupported)

From patchwork Mon Nov 16 18:26:09 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910413
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 88472C63798
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 52CF120853
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388455AbgKPScT (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:32:19 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388071AbgKPS2E (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:04 -0500
IronPort-SDR: 
 xW2djYd6UIkYTVsm89L1yjIXtKJZgBEvrlrawqI9tIGcuVrY0jr8bAVSYA4lVlN0tG1HhWlwgw
 lPQNWq4VqXmw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410033"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410033"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:03 -0800
IronPort-SDR: 
 9VpXd3tfmiTiB2v42AVl1c4CsTDzG49xG8gcTbofRCACxUXSjLMK35E4OrfO5ttYRpxuxucSkZ
 33pCbPmc+L6A==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400527995"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:03 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Kai Huang <kai.huang@linux.intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 24/67] KVM: x86: Add per-VM flag to disable in-kernel I/O
 APIC and level routes
Date: Mon, 16 Nov 2020 10:26:09 -0800
Message-Id: 
 <b379033c7cc157444993dac67755575b486cb232.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Kai Huang <kai.huang@linux.intel.com>

Add a flag to let TDX disallow the in-kernel I/O APIC, level triggered
routes for a userspace I/O APIC, and anything else that relies on being
able to intercept EOIs.  TDX-SEAM does not allow intercepting EOI.

Note, technically KVM could partially emulate the I/O APIC by allowing
only edge triggered interrupts, but that adds a lot of complexity for
basically zero benefit.  Ideally KVM wouldn't even allow I/O APIC route
reservation, but disabling that is a train wreck for Qemu.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/ioapic.c           | 4 ++++
 arch/x86/kvm/irq_comm.c         | 6 +++++-
 arch/x86/kvm/lapic.c            | 3 ++-
 arch/x86/kvm/x86.c              | 6 ++++++
 5 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e5b706889d09..7537ba0bada2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -977,6 +977,7 @@ struct kvm_arch {
 
 	enum kvm_irqchip_mode irqchip_mode;
 	u8 nr_reserved_ioapic_pins;
+	bool eoi_intercept_unsupported;
 
 	bool disabled_lapic_found;
 
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 698969e18fe3..e2de6e552d25 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -311,6 +311,10 @@ void kvm_arch_post_irq_ack_notifier_list_update(struct kvm *kvm)
 {
 	if (!ioapic_in_kernel(kvm))
 		return;
+
+	if (WARN_ON_ONCE(kvm->arch.eoi_intercept_unsupported))
+		return;
+
 	kvm_make_scan_ioapic_request(kvm);
 }
 
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 4aa1c2e00e2a..1523e9d66867 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -307,6 +307,10 @@ int kvm_set_routing_entry(struct kvm *kvm,
 		e->msi.address_hi = ue->u.msi.address_hi;
 		e->msi.data = ue->u.msi.data;
 
+		if (kvm->arch.eoi_intercept_unsupported &&
+		    e->msi.data & (1 << MSI_DATA_TRIGGER_SHIFT))
+			return -EINVAL;
+
 		if (kvm_msi_route_invalid(kvm, e))
 			return -EINVAL;
 		break;
@@ -390,7 +394,7 @@ int kvm_setup_empty_irq_routing(struct kvm *kvm)
 
 void kvm_arch_post_irq_routing_update(struct kvm *kvm)
 {
-	if (!irqchip_split(kvm))
+	if (!irqchip_split(kvm) || kvm->arch.eoi_intercept_unsupported)
 		return;
 	kvm_make_scan_ioapic_request(kvm);
 }
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 105e7859d1f2..e6c0aaf4044e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -278,7 +278,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 	if (old)
 		call_rcu(&old->rcu, kvm_apic_map_free);
 
-	kvm_make_scan_ioapic_request(kvm);
+	if (!kvm->arch.eoi_intercept_unsupported)
+		kvm_make_scan_ioapic_request(kvm);
 }
 
 static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4060f3d91f74..8d58141256c5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5454,6 +5454,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto create_irqchip_unlock;
 
 		r = -EINVAL;
+		if (kvm->arch.eoi_intercept_unsupported)
+			goto create_irqchip_unlock;
+
 		if (kvm->created_vcpus)
 			goto create_irqchip_unlock;
 
@@ -5484,6 +5487,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		u.pit_config.flags = KVM_PIT_SPEAKER_DUMMY;
 		goto create_pit;
 	case KVM_CREATE_PIT2:
+		r = -EINVAL;
+		if (kvm->arch.eoi_intercept_unsupported)
+			goto out;
 		r = -EFAULT;
 		if (copy_from_user(&u.pit_config, argp,
 				   sizeof(struct kvm_pit_config)))

From patchwork Mon Nov 16 18:26:10 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910423
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0C92FC55ABD
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id B847720A8B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:33:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388382AbgKPScS (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:32:18 -0500
Received: from mga06.intel.com ([134.134.136.31]:20636 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388092AbgKPS2F (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:05 -0500
IronPort-SDR: 
 HGyHCLRlsR6rhvGFeeAmTMZEHZFDS3CGzP6E63U4g0vHlo/Z48S02Bv/AhZRgxyfRm0HwtiowQ
 FcZeZRw14xwQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410034"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410034"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:04 -0800
IronPort-SDR: 
 efEQDO8MHzFjirZlU/JQsPOVdBDIf2Lalk0TGZC8zFXHSz75h6ldUESOHHMrjB+dSYEEifotY3
 vsOwBahEB5yw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528010"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:03 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 25/67] KVM: x86: Allow host-initiated WRMSR to set X2APIC
 regardless of CPUID
Date: Mon, 16 Nov 2020 10:26:10 -0800
Message-Id: 
 <66de675d14feb088d60051501523848784d94044.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Let userspace, or in the case of TDX, KVM itself, enable X2APIC even if
X2APIC is not reported as supported in the guest's CPU model.  KVM
generally does not force specific ordering between ioctls(), e.g. this
forces userspace to configure CPUID before MSRs.  And for TDX, vCPUs
will always run with X2APIC enabled, e.g. KVM will want/need to enable
X2APIC from time zero.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8d58141256c5..a1c57d1eb460 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -394,8 +394,11 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	enum lapic_mode old_mode = kvm_get_apic_mode(vcpu);
 	enum lapic_mode new_mode = kvm_apic_mode(msr_info->data);
-	u64 reserved_bits = ((~0ULL) << cpuid_maxphyaddr(vcpu)) | 0x2ff |
-		(guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);
+	u64 reserved_bits = ((~0ULL) << cpuid_maxphyaddr(vcpu)) | 0x2ff;
+
+	if (!msr_info->host_initiated &&
+	    !guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
+		reserved_bits |= X2APIC_ENABLE;
 
 	if ((msr_info->data & reserved_bits) != 0 || new_mode == LAPIC_MODE_INVALID)
 		return 1;

From patchwork Mon Nov 16 18:26:11 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910385
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7BF5EC2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 415B920780
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388211AbgKPS2I (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:08 -0500
Received: from mga06.intel.com ([134.134.136.31]:20638 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388095AbgKPS2G (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:06 -0500
IronPort-SDR: 
 cRGVddNAopi48G6pJKgAFoC3dBwnwlX445Yp6GyyEPelr2GOHncisD+6eSGzgQMZ1vOoQvvFTi
 wSSAyxMsATvQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410035"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410035"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:04 -0800
IronPort-SDR: 
 siudl/3OmboItfUXg82BNh2u5l2IKHgnkwlMqzSDHugZoEFjK/1vQ0YNW39ilQbT044gUIgIWa
 fHSD1U5mAY1Q==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528018"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:04 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Tom Lendacky <thomas.lendacky@amd.com>
Subject: [RFC PATCH 26/67] KVM: x86: Add kvm_x86_ops .cache_gprs() and
 .flush_gprs()
Date: Mon, 16 Nov 2020 10:26:11 -0800
Message-Id: 
 <e57e13793023cf2e605b93e8681b520eb492197b.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add hooks to cache and flush GPRs and invoke them from KVM_GET_REGS and
KVM_SET_REGS respecitively.  TDX will use the hooks to read/write GPRs
from TDX-SEAM on-demand (for debug TDs).

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/x86.c              | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7537ba0bada2..01c78eeefef4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1130,6 +1130,8 @@ struct kvm_x86_ops {
 	void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*sync_dirty_debug_regs)(struct kvm_vcpu *vcpu);
 	void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
+	void (*cache_gprs)(struct kvm_vcpu *vcpu);
+	void (*flush_gprs)(struct kvm_vcpu *vcpu);
 	void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
 	unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1c57d1eb460..22e956f01ddc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9385,6 +9385,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+	if (kvm_x86_ops.cache_gprs)
+		kvm_x86_ops.cache_gprs(vcpu);
+
 	if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
 		/*
 		 * We are here if userspace calls get_regs() in the middle of
@@ -9459,6 +9462,9 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 
 	vcpu->arch.exception.pending = false;
 
+	if (kvm_x86_ops.flush_gprs)
+		kvm_x86_ops.flush_gprs(vcpu);
+
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 

From patchwork Mon Nov 16 18:26:12 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910407
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5606CC64E7C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:02 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 22C93206E0
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388282AbgKPSbq (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:31:46 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388086AbgKPS2H (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:07 -0500
IronPort-SDR: 
 1d5VERfjlxyyJO5+vvxDeYX3iJnR8rQBLWOYEMc5rdHAtUKclbwkHnK/AZ/IRl4Wp/Io9a31sA
 M7uWCFQzO+tw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410036"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410036"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:05 -0800
IronPort-SDR: 
 Ws7KFcD1VWTxivoZy45znlk81pJhdslnPXH2pnnNBT1RnLATydKvOSjWlJTNtZY8XBTsj79hzY
 ZWMm2BnFzHGA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528022"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:04 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 27/67] KVM: x86: Add support for vCPU and device-scoped
 KVM_MEMORY_ENCRYPT_OP
Date: Mon, 16 Nov 2020 10:26:12 -0800
Message-Id: 
 <69f91c5f7625f3e63e15bba4de17ac3765853071.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 01c78eeefef4..32e995327944 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1276,7 +1276,9 @@ struct kvm_x86_ops {
 	int (*pre_leave_smm)(struct kvm_vcpu *vcpu, const char *smstate);
 	void (*enable_smi_window)(struct kvm_vcpu *vcpu);
 
+	int (*mem_enc_op_dev)(void __user *argp);
 	int (*mem_enc_op)(struct kvm *kvm, void __user *argp);
+	int (*mem_enc_op_vcpu)(struct kvm_vcpu *vcpu, void __user *argp);
 	int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 22e956f01ddc..7b8bbdc98492 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3882,6 +3882,12 @@ long kvm_arch_dev_ioctl(struct file *filp,
 	case KVM_GET_MSRS:
 		r = msr_io(NULL, argp, do_get_msr_feature, 1);
 		break;
+	case KVM_MEMORY_ENCRYPT_OP:
+		r = -EINVAL;
+		if (!kvm_x86_ops.mem_enc_op_dev)
+			goto out;
+		r = kvm_x86_ops.mem_enc_op_dev(argp);
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -5020,6 +5026,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_MEMORY_ENCRYPT_OP:
+		r = -EINVAL;
+		if (!kvm_x86_ops.mem_enc_op_vcpu)
+			goto out;
+		r = kvm_x86_ops.mem_enc_op_vcpu(vcpu, argp);
+		break;
 	default:
 		r = -EINVAL;
 	}

From patchwork Mon Nov 16 18:26:13 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910399
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B11ECC55ABD
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 5DE8B2078D
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388168AbgKPS2I (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:08 -0500
Received: from mga06.intel.com ([134.134.136.31]:20638 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388103AbgKPS2H (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:07 -0500
IronPort-SDR: 
 2imAF1Tzfn1GrhVnu5GlKSrsIJ8KowrIddRCZUrhuCm+QFyWVXBzMt+Z+wXR4+MqlqQSGqxK7o
 SSNIrY8oikFw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410038"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410038"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:05 -0800
IronPort-SDR: 
 YSowsfb0gkqGsaTiZQuJSKr8TfV7PdtFN3V8zLg75miiBfviUT6hAjuTQ5tIz2I9W5QxqQknRM
 C9tDXcdqYNHg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528025"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:05 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 28/67] KVM: x86: Introduce vm_teardown() hook in
 kvm_arch_vm_destroy()
Date: Mon, 16 Nov 2020 10:26:13 -0800
Message-Id: 
 <54b79b2f3571737c0fa7ae516212eef6cc056ccc.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a second kvm_x86_ops hook in kvm_arch_vm_destroy() to support TDX's
destruction path, which needs to first put the VM into a teardown state,
then free per-vCPU resource, and finally free per-VM resources.

Note, this knowingly creates a discrepancy in nomenclature for SVM as
svm_vm_teardown() invokes avic_vm_destroy() and sev_vm_destroy().
Moving the now-misnamed functions or renaming them is left to a future
patch so as not to introduce a functional change for SVM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm/svm.c          |  8 +++++++-
 arch/x86/kvm/vmx/vmx.c          | 12 ++++++++++++
 arch/x86/kvm/x86.c              |  4 ++--
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32e995327944..a6c89666ec49 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1100,6 +1100,7 @@ struct kvm_x86_ops {
 	bool (*is_vm_type_supported)(unsigned long vm_type);
 	unsigned int vm_size;
 	int (*vm_init)(struct kvm *kvm);
+	void (*vm_teardown)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
 
 	/* Create, but do not attach this VCPU */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 241a26e1fa71..15836446a9b8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4155,12 +4155,17 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 		   (vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
 }
 
-static void svm_vm_destroy(struct kvm *kvm)
+static void svm_vm_teardown(struct kvm *kvm)
 {
 	avic_vm_destroy(kvm);
 	sev_vm_destroy(kvm);
 }
 
+static void svm_vm_destroy(struct kvm *kvm)
+{
+
+}
+
 static bool svm_is_vm_type_supported(unsigned long type)
 {
 	return type == KVM_X86_LEGACY_VM;
@@ -4195,6 +4200,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.is_vm_type_supported = svm_is_vm_type_supported,
 	.vm_size = sizeof(struct kvm_svm),
 	.vm_init = svm_vm_init,
+	.vm_teardown = svm_vm_teardown,
 	.vm_destroy = svm_vm_destroy,
 
 	.prepare_guest_switch = svm_prepare_guest_switch,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2ee7eb7dac26..3559b51f566d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7010,6 +7010,16 @@ static int vmx_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static void vmx_vm_teardown(struct kvm *kvm)
+{
+
+}
+
+static void vmx_vm_destroy(struct kvm *kvm)
+{
+
+}
+
 static int __init vmx_check_processor_compat(void)
 {
 	struct vmcs_config vmcs_conf;
@@ -7611,6 +7621,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.is_vm_type_supported = vmx_is_vm_type_supported,
 	.vm_size = sizeof(struct kvm_vmx),
 	.vm_init = vmx_vm_init,
+	.vm_teardown = vmx_vm_teardown,
+	.vm_destroy = vmx_vm_destroy,
 
 	.vcpu_create = vmx_create_vcpu,
 	.vcpu_free = vmx_free_vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7b8bbdc98492..42bd24ba7fdd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10533,10 +10533,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 		__x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, 0, 0);
 		mutex_unlock(&kvm->slots_lock);
 	}
-	if (kvm_x86_ops.vm_destroy)
-		kvm_x86_ops.vm_destroy(kvm);
 	for (i = 0; i < kvm->arch.msr_filter.count; i++)
 		kfree(kvm->arch.msr_filter.ranges[i].bitmap);
+	kvm_x86_ops.vm_teardown(kvm);
 	kvm_pic_destroy(kvm);
 	kvm_ioapic_destroy(kvm);
 	kvm_free_vcpus(kvm);
@@ -10545,6 +10544,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_mmu_uninit_vm(kvm);
 	kvm_page_track_cleanup(kvm);
 	kvm_hv_destroy_vm(kvm);
+	kvm_x86_ops.vm_destroy(kvm);
 }
 
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)

From patchwork Mon Nov 16 18:26:14 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910409
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E3302C64E7A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:01 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id A253A206E0
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388621AbgKPSbr (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:31:47 -0500
Received: from mga06.intel.com ([134.134.136.31]:20636 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388104AbgKPS2H (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:07 -0500
IronPort-SDR: 
 LyT4YNPKVF3BeXDaQa1ur2noX03aG+tv/L+Ztl3hWzZriUvbcQ6vMhQZne43mxFACzDKxdjZOW
 LFbldh9G5SLA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410039"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410039"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:06 -0800
IronPort-SDR: 
 ASwCFKCLy4OB4+gf56DduJlNArpGMnds+U8gtJMjxBGFNEsyjNPUrk1yfQiQe2DK7ACyGhmNp8
 k1mX1KP5X9Iw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528033"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:05 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 29/67] KVM: x86: Add a switch_db_regs flag to handle TDX's
 auto-switched behavior
Date: Mon, 16 Nov 2020 10:26:14 -0800
Message-Id: 
 <df78c741300046037935c0f7ae763e302e66f639.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a flag, KVM_DEBUGREG_AUTO_SWITCHED, to skip saving/restoring DRs
irrespective of any other flags.  TDX-SEAM unconditionally saves and
restores host DRs, ergo there is nothing to do.

Opportunistically convert the KVM_DEBUGREG_* definitions to use BIT().

Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 7 ++++---
 arch/x86/kvm/x86.c              | 6 ++++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a6c89666ec49..815469875445 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -464,9 +464,10 @@ struct kvm_pmu {
 struct kvm_pmu_ops;
 
 enum {
-	KVM_DEBUGREG_BP_ENABLED = 1,
-	KVM_DEBUGREG_WONT_EXIT = 2,
-	KVM_DEBUGREG_RELOAD = 4,
+	KVM_DEBUGREG_BP_ENABLED 	= BIT(0),
+	KVM_DEBUGREG_WONT_EXIT		= BIT(1),
+	KVM_DEBUGREG_RELOAD		= BIT(2),
+	KVM_DEBUGREG_AUTO_SWITCHED	= BIT(3),
 };
 
 struct kvm_mtrr_range {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bd24ba7fdd..098888edc3ad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9009,7 +9009,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		switch_fpu_return();
 
-	if (unlikely(vcpu->arch.switch_db_regs)) {
+	if (unlikely(vcpu->arch.switch_db_regs & ~KVM_DEBUGREG_AUTO_SWITCHED)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);
 		set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -9029,6 +9029,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 */
 	if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) {
 		WARN_ON(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP);
+		WARN_ON(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCHED);
 		kvm_x86_ops.sync_dirty_debug_regs(vcpu);
 		kvm_update_dr0123(vcpu);
 		kvm_update_dr7(vcpu);
@@ -9042,7 +9043,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * care about the messed up debug address registers. But if
 	 * we have some of them active, restore the old state.
 	 */
-	if (hw_breakpoint_active())
+	if (hw_breakpoint_active() &&
+	    !(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCHED))
 		hw_breakpoint_restore();
 
 	vcpu->arch.last_vmentry_cpu = vcpu->cpu;

From patchwork Mon Nov 16 18:26:15 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910403
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 075A9C55ABD
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id C76512080A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388218AbgKPScD (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:32:03 -0500
Received: from mga06.intel.com ([134.134.136.31]:20638 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388105AbgKPS2H (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:07 -0500
IronPort-SDR: 
 8h7dCpqrMi/07FLmw0OTrbtlI3j8yv1h99ZLocsxs2mnesjDXuGRmmo09WV9tHF8lfBSI33Odw
 6U+/1YAg7lDg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410041"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410041"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:06 -0800
IronPort-SDR: 
 ajiByAb1hVChYsZUFFQ7uIGLi6KwM6u6oxMMjqY9x9doO9Vi3e4cvqsAx9/chXBzCakBLPE+ld
 XdccLw3JdClQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528044"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:06 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 30/67] KVM: x86: Check for pending APICv interrupt in
 kvm_vcpu_has_events()
Date: Mon, 16 Nov 2020 10:26:15 -0800
Message-Id: 
 <a3dff0c9711f20071f971b0036573cfc26153f32.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Return true for kvm_vcpu_has_events() if the vCPU has a pending APICv
interrupt to support TDX's usage of APICv.  Unlike VMX, TDX doesn't have
access to vmcs.GUEST_INTR_STATUS and so can't emulate posted interrupts,
i.e. needs to generate a posted interrupt and more importantly can't
manually move requested interrupts into the vIRR (which it also doesn't
have access to).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 098888edc3ad..c233e7ef3366 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10813,7 +10813,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
-	    kvm_guest_apic_has_interrupt(vcpu)))
+	     kvm_guest_apic_has_interrupt(vcpu) ||
+	     (vcpu->arch.apicv_active &&
+	      kvm_x86_ops.dy_apicv_has_pending_interrupt(vcpu))))
 		return true;
 
 	if (kvm_hv_has_stimer_pending(vcpu))

From patchwork Mon Nov 16 18:26:16 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910401
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 32AEEC63798
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:01 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 0146220780
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388669AbgKPSb0 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:31:26 -0500
Received: from mga06.intel.com ([134.134.136.31]:20638 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388161AbgKPS2I (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:08 -0500
IronPort-SDR: 
 tSRz0Edq9M/BnbHOeBeO5QjhyucvHcytcscaINjsciUT0/XXF2pU8K2jx4WDl41zZUrkhGHGns
 MzAt0sFwkWlw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410044"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410044"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:07 -0800
IronPort-SDR: 
 KuA76qT6+fBbBZrBxrNugSPCwPfmSI1+L6kV2+9GCK1cQJlWyrEiFsHYKiXVG/MmbMrsXmjLNP
 wcFA3pYIzMZA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528055"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:06 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 31/67] KVM: x86: Add option to force LAPIC expiration wait
Date: Mon, 16 Nov 2020 10:26:16 -0800
Message-Id: 
 <571d4b6878fbc96db4d4584957c8cafd6a01490e.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add an option to skip the IRR check in kvm_wait_lapic_expire().  This
will be used by TDX to wait if there is an outstanding notification for
a TD, i.e. a virtual interrupt is being triggered via posted interrupt
processing.  KVM TDX doesn't emulate PI processing, i.e. there will
never be a bit set in IRR/ISR, so the default behavior for APICv of
querying the IRR doesn't work as intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/lapic.c   | 6 +++---
 arch/x86/kvm/lapic.h   | 2 +-
 arch/x86/kvm/svm/svm.c | 2 +-
 arch/x86/kvm/vmx/vmx.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e6c0aaf4044e..41dce91f5df0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1601,12 +1601,12 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
 }
 
-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool force_wait)
 {
 	if (lapic_in_kernel(vcpu) &&
 	    vcpu->arch.apic->lapic_timer.expired_tscdeadline &&
 	    vcpu->arch.apic->lapic_timer.timer_advance_ns &&
-	    lapic_timer_int_injected(vcpu))
+	    (force_wait || lapic_timer_int_injected(vcpu)))
 		__kvm_wait_lapic_expire(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire);
@@ -1642,7 +1642,7 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool from_timer_fn)
 	}
 
 	if (kvm_use_posted_timer_interrupt(apic->vcpu)) {
-		kvm_wait_lapic_expire(vcpu);
+		kvm_wait_lapic_expire(vcpu, false);
 		kvm_apic_inject_pending_timer_irqs(apic);
 		return;
 	}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4fb86e3a9dd3..30f036678f5c 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -237,7 +237,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool force_wait);
 
 void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			      unsigned long *vcpu_bitmap);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 15836446a9b8..8be23240c74f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3580,7 +3580,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	clgi();
 	kvm_load_guest_xsave_state(vcpu);
 
-	kvm_wait_lapic_expire(vcpu);
+	kvm_wait_lapic_expire(vcpu, false);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3559b51f566d..deeec105e963 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6720,7 +6720,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (enable_preemption_timer)
 		vmx_update_hv_timer(vcpu);
 
-	kvm_wait_lapic_expire(vcpu);
+	kvm_wait_lapic_expire(vcpu, false);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if

From patchwork Mon Nov 16 18:26:17 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910397
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C3846C63697
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 859D0206E0
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388649AbgKPSbP (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:31:15 -0500
Received: from mga06.intel.com ([134.134.136.31]:20636 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388175AbgKPS2I (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:08 -0500
IronPort-SDR: 
 7WFjOgM1GJoXgBGEKdjDkMHi1+Ias72Oxv1gjNlaiPvU55ZMCP70sWzGVnSSp0BT0DLwq16dTj
 MQ2e9ia+kPXw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410045"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410045"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:07 -0800
IronPort-SDR: 
 85+oEt1CoVwHMe8bamduFw/uAex3BLcPyQyUvXecq9zwY6p43Fri22AC6BGwTVvZeMATulvs7o
 KOtWe5hg/KCQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528069"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:07 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 32/67] KVM: x86: Add guest_supported_xss placholder
Date: Mon, 16 Nov 2020 10:26:17 -0800
Message-Id: 
 <4849ccb92513affa6fdbc83dd96d37b1fa8e5cff.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a per-vcpu placeholder for the support XSS of the guest so that the
TDX configuration code doesn't need to hack in manual computation of the
supported XSS.  KVM XSS enabling is currently being upstreamed, i.e.
guest_supported_xss will no longer be a placeholder by the time TDX is
ready for upstreaming (hopefully).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 815469875445..6dfc09092bc9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -612,6 +612,7 @@ struct kvm_vcpu_arch {
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
 
 	struct kvm_pio_request pio;
 	void *pio_data;

From patchwork Mon Nov 16 18:26:18 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910405
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1A528C63777
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:01 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id CF17A206E0
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:32:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388662AbgKPSbZ (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:31:25 -0500
Received: from mga06.intel.com ([134.134.136.31]:20632 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388194AbgKPS2I (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:08 -0500
IronPort-SDR: 
 ZeG1QOzTL+T10rZfMpToTD/deM3wVL351RpA/tj0fH/Ti8zCXGuD0sBJcfaNvbA8Yart34MPFZ
 9lFwKo3/CkhQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410047"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410047"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:07 -0800
IronPort-SDR: 
 IkSg7+/V459wB9fwH3DtZv6KXZtYbDbJNyMmTD5smWpm5ennW+BfsSJNtk3cpdpEIdUSCK//Ux
 pDoShjzkK2dw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528081"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:07 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 33/67] KVM: Export kvm_is_reserved_pfn() for use by TDX
Date: Mon, 16 Nov 2020 10:26:18 -0800
Message-Id: 
 <166f7cdaac4eadf86615bcc508164235e13d76b7.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX will use kvm_is_reserved_pfn() to prevent installing a reserved PFN
int SEPT.  Or rather, to prevent such an attempt, as reserved PFNs are
not covered by TDMRs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index aa5f27753756..a60dcf682f33 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -185,6 +185,7 @@ bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
 
 	return true;
 }
+EXPORT_SYMBOL_GPL(kvm_is_reserved_pfn);
 
 bool kvm_is_transparent_hugepage(kvm_pfn_t pfn)
 {

From patchwork Mon Nov 16 18:26:19 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910389
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 40BAAC71155
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:10 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 0786B22370
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388597AbgKPSas (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:30:48 -0500
Received: from mga06.intel.com ([134.134.136.31]:20644 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388214AbgKPS2K (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:10 -0500
IronPort-SDR: 
 0IGTi0UbsnOM4RJDV13UThmz6oVTFbrPPJOo/3vbD6m+KD+oy6yVGRyl2JJZMbujG1weE9T/u7
 033cVbgbTz/g==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410049"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410049"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:08 -0800
IronPort-SDR: 
 +JXMqSAchhfuqtAGaI9orRvYJPjPWVokTjzedWOKXF6BoAC5A8PHhUj48LewtlKGVXsEEUZIP2
 UQkknVgTQOqA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528094"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:08 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: [RFC PATCH 34/67] KVM: x86: Add infrastructure for stolen GPA bits
Date: Mon, 16 Nov 2020 10:26:19 -0800
Message-Id: 
 <7bb16d4be294888bb09e26fd69a4cd346a607aac.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

Add support in KVM's MMU for aliasing multiple GPAs (from a hardware
perspective) to a single GPA (from a memslot perspective). GPA alising
will be used to repurpose GPA bits as attribute bits, e.g. to expose an
execute-only permission bit to the guest. To keep the implementation
simple (relatively speaking), GPA aliasing is only supported via TDP.

Today KVM assumes two things that are broken by GPA aliasing.
  1. GPAs coming from hardware can be simply shifted to get the GFNs.
  2. GPA bits 51:MAXPHYADDR are reserved to zero.

With GPA aliasing, translating a GPA to GFN requires masking off the
repurposed bit, and a repurposed bit may reside in 51:MAXPHYADDR.

To support GPA aliasing, introduce the concept of per-VM GPA stolen bits,
that is, bits stolen from the GPA to act as new virtualized attribute
bits. A bit in the mask will cause the MMU code to create aliases of the
GPA. It can also be used to find the GFN out of a GPA coming from a tdp
fault.

To handle case (1) from above, retain any stolen bits when passing a GPA
in KVM's MMU code, but strip them when converting to a GFN so that the
GFN contains only the "real" GFN, i.e. never has repurposed bits set.

GFNs (without stolen bits) continue to be used to:
	-Specify physical memory by userspace via memslots
	-Map GPAs to TDP PTEs via RMAP
	-Specify dirty tracking and write protection
	-Look up MTRR types
	-Inject async page faults

Since there are now multiple aliases for the same aliased GPA, when
userspace memory backing the memslots is paged out, both aliases need to be
modified. Fortunately this happens automatically. Since rmap supports
multiple mappings for the same GFN for PTE shadowing based paging, by
adding/removing each alias PTE with its GFN, kvm_handle_hva() based
operations will be applied to both aliases.

In the case of the rmap being removed in the future, the needed
information could be recovered by iterating over the stolen bits and
walking the TDP page tables.

For TLB flushes that are address based, make sure to flush both aliases
in the stolen bits case.

Only support stolen bits in 64 bit guest paging modes (long, PAE).
Features that use this infrastructure should restrict the stolen bits to
exclude the other paging modes. Don't support stolen bits for shadow EPT.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/kvm/mmu.h              | 26 ++++++++++
 arch/x86/kvm/mmu/mmu.c          | 86 ++++++++++++++++++++++-----------
 arch/x86/kvm/mmu/mmu_internal.h |  1 +
 arch/x86/kvm/mmu/paging_tmpl.h  | 25 ++++++----
 4 files changed, 101 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 9c4a9c8e43d9..7ce8f0256d6d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -220,4 +220,30 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 int kvm_mmu_post_init_vm(struct kvm *kvm);
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
 
+static inline gfn_t kvm_gfn_stolen_mask(struct kvm *kvm)
+{
+	/* Currently there are no stolen bits in KVM */
+	return 0;
+}
+
+static inline gfn_t vcpu_gfn_stolen_mask(struct kvm_vcpu *vcpu)
+{
+	return kvm_gfn_stolen_mask(vcpu->kvm);
+}
+
+static inline gpa_t kvm_gpa_stolen_mask(struct kvm *kvm)
+{
+	return kvm_gfn_stolen_mask(kvm) << PAGE_SHIFT;
+}
+
+static inline gpa_t vcpu_gpa_stolen_mask(struct kvm_vcpu *vcpu)
+{
+	return kvm_gpa_stolen_mask(vcpu->kvm);
+}
+
+static inline gfn_t vcpu_gpa_to_gfn_unalias(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+	return (gpa >> PAGE_SHIFT) & ~vcpu_gfn_stolen_mask(vcpu);
+}
+
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bebd2b6ebcad..76de8d48165d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -187,27 +187,37 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
-		struct kvm_tlb_range *range)
-{
-	int ret = -ENOTSUPP;
-
-	if (range && kvm_x86_ops.tlb_remote_flush_with_range)
-		ret = kvm_x86_ops.tlb_remote_flush_with_range(kvm, range);
-
-	if (ret)
-		kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 		u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
+	u64 gfn_stolen_mask;
+
+	if (!kvm_x86_ops.tlb_remote_flush_with_range)
+		goto generic_flush;
+
+	/*
+	 * Fall back to the big hammer flush if there is more than one
+	 * GPA alias that needs to be flushed.
+	 */
+	gfn_stolen_mask = kvm_gfn_stolen_mask(kvm);
+	if (hweight64(gfn_stolen_mask) > 1)
+		goto generic_flush;
 
 	range.start_gfn = start_gfn;
 	range.pages = pages;
+	if (kvm_x86_ops.tlb_remote_flush_with_range(kvm, &range))
+		goto generic_flush;
+
+	if (!gfn_stolen_mask)
+		return;
+
+	range.start_gfn |= gfn_stolen_mask;
+	kvm_x86_ops.tlb_remote_flush_with_range(kvm, &range);
+	return;
 
-	kvm_flush_remote_tlbs_with_range(kvm, &range);
+generic_flush:
+	kvm_flush_remote_tlbs(kvm);
 }
 
 bool is_nx_huge_page_enabled(void)
@@ -2029,14 +2039,16 @@ static void clear_sp_write_flooding_count(u64 *spte)
 	__clear_sp_write_flooding_count(sptep_to_sp(spte));
 }
 
-static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
-					     gfn_t gfn,
-					     gva_t gaddr,
-					     unsigned level,
-					     int direct,
-					     unsigned int access)
+static struct kvm_mmu_page *__kvm_mmu_get_page(struct kvm_vcpu *vcpu,
+					       gfn_t gfn,
+					       gfn_t gfn_stolen_bits,
+					       gva_t gaddr,
+					       unsigned level,
+					       int direct,
+					       unsigned int access)
 {
 	bool direct_mmu = vcpu->arch.mmu->direct_map;
+	gpa_t gfn_and_stolen = gfn | gfn_stolen_bits;
 	union kvm_mmu_page_role role;
 	struct hlist_head *sp_list;
 	unsigned quadrant;
@@ -2058,9 +2070,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		role.quadrant = quadrant;
 	}
 
-	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
+	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn_and_stolen)];
 	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
-		if (sp->gfn != gfn) {
+		if ((sp->gfn | sp->gfn_stolen_bits) != gfn_and_stolen) {
 			collisions++;
 			continue;
 		}
@@ -2100,6 +2112,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	sp = kvm_mmu_alloc_page(vcpu, direct);
 
 	sp->gfn = gfn;
+	sp->gfn_stolen_bits = gfn_stolen_bits;
 	sp->role = role;
 	hlist_add_head(&sp->hash_link, sp_list);
 	if (!direct) {
@@ -2124,6 +2137,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	return sp;
 }
 
+static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn,
+					     gva_t gaddr, unsigned level,
+					     int direct, unsigned int access)
+{
+	return __kvm_mmu_get_page(vcpu, gfn, 0, gaddr, level, direct, access);
+}
+
 static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
 					struct kvm_vcpu *vcpu, hpa_t root,
 					u64 addr)
@@ -2695,7 +2715,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 
 	gfn = kvm_mmu_page_get_gfn(sp, start - sp->spt);
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK);
-	if (!slot)
+
+	/* Don't map private memslots for stolen bits */
+	if (!slot || (sp->gfn_stolen_bits && slot->id >= KVM_USER_MEM_SLOTS))
 		return -1;
 
 	ret = gfn_to_page_many_atomic(slot, gfn, pages, end - start);
@@ -2870,7 +2892,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	struct kvm_shadow_walk_iterator it;
 	struct kvm_mmu_page *sp;
 	int level, req_level, ret;
-	gfn_t gfn = gpa >> PAGE_SHIFT;
+	gpa_t gpa_stolen_mask = vcpu_gpa_stolen_mask(vcpu);
+	gfn_t gfn = (gpa & ~gpa_stolen_mask) >> PAGE_SHIFT;
+	gfn_t gfn_stolen_bits = (gpa & gpa_stolen_mask) >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
@@ -2895,8 +2919,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 		drop_large_spte(vcpu, it.sptep);
 		if (!is_shadow_present_pte(*it.sptep)) {
-			sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr,
-					      it.level - 1, true, ACC_ALL);
+			sp = __kvm_mmu_get_page(vcpu, base_gfn,
+						gfn_stolen_bits, it.addr,
+						it.level - 1, true, ACC_ALL);
 
 			link_shadow_page(vcpu, it.sptep, sp);
 			if (is_tdp && huge_page_disallowed &&
@@ -3650,6 +3675,13 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
 
+	/* Don't expose aliases for no slot GFNs or private memslots */
+	if ((cr2_or_gpa & vcpu_gpa_stolen_mask(vcpu)) &&
+	    !kvm_is_visible_memslot(slot)) {
+		*pfn = KVM_PFN_NOSLOT;
+		return false;
+	}
+
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
 		*pfn = KVM_PFN_NOSLOT;
@@ -3682,7 +3714,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	bool write = error_code & PFERR_WRITE_MASK;
 	bool map_writable;
 
-	gfn_t gfn = gpa >> PAGE_SHIFT;
+	gfn_t gfn = vcpu_gpa_to_gfn_unalias(vcpu, gpa);
 	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
 	int r;
@@ -3782,7 +3814,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	     max_level > PG_LEVEL_4K;
 	     max_level--) {
 		int page_num = KVM_PAGES_PER_HPAGE(max_level);
-		gfn_t base = (gpa >> PAGE_SHIFT) & ~(page_num - 1);
+		gfn_t base = vcpu_gpa_to_gfn_unalias(vcpu, gpa) & ~(page_num - 1);
 
 		if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
 			break;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index bfc6389edc28..4d30f1562142 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -36,6 +36,7 @@ struct kvm_mmu_page {
 	 */
 	union kvm_mmu_page_role role;
 	gfn_t gfn;
+	gfn_t gfn_stolen_bits;
 
 	u64 *spt;
 	/* hold the gfn of each spte inside spt */
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50e268eb8e1a..5d4e9f404018 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -25,7 +25,8 @@
 	#define guest_walker guest_walker64
 	#define FNAME(name) paging##64_##name
 	#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
-	#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
+	#define PT_LVL_ADDR_MASK(vcpu, lvl) (~vcpu_gpa_stolen_mask(vcpu) & \
+					     PT64_LVL_ADDR_MASK(lvl))
 	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
 	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT64_LEVEL_BITS
@@ -44,7 +45,7 @@
 	#define guest_walker guest_walker32
 	#define FNAME(name) paging##32_##name
 	#define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK
-	#define PT_LVL_ADDR_MASK(lvl) PT32_LVL_ADDR_MASK(lvl)
+	#define PT_LVL_ADDR_MASK(vcpu, lvl) PT32_LVL_ADDR_MASK(lvl)
 	#define PT_LVL_OFFSET_MASK(lvl) PT32_LVL_OFFSET_MASK(lvl)
 	#define PT_INDEX(addr, level) PT32_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT32_LEVEL_BITS
@@ -58,7 +59,7 @@
 	#define guest_walker guest_walkerEPT
 	#define FNAME(name) ept_##name
 	#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
-	#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
+	#define PT_LVL_ADDR_MASK(vcpu, lvl) PT64_LVL_ADDR_MASK(lvl)
 	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
 	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT64_LEVEL_BITS
@@ -75,7 +76,7 @@
 #define PT_GUEST_ACCESSED_MASK (1 << PT_GUEST_ACCESSED_SHIFT)
 
 #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl)
-#define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PG_LEVEL_4K)
+#define gpte_to_gfn(vcpu, pte) gpte_to_gfn_lvl(vcpu, pte, PG_LEVEL_4K)
 
 /*
  * The guest_walker structure emulates the behavior of the hardware page
@@ -96,9 +97,9 @@ struct guest_walker {
 	struct x86_exception fault;
 };
 
-static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
+static gfn_t gpte_to_gfn_lvl(struct kvm_vcpu *vcpu, pt_element_t gpte, int lvl)
 {
-	return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
+	return (gpte & PT_LVL_ADDR_MASK(vcpu, lvl)) >> PAGE_SHIFT;
 }
 
 static inline void FNAME(protect_clean_gpte)(struct kvm_mmu *mmu, unsigned *access,
@@ -366,7 +367,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 		--walker->level;
 
 		index = PT_INDEX(addr, walker->level);
-		table_gfn = gpte_to_gfn(pte);
+		table_gfn = gpte_to_gfn(vcpu, pte);
 		offset    = index * sizeof(pt_element_t);
 		pte_gpa   = gfn_to_gpa(table_gfn) + offset;
 
@@ -430,7 +431,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	if (unlikely(errcode))
 		goto error;
 
-	gfn = gpte_to_gfn_lvl(pte, walker->level);
+	gfn = gpte_to_gfn_lvl(vcpu, pte, walker->level);
 	gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT;
 
 	if (PTTYPE == 32 && walker->level > PG_LEVEL_4K && is_cpuid_PSE36())
@@ -533,12 +534,14 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 
+	WARN_ON(gpte & vcpu_gpa_stolen_mask(vcpu));
+
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
 
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
 
-	gfn = gpte_to_gfn(gpte);
+	gfn = gpte_to_gfn(vcpu, gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
@@ -641,6 +644,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 
 	direct_access = gw->pte_access;
 
+	WARN_ON(addr & vcpu_gpa_stolen_mask(vcpu));
+
 	top_level = vcpu->arch.mmu->root_level;
 	if (top_level == PT32E_ROOT_LEVEL)
 		top_level = PT32_ROOT_LEVEL;
@@ -1054,7 +1059,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			continue;
 		}
 
-		gfn = gpte_to_gfn(gpte);
+		gfn = gpte_to_gfn(vcpu, gpte);
 		pte_access = sp->role.access;
 		pte_access &= FNAME(gpte_access)(gpte);
 		FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);

From patchwork Mon Nov 16 18:26:20 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910393
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5A8F8C64E7D
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:09 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 30EE120756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388227AbgKPS2K (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:10 -0500
Received: from mga06.intel.com ([134.134.136.31]:20636 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388095AbgKPS2J (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:09 -0500
IronPort-SDR: 
 0uHw5TVTUp1VNyN+BwGMmMYtZ7bDaRWBqB43hlirwkWog4UIYjv2TG8jzbo2NRkKSVaKAHh7k2
 bNtve30jYbWw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410050"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410050"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:08 -0800
IronPort-SDR: 
 ix4YNSY74kgFgcbyDGWJufKKuGTn9MPfyqAqrt3TjWDCuvFF3FVUDjc7zYwsfO7gBh0SlllzVN
 qTl/9Pa2C78Q==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528106"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:08 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 35/67] KVM: x86/mmu: Explicitly check for MMIO spte in
 fast page fault
Date: Mon, 16 Nov 2020 10:26:20 -0800
Message-Id: 
 <8d129c5e2ab4ca26eaf761a2fd46cb50cf6376ba.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Explicity check for an MMIO spte in the fast page fault flow.  TDX will
use a not-present entry for MMIO sptes, which can be mistaken for an
access-tracked spte since both have SPTE_SPECIAL_MASK set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 76de8d48165d..c4d657b26066 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3090,7 +3090,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				break;
 
 		sp = sptep_to_sp(iterator.sptep);
-		if (!is_last_spte(spte, sp->role.level))
+		if (!is_last_spte(spte, sp->role.level) || is_mmio_spte(spte))
 			break;
 
 		/*

From patchwork Mon Nov 16 18:26:21 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910365
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9E586C4742C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:07 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 5D325206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388238AbgKPS2L (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:11 -0500
Received: from mga06.intel.com ([134.134.136.31]:20636 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388222AbgKPS2K (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:10 -0500
IronPort-SDR: 
 +iIuRkOUHIQnyMiFgwOrDhG0BQPoG9kbJuCHFMAToHqNQCORSodkMbrkUWZC6Ubkko693LPvho
 0fQdTh4YtisQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410051"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410051"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:09 -0800
IronPort-SDR: 
 at79M/MEaeJfBfe6CFEYDNwXWRC88rqn6Ywhj97LFbSV3XoFiiOguFtrtAhqjIKUDujowN/eg7
 XH8K1Y5GfbBg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528118"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:09 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 36/67] KVM: x86/mmu: Track shadow MMIO value on a per-VM
 basis
Date: Mon, 16 Nov 2020 10:26:21 -0800
Message-Id: 
 <1e5ce007ad1046ee34408708c0e369b7fcab917b.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.h              |  4 +++-
 arch/x86/kvm/mmu/mmu.c          | 24 +++---------------------
 arch/x86/kvm/mmu/spte.c         | 26 ++++++++++++++++++++++----
 arch/x86/kvm/mmu/spte.h         |  2 +-
 arch/x86/kvm/svm/svm.c          |  2 +-
 arch/x86/kvm/vmx/vmx.c          | 18 +++++++-----------
 7 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6dfc09092bc9..d4fd9859fcd5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -915,6 +915,8 @@ struct kvm_arch {
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 
+	u64 shadow_mmio_value;
+
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
 	bool iommu_noncoherent;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7ce8f0256d6d..05c2898cb2a2 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -52,7 +52,9 @@ static inline u64 rsvd_bits(int s, int e)
 	return ((1ULL << (e - s + 1)) - 1) << s;
 }
 
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask);
+void kvm_mmu_set_mmio_spte_mask(struct kvm *kvm, u64 mmio_value,
+				u64 access_mask);
+void kvm_mmu_set_default_mmio_spte_mask(u64 mask);
 
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c4d657b26066..da2a58fa86a8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5507,6 +5507,9 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 	node->track_write = kvm_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
+
+	kvm_mmu_set_mmio_spte_mask(kvm, shadow_default_mmio_mask,
+				   ACC_WRITE_MASK | ACC_USER_MASK);
 }
 
 void kvm_mmu_uninit_vm(struct kvm *kvm)
@@ -5835,25 +5838,6 @@ static void mmu_destroy_caches(void)
 	kmem_cache_destroy(mmu_page_header_cache);
 }
 
-static void kvm_set_mmio_spte_mask(void)
-{
-	u64 mask;
-
-	/*
-	 * Set a reserved PA bit in MMIO SPTEs to generate page faults with
-	 * PFEC.RSVD=1 on MMIO accesses.  64-bit PTEs (PAE, x86-64, and EPT
-	 * paging) support a maximum of 52 bits of PA, i.e. if the CPU supports
-	 * 52-bit physical addresses then there are no reserved PA bits in the
-	 * PTEs and so the reserved PA approach must be disabled.
-	 */
-	if (shadow_phys_bits < 52)
-		mask = BIT_ULL(51) | PT_PRESENT_MASK;
-	else
-		mask = 0;
-
-	kvm_mmu_set_mmio_spte_mask(mask, ACC_WRITE_MASK | ACC_USER_MASK);
-}
-
 static bool get_nx_auto_mode(void)
 {
 	/* Return true when CPU has the bug, and mitigations are ON */
@@ -5919,8 +5903,6 @@ int kvm_mmu_module_init(void)
 
 	kvm_mmu_reset_all_pte_masks();
 
-	kvm_set_mmio_spte_mask();
-
 	pte_list_desc_cache = kmem_cache_create("pte_list_desc",
 					    sizeof(struct pte_list_desc),
 					    0, SLAB_ACCOUNT, NULL);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index fcac2cac78fe..574c8ccac0bf 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -22,7 +22,7 @@ u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
 u64 __read_mostly shadow_user_mask;
 u64 __read_mostly shadow_accessed_mask;
 u64 __read_mostly shadow_dirty_mask;
-u64 __read_mostly shadow_mmio_value;
+u64 __read_mostly shadow_default_mmio_mask;
 u64 __read_mostly shadow_mmio_access_mask;
 u64 __read_mostly shadow_present_mask;
 u64 __read_mostly shadow_me_mask;
@@ -52,7 +52,7 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
 	u64 gpa = gfn << PAGE_SHIFT;
 
 	access &= shadow_mmio_access_mask;
-	mask |= shadow_mmio_value | access;
+	mask |= vcpu->kvm->arch.shadow_mmio_value | SPTE_MMIO_MASK | access;
 	mask |= gpa | shadow_nonpresent_or_rsvd_mask;
 	mask |= (gpa & shadow_nonpresent_or_rsvd_mask)
 		<< SHADOW_NONPRESENT_OR_RSVD_MASK_LEN;
@@ -242,12 +242,13 @@ u64 mark_spte_for_access_track(u64 spte)
 	return spte;
 }
 
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask)
+void kvm_mmu_set_mmio_spte_mask(struct kvm *kvm, u64 mmio_value,
+				u64 access_mask)
 {
 	BUG_ON((u64)(unsigned)access_mask != access_mask);
 	WARN_ON(mmio_value & (shadow_nonpresent_or_rsvd_mask << SHADOW_NONPRESENT_OR_RSVD_MASK_LEN));
 	WARN_ON(mmio_value & shadow_nonpresent_or_rsvd_lower_gfn_mask);
-	shadow_mmio_value = mmio_value | SPTE_MMIO_MASK;
+	kvm->arch.shadow_mmio_value = mmio_value | SPTE_MMIO_MASK;
 	shadow_mmio_access_mask = access_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
@@ -289,6 +290,7 @@ void kvm_mmu_reset_all_pte_masks(void)
 	shadow_x_mask = 0;
 	shadow_present_mask = 0;
 	shadow_acc_track_mask = 0;
+	shadow_default_mmio_mask = 0;
 
 	shadow_phys_bits = kvm_get_shadow_phys_bits();
 
@@ -315,4 +317,20 @@ void kvm_mmu_reset_all_pte_masks(void)
 
 	shadow_nonpresent_or_rsvd_lower_gfn_mask =
 		GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT);
+
+	/*
+	 * Set a reserved PA bit in MMIO SPTEs to generate page faults with
+	 * PFEC.RSVD=1 on MMIO accesses.  64-bit PTEs (PAE, x86-64, and EPT
+	 * paging) support a maximum of 52 bits of PA, i.e. if the CPU supports
+	 * 52-bit physical addresses then there are no reserved PA bits in the
+	 * PTEs and so the reserved PA approach must be disabled.
+	 */
+	if (shadow_phys_bits < 52)
+		shadow_default_mmio_mask = BIT_ULL(51) | PT_PRESENT_MASK;
+}
+
+void kvm_mmu_set_default_mmio_spte_mask(u64 mask)
+{
+	shadow_default_mmio_mask = mask;
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_set_default_mmio_spte_mask);
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 5c75a451c000..e5c94848ade1 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -86,7 +86,7 @@ extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
 extern u64 __read_mostly shadow_user_mask;
 extern u64 __read_mostly shadow_accessed_mask;
 extern u64 __read_mostly shadow_dirty_mask;
-extern u64 __read_mostly shadow_mmio_value;
+extern u64 __read_mostly shadow_default_mmio_mask;
 extern u64 __read_mostly shadow_mmio_access_mask;
 extern u64 __read_mostly shadow_present_mask;
 extern u64 __read_mostly shadow_me_mask;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8be23240c74f..0aa29a30f922 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -871,7 +871,7 @@ static __init void svm_adjust_mmio_mask(void)
 	 */
 	mask = (mask_bit < 52) ? rsvd_bits(mask_bit, 51) | PT_PRESENT_MASK : 0;
 
-	kvm_mmu_set_mmio_spte_mask(mask, PT_WRITABLE_MASK | PT_USER_MASK);
+	kvm_mmu_set_default_mmio_spte_mask(mask);
 }
 
 static void svm_hardware_teardown(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index deeec105e963..997a391f0842 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4276,15 +4276,6 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	vmx->secondary_exec_control = exec_control;
 }
 
-static void ept_set_mmio_spte_mask(void)
-{
-	/*
-	 * EPT Misconfigurations can be generated if the value of bits 2:0
-	 * of an EPT paging-structure entry is 110b (write/execute).
-	 */
-	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE, 0);
-}
-
 #define VMX_XSS_EXIT_BITMAP 0
 
 /*
@@ -5473,8 +5464,6 @@ static void vmx_enable_tdp(void)
 		0ull, VMX_EPT_EXECUTABLE_MASK,
 		cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
 		VMX_EPT_RWX_MASK, 0ull);
-
-	ept_set_mmio_spte_mask();
 }
 
 /*
@@ -6983,6 +6972,13 @@ static int vmx_vm_init(struct kvm *kvm)
 	if (!ple_gap)
 		kvm->arch.pause_in_guest = true;
 
+	/*
+	 * EPT Misconfigurations can be generated if the value of bits 2:0
+	 * of an EPT paging-structure entry is 110b (write/execute).
+	 */
+	if (enable_ept)
+		kvm_mmu_set_mmio_spte_mask(kvm, VMX_EPT_MISCONFIG_WX_VALUE, 0);
+
 	if (boot_cpu_has(X86_BUG_L1TF) && enable_ept) {
 		switch (l1tf_mitigation) {
 		case L1TF_MITIGATION_OFF:

From patchwork Mon Nov 16 18:26:22 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910395
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70ECDC64E7C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:09 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 48856206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388589AbgKPSas (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:30:48 -0500
Received: from mga06.intel.com ([134.134.136.31]:20645 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388229AbgKPS2K (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:10 -0500
IronPort-SDR: 
 YL/rgSuRDWEANztkCs2RMvtR3U4OO0Ing6QQIK3RFzq/QgrW0mlWqHzefqoHnhdekJhUVRhA3a
 ch3DfDvm2+wg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410052"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410052"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:09 -0800
IronPort-SDR: 
 XDNgKlubACap/YjCiy30MO5c7BvGj9iaXY7uhsLE1kPts++UzGrhS27GqeyeIbE7E8WcHIw1g/
 hXMG8di6Ouxg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528130"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:09 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 37/67] KVM: x86/mmu: Ignore bits 63 and 62 when checking
 for "present" SPTEs
Date: Mon, 16 Nov 2020 10:26:22 -0800
Message-Id: 
 <7ca4ebee9566d6fb5ecdbffd32468a6b756ab515.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Ignore bits 63 and 62 when checking for present SPTEs to allow setting
said bits in not-present SPTEs.  TDX will set bit 63 in "zero" SPTEs to
suppress #VEs (TDX-SEAM unconditionally enables EPT Violation #VE), and
will use bit 62 to track zapped private SPTEs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 arch/x86/kvm/mmu/spte.h        | 17 +++++++++++++++--
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5d4e9f404018..06659d5c8ba0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1039,7 +1039,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		gpa_t pte_gpa;
 		gfn_t gfn;
 
-		if (!sp->spt[i])
+		if (!__is_shadow_present_pte(sp->spt[i]))
 			continue;
 
 		pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index e5c94848ade1..22256cc8cce6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -174,9 +174,22 @@ static inline bool is_access_track_spte(u64 spte)
 	return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
 }
 
-static inline int is_shadow_present_pte(u64 pte)
+static inline bool __is_shadow_present_pte(u64 pte)
 {
-	return (pte != 0) && !is_mmio_spte(pte);
+	/*
+	 * Ignore bits 63 and 62 so that they can be set in SPTEs that are well
+	 * and truly not present.  We can't use the sane/obvious approach of
+	 * querying bits 2:0 (RWX or P) because EPT without A/D bits will clear
+	 * RWX of a "present" SPTE to do access tracking.  Tracking updates can
+	 * be done out of mmu_lock, so even the flushing logic needs to treat
+	 * such SPTEs as present.
+	 */
+	return !!(pte << 2);
+}
+
+static inline bool is_shadow_present_pte(u64 pte)
+{
+	return __is_shadow_present_pte(pte) && !is_mmio_spte(pte);
 }
 
 static inline int is_large_pte(u64 pte)

From patchwork Mon Nov 16 18:26:23 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910313
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A22C8C6379F
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:17 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 88FE024680
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388262AbgKPS2N (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:13 -0500
Received: from mga06.intel.com ([134.134.136.31]:20636 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388231AbgKPS2L (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:11 -0500
IronPort-SDR: 
 ql51VDgYSUtYUgb3wneOzqUFlYshvj3wtpb1HZSK0spOU/GCQ1nN62af6892Bl9sr+rbfjjbxI
 F745jW3aMz8Q==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410053"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410053"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:10 -0800
IronPort-SDR: 
 b4d4hd9uvgO57+SWANxqk2M2ikqPLMBIbH0OJjdYSlYo2TR/xLrfxktGCIoWrNUV8XR+w+eufB
 Zc9gtgltOxvg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528141"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:10 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 38/67] KVM: x86/mmu: Allow non-zero init value for shadow
 PTE
Date: Mon, 16 Nov 2020 10:26:23 -0800
Message-Id: 
 <d8447d317d5fee37b0cff586ecc3a8bc3da94984.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX will run with EPT violation #VEs enabled, which means KVM needs to
set the "suppress #VE" bit in unused PTEs to avoid unintentionally
reflecting not-present EPT violations into the guest.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.h      |  1 +
 arch/x86/kvm/mmu/mmu.c  | 50 +++++++++++++++++++++++++++++++++++------
 arch/x86/kvm/mmu/spte.c | 10 +++++++++
 arch/x86/kvm/mmu/spte.h |  2 ++
 4 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 05c2898cb2a2..e9598a51090b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -55,6 +55,7 @@ static inline u64 rsvd_bits(int s, int e)
 void kvm_mmu_set_mmio_spte_mask(struct kvm *kvm, u64 mmio_value,
 				u64 access_mask);
 void kvm_mmu_set_default_mmio_spte_mask(u64 mask);
+void kvm_mmu_set_spte_init_value(u64 init_value);
 
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index da2a58fa86a8..732510ecda36 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -560,9 +560,9 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 	u64 old_spte = *sptep;
 
 	if (!spte_has_volatile_bits(old_spte))
-		__update_clear_spte_fast(sptep, 0ull);
+		__update_clear_spte_fast(sptep, shadow_init_value);
 	else
-		old_spte = __update_clear_spte_slow(sptep, 0ull);
+		old_spte = __update_clear_spte_slow(sptep, shadow_init_value);
 
 	if (!is_shadow_present_pte(old_spte))
 		return 0;
@@ -592,7 +592,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
  */
 static void mmu_spte_clear_no_track(u64 *sptep)
 {
-	__update_clear_spte_fast(sptep, 0ull);
+	__update_clear_spte_fast(sptep, shadow_init_value);
 }
 
 static u64 mmu_spte_get_lockless(u64 *sptep)
@@ -670,6 +670,42 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
 	local_irq_enable();
 }
 
+static inline void kvm_init_shadow_page(void *page)
+{
+#ifdef CONFIG_X86_64
+	int ign;
+
+	asm volatile (
+		"rep stosq\n\t"
+		: "=c"(ign), "=D"(page)
+		: "a"(shadow_init_value), "c"(4096/8), "D"(page)
+		: "memory"
+	);
+#else
+	BUG();
+#endif
+}
+
+static int mmu_topup_shadow_page_cache(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu_memory_cache *mc = &vcpu->arch.mmu_shadow_page_cache;
+	int start, end, i, r;
+
+	if (shadow_init_value)
+		start = kvm_mmu_memory_cache_nr_free_objects(mc);
+
+	r = kvm_mmu_topup_memory_cache(mc, PT64_ROOT_MAX_LEVEL);
+	if (r)
+		return r;
+
+	if (shadow_init_value) {
+		end = kvm_mmu_memory_cache_nr_free_objects(mc);
+		for (i = start; i < end; i++)
+			kvm_init_shadow_page(mc->objects[i]);
+	}
+	return 0;
+}
+
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 {
 	int r;
@@ -679,8 +715,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-				       PT64_ROOT_MAX_LEVEL);
+	r = mmu_topup_shadow_page_cache(vcpu);
 	if (r)
 		return r;
 	if (maybe_indirect) {
@@ -3074,7 +3109,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
 	int ret = RET_PF_INVALID;
-	u64 spte = 0ull;
+	u64 spte = shadow_init_value;
 	uint retry_count = 0;
 
 	if (!page_fault_can_be_fast(error_code))
@@ -5356,7 +5391,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+	if (!shadow_init_value)
+		vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->arch.mmu = &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 574c8ccac0bf..079bbef7b8aa 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -27,6 +27,7 @@ u64 __read_mostly shadow_mmio_access_mask;
 u64 __read_mostly shadow_present_mask;
 u64 __read_mostly shadow_me_mask;
 u64 __read_mostly shadow_acc_track_mask;
+u64 __read_mostly shadow_init_value;
 
 u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
 u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
@@ -195,6 +196,14 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn)
 	return new_spte;
 }
 
+void kvm_mmu_set_spte_init_value(u64 init_value)
+{
+	if (WARN_ON(!IS_ENABLED(CONFIG_X86_64) && init_value))
+		init_value = 0;
+	shadow_init_value = init_value;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_set_spte_init_value);
+
 static u8 kvm_get_shadow_phys_bits(void)
 {
 	/*
@@ -291,6 +300,7 @@ void kvm_mmu_reset_all_pte_masks(void)
 	shadow_present_mask = 0;
 	shadow_acc_track_mask = 0;
 	shadow_default_mmio_mask = 0;
+	shadow_init_value = 0;
 
 	shadow_phys_bits = kvm_get_shadow_phys_bits();
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 22256cc8cce6..a5eab5607606 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -91,6 +91,8 @@ extern u64 __read_mostly shadow_mmio_access_mask;
 extern u64 __read_mostly shadow_present_mask;
 extern u64 __read_mostly shadow_me_mask;
 
+extern u64 __read_mostly shadow_init_value;
+
 /*
  * SPTEs used by MMUs without A/D bits are marked with SPTE_AD_DISABLED_MASK;
  * shadow_acc_track_mask is the set of bits to be cleared in non-accessed

From patchwork Mon Nov 16 18:26:24 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910391
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2423FC64E7A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:09 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id ED45120756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388577AbgKPSaj (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:30:39 -0500
Received: from mga06.intel.com ([134.134.136.31]:20645 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388232AbgKPS2L (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:11 -0500
IronPort-SDR: 
 KeEoDG1/XRI5Ya6bLtbezkv/Sy76zeMOl3CcEFKD1dnWeShcjB7onn5+A0xwEc5fWa+cZVNpMh
 3/FkxOVEWp7w==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410054"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410054"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:10 -0800
IronPort-SDR: 
 bu2bFn08Vmg/gkRfuMIFM+4MC8Z1WaU9xjW2sSXdPprtJ8MDaqAu4pvoPNFF9tOeEPIx/1em+3
 JnVesnz854aw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528150"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:10 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 39/67] KVM: x86/mmu: Refactor shadow walk in
 __direct_map() to reduce indentation
Date: Mon, 16 Nov 2020 10:26:24 -0800
Message-Id: 
 <bfd8536d1345d281e8522f724177f56e0a645c2b.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Employ a 'continue' to reduce the indentation for linking a new shadow
page during __direct_map() in preparation for linking private pages.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 732510ecda36..25aafac9b5de 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2953,16 +2953,15 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 			break;
 
 		drop_large_spte(vcpu, it.sptep);
-		if (!is_shadow_present_pte(*it.sptep)) {
-			sp = __kvm_mmu_get_page(vcpu, base_gfn,
-						gfn_stolen_bits, it.addr,
-						it.level - 1, true, ACC_ALL);
-
-			link_shadow_page(vcpu, it.sptep, sp);
-			if (is_tdp && huge_page_disallowed &&
-			    req_level >= it.level)
-				account_huge_nx_page(vcpu->kvm, sp);
-		}
+		if (is_shadow_present_pte(*it.sptep))
+			continue;
+
+		sp = __kvm_mmu_get_page(vcpu, base_gfn, gfn_stolen_bits,
+					it.addr, it.level - 1, true, ACC_ALL);
+
+		link_shadow_page(vcpu, it.sptep, sp);
+		if (is_tdp && huge_page_disallowed && req_level >= it.level)
+			account_huge_nx_page(vcpu->kvm, sp);
 	}
 
 	ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,

From patchwork Mon Nov 16 18:26:25 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910383
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CAC3FC55ABD
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:07 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id A4815206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388253AbgKPS2M (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:12 -0500
Received: from mga06.intel.com ([134.134.136.31]:20644 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388240AbgKPS2L (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:11 -0500
IronPort-SDR: 
 qey2MVox6uZ9UYG4tY+HuK/5jCLzhPB5cyVy5rGQ6SfSB3aM+MoWyikS9hnFnoNHunB35mfC1q
 uvvuU9x8I4zw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410056"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410056"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:11 -0800
IronPort-SDR: 
 WASFe6mo2FhvYxh/F0b5Ofn9gfH7w3BAu3djvU3vyP+KrH7KJt5Lfg0x+wet4fnioi+/z+EjN1
 TjJr7SCiz6gA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528162"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:10 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 40/67] KVM: x86/mmu: Return old SPTE from
 mmu_spte_clear_track_bits()
Date: Mon, 16 Nov 2020 10:26:25 -0800
Message-Id: 
 <4bdb2c587d917a67eecbae6f02926c31c32e3e39.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Return the old SPTE when clearing a SPTE and push the "old SPTE present"
check to the caller.  Private shadow page support will use the old SPTE
in rmap_remove() to determine whether or not there is a linked private
shadow page.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 25aafac9b5de..8d847c3abf1d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -552,9 +552,9 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
  * Rules for using mmu_spte_clear_track_bits:
  * It sets the sptep from present to nonpresent, and track the
  * state bits, it is used to clear the last level sptep.
- * Returns non-zero if the PTE was previously valid.
+ * Returns the old PTE.
  */
-static int mmu_spte_clear_track_bits(u64 *sptep)
+static u64 mmu_spte_clear_track_bits(u64 *sptep)
 {
 	kvm_pfn_t pfn;
 	u64 old_spte = *sptep;
@@ -565,7 +565,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 		old_spte = __update_clear_spte_slow(sptep, shadow_init_value);
 
 	if (!is_shadow_present_pte(old_spte))
-		return 0;
+		return old_spte;
 
 	pfn = spte_to_pfn(old_spte);
 
@@ -582,7 +582,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 	if (is_dirty_spte(old_spte))
 		kvm_set_pfn_dirty(pfn);
 
-	return 1;
+	return old_spte;
 }
 
 /*
@@ -1113,7 +1113,9 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
-	if (mmu_spte_clear_track_bits(sptep))
+	u64 old_spte = mmu_spte_clear_track_bits(sptep);
+
+	if (is_shadow_present_pte(old_spte))
 		rmap_remove(kvm, sptep);
 }
 

From patchwork Mon Nov 16 18:26:26 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910379
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A58E0C63697
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 78E1820756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388516AbgKPS3u (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:50 -0500
Received: from mga06.intel.com ([134.134.136.31]:20649 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388222AbgKPS2P (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:15 -0500
IronPort-SDR: 
 2rxnuFspw0Qi4BKqa3IifUxLHHsBDFusgWRMF3jrCesNzaDix7NzDcuLpy8irQ0bzCBaRcYEEp
 r0ww239Li6vQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410058"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410058"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:11 -0800
IronPort-SDR: 
 3UtirH/84+e6dvcSYrCP+yRqlh1ZjkSE1JXgMa/E8+gzeH3z1qiCTBbEhC5MglwiZuCnN83dy5
 eUzgkP7swLNg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528173"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:11 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 41/67] KVM: x86/mmu: Frame in support for
 private/inaccessible shadow pages
Date: Mon, 16 Nov 2020 10:26:26 -0800
Message-Id: 
 <c18d4561104de20ff78e00385032aeb8f1d5eb23.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add kvm_x86_ops hooks to set/clear private SPTEs, i.e. SEPT entries, and
to link/free private shadow pages, i.e. non-leaf SEPT pages.

Because SEAMCALLs are bloody expensive, and because KVM's MMU is already
complex enough, TDX's SEPT will mirror KVM's shadow pages instead of
replacing them outright.  This costs extra memory, but is simpler and
far more performant.

Add a separate list for tracking active private shadow pages.  Zapping
and freeing SEPT entries is subject to very different rules than normal
pages/memory, and need to be preserved (along with their shadow page
counterparts) when KVM gets trigger happy, e.g. zaps everything during a
memslot update.

Zap any aliases of a GPA when mapping in a guest that supports guest
private GPAs.  This is necessary to avoid integrity failures with TDX
due to pointing shared and private GPAs at the same HPA.

Do not prefetch private pages (this should probably be a property of the
VM).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  23 ++-
 arch/x86/kvm/mmu.h              |   3 +-
 arch/x86/kvm/mmu/mmu.c          | 270 +++++++++++++++++++++++++++-----
 arch/x86/kvm/mmu/mmu_internal.h |   4 +
 arch/x86/kvm/mmu/spte.h         |  11 +-
 arch/x86/kvm/x86.c              |   4 +-
 6 files changed, 269 insertions(+), 46 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d4fd9859fcd5..9f7349aa3c77 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -361,6 +361,7 @@ struct kvm_mmu {
 	void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   u64 *spte, const void *pte);
 	hpa_t root_hpa;
+	hpa_t private_root_hpa;
 	gpa_t root_pgd;
 	union kvm_mmu_role mmu_role;
 	u8 root_level;
@@ -595,6 +596,7 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_gfn_array_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
+	struct kvm_mmu_memory_cache mmu_private_sp_cache;
 
 	/*
 	 * QEMU userspace and the guest each have their own FPU state.
@@ -910,6 +912,7 @@ struct kvm_arch {
 	 * Hash table of struct kvm_mmu_page.
 	 */
 	struct list_head active_mmu_pages;
+	struct list_head private_mmu_pages;
 	struct list_head zapped_obsolete_pages;
 	struct list_head lpage_disallowed_mmu_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
@@ -1020,6 +1023,8 @@ struct kvm_arch {
 	struct list_head tdp_mmu_roots;
 	/* List of struct tdp_mmu_pages not being used as roots */
 	struct list_head tdp_mmu_pages;
+
+	gfn_t gfn_shared_mask;
 };
 
 struct kvm_vm_stat {
@@ -1199,6 +1204,17 @@ struct kvm_x86_ops {
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, unsigned long pgd,
 			     int pgd_level);
 
+	void (*set_private_spte)(struct kvm_vcpu *vcpu, gfn_t gfn, int level,
+				 kvm_pfn_t pfn);
+	void (*drop_private_spte)(struct kvm *kvm, gfn_t gfn, int level,
+				  kvm_pfn_t pfn);
+	void (*zap_private_spte)(struct kvm *kvm, gfn_t gfn, int level);
+	void (*unzap_private_spte)(struct kvm *kvm, gfn_t gfn, int level);
+	int (*link_private_sp)(struct kvm_vcpu *vcpu, gfn_t gfn, int level,
+			       void *private_sp);
+	int (*free_private_sp)(struct kvm *kvm, gfn_t gfn, int level,
+			       void *private_sp);
+
 	bool (*has_wbinvd_exit)(void);
 
 	/* Returns actual tsc_offset set in active VMCS */
@@ -1378,7 +1394,8 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 				   struct kvm_memory_slot *slot,
 				   gfn_t gfn_offset, unsigned long mask);
-void kvm_mmu_zap_all(struct kvm *kvm);
+void kvm_mmu_zap_all_active(struct kvm *kvm);
+void kvm_mmu_zap_all_private(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
@@ -1532,7 +1549,9 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
 
 #define KVM_MMU_ROOT_CURRENT		BIT(0)
 #define KVM_MMU_ROOT_PREVIOUS(i)	BIT(1+i)
-#define KVM_MMU_ROOTS_ALL		(~0UL)
+#define KVM_MMU_ROOT_PRIVATE		BIT(1+KVM_MMU_NUM_PREV_ROOTS)
+#define KVM_MMU_ROOTS_ALL		((u32)(~KVM_MMU_ROOT_PRIVATE))
+#define KVM_MMU_ROOTS_ALL_INC_PRIVATE	(KVM_MMU_ROOTS_ALL | KVM_MMU_ROOT_PRIVATE)
 
 int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int irq_source_id, int level);
 void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e9598a51090b..3b1243cfc280 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -225,8 +225,7 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
 
 static inline gfn_t kvm_gfn_stolen_mask(struct kvm *kvm)
 {
-	/* Currently there are no stolen bits in KVM */
-	return 0;
+	return kvm->arch.gfn_shared_mask;
 }
 
 static inline gfn_t vcpu_gfn_stolen_mask(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d847c3abf1d..e4e0c883b52d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -554,15 +554,15 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
  * state bits, it is used to clear the last level sptep.
  * Returns the old PTE.
  */
-static u64 mmu_spte_clear_track_bits(u64 *sptep)
+static u64 __mmu_spte_clear_track_bits(u64 *sptep, u64 clear_value)
 {
 	kvm_pfn_t pfn;
 	u64 old_spte = *sptep;
 
 	if (!spte_has_volatile_bits(old_spte))
-		__update_clear_spte_fast(sptep, shadow_init_value);
+		__update_clear_spte_fast(sptep, clear_value);
 	else
-		old_spte = __update_clear_spte_slow(sptep, shadow_init_value);
+		old_spte = __update_clear_spte_slow(sptep, clear_value);
 
 	if (!is_shadow_present_pte(old_spte))
 		return old_spte;
@@ -585,6 +585,11 @@ static u64 mmu_spte_clear_track_bits(u64 *sptep)
 	return old_spte;
 }
 
+static inline u64 mmu_spte_clear_track_bits(u64 *sptep)
+{
+	return __mmu_spte_clear_track_bits(sptep, shadow_init_value);
+}
+
 /*
  * Rules for using mmu_spte_clear_no_track:
  * Directly clear spte without caring the state bits of sptep,
@@ -691,6 +696,13 @@ static int mmu_topup_shadow_page_cache(struct kvm_vcpu *vcpu)
 	struct kvm_mmu_memory_cache *mc = &vcpu->arch.mmu_shadow_page_cache;
 	int start, end, i, r;
 
+	if (vcpu->kvm->arch.gfn_shared_mask) {
+		r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_private_sp_cache,
+					       PT64_ROOT_MAX_LEVEL);
+		if (r)
+			return r;
+	}
+
 	if (shadow_init_value)
 		start = kvm_mmu_memory_cache_nr_free_objects(mc);
 
@@ -732,6 +744,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_private_sp_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -874,6 +887,23 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
 	return slot;
 }
 
+static inline bool __is_private_gfn(struct kvm *kvm, gfn_t gfn_stolen_bits)
+{
+	gfn_t gfn_shared_mask = kvm->arch.gfn_shared_mask;
+
+	return gfn_shared_mask && !(gfn_shared_mask & gfn_stolen_bits);
+}
+
+static inline bool is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn_stolen_bits)
+{
+	return __is_private_gfn(vcpu->kvm, gfn_stolen_bits);
+}
+
+static inline bool is_private_spte(struct kvm *kvm, u64 *sptep)
+{
+	return __is_private_gfn(kvm, sptep_to_sp(sptep)->gfn);
+}
+
 /*
  * About rmap_head encoding:
  *
@@ -1023,7 +1053,7 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 	return pte_list_add(vcpu, spte, rmap_head);
 }
 
-static void rmap_remove(struct kvm *kvm, u64 *spte)
+static void rmap_remove(struct kvm *kvm, u64 *spte, u64 old_spte)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
@@ -1033,6 +1063,10 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
 	rmap_head = gfn_to_rmap(kvm, gfn, sp);
 	__pte_list_remove(spte, rmap_head);
+
+	if (__is_private_gfn(kvm, sp->gfn_stolen_bits))
+		kvm_x86_ops.drop_private_spte(kvm, gfn, sp->role.level - 1,
+					      spte_to_pfn(old_spte));
 }
 
 /*
@@ -1070,7 +1104,8 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
 	iter->pos = 0;
 	sptep = iter->desc->sptes[iter->pos];
 out:
-	BUG_ON(!is_shadow_present_pte(*sptep));
+	BUG_ON(!is_shadow_present_pte(*sptep) &&
+	       !is_zapped_private_pte(*sptep));
 	return sptep;
 }
 
@@ -1115,8 +1150,9 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
 	u64 old_spte = mmu_spte_clear_track_bits(sptep);
 
-	if (is_shadow_present_pte(old_spte))
-		rmap_remove(kvm, sptep);
+	if (is_shadow_present_pte(old_spte) ||
+	    is_zapped_private_pte(old_spte))
+		rmap_remove(kvm, sptep, old_spte);
 }
 
 
@@ -1364,17 +1400,51 @@ static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
 	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
 }
 
+static bool kvm_mmu_zap_private_spte(struct kvm *kvm, u64 *sptep)
+{
+	struct kvm_mmu_page *sp;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+
+	/* Skip the lookup if the VM doesn't support private memory. */
+	if (likely(!kvm->arch.gfn_shared_mask))
+		return false;
+
+	sp = sptep_to_sp(sptep);
+	if (!__is_private_gfn(kvm, sp->gfn_stolen_bits))
+		return false;
+
+	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+	pfn = spte_to_pfn(*sptep);
+
+	kvm_x86_ops.zap_private_spte(kvm, gfn, sp->role.level - 1);
+
+	__mmu_spte_clear_track_bits(sptep,
+				    SPTE_PRIVATE_ZAPPED | pfn << PAGE_SHIFT);
+	return true;
+}
+
 static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	while ((sptep = rmap_get_first(rmap_head, &iter))) {
+restart:
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
-		pte_list_remove(rmap_head, sptep);
+		if (is_zapped_private_pte(*sptep))
+			continue;
+
 		flush = true;
+
+		/* Keep the rmap if the private SPTE couldn't be zapped. */
+		if (kvm_mmu_zap_private_spte(kvm, sptep))
+			continue;
+
+		pte_list_remove(rmap_head, sptep);
+		goto restart;
 	}
 
 	return flush;
@@ -1408,6 +1478,9 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 
 		need_flush = 1;
 
+		/* Private page relocation is not yet supported. */
+		KVM_BUG_ON(is_private_spte(kvm, sptep), kvm);
+
 		if (pte_write(*ptep)) {
 			pte_list_remove(rmap_head, sptep);
 			goto restart;
@@ -1673,7 +1746,7 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
 	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
 }
 
-static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
+static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
 	hlist_del(&sp->hash_link);
@@ -1681,6 +1754,11 @@ static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 	free_page((unsigned long)sp->spt);
 	if (!sp->role.direct)
 		free_page((unsigned long)sp->gfns);
+	if (sp->private_sp &&
+	    !kvm_x86_ops.free_private_sp(kvm, sp->gfn, sp->role.level,
+					 sp->private_sp))
+		free_page((unsigned long)sp->private_sp);
+
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
@@ -1711,7 +1789,8 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
 	mmu_spte_clear_no_track(parent_pte);
 }
 
-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
+					       int direct, bool private)
 {
 	struct kvm_mmu_page *sp;
 
@@ -1727,7 +1806,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
 	 * comments in kvm_zap_obsolete_pages().
 	 */
 	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
-	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
+	if (private)
+		list_add(&sp->link, &vcpu->kvm->arch.private_mmu_pages);
+	else
+		list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
 }
@@ -2146,7 +2228,8 @@ static struct kvm_mmu_page *__kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 
 	++vcpu->kvm->stat.mmu_cache_miss;
 
-	sp = kvm_mmu_alloc_page(vcpu, direct);
+	sp = kvm_mmu_alloc_page(vcpu, direct,
+				is_private_gfn(vcpu, gfn_stolen_bits));
 
 	sp->gfn = gfn;
 	sp->gfn_stolen_bits = gfn_stolen_bits;
@@ -2213,8 +2296,13 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
 			     struct kvm_vcpu *vcpu, u64 addr)
 {
-	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root_hpa,
-				    addr);
+	hpa_t root;
+
+	if (is_private_gfn(vcpu, addr >> PAGE_SHIFT))
+		root = vcpu->arch.mmu->private_root_hpa;
+	else
+		root = vcpu->arch.mmu->root_hpa;
+	shadow_walk_init_using_root(iterator, vcpu, root, addr);
 }
 
 static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
@@ -2291,7 +2379,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 	struct kvm_mmu_page *child;
 
 	pte = *spte;
-	if (is_shadow_present_pte(pte)) {
+	if (is_shadow_present_pte(pte) || is_zapped_private_pte(pte)) {
 		if (is_last_spte(pte, sp->role.level)) {
 			drop_spte(kvm, spte);
 			if (is_large_pte(pte))
@@ -2300,6 +2388,9 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
 			drop_parent_pte(child, spte);
 
+			if (!is_shadow_present_pte(pte))
+				return 0;
+
 			/*
 			 * Recursively zap nested TDP SPs, parentless SPs are
 			 * unlikely to be used again in the near future.  This
@@ -2450,7 +2541,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
 		WARN_ON(!sp->role.invalid || sp->root_count);
-		kvm_mmu_free_page(sp);
+		kvm_mmu_free_page(kvm, sp);
 	}
 }
 
@@ -2663,29 +2754,33 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	int set_spte_ret;
 	int ret = RET_PF_FIXED;
 	bool flush = false;
+	u64 pte = *sptep;
 
 	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 		 *sptep, write_fault, gfn);
 
-	if (is_shadow_present_pte(*sptep)) {
+	if (is_shadow_present_pte(pte)) {
 		/*
 		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 		 * the parent of the now unreachable PTE.
 		 */
-		if (level > PG_LEVEL_4K && !is_large_pte(*sptep)) {
+		if (level > PG_LEVEL_4K && !is_large_pte(pte)) {
 			struct kvm_mmu_page *child;
-			u64 pte = *sptep;
 
 			child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
 			drop_parent_pte(child, sptep);
 			flush = true;
-		} else if (pfn != spte_to_pfn(*sptep)) {
+		} else if (pfn != spte_to_pfn(pte)) {
 			pgprintk("hfn old %llx new %llx\n",
-				 spte_to_pfn(*sptep), pfn);
+				 spte_to_pfn(pte), pfn);
 			drop_spte(vcpu->kvm, sptep);
 			flush = true;
 		} else
 			was_rmapped = 1;
+	} else if (is_zapped_private_pte(pte)) {
+		WARN_ON(pfn != spte_to_pfn(pte));
+		ret = RET_PF_UNZAPPED;
+		was_rmapped = 1;
 	}
 
 	set_spte_ret = set_spte(vcpu, sptep, pte_access, level, gfn, pfn,
@@ -2918,6 +3013,53 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 	}
 }
 
+static void kvm_mmu_link_private_sp(struct kvm_vcpu *vcpu,
+				    struct kvm_mmu_page *sp)
+{
+	void *p = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_private_sp_cache);
+
+	if (!kvm_x86_ops.link_private_sp(vcpu, sp->gfn, sp->role.level, p))
+		sp->private_sp = p;
+	else
+		free_page((unsigned long)p);
+}
+
+static void kvm_mmu_zap_alias_spte(struct kvm_vcpu *vcpu, gfn_t gfn,
+				   gpa_t gpa_alias)
+{
+	struct kvm_shadow_walk_iterator it;
+	struct kvm_rmap_head *rmap_head;
+	struct kvm *kvm = vcpu->kvm;
+	struct rmap_iterator iter;
+	struct kvm_mmu_page *sp;
+	u64 *sptep;
+
+	for_each_shadow_entry(vcpu, gpa_alias, it) {
+		if (!is_shadow_present_pte(*it.sptep))
+			break;
+	}
+
+	sp = sptep_to_sp(it.sptep);
+	if (!is_last_spte(*it.sptep, sp->role.level))
+		return;
+
+	rmap_head = gfn_to_rmap(kvm, gfn, sp);
+	if (!kvm_zap_rmapp(kvm, rmap_head))
+		return;
+
+	kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+
+	if (!is_private_gfn(vcpu, sp->gfn_stolen_bits))
+		return;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
+		if (!is_zapped_private_pte(*sptep))
+			continue;
+
+		drop_spte(kvm, sptep);
+	}
+}
+
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 			int map_writable, int max_level, kvm_pfn_t pfn,
 			bool prefault, bool is_tdp)
@@ -2933,10 +3075,18 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	gfn_t gfn = (gpa & ~gpa_stolen_mask) >> PAGE_SHIFT;
 	gfn_t gfn_stolen_bits = (gpa & gpa_stolen_mask) >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	bool is_private = is_private_gfn(vcpu, gfn_stolen_bits);
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
+	if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn)) {
+		if (is_private)
+			return -EFAULT;
+	} else if (vcpu->kvm->arch.gfn_shared_mask) {
+		kvm_mmu_zap_alias_spte(vcpu, gfn, gpa ^ gpa_stolen_mask);
+	}
+
 	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
 					huge_page_disallowed, &req_level);
 
@@ -2964,6 +3114,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		link_shadow_page(vcpu, it.sptep, sp);
 		if (is_tdp && huge_page_disallowed && req_level >= it.level)
 			account_huge_nx_page(vcpu->kvm, sp);
+		if (is_private)
+			kvm_mmu_link_private_sp(vcpu, sp);
 	}
 
 	ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
@@ -2972,7 +3124,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	if (ret == RET_PF_SPURIOUS)
 		return ret;
 
-	direct_pte_prefetch(vcpu, it.sptep);
+	if (!is_private)
+		direct_pte_prefetch(vcpu, it.sptep);
+	else if (ret == RET_PF_UNZAPPED)
+		kvm_x86_ops.unzap_private_spte(vcpu->kvm, gfn, level - 1);
+	else if (!WARN_ON_ONCE(ret != RET_PF_FIXED))
+		kvm_x86_ops.set_private_spte(vcpu, gfn, level, pfn);
 	++vcpu->stat.pf_fixed;
 	return ret;
 }
@@ -3242,7 +3399,9 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			    VALID_PAGE(mmu->prev_roots[i].hpa))
 				break;
 
-		if (i == KVM_MMU_NUM_PREV_ROOTS)
+		if (i == KVM_MMU_NUM_PREV_ROOTS &&
+		    (!(roots_to_free & KVM_MMU_ROOT_PRIVATE) ||
+		     !VALID_PAGE(mmu->private_root_hpa)))
 			return;
 	}
 
@@ -3268,6 +3427,9 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		mmu->root_pgd = 0;
 	}
 
+	if (roots_to_free & KVM_MMU_ROOT_PRIVATE)
+		mmu_free_root_page(kvm, &mmu->private_root_hpa, &invalid_list);
+
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 	spin_unlock(&kvm->mmu_lock);
 }
@@ -3285,8 +3447,9 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 	return ret;
 }
 
-static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
-			    u8 level, bool direct)
+static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn,
+			    gfn_t gfn_stolen_bits, gva_t gva, u8 level,
+			    bool direct)
 {
 	struct kvm_mmu_page *sp;
 
@@ -3296,7 +3459,8 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		return INVALID_PAGE;
 	}
-	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
+	sp = __kvm_mmu_get_page(vcpu, gfn, gfn_stolen_bits, gva, level, direct,
+				ACC_ALL);
 	++sp->root_count;
 
 	spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3306,6 +3470,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 {
 	u8 shadow_root_level = vcpu->arch.mmu->shadow_root_level;
+	gfn_t gfn_shared = vcpu->kvm->arch.gfn_shared_mask;
 	hpa_t root;
 	unsigned i;
 
@@ -3316,17 +3481,23 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 			return -ENOSPC;
 		vcpu->arch.mmu->root_hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
-				      true);
-
+		if (gfn_shared && !VALID_PAGE(vcpu->arch.mmu->private_root_hpa)) {
+			root = mmu_alloc_root(vcpu, 0, 0, 0, shadow_root_level, true);
+			if (!VALID_PAGE(root))
+				return -ENOSPC;
+			vcpu->arch.mmu->private_root_hpa = root;
+		}
+		root = mmu_alloc_root(vcpu, 0, gfn_shared, 0, shadow_root_level, true);
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
 		vcpu->arch.mmu->root_hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
+		WARN_ON_ONCE(gfn_shared);
+
 		for (i = 0; i < 4; ++i) {
 			MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->pae_root[i]));
 
-			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
+			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), 0,
 					      i << 30, PT32_ROOT_LEVEL, true);
 			if (!VALID_PAGE(root))
 				return -ENOSPC;
@@ -3362,7 +3533,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
 		MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu->root_hpa));
 
-		root = mmu_alloc_root(vcpu, root_gfn, 0,
+		root = mmu_alloc_root(vcpu, root_gfn, 0, 0,
 				      vcpu->arch.mmu->shadow_root_level, false);
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
@@ -3392,7 +3563,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 				return 1;
 		}
 
-		root = mmu_alloc_root(vcpu, root_gfn, i << 30,
+		root = mmu_alloc_root(vcpu, root_gfn, 0, i << 30,
 				      PT32_ROOT_LEVEL, false);
 		if (!VALID_PAGE(root))
 			return -ENOSPC;
@@ -4871,13 +5042,18 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_load);
 
-void kvm_mmu_unload(struct kvm_vcpu *vcpu)
+static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, u32 roots_to_free)
 {
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
+	kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, roots_to_free);
 	WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa));
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
+	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, roots_to_free);
 	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
 }
+
+void kvm_mmu_unload(struct kvm_vcpu *vcpu)
+{
+	__kvm_mmu_unload(vcpu, KVM_MMU_ROOTS_ALL);
+}
 EXPORT_SYMBOL_GPL(kvm_mmu_unload);
 
 static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
@@ -5354,6 +5530,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 	int i;
 
 	mmu->root_hpa = INVALID_PAGE;
+	mmu->private_root_hpa = INVALID_PAGE;
 	mmu->root_pgd = 0;
 	mmu->translate_gpa = translate_gpa;
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
@@ -5640,6 +5817,9 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		sp = sptep_to_sp(sptep);
 		pfn = spte_to_pfn(*sptep);
 
+		/* Private page dirty logging is not supported. */
+		KVM_BUG_ON(is_private_spte(kvm, sptep), kvm);
+
 		/*
 		 * We cannot do huge page mapping for indirect shadow pages,
 		 * which are found on the last rmap (level = 1) when not using
@@ -5748,7 +5928,7 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
 
-void kvm_mmu_zap_all(struct kvm *kvm)
+static void __kvm_mmu_zap_all(struct kvm *kvm, struct list_head *mmu_pages)
 {
 	struct kvm_mmu_page *sp, *node;
 	LIST_HEAD(invalid_list);
@@ -5756,7 +5936,7 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 
 	spin_lock(&kvm->mmu_lock);
 restart:
-	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
+	list_for_each_entry_safe(sp, node, mmu_pages, link) {
 		if (WARN_ON(sp->role.invalid))
 			continue;
 		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
@@ -5764,7 +5944,6 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 		if (cond_resched_lock(&kvm->mmu_lock))
 			goto restart;
 	}
-
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 
 	if (kvm->arch.tdp_mmu_enabled)
@@ -5773,6 +5952,17 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	spin_unlock(&kvm->mmu_lock);
 }
 
+void kvm_mmu_zap_all_active(struct kvm *kvm)
+{
+	__kvm_mmu_zap_all(kvm, &kvm->arch.active_mmu_pages);
+}
+
+void kvm_mmu_zap_all_private(struct kvm *kvm)
+{
+	__kvm_mmu_zap_all(kvm, &kvm->arch.private_mmu_pages);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_zap_all_private);
+
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
 {
 	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
@@ -5992,7 +6182,7 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
 
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_unload(vcpu);
+	__kvm_mmu_unload(vcpu, KVM_MMU_ROOTS_ALL_INC_PRIVATE);
 	free_mmu_pages(&vcpu->arch.root_mmu);
 	free_mmu_pages(&vcpu->arch.guest_mmu);
 	mmu_free_memory_caches(vcpu);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4d30f1562142..f385a05d5eb7 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -41,6 +41,8 @@ struct kvm_mmu_page {
 	u64 *spt;
 	/* hold the gfn of each spte inside spt */
 	gfn_t *gfns;
+	/* associated private shadow page, e.g. SEPT page */
+	void *private_sp;
 	int root_count;          /* Currently serving as active root */
 	unsigned int unsync_children;
 	struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
@@ -120,6 +122,7 @@ static inline bool kvm_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *sp)
  * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
  * RET_PF_FIXED: The faulting entry has been fixed.
  * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
+ * RET_PF_UNZAPPED: A private SPTE was unzapped.
  */
 enum {
 	RET_PF_RETRY = 0,
@@ -127,6 +130,7 @@ enum {
 	RET_PF_INVALID,
 	RET_PF_FIXED,
 	RET_PF_SPURIOUS,
+	RET_PF_UNZAPPED,
 };
 
 /* Bits which may be returned by set_spte() */
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index a5eab5607606..89b5fdaf165b 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -8,6 +8,9 @@
 #define PT_FIRST_AVAIL_BITS_SHIFT 10
 #define PT64_SECOND_AVAIL_BITS_SHIFT 54
 
+/* Masks that used to track metadata for not-present SPTEs. */
+#define SPTE_PRIVATE_ZAPPED	BIT_ULL(62)
+
 /*
  * The mask used to denote special SPTEs, which can be either MMIO SPTEs or
  * Access Tracking SPTEs.
@@ -176,6 +179,11 @@ static inline bool is_access_track_spte(u64 spte)
 	return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
 }
 
+static inline bool is_zapped_private_pte(u64 pte)
+{
+	return !!(pte & SPTE_PRIVATE_ZAPPED);
+}
+
 static inline bool __is_shadow_present_pte(u64 pte)
 {
 	/*
@@ -191,7 +199,8 @@ static inline bool __is_shadow_present_pte(u64 pte)
 
 static inline bool is_shadow_present_pte(u64 pte)
 {
-	return __is_shadow_present_pte(pte) && !is_mmio_spte(pte);
+	return __is_shadow_present_pte(pte) && !is_mmio_spte(pte) &&
+	       !is_zapped_private_pte(pte);
 }
 
 static inline int is_large_pte(u64 pte)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c233e7ef3366..f7ffb36c318c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10388,6 +10388,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.private_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
 	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
@@ -10771,7 +10772,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
-	kvm_mmu_zap_all(kvm);
+	/* Zapping private pages must be deferred until VM destruction. */
+	kvm_mmu_zap_all_active(kvm);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,

From patchwork Mon Nov 16 18:26:27 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910387
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AC31CC64E75
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:08 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 8B75220756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388573AbgKPSa1 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:30:27 -0500
Received: from mga06.intel.com ([134.134.136.31]:20648 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388248AbgKPS2N (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:13 -0500
IronPort-SDR: 
 IFc14gUO2Z21yk1tOapc5cJMFA4y1hLL5SXcpeOQWMhDpfcFFWbJyqwryajO4U+H/+7NMVOU91
 lDbJ2k2zBY7g==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410060"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410060"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:12 -0800
IronPort-SDR: 
 AGHO0QW4ocQG/biBs+I4G+pVYmbyPqPWn2A7phsCgJeVBECckmj1ZtRt5wvYmJ6g9ba3aXxdjU
 +BgY8hssPPfA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528182"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:11 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 42/67] KVM: x86/mmu: Move 'pfn' variable to caller of
 direct_page_fault()
Date: Mon, 16 Nov 2020 10:26:27 -0800
Message-Id: 
 <bb0705a54354544f525e683509ce8275dedcf5b9.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

When adding pages prior to boot, TDX will need the resulting host pfn so
that it can be passed to TDADDPAGE (TDX-SEAM always works with physical
addresses as it has its own page tables).  Start plumbing pfn back up
the page fault stack.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e4e0c883b52d..474173bceb54 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3916,14 +3916,14 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 }
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			     bool prefault, int max_level, bool is_tdp)
+			     bool prefault, int max_level, bool is_tdp,
+			     kvm_pfn_t *pfn)
 {
 	bool write = error_code & PFERR_WRITE_MASK;
 	bool map_writable;
 
 	gfn_t gfn = vcpu_gpa_to_gfn_unalias(vcpu, gpa);
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
 	int r;
 
 	if (page_fault_handle_page_track(vcpu, error_code, gfn))
@@ -3942,10 +3942,10 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
+	if (try_async_pf(vcpu, prefault, gfn, gpa, pfn, write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, *pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3958,25 +3958,27 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    *pfn, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
-				 prefault, is_tdp);
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level,
+				 *pfn, prefault, is_tdp);
 
 out_unlock:
 	spin_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	kvm_release_pfn_clean(*pfn);
 	return r;
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
 				u32 error_code, bool prefault)
 {
+	kvm_pfn_t pfn;
+
 	pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
 
 	/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
 	return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault,
-				 PG_LEVEL_2M, false);
+				 PG_LEVEL_2M, false, &pfn);
 }
 
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
@@ -4015,6 +4017,7 @@ EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		       bool prefault)
 {
+	kvm_pfn_t pfn;
 	int max_level;
 
 	for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
@@ -4028,7 +4031,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	}
 
 	return direct_page_fault(vcpu, gpa, error_code, prefault,
-				 max_level, true);
+				 max_level, true, &pfn);
 }
 
 static void nonpaging_init_context(struct kvm_vcpu *vcpu,

From patchwork Mon Nov 16 18:26:28 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910315
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B6374C64E75
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:17 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9DC92241A7
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388274AbgKPS2N (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:13 -0500
Received: from mga06.intel.com ([134.134.136.31]:20651 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388259AbgKPS2N (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:13 -0500
IronPort-SDR: 
 cRm1lLdo2O7a5ILAzuwGQooya3p3TZdwgyL0d5L27v1sIySSrrZlh/2+zVea/tg8gR7NyM1ZYF
 8CYEfPNCwNQg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410061"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410061"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:12 -0800
IronPort-SDR: 
 eYT6/Tv30ckhTif6v0quFkZ2VzXtOKcWyKkOqyz5yod4zd0MkZG6YCe2ot40BlSi4h0KWwgO80
 +klwdXy+CK0Q==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528187"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:12 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 43/67] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for
 use by TDX
Date: Mon, 16 Nov 2020 10:26:28 -0800
Message-Id: 
 <3c4a842b7f989090f58da9ec50e238e5cd06588a.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce a helper to directly (pun intented) fault-in a TDP page
without having to go through the full page fault path.  This allows
TDX to get the resulting pfn and also allows the RET_PF_* enums to
stay in mmu.c where they belong.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.h     |  3 +++
 arch/x86/kvm/mmu/mmu.c | 25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3b1243cfc280..a6bb930d1549 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -115,6 +115,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault);
 }
 
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level);
+
 /*
  * Currently, we have two sorts of write-protection, a) the first one
  * write-protects guest page to sync the guest modification, b) another one is
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 474173bceb54..bb59e80ade81 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4034,6 +4034,31 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 				 max_level, true, &pfn);
 }
 
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level)
+{
+	kvm_pfn_t pfn;
+	int r;
+
+	if (mmu_topup_memory_caches(vcpu, false))
+		return KVM_PFN_ERR_FAULT;
+
+	/*
+	 * Loop on the page fault path to handle the case where an mmu_notifier
+	 * invalidation triggers RET_PF_RETRY.  In the normal page fault path,
+	 * KVM needs to resume the guest in case the invalidation changed any
+	 * of the page fault properties, i.e. the gpa or error code.  For this
+	 * path, the gpa and error code are fixed by the caller, and the caller
+	 * expects failure if and only if the page fault can't be fixed.
+	 */
+	do {
+		r = direct_page_fault(vcpu, gpa, error_code, false, max_level,
+				      true, &pfn);
+	} while (r == RET_PF_RETRY && !is_error_noslot_pfn(pfn));
+	return pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
 static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 				   struct kvm_mmu *context)
 {

From patchwork Mon Nov 16 18:26:29 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910381
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2ED0FC63798
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:08 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 0479E20756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:31:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388565AbgKPSaS (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:30:18 -0500
Received: from mga06.intel.com ([134.134.136.31]:20648 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388240AbgKPS2N (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:13 -0500
IronPort-SDR: 
 YQbaj94sw+AtP1tvP/5R/h80PGTpyMTZjl0PGRhINPKO+fPE/mfq8Ra4p5E8kWWzDYUv+LoNBW
 U0YEbAhib0Bw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410062"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410062"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:13 -0800
IronPort-SDR: 
 MHJoYFi5Z2B04lQPD5n2C8aTvhG0LAxXfzFdOcOSg7d2aB6iJx111GYsH4ertiuWfguigXDUfX
 eXR3NZzaXBmw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528190"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:12 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 44/67] KVM: VMX: Modify NMI and INTR handlers to take
 intr_info as param
Date: Mon, 16 Nov 2020 10:26:29 -0800
Message-Id: 
 <ea5bbfb9c889588148f31f8ecb02ac6c4c692dea.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Pass intr_info to the NMI and INTR handlers instead of pulling it from
vcpu_vmx in preparation for sharing the bulk of the handlers with TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 997a391f0842..5d6c3a50230d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6358,25 +6358,21 @@ static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
 	kvm_after_interrupt(vcpu);
 }
 
-static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
+static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
 {
-	u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
-
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(intr_info))
-		vmx->vcpu.arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags();
+		vcpu->arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags();
 	/* Handle machine checks before interrupts are enabled */
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+		handle_interrupt_nmi_irqoff(vcpu, intr_info);
 }
 
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
+static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
 {
-	u32 intr_info = vmx_get_intr_info(vcpu);
-
 	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
 	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
@@ -6389,9 +6385,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	if (vmx->exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu);
+		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vmx);
+		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
 
 static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)

From patchwork Mon Nov 16 18:26:30 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910375
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6A580C6379F
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:13 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 3DD1E20756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388534AbgKPSaD (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:30:03 -0500
Received: from mga06.intel.com ([134.134.136.31]:20651 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388273AbgKPS2O (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:14 -0500
IronPort-SDR: 
 RHulxorf8wXbsCXABFfQS/uRbN0sjVJydxuPzcgVK/sOoJS66zYmkKqRPQGN5R4v/ChAQFvI6o
 TK4WvItcHNSw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410064"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410064"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:13 -0800
IronPort-SDR: 
 A+uGQgeEKhXMSryBBIqrS0JVVZyQLf7RQ513GN58RePX+xhVjfjzSRJf5XBGz8RL1PlUJxQeLB
 iZfi8OZPjLbg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528200"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:13 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 45/67] KVM: VMX: Move NMI/exception handler to common
 helper
Date: Mon, 16 Nov 2020 10:26:30 -0800
Message-Id: 
 <81a8753361caa8d1a32a8f125cb07af1e7cc75b8.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/common.h | 54 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 42 +++++-------------------------
 2 files changed, 60 insertions(+), 36 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/common.h

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
new file mode 100644
index 000000000000..146f1da9c88d
--- /dev/null
+++ b/arch/x86/kvm/vmx/common.h
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#ifndef __KVM_X86_VMX_COMMON_H
+#define __KVM_X86_VMX_COMMON_H
+
+#include <linux/kvm_host.h>
+
+#include <asm/traps.h>
+
+#include "vmcs.h"
+#include "x86.h"
+
+void vmx_handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info);
+
+/*
+ * Trigger machine check on the host. We assume all the MSRs are already set up
+ * by the CPU and that we still run on the same CPU as the MCE occurred on.
+ * We pass a fake environment to the machine check handler because we want
+ * the guest to be always treated like user space, no matter what context
+ * it used internally.
+ */
+static inline void kvm_machine_check(void)
+{
+#if defined(CONFIG_X86_MCE)
+	struct pt_regs regs = {
+		.cs = 3, /* Fake ring 3 no matter what the guest ran on */
+		.flags = X86_EFLAGS_IF,
+	};
+
+	do_machine_check(&regs);
+#endif
+}
+
+static inline void vmx_handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
+							u32 intr_info)
+{
+	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
+	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
+		return;
+
+	vmx_handle_interrupt_nmi_irqoff(vcpu, intr_info);
+}
+
+static inline void vmx_handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu,
+						  u32 intr_info)
+{
+	/* Handle machine checks before interrupts are enabled */
+	if (is_machine_check(intr_info))
+		kvm_machine_check();
+	/* We need to handle NMIs before interrupts are enabled */
+	else if (is_nmi(intr_info))
+		vmx_handle_interrupt_nmi_irqoff(vcpu, intr_info);
+}
+
+#endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d6c3a50230d..e8b60d447e27 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -49,6 +49,7 @@
 #include <asm/vmx.h>
 
 #include "capabilities.h"
+#include "common.h"
 #include "cpuid.h"
 #include "evmcs.h"
 #include "irq.h"
@@ -4708,25 +4709,6 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
 	return 1;
 }
 
-/*
- * Trigger machine check on the host. We assume all the MSRs are already set up
- * by the CPU and that we still run on the same CPU as the MCE occurred on.
- * We pass a fake environment to the machine check handler because we want
- * the guest to be always treated like user space, no matter what context
- * it used internally.
- */
-static void kvm_machine_check(void)
-{
-#if defined(CONFIG_X86_MCE)
-	struct pt_regs regs = {
-		.cs = 3, /* Fake ring 3 no matter what the guest ran on */
-		.flags = X86_EFLAGS_IF,
-	};
-
-	do_machine_check(&regs);
-#endif
-}
-
 static int handle_machine_check(struct kvm_vcpu *vcpu)
 {
 	/* handled by vmx_vcpu_run() */
@@ -6348,7 +6330,7 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+void vmx_handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
 {
 	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
 	gate_desc *desc = (gate_desc *)host_idt_base + vector;
@@ -6363,21 +6345,8 @@ static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(intr_info))
 		vcpu->arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags();
-	/* Handle machine checks before interrupts are enabled */
-	else if (is_machine_check(intr_info))
-		kvm_machine_check();
-	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(vcpu, intr_info);
-}
-
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
-{
-	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
-	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
-		return;
-
-	handle_interrupt_nmi_irqoff(vcpu, intr_info);
+	else
+		vmx_handle_exception_nmi_irqoff(vcpu, intr_info);
 }
 
 static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
@@ -6385,7 +6354,8 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	if (vmx->exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI)
 		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }

From patchwork Mon Nov 16 18:26:31 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910377
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4B878C6379D
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:13 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 206FE2231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388528AbgKPS34 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:56 -0500
Received: from mga06.intel.com ([134.134.136.31]:20648 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388279AbgKPS2O (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:14 -0500
IronPort-SDR: 
 1OC5OsQfz1WMZZtsR6jl+281GQ6D1FZeTv8n5NgxTEq6W13oLszpaWN1JtLhvjAhz/T5C76pHb
 7drPuLJNpCaQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410065"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410065"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:14 -0800
IronPort-SDR: 
 2BaVWLE3VepDjro43gDMggXU3wky1zPXMm1oOA52AJigpjRXLRQ9LvGnWpk3TpbG6oWNsfgQsA
 FyPbUwfkQ4BA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528211"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:13 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 46/67] KVM: VMX: Split out guts of EPT violation to
 common/exposed function
Date: Mon, 16 Nov 2020 10:26:31 -0800
Message-Id: 
 <fbcc4f78f566367a1264ced885a6e646bfde5431.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/common.h | 29 +++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 32 +++++---------------------------
 2 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 146f1da9c88d..58edf1296cbd 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -5,8 +5,11 @@
 #include <linux/kvm_host.h>
 
 #include <asm/traps.h>
+#include <asm/vmx.h>
 
+#include "mmu.h"
 #include "vmcs.h"
+#include "vmx.h"
 #include "x86.h"
 
 void vmx_handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info);
@@ -51,4 +54,30 @@ static inline void vmx_handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu,
 		vmx_handle_interrupt_nmi_irqoff(vcpu, intr_info);
 }
 
+static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
+					     unsigned long exit_qualification)
+{
+	u64 error_code;
+
+	/* Is it a read fault? */
+	error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
+		     ? PFERR_USER_MASK : 0;
+	/* Is it a write fault? */
+	error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE)
+		      ? PFERR_WRITE_MASK : 0;
+	/* Is it a fetch fault? */
+	error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
+		      ? PFERR_FETCH_MASK : 0;
+	/* ept page table entry is present? */
+	error_code |= (exit_qualification &
+		       (EPT_VIOLATION_READABLE | EPT_VIOLATION_WRITABLE |
+			EPT_VIOLATION_EXECUTABLE))
+		      ? PFERR_PRESENT_MASK : 0;
+
+	error_code |= (exit_qualification & 0x100) != 0 ?
+	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
+
+	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+}
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e8b60d447e27..0dad9d1816b0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5277,11 +5277,10 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
 
 static int handle_ept_violation(struct kvm_vcpu *vcpu)
 {
-	unsigned long exit_qualification;
-	gpa_t gpa;
-	u64 error_code;
+	unsigned long exit_qualification = vmx_get_exit_qual(vcpu);
+	gpa_t gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 
-	exit_qualification = vmx_get_exit_qual(vcpu);
+	trace_kvm_page_fault(gpa, exit_qualification);
 
 	/*
 	 * EPT violation happened while executing iret from NMI,
@@ -5290,30 +5289,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	 * AAK134, BY25.
 	 */
 	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
-			enable_vnmi &&
-			(exit_qualification & INTR_INFO_UNBLOCK_NMI))
+	    enable_vnmi && (exit_qualification & INTR_INFO_UNBLOCK_NMI))
 		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
 
-	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-	trace_kvm_page_fault(gpa, exit_qualification);
-
-	/* Is it a read fault? */
-	error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
-		     ? PFERR_USER_MASK : 0;
-	/* Is it a write fault? */
-	error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE)
-		      ? PFERR_WRITE_MASK : 0;
-	/* Is it a fetch fault? */
-	error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
-		      ? PFERR_FETCH_MASK : 0;
-	/* ept page table entry is present? */
-	error_code |= (exit_qualification &
-		       (EPT_VIOLATION_READABLE | EPT_VIOLATION_WRITABLE |
-			EPT_VIOLATION_EXECUTABLE))
-		      ? PFERR_PRESENT_MASK : 0;
-
-	error_code |= (exit_qualification & 0x100) != 0 ?
-	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
 
 	vcpu->arch.exit_qualification = exit_qualification;
 
@@ -5328,7 +5306,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gpa)))
 		return kvm_emulate_instruction(vcpu, 0);
 
-	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+	return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification);
 }
 
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)

From patchwork Mon Nov 16 18:26:32 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910349
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F3AC5C2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:37 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9CF41206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732502AbgKPS2V (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:21 -0500
Received: from mga06.intel.com ([134.134.136.31]:20651 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388289AbgKPS2R (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:17 -0500
IronPort-SDR: 
 BTeLHS9JlxKxF175a5D8YFPWTwm2SSaQ1u08KgQ50OE7lgwvqgQTFilowAl82s6FA1FFBD1xLl
 r3kcFr0EsQhQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410068"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410068"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:14 -0800
IronPort-SDR: 
 UD3yKWGQO3tGvFl0MpsnzYn91WJuGOszSka5r9isZ6wW8QEURdUge9LxmwlWloWa4kUclIR8Lk
 d/1Yd2EEPLMQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528221"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:14 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 47/67] KVM: VMX: Define EPT Violation architectural bits
Date: Mon, 16 Nov 2020 10:26:32 -0800
Message-Id: 
 <008b247c268b398120019c316083ae80cd26e2d7.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Define the EPT Violation #VE control bit, #VE info VMCS fields, and the
suppress #VE bit for EPT entries.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/vmx.h         | 4 ++++
 arch/x86/include/asm/vmxfeatures.h | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index f8ba5289ecb0..8a3a2e2dc208 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -67,6 +67,7 @@
 #define SECONDARY_EXEC_ENCLS_EXITING		VMCS_CONTROL_BIT(ENCLS_EXITING)
 #define SECONDARY_EXEC_RDSEED_EXITING		VMCS_CONTROL_BIT(RDSEED_EXITING)
 #define SECONDARY_EXEC_ENABLE_PML               VMCS_CONTROL_BIT(PAGE_MOD_LOGGING)
+#define SECONDARY_EXEC_EPT_VIOLATION_VE		VMCS_CONTROL_BIT(EPT_VIOLATION_VE)
 #define SECONDARY_EXEC_PT_CONCEAL_VMX		VMCS_CONTROL_BIT(PT_CONCEAL_VMX)
 #define SECONDARY_EXEC_XSAVES			VMCS_CONTROL_BIT(XSAVES)
 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC	VMCS_CONTROL_BIT(MODE_BASED_EPT_EXEC)
@@ -213,6 +214,8 @@ enum vmcs_field {
 	VMREAD_BITMAP_HIGH              = 0x00002027,
 	VMWRITE_BITMAP                  = 0x00002028,
 	VMWRITE_BITMAP_HIGH             = 0x00002029,
+	VE_INFO_ADDRESS                 = 0x0000202A,
+	VE_INFO_ADDRESS_HIGH            = 0x0000202B,
 	XSS_EXIT_BITMAP                 = 0x0000202C,
 	XSS_EXIT_BITMAP_HIGH            = 0x0000202D,
 	ENCLS_EXITING_BITMAP		= 0x0000202E,
@@ -495,6 +498,7 @@ enum vmcs_field {
 #define VMX_EPT_IPAT_BIT    			(1ull << 6)
 #define VMX_EPT_ACCESS_BIT			(1ull << 8)
 #define VMX_EPT_DIRTY_BIT			(1ull << 9)
+#define VMX_EPT_SUPPRESS_VE_BIT			(1ull << 63)
 #define VMX_EPT_RWX_MASK                        (VMX_EPT_READABLE_MASK |       \
 						 VMX_EPT_WRITABLE_MASK |       \
 						 VMX_EPT_EXECUTABLE_MASK)
diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index 9915990fd8cf..9013e383fee6 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -75,7 +75,7 @@
 #define VMX_FEATURE_ENCLS_EXITING	( 2*32+ 15) /* "" VM-Exit on ENCLS (leaf dependent) */
 #define VMX_FEATURE_RDSEED_EXITING	( 2*32+ 16) /* "" VM-Exit on RDSEED */
 #define VMX_FEATURE_PAGE_MOD_LOGGING	( 2*32+ 17) /* "pml" Log dirty pages into buffer */
-#define VMX_FEATURE_EPT_VIOLATION_VE	( 2*32+ 18) /* "" Conditionally reflect EPT violations as #VE exceptions */
+#define VMX_FEATURE_EPT_VIOLATION_VE	( 2*32+ 18) /* Conditionally reflect EPT violations as #VE exceptions */
 #define VMX_FEATURE_PT_CONCEAL_VMX	( 2*32+ 19) /* "" Suppress VMX indicators in Processor Trace */
 #define VMX_FEATURE_XSAVES		( 2*32+ 20) /* "" Enable XSAVES and XRSTORS in guest */
 #define VMX_FEATURE_MODE_BASED_EPT_EXEC	( 2*32+ 22) /* "ept_mode_based_exec" Enable separate EPT EXEC bits for supervisor vs. user */

From patchwork Mon Nov 16 18:26:33 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910331
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4FA12C2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:53 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 129AC241A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388302AbgKPS2S (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:18 -0500
Received: from mga06.intel.com ([134.134.136.31]:20648 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1732502AbgKPS2R (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:17 -0500
IronPort-SDR: 
 PAJ7nz3NBshZSxmtMFus+xq40rTTO0f/CJ2a9nd3NnrVYvQrWtn1leR4x5WFfA2pu/UX3drHdF
 Knt0FBL5/OWA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410069"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410069"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:15 -0800
IronPort-SDR: 
 816ay1Nz1B5a6QyUprlszXtxI8+Lja0EPwUUfcwzcKP6++5IR6dn2qMdo5kO5iQyGSx/KBKzGq
 BS0S8qCvTXJg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528231"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:14 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 48/67] KVM: VMX: Define VMCS encodings for shared EPT
 pointer
Date: Mon, 16 Nov 2020 10:26:33 -0800
Message-Id: 
 <b8aae184955b84315ff2bf18a278be27f05310a7.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add the VMCS field encoding for the shared EPTP, which will be used by
TDX to have separate EPT walks for private GPAs (existing EPTP) versus
shared GPAs (new shared EPTP).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/vmx.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 8a3a2e2dc208..7c968f66d926 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -222,6 +222,8 @@ enum vmcs_field {
 	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
 	TSC_MULTIPLIER                  = 0x00002032,
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
+	SHARED_EPT_POINTER		= 0x0000203C,
+	SHARED_EPT_POINTER_HIGH		= 0x0000203D,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,

From patchwork Mon Nov 16 18:26:34 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910369
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E9AC3C64E7C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:40 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id CFD30206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388496AbgKPS3T (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:19 -0500
Received: from mga06.intel.com ([134.134.136.31]:20649 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388231AbgKPS2T (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:19 -0500
IronPort-SDR: 
 qYKxM819L4VGlwK3xwWxDDusbUqc1ZuDawWjhE0urLUwBbwTZWOPRvV+QAfrUHVc9EKUZ35Hps
 pvkUxfrjJKLw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="232410072"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="232410072"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:15 -0800
IronPort-SDR: 
 NBqfdcHH2w+k1UpcULrQdGE5u8XKDbvGOXtts7EuMQQSxZb0SOJoDa8EoGAF1Zhsug9ITC94bb
 b4O52zO9CfOg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528242"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:15 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [RFC PATCH 49/67] KVM: VMX: Add 'main.c' to wrap VMX and TDX
Date: Mon, 16 Nov 2020 10:26:34 -0800
Message-Id: 
 <ad7ff839ce9ea0a120babbaa31f0a61e0c4b108e.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Wrap the VMX kvm_x86_ops hooks in preparation of adding TDX, which can
coexist with VMX, i.e. KVM can run both VMs and TDs.  Use 'vt' for the
naming scheme as a nod to VT-x and as a concatenation of VmxTdx.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/Makefile   |   2 +-
 arch/x86/kvm/vmx/main.c | 720 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c  | 304 ++++-------------
 3 files changed, 784 insertions(+), 242 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/main.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b804444e16d4..4192b252eba0 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
 			   mmu/spte.o mmu/tdp_iter.o mmu/tdp_mmu.o
 
-kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
+kvm-intel-y		+= vmx/main.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
 kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
 
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
new file mode 100644
index 000000000000..85bc238c0852
--- /dev/null
+++ b/arch/x86/kvm/vmx/main.c
@@ -0,0 +1,720 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/moduleparam.h>
+
+#include "vmx.c"
+
+static struct kvm_x86_ops vt_x86_ops __initdata;
+
+static int __init vt_cpu_has_kvm_support(void)
+{
+	return cpu_has_vmx();
+}
+
+static int __init vt_disabled_by_bios(void)
+{
+	return vmx_disabled_by_bios();
+}
+
+static int __init vt_check_processor_compatibility(void)
+{
+	int ret;
+
+	ret = vmx_check_processor_compat();
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static __init int vt_hardware_setup(void)
+{
+	int ret;
+
+	ret = hardware_setup(&vt_x86_ops);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int vt_hardware_enable(void)
+{
+	return hardware_enable();
+}
+
+static void vt_hardware_disable(void)
+{
+	hardware_disable();
+}
+
+static bool vt_cpu_has_accelerated_tpr(void)
+{
+	return report_flexpriority();
+}
+
+static bool vt_is_vm_type_supported(unsigned long type)
+{
+	return type == KVM_X86_LEGACY_VM;
+}
+
+static int vt_vm_init(struct kvm *kvm)
+{
+	return vmx_vm_init(kvm);
+}
+
+static void vt_vm_teardown(struct kvm *kvm)
+{
+
+}
+
+static void vt_vm_destroy(struct kvm *kvm)
+{
+
+}
+
+static int vt_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	return vmx_create_vcpu(vcpu);
+}
+
+static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	return vmx_vcpu_run(vcpu);
+}
+
+static void vt_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	return vmx_free_vcpu(vcpu);
+}
+
+static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	return vmx_vcpu_reset(vcpu, init_event);
+}
+
+static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	return vmx_vcpu_load(vcpu, cpu);
+}
+
+static void vt_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	return vmx_vcpu_put(vcpu);
+}
+
+static int vt_handle_exit(struct kvm_vcpu *vcpu,
+			     enum exit_fastpath_completion fastpath)
+{
+	return vmx_handle_exit(vcpu, fastpath);
+}
+
+static void vt_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	vmx_handle_exit_irqoff(vcpu);
+}
+
+static int vt_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+{
+	return vmx_skip_emulated_instruction(vcpu);
+}
+
+static void vt_update_emulated_instruction(struct kvm_vcpu *vcpu)
+{
+	vmx_update_emulated_instruction(vcpu);
+}
+
+static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	return vmx_set_msr(vcpu, msr_info);
+}
+
+static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	return vmx_smi_allowed(vcpu, for_injection);
+}
+
+static int vt_pre_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	return vmx_pre_enter_smm(vcpu, smstate);
+}
+
+static int vt_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+{
+	return vmx_pre_leave_smm(vcpu, smstate);
+}
+
+static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	/* RSM will cause a vmexit anyway.  */
+}
+
+static bool vt_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn,
+				       int insn_len)
+{
+	return vmx_can_emulate_instruction(vcpu, insn, insn_len);
+}
+
+static int vt_check_intercept(struct kvm_vcpu *vcpu,
+				 struct x86_instruction_info *info,
+				 enum x86_intercept_stage stage,
+				 struct x86_exception *exception)
+{
+	return vmx_check_intercept(vcpu, info, stage, exception);
+}
+
+static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+{
+	return vmx_apic_init_signal_blocked(vcpu);
+}
+
+static void vt_migrate_timers(struct kvm_vcpu *vcpu)
+{
+	vmx_migrate_timers(vcpu);
+}
+
+static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	return vmx_set_virtual_apic_mode(vcpu);
+}
+
+static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+{
+	return vmx_apicv_post_state_restore(vcpu);
+}
+
+static bool vt_check_apicv_inhibit_reasons(ulong bit)
+{
+	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
+			  BIT(APICV_INHIBIT_REASON_HYPERV);
+
+	return supported & BIT(bit);
+}
+
+static void vt_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+	return vmx_hwapic_irr_update(vcpu, max_irr);
+}
+
+static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
+{
+	return vmx_hwapic_isr_update(vcpu, max_isr);
+}
+
+static bool vt_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	return vmx_guest_apic_has_interrupt(vcpu);
+}
+
+static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	return vmx_sync_pir_to_irr(vcpu);
+}
+
+static int vt_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
+{
+	return vmx_deliver_posted_interrupt(vcpu, vector);
+}
+
+static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	return vmx_vcpu_after_set_cpuid(vcpu);
+}
+
+static bool vt_has_emulated_msr(struct kvm *kvm, u32 index)
+{
+	return vmx_has_emulated_msr(index);
+}
+
+static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
+{
+	vmx_msr_filter_changed(vcpu);
+}
+
+static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+	vmx_prepare_switch_to_guest(vcpu);
+}
+
+static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu)
+{
+	update_exception_bitmap(vcpu);
+}
+
+static int vt_get_msr_feature(struct kvm_msr_entry *msr)
+{
+	return vmx_get_msr_feature(msr);
+}
+
+static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	return vmx_get_msr(vcpu, msr_info);
+}
+
+static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	return vmx_get_segment_base(vcpu, seg);
+}
+
+static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	vmx_get_segment(vcpu, var, seg);
+}
+
+static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	vmx_set_segment(vcpu, var, seg);
+}
+
+static int vt_get_cpl(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_cpl(vcpu);
+}
+
+static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
+{
+	vmx_get_cs_db_l_bits(vcpu, db, l);
+}
+
+static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+	vmx_set_cr0(vcpu, cr0);
+}
+
+static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
+			    int pgd_level)
+{
+	vmx_load_mmu_pgd(vcpu, pgd, pgd_level);
+}
+
+static int vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	return vmx_set_cr4(vcpu, cr4);
+}
+
+static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+	return vmx_set_efer(vcpu, efer);
+}
+
+static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	vmx_get_idt(vcpu, dt);
+}
+
+static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	vmx_set_idt(vcpu, dt);
+}
+
+static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	vmx_get_gdt(vcpu, dt);
+}
+
+static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	vmx_set_gdt(vcpu, dt);
+}
+
+static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	vmx_set_dr7(vcpu, val);
+}
+
+static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
+{
+	vmx_sync_dirty_debug_regs(vcpu);
+}
+
+static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	vmx_cache_reg(vcpu, reg);
+}
+
+static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_rflags(vcpu);
+}
+
+static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	vmx_set_rflags(vcpu, rflags);
+}
+
+static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+	vmx_flush_tlb_all(vcpu);
+}
+
+static void vt_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	vmx_flush_tlb_current(vcpu);
+}
+
+static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
+{
+	vmx_flush_tlb_gva(vcpu, addr);
+}
+
+static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	vmx_flush_tlb_guest(vcpu);
+}
+
+static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+{
+	vmx_set_interrupt_shadow(vcpu, mask);
+}
+
+static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_interrupt_shadow(vcpu);
+}
+
+static void vt_patch_hypercall(struct kvm_vcpu *vcpu,
+				  unsigned char *hypercall)
+{
+	vmx_patch_hypercall(vcpu, hypercall);
+}
+
+static void vt_inject_irq(struct kvm_vcpu *vcpu)
+{
+	vmx_inject_irq(vcpu);
+}
+
+static void vt_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	vmx_inject_nmi(vcpu);
+}
+
+static void vt_queue_exception(struct kvm_vcpu *vcpu)
+{
+	vmx_queue_exception(vcpu);
+}
+
+static void vt_cancel_injection(struct kvm_vcpu *vcpu)
+{
+	vmx_cancel_injection(vcpu);
+}
+
+static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	return vmx_interrupt_allowed(vcpu, for_injection);
+}
+
+static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	return vmx_nmi_allowed(vcpu, for_injection);
+}
+
+static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_nmi_mask(vcpu);
+}
+
+static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+	vmx_set_nmi_mask(vcpu, masked);
+}
+
+static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+	enable_nmi_window(vcpu);
+}
+
+static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
+{
+	enable_irq_window(vcpu);
+}
+
+static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
+{
+	update_cr8_intercept(vcpu, tpr, irr);
+}
+
+static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+{
+	vmx_set_apic_access_page_addr(vcpu);
+}
+
+static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+	vmx_refresh_apicv_exec_ctrl(vcpu);
+}
+
+static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+{
+	vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap);
+}
+
+static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr)
+{
+	return vmx_set_tss_addr(kvm, addr);
+}
+
+static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
+{
+	return vmx_set_identity_map_addr(kvm, ident_addr);
+}
+
+static u64 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+{
+	return vmx_get_mt_mask(vcpu, gfn, is_mmio);
+}
+
+static void vt_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
+			     u32 *intr_info, u32 *error_code)
+{
+
+	return vmx_get_exit_info(vcpu, info1, info2, intr_info, error_code);
+}
+
+static u64 vt_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+{
+	return vmx_write_l1_tsc_offset(vcpu, offset);
+}
+
+static void vt_request_immediate_exit(struct kvm_vcpu *vcpu)
+{
+	vmx_request_immediate_exit(vcpu);
+}
+
+static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+	vmx_sched_in(vcpu, cpu);
+}
+
+static void vt_slot_enable_log_dirty(struct kvm *kvm,
+					struct kvm_memory_slot *slot)
+{
+	vmx_slot_enable_log_dirty(kvm, slot);
+}
+
+static void vt_slot_disable_log_dirty(struct kvm *kvm,
+					 struct kvm_memory_slot *slot)
+{
+	vmx_slot_disable_log_dirty(kvm, slot);
+}
+
+static void vt_flush_log_dirty(struct kvm *kvm)
+{
+	vmx_flush_log_dirty(kvm);
+}
+
+static void vt_enable_log_dirty_pt_masked(struct kvm *kvm,
+					     struct kvm_memory_slot *memslot,
+					     gfn_t offset, unsigned long mask)
+{
+	vmx_enable_log_dirty_pt_masked(kvm, memslot, offset, mask);
+}
+
+static int vt_pre_block(struct kvm_vcpu *vcpu)
+{
+	if (pi_pre_block(vcpu))
+		return 1;
+
+	return vmx_pre_block(vcpu);
+}
+
+static void vt_post_block(struct kvm_vcpu *vcpu)
+{
+	vmx_post_block(vcpu);
+
+	pi_post_block(vcpu);
+}
+
+
+#ifdef CONFIG_X86_64
+static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+			      bool *expired)
+{
+	return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired);
+}
+
+static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
+{
+	vmx_cancel_hv_timer(vcpu);
+}
+#endif
+
+static void vt_setup_mce(struct kvm_vcpu *vcpu)
+{
+	vmx_setup_mce(vcpu);
+}
+
+static struct kvm_x86_ops vt_x86_ops __initdata = {
+	.hardware_unsetup = hardware_unsetup,
+
+	.hardware_enable = vt_hardware_enable,
+	.hardware_disable = vt_hardware_disable,
+	.cpu_has_accelerated_tpr = vt_cpu_has_accelerated_tpr,
+	.has_emulated_msr = vt_has_emulated_msr,
+
+	.is_vm_type_supported = vt_is_vm_type_supported,
+	.vm_size = sizeof(struct kvm_vmx),
+	.vm_init = vt_vm_init,
+	.vm_teardown = vt_vm_teardown,
+	.vm_destroy = vt_vm_destroy,
+
+	.vcpu_create = vt_vcpu_create,
+	.vcpu_free = vt_vcpu_free,
+	.vcpu_reset = vt_vcpu_reset,
+
+	.prepare_guest_switch = vt_prepare_switch_to_guest,
+	.vcpu_load = vt_vcpu_load,
+	.vcpu_put = vt_vcpu_put,
+
+	.update_exception_bitmap = vt_update_exception_bitmap,
+	.get_msr_feature = vt_get_msr_feature,
+	.get_msr = vt_get_msr,
+	.set_msr = vt_set_msr,
+	.get_segment_base = vt_get_segment_base,
+	.get_segment = vt_get_segment,
+	.set_segment = vt_set_segment,
+	.get_cpl = vt_get_cpl,
+	.get_cs_db_l_bits = vt_get_cs_db_l_bits,
+	.set_cr0 = vt_set_cr0,
+	.set_cr4 = vt_set_cr4,
+	.set_efer = vt_set_efer,
+	.get_idt = vt_get_idt,
+	.set_idt = vt_set_idt,
+	.get_gdt = vt_get_gdt,
+	.set_gdt = vt_set_gdt,
+	.set_dr7 = vt_set_dr7,
+	.sync_dirty_debug_regs = vt_sync_dirty_debug_regs,
+	.cache_reg = vt_cache_reg,
+	.get_rflags = vt_get_rflags,
+	.set_rflags = vt_set_rflags,
+
+	.tlb_flush_all = vt_flush_tlb_all,
+	.tlb_flush_current = vt_flush_tlb_current,
+	.tlb_flush_gva = vt_flush_tlb_gva,
+	.tlb_flush_guest = vt_flush_tlb_guest,
+
+	.run = vt_vcpu_run,
+	.handle_exit = vt_handle_exit,
+	.skip_emulated_instruction = vt_skip_emulated_instruction,
+	.update_emulated_instruction = vt_update_emulated_instruction,
+	.set_interrupt_shadow = vt_set_interrupt_shadow,
+	.get_interrupt_shadow = vt_get_interrupt_shadow,
+	.patch_hypercall = vt_patch_hypercall,
+	.set_irq = vt_inject_irq,
+	.set_nmi = vt_inject_nmi,
+	.queue_exception = vt_queue_exception,
+	.cancel_injection = vt_cancel_injection,
+	.interrupt_allowed = vt_interrupt_allowed,
+	.nmi_allowed = vt_nmi_allowed,
+	.get_nmi_mask = vt_get_nmi_mask,
+	.set_nmi_mask = vt_set_nmi_mask,
+	.enable_nmi_window = vt_enable_nmi_window,
+	.enable_irq_window = vt_enable_irq_window,
+	.update_cr8_intercept = vt_update_cr8_intercept,
+	.set_virtual_apic_mode = vt_set_virtual_apic_mode,
+	.set_apic_access_page_addr = vt_set_apic_access_page_addr,
+	.refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl,
+	.load_eoi_exitmap = vt_load_eoi_exitmap,
+	.apicv_post_state_restore = vt_apicv_post_state_restore,
+	.check_apicv_inhibit_reasons = vt_check_apicv_inhibit_reasons,
+	.hwapic_irr_update = vt_hwapic_irr_update,
+	.hwapic_isr_update = vt_hwapic_isr_update,
+	.guest_apic_has_interrupt = vt_guest_apic_has_interrupt,
+	.sync_pir_to_irr = vt_sync_pir_to_irr,
+	.deliver_posted_interrupt = vt_deliver_posted_interrupt,
+	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
+
+	.set_tss_addr = vt_set_tss_addr,
+	.set_identity_map_addr = vt_set_identity_map_addr,
+	.get_mt_mask = vt_get_mt_mask,
+
+	.get_exit_info = vt_get_exit_info,
+
+	.vcpu_after_set_cpuid = vt_vcpu_after_set_cpuid,
+
+	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
+
+	.write_l1_tsc_offset = vt_write_l1_tsc_offset,
+
+	.load_mmu_pgd = vt_load_mmu_pgd,
+
+	.check_intercept = vt_check_intercept,
+	.handle_exit_irqoff = vt_handle_exit_irqoff,
+
+	.request_immediate_exit = vt_request_immediate_exit,
+
+	.sched_in = vt_sched_in,
+
+	.slot_enable_log_dirty = vt_slot_enable_log_dirty,
+	.slot_disable_log_dirty = vt_slot_disable_log_dirty,
+	.flush_log_dirty = vt_flush_log_dirty,
+	.enable_log_dirty_pt_masked = vt_enable_log_dirty_pt_masked,
+
+	.pre_block = vt_pre_block,
+	.post_block = vt_post_block,
+
+	.pmu_ops = &intel_pmu_ops,
+	.nested_ops = &vmx_nested_ops,
+
+	.update_pi_irte = pi_update_irte,
+
+#ifdef CONFIG_X86_64
+	.set_hv_timer = vt_set_hv_timer,
+	.cancel_hv_timer = vt_cancel_hv_timer,
+#endif
+
+	.setup_mce = vt_setup_mce,
+
+	.smi_allowed = vt_smi_allowed,
+	.pre_enter_smm = vt_pre_enter_smm,
+	.pre_leave_smm = vt_pre_leave_smm,
+	.enable_smi_window = vt_enable_smi_window,
+
+	.can_emulate_instruction = vt_can_emulate_instruction,
+	.apic_init_signal_blocked = vt_apic_init_signal_blocked,
+	.migrate_timers = vt_migrate_timers,
+
+	.msr_filter_changed = vt_msr_filter_changed,
+};
+
+static struct kvm_x86_init_ops vt_init_ops __initdata = {
+	.cpu_has_kvm_support = vt_cpu_has_kvm_support,
+	.disabled_by_bios = vt_disabled_by_bios,
+	.check_processor_compatibility = vt_check_processor_compatibility,
+	.hardware_setup = vt_hardware_setup,
+
+	.runtime_ops = &vt_x86_ops,
+};
+
+static int __init vt_init(void)
+{
+	unsigned int vcpu_size = 0, vcpu_align = 0;
+	int r;
+
+	vmx_pre_kvm_init(&vcpu_size, &vcpu_align, &vt_x86_ops);
+
+	r = kvm_init(&vt_init_ops, vcpu_size, vcpu_align, THIS_MODULE);
+	if (r)
+		goto err_vmx_post_exit;
+
+	r = vmx_init();
+	if (r)
+		goto err_kvm_exit;
+
+	return 0;
+
+err_kvm_exit:
+	kvm_exit();
+err_vmx_post_exit:
+	vmx_post_kvm_exit();
+	return r;
+}
+module_init(vt_init);
+
+static void vt_exit(void)
+{
+	vmx_exit();
+	kvm_exit();
+	vmx_post_kvm_exit();
+}
+module_exit(vt_exit);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0dad9d1816b0..966d48eada40 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2251,11 +2251,6 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 	}
 }
 
-static __init int cpu_has_kvm_support(void)
-{
-	return cpu_has_vmx();
-}
-
 static __init int vmx_disabled_by_bios(void)
 {
 	return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
@@ -6338,7 +6333,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
 
-static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
+static bool vmx_has_emulated_msr(u32 index)
 {
 	switch (index) {
 	case MSR_IA32_SMBASE:
@@ -6899,11 +6894,6 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 	return err;
 }
 
-static bool vmx_is_vm_type_supported(unsigned long type)
-{
-	return type == KVM_X86_LEGACY_VM;
-}
-
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
 
@@ -6950,16 +6940,6 @@ static int vmx_vm_init(struct kvm *kvm)
 	return 0;
 }
 
-static void vmx_vm_teardown(struct kvm *kvm)
-{
-
-}
-
-static void vmx_vm_destroy(struct kvm *kvm)
-{
-
-}
-
 static int __init vmx_check_processor_compat(void)
 {
 	struct vmcs_config vmcs_conf;
@@ -7445,9 +7425,6 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
 
 static int vmx_pre_block(struct kvm_vcpu *vcpu)
 {
-	if (pi_pre_block(vcpu))
-		return 1;
-
 	if (kvm_lapic_hv_timer_in_use(vcpu))
 		kvm_lapic_switch_to_sw_timer(vcpu);
 
@@ -7458,8 +7435,6 @@ static void vmx_post_block(struct kvm_vcpu *vcpu)
 {
 	if (kvm_x86_ops.set_hv_timer)
 		kvm_lapic_switch_to_hv_timer(vcpu);
-
-	pi_post_block(vcpu);
 }
 
 static void vmx_setup_mce(struct kvm_vcpu *vcpu)
@@ -7514,11 +7489,6 @@ static int vmx_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 	return 0;
 }
 
-static void enable_smi_window(struct kvm_vcpu *vcpu)
-{
-	/* RSM will cause a vmexit anyway.  */
-}
-
 static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 {
 	return to_vmx(vcpu)->nested.vmxon;
@@ -7542,148 +7512,7 @@ static void hardware_unsetup(void)
 	free_kvm_area();
 }
 
-static bool vmx_check_apicv_inhibit_reasons(ulong bit)
-{
-	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
-			  BIT(APICV_INHIBIT_REASON_HYPERV);
-
-	return supported & BIT(bit);
-}
-
-static struct kvm_x86_ops vmx_x86_ops __initdata = {
-	.hardware_unsetup = hardware_unsetup,
-
-	.hardware_enable = hardware_enable,
-	.hardware_disable = hardware_disable,
-	.cpu_has_accelerated_tpr = report_flexpriority,
-	.has_emulated_msr = vmx_has_emulated_msr,
-
-	.is_vm_type_supported = vmx_is_vm_type_supported,
-	.vm_size = sizeof(struct kvm_vmx),
-	.vm_init = vmx_vm_init,
-	.vm_teardown = vmx_vm_teardown,
-	.vm_destroy = vmx_vm_destroy,
-
-	.vcpu_create = vmx_create_vcpu,
-	.vcpu_free = vmx_free_vcpu,
-	.vcpu_reset = vmx_vcpu_reset,
-
-	.prepare_guest_switch = vmx_prepare_switch_to_guest,
-	.vcpu_load = vmx_vcpu_load,
-	.vcpu_put = vmx_vcpu_put,
-
-	.update_exception_bitmap = update_exception_bitmap,
-	.get_msr_feature = vmx_get_msr_feature,
-	.get_msr = vmx_get_msr,
-	.set_msr = vmx_set_msr,
-	.get_segment_base = vmx_get_segment_base,
-	.get_segment = vmx_get_segment,
-	.set_segment = vmx_set_segment,
-	.get_cpl = vmx_get_cpl,
-	.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
-	.set_cr0 = vmx_set_cr0,
-	.set_cr4 = vmx_set_cr4,
-	.set_efer = vmx_set_efer,
-	.get_idt = vmx_get_idt,
-	.set_idt = vmx_set_idt,
-	.get_gdt = vmx_get_gdt,
-	.set_gdt = vmx_set_gdt,
-	.set_dr7 = vmx_set_dr7,
-	.sync_dirty_debug_regs = vmx_sync_dirty_debug_regs,
-	.cache_reg = vmx_cache_reg,
-	.get_rflags = vmx_get_rflags,
-	.set_rflags = vmx_set_rflags,
-
-	.tlb_flush_all = vmx_flush_tlb_all,
-	.tlb_flush_current = vmx_flush_tlb_current,
-	.tlb_flush_gva = vmx_flush_tlb_gva,
-	.tlb_flush_guest = vmx_flush_tlb_guest,
-
-	.run = vmx_vcpu_run,
-	.handle_exit = vmx_handle_exit,
-	.skip_emulated_instruction = vmx_skip_emulated_instruction,
-	.update_emulated_instruction = vmx_update_emulated_instruction,
-	.set_interrupt_shadow = vmx_set_interrupt_shadow,
-	.get_interrupt_shadow = vmx_get_interrupt_shadow,
-	.patch_hypercall = vmx_patch_hypercall,
-	.set_irq = vmx_inject_irq,
-	.set_nmi = vmx_inject_nmi,
-	.queue_exception = vmx_queue_exception,
-	.cancel_injection = vmx_cancel_injection,
-	.interrupt_allowed = vmx_interrupt_allowed,
-	.nmi_allowed = vmx_nmi_allowed,
-	.get_nmi_mask = vmx_get_nmi_mask,
-	.set_nmi_mask = vmx_set_nmi_mask,
-	.enable_nmi_window = enable_nmi_window,
-	.enable_irq_window = enable_irq_window,
-	.update_cr8_intercept = update_cr8_intercept,
-	.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
-	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
-	.load_eoi_exitmap = vmx_load_eoi_exitmap,
-	.apicv_post_state_restore = vmx_apicv_post_state_restore,
-	.check_apicv_inhibit_reasons = vmx_check_apicv_inhibit_reasons,
-	.hwapic_irr_update = vmx_hwapic_irr_update,
-	.hwapic_isr_update = vmx_hwapic_isr_update,
-	.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
-	.sync_pir_to_irr = vmx_sync_pir_to_irr,
-	.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
-	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
-
-	.set_tss_addr = vmx_set_tss_addr,
-	.set_identity_map_addr = vmx_set_identity_map_addr,
-	.get_mt_mask = vmx_get_mt_mask,
-
-	.get_exit_info = vmx_get_exit_info,
-
-	.vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid,
-
-	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
-
-	.write_l1_tsc_offset = vmx_write_l1_tsc_offset,
-
-	.load_mmu_pgd = vmx_load_mmu_pgd,
-
-	.check_intercept = vmx_check_intercept,
-	.handle_exit_irqoff = vmx_handle_exit_irqoff,
-
-	.request_immediate_exit = vmx_request_immediate_exit,
-
-	.sched_in = vmx_sched_in,
-
-	.slot_enable_log_dirty = vmx_slot_enable_log_dirty,
-	.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
-	.flush_log_dirty = vmx_flush_log_dirty,
-	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
-
-	.pre_block = vmx_pre_block,
-	.post_block = vmx_post_block,
-
-	.pmu_ops = &intel_pmu_ops,
-	.nested_ops = &vmx_nested_ops,
-
-	.update_pi_irte = pi_update_irte,
-
-#ifdef CONFIG_X86_64
-	.set_hv_timer = vmx_set_hv_timer,
-	.cancel_hv_timer = vmx_cancel_hv_timer,
-#endif
-
-	.setup_mce = vmx_setup_mce,
-
-	.smi_allowed = vmx_smi_allowed,
-	.pre_enter_smm = vmx_pre_enter_smm,
-	.pre_leave_smm = vmx_pre_leave_smm,
-	.enable_smi_window = enable_smi_window,
-
-	.can_emulate_instruction = vmx_can_emulate_instruction,
-	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
-	.migrate_timers = vmx_migrate_timers,
-
-	.msr_filter_changed = vmx_msr_filter_changed,
-};
-
-static __init int hardware_setup(void)
+static __init int hardware_setup(struct kvm_x86_ops *x86_ops)
 {
 	unsigned long host_bndcfgs;
 	struct desc_ptr dt;
@@ -7738,16 +7567,16 @@ static __init int hardware_setup(void)
 	 * using the APIC_ACCESS_ADDR VMCS field.
 	 */
 	if (!flexpriority_enabled)
-		vmx_x86_ops.set_apic_access_page_addr = NULL;
+		x86_ops->set_apic_access_page_addr = NULL;
 
 	if (!cpu_has_vmx_tpr_shadow())
-		vmx_x86_ops.update_cr8_intercept = NULL;
+		x86_ops->update_cr8_intercept = NULL;
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
 	    && enable_ept) {
-		vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
-		vmx_x86_ops.tlb_remote_flush_with_range =
+		x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+		x86_ops->tlb_remote_flush_with_range =
 				hv_remote_flush_tlb_with_range;
 	}
 #endif
@@ -7762,7 +7591,7 @@ static __init int hardware_setup(void)
 
 	if (!cpu_has_vmx_apicv()) {
 		enable_apicv = 0;
-		vmx_x86_ops.sync_pir_to_irr = NULL;
+		x86_ops->sync_pir_to_irr = NULL;
 	}
 
 	if (cpu_has_vmx_tsc_scaling()) {
@@ -7794,10 +7623,10 @@ static __init int hardware_setup(void)
 		enable_pml = 0;
 
 	if (!enable_pml) {
-		vmx_x86_ops.slot_enable_log_dirty = NULL;
-		vmx_x86_ops.slot_disable_log_dirty = NULL;
-		vmx_x86_ops.flush_log_dirty = NULL;
-		vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
+		x86_ops->slot_enable_log_dirty = NULL;
+		x86_ops->slot_disable_log_dirty = NULL;
+		x86_ops->flush_log_dirty = NULL;
+		x86_ops->enable_log_dirty_pt_masked = NULL;
 	}
 
 	if (!cpu_has_vmx_preemption_timer())
@@ -7825,9 +7654,9 @@ static __init int hardware_setup(void)
 	}
 
 	if (!enable_preemption_timer) {
-		vmx_x86_ops.set_hv_timer = NULL;
-		vmx_x86_ops.cancel_hv_timer = NULL;
-		vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
+		x86_ops->set_hv_timer = NULL;
+		x86_ops->cancel_hv_timer = NULL;
+		x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
 	}
 
 	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
@@ -7856,15 +7685,6 @@ static __init int hardware_setup(void)
 	return r;
 }
 
-static struct kvm_x86_init_ops vmx_init_ops __initdata = {
-	.cpu_has_kvm_support = cpu_has_kvm_support,
-	.disabled_by_bios = vmx_disabled_by_bios,
-	.check_processor_compatibility = vmx_check_processor_compat,
-	.hardware_setup = hardware_setup,
-
-	.runtime_ops = &vmx_x86_ops,
-};
-
 static void vmx_cleanup_l1d_flush(void)
 {
 	if (vmx_l1d_flush_pages) {
@@ -7875,45 +7695,14 @@ static void vmx_cleanup_l1d_flush(void)
 	l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
 }
 
-static void vmx_exit(void)
+static void __init vmx_pre_kvm_init(unsigned int *vcpu_size,
+				    unsigned int *vcpu_align,
+				    struct kvm_x86_ops *x86_ops)
 {
-#ifdef CONFIG_KEXEC_CORE
-	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
-	synchronize_rcu();
-#endif
-
-	kvm_exit();
-
-#if IS_ENABLED(CONFIG_HYPERV)
-	if (static_branch_unlikely(&enable_evmcs)) {
-		int cpu;
-		struct hv_vp_assist_page *vp_ap;
-		/*
-		 * Reset everything to support using non-enlightened VMCS
-		 * access later (e.g. when we reload the module with
-		 * enlightened_vmcs=0)
-		 */
-		for_each_online_cpu(cpu) {
-			vp_ap =	hv_get_vp_assist_page(cpu);
-
-			if (!vp_ap)
-				continue;
-
-			vp_ap->nested_control.features.directhypercall = 0;
-			vp_ap->current_nested_vmcs = 0;
-			vp_ap->enlighten_vmentry = 0;
-		}
-
-		static_branch_disable(&enable_evmcs);
-	}
-#endif
-	vmx_cleanup_l1d_flush();
-}
-module_exit(vmx_exit);
-
-static int __init vmx_init(void)
-{
-	int r, cpu;
+	if (sizeof(struct vcpu_vmx) > *vcpu_size)
+		*vcpu_size = sizeof(struct vcpu_vmx);
+	if (__alignof__(struct vcpu_vmx) > *vcpu_align)
+		*vcpu_align = __alignof__(struct vcpu_vmx);
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	/*
@@ -7941,18 +7730,45 @@ static int __init vmx_init(void)
 		}
 
 		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
-			vmx_x86_ops.enable_direct_tlbflush
+			x86_ops->enable_direct_tlbflush
 				= hv_enable_direct_tlbflush;
 
 	} else {
 		enlightened_vmcs = false;
 	}
 #endif
+}
 
-	r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
-		     __alignof__(struct vcpu_vmx), THIS_MODULE);
-	if (r)
-		return r;
+static void vmx_post_kvm_exit(void)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (static_branch_unlikely(&enable_evmcs)) {
+		int cpu;
+		struct hv_vp_assist_page *vp_ap;
+		/*
+		 * Reset everything to support using non-enlightened VMCS
+		 * access later (e.g. when we reload the module with
+		 * enlightened_vmcs=0)
+		 */
+		for_each_online_cpu(cpu) {
+			vp_ap =	hv_get_vp_assist_page(cpu);
+
+			if (!vp_ap)
+				continue;
+
+			vp_ap->nested_control.features.directhypercall = 0;
+			vp_ap->current_nested_vmcs = 0;
+			vp_ap->enlighten_vmentry = 0;
+		}
+
+		static_branch_disable(&enable_evmcs);
+	}
+#endif
+}
+
+static int __init vmx_init(void)
+{
+	int r, cpu;
 
 	/*
 	 * Must be called after kvm_init() so enable_ept is properly set
@@ -7962,10 +7778,8 @@ static int __init vmx_init(void)
 	 * mitigation mode.
 	 */
 	r = vmx_setup_l1d_flush(vmentry_l1d_flush_param);
-	if (r) {
-		vmx_exit();
+	if (r)
 		return r;
-	}
 
 	for_each_possible_cpu(cpu) {
 		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
@@ -7989,4 +7803,12 @@ static int __init vmx_init(void)
 
 	return 0;
 }
-module_init(vmx_init);
+
+static void vmx_exit(void)
+{
+#ifdef CONFIG_KEXEC_CORE
+	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
+	synchronize_rcu();
+#endif
+	vmx_cleanup_l1d_flush();
+}

From patchwork Mon Nov 16 18:26:35 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910371
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E1BC5C2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:11 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9B7A420756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388259AbgKPS2S (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:18 -0500
Received: from mga02.intel.com ([134.134.136.20]:48445 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388286AbgKPS2R (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:17 -0500
IronPort-SDR: 
 o6Oy/oSOe80uoASTaoq2hYXUv7to8oL3KnTDo9VGT/7EqR34aCLdbteTAIKCcyJbFSVCT5E2es
 gqrGAPZ/kevA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819179"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819179"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:15 -0800
IronPort-SDR: 
 two99wgknky1TIpk7qAH9FAMPhBzehL+u6OEXQXm11sneW92gZzgirhbmbWAtDroe0mwdh9X0D
 WClcJEV4St9w==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528251"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:15 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 50/67] KVM: VMX: Move setting of EPT MMU masks to common
 VT-x code
Date: Mon, 16 Nov 2020 10:26:35 -0800
Message-Id: 
 <2c648e2f7fb9debcef370fcce64219b9a065727f.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/main.c | 17 +++++++++++++++++
 arch/x86/kvm/vmx/vmx.c  | 13 -------------
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 85bc238c0852..52e7a9d25e9c 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -26,6 +26,20 @@ static int __init vt_check_processor_compatibility(void)
 	return 0;
 }
 
+static __init void vt_set_ept_masks(void)
+{
+	const u64 u_mask = VMX_EPT_READABLE_MASK;
+	const u64 a_mask = enable_ept_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull;
+	const u64 d_mask = enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull;
+	const u64 p_mask = cpu_has_vmx_ept_execute_only() ? 0ull :
+							    VMX_EPT_READABLE_MASK;
+	const u64 x_mask = VMX_EPT_EXECUTABLE_MASK;
+	const u64 nx_mask = 0ull;
+
+	kvm_mmu_set_mask_ptes(u_mask, a_mask, d_mask, nx_mask, x_mask, p_mask,
+			      VMX_EPT_RWX_MASK, 0ull);
+}
+
 static __init int vt_hardware_setup(void)
 {
 	int ret;
@@ -34,6 +48,9 @@ static __init int vt_hardware_setup(void)
 	if (ret)
 		return ret;
 
+	if (enable_ept)
+		vt_set_ept_masks();
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 966d48eada40..f6b2ddff58e1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5411,16 +5411,6 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void vmx_enable_tdp(void)
-{
-	kvm_mmu_set_mask_ptes(VMX_EPT_READABLE_MASK,
-		enable_ept_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull,
-		enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
-		0ull, VMX_EPT_EXECUTABLE_MASK,
-		cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
-		VMX_EPT_RWX_MASK, 0ull);
-}
-
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
@@ -7602,9 +7592,6 @@ static __init int hardware_setup(struct kvm_x86_ops *x86_ops)
 
 	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
 
-	if (enable_ept)
-		vmx_enable_tdp();
-
 	if (!enable_ept)
 		ept_lpage_level = 0;
 	else if (cpu_has_vmx_ept_1g_page())

From patchwork Mon Nov 16 18:26:36 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910335
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 76133C61DD8
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:53 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 3890622453
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388309AbgKPS2T (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:19 -0500
Received: from mga02.intel.com ([134.134.136.20]:48445 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388288AbgKPS2R (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:17 -0500
IronPort-SDR: 
 Bd/d/zA8HEltzwF3/gwdHvQ1M89TiZ5gNiZ+HooTzRF7/Viw1W+nH61ldtXoYaAsh8Verlkwpt
 IWqNyL58ozPA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819182"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819182"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:16 -0800
IronPort-SDR: 
 CQUW8HacZbxTSsGpxKVPH/IccyLvj2IEb1u1dKSnjmwAF/uppcw7UKZzOu6uZICDCyTHdyk/cU
 PcXYCZHY+fhQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528262"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:16 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 51/67] KVM: VMX: Move register caching logic to common
 code
Date: Mon, 16 Nov 2020 10:26:36 -0800
Message-Id: 
 <df0336877136fb9c82728e1c6e2b65124bfdf7c9.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the guts of vmx_cache_reg() to vt_cache_reg() in preparation for
reusing the bulk of the code for TDX, which can access guest state for
debug TDs.

Use kvm_x86_ops.cache_reg() in ept_update_paging_mode_cr0() rather than
trying to expose vt_cache_reg() to vmx.c, even though it means taking a
retpoline.  The code runs if and only if EPT is enabled but unrestricted
guest.  Only one generation of CPU, Nehalem, supports EPT but not
unrestricted guest, and disabling unrestricted guest without also
disabling EPT is, to put it bluntly, dumb.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/main.c | 37 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c  | 42 +----------------------------------------
 2 files changed, 37 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 52e7a9d25e9c..30b1815fd5a7 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -347,7 +347,42 @@ static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 
 static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 {
-	vmx_cache_reg(vcpu, reg);
+	unsigned long guest_owned_bits;
+
+	kvm_register_mark_available(vcpu, reg);
+
+	switch (reg) {
+	case VCPU_REGS_RSP:
+		vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP);
+		break;
+	case VCPU_REGS_RIP:
+		vcpu->arch.regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP);
+		break;
+	case VCPU_EXREG_PDPTR:
+		if (enable_ept)
+			ept_save_pdptrs(vcpu);
+		break;
+	case VCPU_EXREG_CR0:
+		guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
+
+		vcpu->arch.cr0 &= ~guest_owned_bits;
+		vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & guest_owned_bits;
+		break;
+	case VCPU_EXREG_CR3:
+		if (is_unrestricted_guest(vcpu) ||
+		    (enable_ept && is_paging(vcpu)))
+			vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
+		break;
+	case VCPU_EXREG_CR4:
+		guest_owned_bits = vcpu->arch.cr4_guest_owned_bits;
+
+		vcpu->arch.cr4 &= ~guest_owned_bits;
+		vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits;
+		break;
+	default:
+		KVM_BUG_ON(1, vcpu->kvm);
+		break;
+	}
 }
 
 static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f6b2ddff58e1..85401a7eef9a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2211,46 +2211,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return ret;
 }
 
-static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
-{
-	unsigned long guest_owned_bits;
-
-	kvm_register_mark_available(vcpu, reg);
-
-	switch (reg) {
-	case VCPU_REGS_RSP:
-		vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP);
-		break;
-	case VCPU_REGS_RIP:
-		vcpu->arch.regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP);
-		break;
-	case VCPU_EXREG_PDPTR:
-		if (enable_ept)
-			ept_save_pdptrs(vcpu);
-		break;
-	case VCPU_EXREG_CR0:
-		guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
-
-		vcpu->arch.cr0 &= ~guest_owned_bits;
-		vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & guest_owned_bits;
-		break;
-	case VCPU_EXREG_CR3:
-		if (is_unrestricted_guest(vcpu) ||
-		    (enable_ept && is_paging(vcpu)))
-			vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
-		break;
-	case VCPU_EXREG_CR4:
-		guest_owned_bits = vcpu->arch.cr4_guest_owned_bits;
-
-		vcpu->arch.cr4 &= ~guest_owned_bits;
-		vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits;
-		break;
-	default:
-		KVM_BUG_ON(1, vcpu->kvm);
-		break;
-	}
-}
-
 static __init int vmx_disabled_by_bios(void)
 {
 	return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
@@ -2976,7 +2936,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
-		vmx_cache_reg(vcpu, VCPU_EXREG_CR3);
+		kvm_x86_ops.cache_reg(vcpu, VCPU_EXREG_CR3);
 	if (!(cr0 & X86_CR0_PG)) {
 		/* From paging/starting to nonpaging */
 		exec_controls_setbit(vmx, CPU_BASED_CR3_LOAD_EXITING |

From patchwork Mon Nov 16 18:26:37 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910351
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5E986C61DD8
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 1D948206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388315AbgKPS2V (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:21 -0500
Received: from mga02.intel.com ([134.134.136.20]:48448 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388290AbgKPS2R (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:17 -0500
IronPort-SDR: 
 9WHRGbJIanAZq3wqTyGIi2bhtDJ7Xar8K+Z/zuuHD7yayULSgZnm//ZageyB75Lvlpm1v0EALo
 dZwveqNu81uA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819185"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819185"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:16 -0800
IronPort-SDR: 
 bnuKa9p1R8uOIhswaGJbfNd/NrKTgOizES9qjo31UtncRLKA11Mjp8IBqDp4M4D0hjOTaZgPqP
 Hl41gUKEfFHw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528274"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:16 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 52/67] KVM: TDX: Add TDX "architectural" error codes
Date: Mon, 16 Nov 2020 10:26:37 -0800
Message-Id: 
 <c9da3098d2f2e18b0af55523241c689cf83f037f.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX-SEAM uses bits 31:0 to return more information, so these error codes
will only exactly match RAX[63:32].

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/tdx_errno.h | 91 ++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h

diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
new file mode 100644
index 000000000000..802ddc169d58
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_TDX_ERRNO_H
+#define __KVM_X86_TDX_ERRNO_H
+
+/*
+ * TDX SEAMCALL Status Codes (returned in RAX)
+ */
+#define TDX_SUCCESS				0x0000000000000000
+#define TDX_NON_RECOVERABLE_VCPU		0x4000000100000000
+#define TDX_NON_RECOVERABLE_TD			0x4000000200000000
+#define TDX_INTERRUPTED_RESUMABLE		0x8000000300000000
+#define TDX_INTERRUPTED_RESTARTABLE		0x8000000400000000
+#define TDX_OPERAND_INVALID			0xC000010000000000
+#define TDX_OPERAND_ADDR_RANGE_ERROR		0xC000010100000000
+#define TDX_OPERAND_BUSY			0x8000020000000000
+#define TDX_PREVIOUS_TLB_EPOCH_BUSY		0x8000020100000000
+#define TDX_SYS_BUSY				0x8000020200000000
+#define TDX_OPERAND_PAGE_METADATA_INCORRECT	0xC000030000000000
+#define TDX_PAGE_ALREADY_FREE			0x0000030100000000
+#define TDX_TD_ASSOCIATED_PAGES_EXIST		0xC000040000000000
+#define TDX_SYSINIT_NOT_PENDING			0xC000050000000000
+#define TDX_SYSINIT_NOT_DONE			0xC000050100000000
+#define TDX_SYSINITLP_NOT_DONE			0xC000050200000000
+#define TDX_SYSINITLP_DONE			0xC000050300000000
+#define TDX_SYSCONFIGKEY_NOT_DONE		0xC000050400000000
+#define TDX_SYS_NOT_READY			0xC000050500000000
+#define TDX_SYS_SHUTDOWN			0xC000050600000000
+#define TDX_SYSCONFIG_NOT_DONE			0xC000050700000000
+#define TDX_TD_NOT_INITIALIZED			0xC000060000000000
+#define TDX_TD_INITIALIZED			0xC000060100000000
+#define TDX_TD_NOT_FINALIZED			0xC000060200000000
+#define TDX_TD_FINALIZED			0xC000060300000000
+#define TDX_TD_FATAL				0xC000060400000000
+#define TDX_TD_NON_DEBUG			0xC000060500000000
+#define TDX_TDCX_NUM_INCORRECT			0xC000061000000000
+#define TDX_VCPU_STATE_INCORRECT		0xC000070000000000
+#define TDX_VCPU_ASSOCIATED			0x8000070100000000
+#define TDX_VCPU_NOT_ASSOCIATED			0x8000070200000000
+#define TDX_TDVPX_NUM_INCORRECT			0xC000070300000000
+#define TDX_NO_VALID_VE_INFO			0xC000070400000000
+#define TDX_MAX_VCPUS_EXCEEDED			0xC000070500000000
+#define TDX_TDVPS_FIELD_NOT_WRITABLE		0xC000072000000000
+#define TDX_TDVPS_FIELD_NOT_READABLE		0xC000072100000000
+#define TDX_TD_VMCS_FIELD_NOT_INITIALIZED	0xC000073000000000
+#define TDX_KEY_GENERATION_FAILED		0x8000080000000000
+#define TDX_TD_KEYS_NOT_CONFIGURED		0x8000081000000000
+#define TDX_KEY_STATE_INCORRECT			0xC000081100000000
+#define TDX_KEY_CONFIGURED			0x0000081500000000
+#define TDX_WBCACHE_NOT_COMPLETE		0x8000081700000000
+#define TDX_HKID_NOT_FREE			0xC000082000000000
+#define TDX_NO_HKID_READY_TO_WBCACHE		0x0000082100000000
+#define TDX_WBCACHE_RESUME_ERROR		0xC000082300000000
+#define TDX_FLUSHVP_NOT_DONE			0x8000082400000000
+#define TDX_NUM_ACTIVATED_HKIDS_NOT_SUPPORRTED	0xC000082500000000
+#define TDX_INCORRECT_CPUID_VALUE		0xC000090000000000
+#define TDX_BOOT_NT4_SET			0xC000090100000000
+#define TDX_INCONSISTENT_CPUID_FIELD		0xC000090200000000
+#define TDX_CPUID_LEAF_1F_NOT_SUPPORTED		0xC000090300000000
+#define TDX_CPUID_LEAF_1F_FORMAT_UNRECOGNIZED	0xC000090400000000
+#define TDX_INVALID_WBINVD_SCOPE		0xC000090500000000
+#define TDX_INVALID_PKG_ID			0xC000090600000000
+#define TDX_SMRR_NOT_LOCKED			0xC000091000000000
+#define TDX_INVALID_SMRR_CONFIGURATION		0xC000091100000000
+#define TDX_SMRR_OVERLAPS_CMR			0xC000091200000000
+#define TDX_SMRR_LOCK_NOT_SUPPORTED		0xC000091300000000
+#define TDX_SMRR_NOT_SUPPORTED			0xC000091400000000
+#define TDX_INCONSISTENT_MSR			0xC000092000000000
+#define TDX_INCORRECT_MSR_VALUE			0xC000092100000000
+#define TDX_SEAMREPORT_NOT_AVAILABLE		0xC000093000000000
+#define TDX_INVALID_TDMR			0xC0000A0000000000
+#define TDX_NON_ORDERED_TDMR			0xC0000A0100000000
+#define TDX_TDMR_OUTSIDE_CMRS			0xC0000A0200000000
+#define TDX_TDMR_ALREADY_INITIALIZED		0x00000A0300000000
+#define TDX_INVALID_PAMT			0xC0000A1000000000
+#define TDX_PAMT_OUTSIDE_CMRS			0xC0000A1100000000
+#define TDX_PAMT_OVERLAP			0xC0000A1200000000
+#define TDX_INVALID_RESERVED_IN_TDMR		0xC0000A2000000000
+#define TDX_NON_ORDERED_RESERVED_IN_TDMR	0xC0000A2100000000
+#define TDX_EPT_WALK_FAILED			0xC0000B0000000000
+#define TDX_EPT_ENTRY_FREE			0xC0000B0100000000
+#define TDX_EPT_ENTRY_NOT_FREE			0xC0000B0200000000
+#define TDX_EPT_ENTRY_NOT_PRESENT		0xC0000B0300000000
+#define TDX_EPT_ENTRY_NOT_LEAF			0xC0000B0400000000
+#define TDX_EPT_ENTRY_LEAF			0xC0000B0500000000
+#define TDX_GPA_RANGE_NOT_BLOCKED		0xC0000B0600000000
+#define TDX_GPA_RANGE_ALREADY_BLOCKED		0x00000B0700000000
+#define TDX_TLB_TRACKING_NOT_DONE		0xC0000B0800000000
+#define TDX_EPT_INVALID_PROMOTE_CONDITIONS	0xC0000B0900000000
+#define TDX_PAGE_ALREADY_ACCEPTED		0x00000B0A00000000
+
+#endif /* __KVM_X86_TDX_ERRNO_H */

From patchwork Mon Nov 16 18:26:38 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910359
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 85E1BC64E90
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:41 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 453DB20756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388485AbgKPS3g (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:36 -0500
Received: from mga02.intel.com ([134.134.136.20]:48445 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1731941AbgKPS2S (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:18 -0500
IronPort-SDR: 
 QXOKLmbX/23N4GXPr4JDWrQtUVwsfXpSTg4SX1wy7k8RCYzCADivbaMBvdB+aSaVhcy5GkRjhv
 R77puYbI7zLg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819186"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819186"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:17 -0800
IronPort-SDR: 
 uP+JkfsVTjgEUcMPHXjLS1wTMENYh8D8HmaojHVqwVBavi/ozPBGpzhC3vy5iN083AEwmxYOsU
 lGi8d6efpuQg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528285"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:17 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Kai Huang <kai.huang@linux.intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [RFC PATCH 53/67] KVM: TDX: Add architectural definitions for
 structures and values
Date: Mon, 16 Nov 2020 10:26:38 -0800
Message-Id: 
 <4e6f074f8dcf0e8248870919185539d1f5aa3d62.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Co-developed-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/tdx_arch.h | 230 ++++++++++++++++++++++++++++++++++++
 1 file changed, 230 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h

diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
new file mode 100644
index 000000000000..d13db55e5086
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -0,0 +1,230 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_TDX_ARCH_H
+#define __KVM_X86_TDX_ARCH_H
+
+#include <linux/types.h>
+
+/*
+ * SEAMCALL API function leaf
+ */
+#define SEAMCALL_TDENTER		0
+#define SEAMCALL_TDADDCX		1
+#define SEAMCALL_TDADDPAGE		2
+#define SEAMCALL_TDADDSEPT		3
+#define SEAMCALL_TDADDVPX		4
+#define SEAMCALL_TDASSIGNHKID		5
+#define SEAMCALL_TDAUGPAGE		6
+#define SEAMCALL_TDBLOCK		7
+#define SEAMCALL_TDCONFIGKEY		8
+#define SEAMCALL_TDCREATE		9
+#define SEAMCALL_TDCREATEVP		10
+#define SEAMCALL_TDDBGRD		11
+#define SEAMCALL_TDDBGRDMEM		12
+#define SEAMCALL_TDDBGWR		13
+#define SEAMCALL_TDDBGWRMEM		14
+#define SEAMCALL_TDDEMOTEPAGE		15
+#define SEAMCALL_TDEXTENDMR		16
+#define SEAMCALL_TDFINALIZEMR		17
+#define SEAMCALL_TDFLUSHVP		18
+#define SEAMCALL_TDFLUSHVPDONE		19
+#define SEAMCALL_TDFREEHKIDS		20
+#define SEAMCALL_TDINIT			21
+#define SEAMCALL_TDINITVP		22
+#define SEAMCALL_TDPROMOTEPAGE		23
+#define SEAMCALL_TDRDPAGEMD		24
+#define SEAMCALL_TDRDSEPT		25
+#define SEAMCALL_TDRDVPS		26
+#define SEAMCALL_TDRECLAIMHKIDS		27
+#define SEAMCALL_TDRECLAIMPAGE		28
+#define SEAMCALL_TDREMOVEPAGE		29
+#define SEAMCALL_TDREMOVESEPT		30
+#define SEAMCALL_TDSYSCONFIGKEY		31
+#define SEAMCALL_TDSYSINFO		32
+#define SEAMCALL_TDSYSINIT		33
+
+#define SEAMCALL_TDSYSINITLP		35
+#define SEAMCALL_TDSYSINITTDMR		36
+#define SEAMCALL_TDTEARDOWN		37
+#define SEAMCALL_TDTRACK		38
+#define SEAMCALL_TDUNBLOCK		39
+#define SEAMCALL_TDWBCACHE		40
+#define SEAMCALL_TDWBINVDPAGE		41
+#define SEAMCALL_TDWRSEPT		42
+#define SEAMCALL_TDWRVPS		43
+#define SEAMCALL_TDSYSSHUTDOWNLP	44
+#define SEAMCALL_TDSYSCONFIG		45
+
+#define TDVMCALL_MAP_GPA		0x10001
+#define TDVMCALL_REPORT_FATAL_ERROR	0x10003
+
+/* TDX control structure (TDR/TDCS/TDVPS) field access codes */
+#define TDX_CLASS_SHIFT		56
+#define TDX_FIELD_MASK		GENMASK_ULL(31, 0)
+
+#define BUILD_TDX_FIELD(class, field)	\
+	(((u64)(class) << TDX_CLASS_SHIFT) | ((u64)(field) & TDX_FIELD_MASK))
+
+/* @field is the VMCS field encoding */
+#define TDVPS_VMCS(field)	BUILD_TDX_FIELD(0, (field))
+
+/*
+ * @offset is the offset (in bytes) from the beginning of the architectural
+ * virtual APIC page.
+ */
+#define TDVPS_APIC(offset)	BUILD_TDX_FIELD(1, (offset))
+
+/* @gpr is the index of a general purpose register, e.g. eax=0 */
+#define TDVPS_GPR(gpr)		BUILD_TDX_FIELD(16, (gpr))
+
+#define TDVPS_DR(dr)		BUILD_TDX_FIELD(17, (0 + (dr)))
+
+enum tdx_guest_other_state {
+	TD_VCPU_XCR0 = 32,
+	TD_VCPU_IWK_ENCKEY0 = 64,
+	TD_VCPU_IWK_ENCKEY1,
+	TD_VCPU_IWK_ENCKEY2,
+	TD_VCPU_IWK_ENCKEY3,
+	TD_VCPU_IWK_INTKEY0 = 68,
+	TD_VCPU_IWK_INTKEY1,
+	TD_VCPU_IWK_FLAGS = 70,
+};
+
+/* @field is any of enum tdx_guest_other_state */
+#define TDVPS_STATE(field)	BUILD_TDX_FIELD(17, (field))
+
+/* @msr is the MSR index */
+#define TDVPS_MSR(msr)		BUILD_TDX_FIELD(19, (msr))
+
+/* Management class fields */
+enum tdx_guest_management {
+	TD_VCPU_PEND_NMI = 11,
+};
+
+/* @field is any of enum tdx_guest_management */
+#define TDVPS_MANAGEMENT(field)	BUILD_TDX_FIELD(32, (field))
+
+#define TDX1_NR_TDCX_PAGES		4
+#define TDX1_NR_TDVPX_PAGES		5
+
+#define TDX1_MAX_NR_CPUID_CONFIGS	6
+#define TDX1_MAX_NR_CMRS		32
+#define TDX1_MAX_NR_TDMRS		64
+#define TDX1_EXTENDMR_CHUNKSIZE		256
+
+struct tdx_cpuid_config {
+	u32 leaf;
+	u32 sub_leaf;
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+} __packed;
+
+struct tdx_cpuid_value {
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+} __packed;
+
+#define TDX1_TD_ATTRIBUTE_DEBUG		BIT_ULL(0)
+#define TDX1_TD_ATTRIBUTE_SYSPROF	BIT_ULL(1)
+#define TDX1_TD_ATTRIBUTE_PKS		BIT_ULL(30)
+#define TDX1_TD_ATTRIBUTE_KL		BIT_ULL(31)
+#define TDX1_TD_ATTRIBUTE_PERFMON	BIT_ULL(63)
+
+/*
+ * TD_PARAMS is provided as an input to TDINIT, the size of which is 1024B.
+ */
+struct td_params {
+	u64 attributes;
+	u64 xfam;
+	u32 max_vcpus;
+	u32 reserved0;
+
+	u64 eptp_controls;
+	u64 exec_controls;
+	u16 tsc_frequency;
+	u8  reserved1[38];
+
+	u64 mrconfigid[6];
+	u64 mrowner[6];
+	u64 mrownerconfig[6];
+	u64 reserved2[4];
+
+	union {
+		struct tdx_cpuid_value cpuid_values[0];
+		u8 reserved3[768];
+	};
+} __packed __aligned(1024);
+
+/* Guest uses MAX_PA for GPAW when set. */
+#define TDX1_EXEC_CONTROL_MAX_GPAW      BIT_ULL(0)
+
+/*
+ * TDX1 requires the frequency to be defined in units of 25MHz, which is the
+ * frequency of the core crystal clock on TDX-capable platforms, i.e. TDX-SEAM
+ * can only program frequencies that are multiples of 25MHz.  The frequency
+ * must be between 1ghz and 10ghz (inclusive).
+ */
+#define TDX1_TSC_KHZ_TO_25MHZ(tsc_in_khz)	((tsc_in_khz) / (25 * 1000))
+#define TDX1_TSC_25MHZ_TO_KHZ(tsc_in_25mhz)	((tsc_in_25mhz) * (25 * 1000))
+#define TDX1_MIN_TSC_FREQUENCY_KHZ		1  * 1000 * 1000
+#define TDX1_MAX_TSC_FREQUENCY_KHZ		10 * 1000 * 1000
+
+struct tdmr_reserved_area {
+	u64 offset;
+	u64 size;
+} __packed;
+
+struct tdmr_info {
+	u64 base;
+	u64 size;
+	u64 pamt_1g_base;
+	u64 pamt_1g_size;
+	u64 pamt_2m_base;
+	u64 pamt_2m_size;
+	u64 pamt_4k_base;
+	u64 pamt_4k_size;
+	struct tdmr_reserved_area reserved_areas[16];
+} __packed __aligned(4096);
+
+struct cmr_info {
+	u64 base;
+	u64 size;
+} __packed;
+
+struct tdsysinfo_struct {
+	/* TDX-SEAM Module Info */
+	u32 attributes;
+	u32 vendor_id;
+	u32 build_date;
+	u16 build_num;
+	u16 minor_version;
+	u16 major_version;
+	u8 reserved0[14];
+	/* Memory Info */
+	u16 max_tdmrs;
+	u16 max_reserved_per_tdmr;
+	u16 pamt_entry_size;
+	u8 reserved1[10];
+	/* Control Struct Info */
+	u16 tdcs_base_size;
+	u8 reserved2[2];
+	u16 tdvps_base_size;
+	u8 tdvps_xfam_dependent_size;
+	u8 reserved3[9];
+	/* TD Capabilities */
+	u64 attributes_fixed0;
+	u64 attributes_fixed1;
+	u64 xfam_fixed0;
+	u64 xfam_fixed1;
+	u8 reserved4[32];
+	u32 num_cpuid_config;
+	union {
+		struct tdx_cpuid_config cpuid_configs[0];
+		u8 reserved5[892];
+	};
+} __packed __aligned(1024);
+
+#endif /* __KVM_X86_TDX_ARCH_H */

From patchwork Mon Nov 16 18:26:39 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910337
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8A3F4C63697
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:53 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 60BA0241A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388331AbgKPS2W (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:22 -0500
Received: from mga02.intel.com ([134.134.136.20]:48448 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388293AbgKPS2S (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:18 -0500
IronPort-SDR: 
 ZkgBr9qgBQHl0KTPRhE7skihXqV1c6HJG/da1v+B8vxTmAQLyp92t5T/ioJrukomy6YTToy58o
 g5ZB1fw3b8bA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819187"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819187"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:17 -0800
IronPort-SDR: 
 AHzUtPvzDlGytzqQ3H5S4duGiAnTQGnwSPrF0Y3GWA3O0nEtSFhbWjdKDaKL4xRvMmJrG7X8l8
 YjtT5CnHPiVQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528301"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:17 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [RFC PATCH 54/67] KVM: TDX: Define TDCALL exit reason
Date: Mon, 16 Nov 2020 10:26:39 -0800
Message-Id: 
 <dd154ae207eba778fa32d9eb0ac7217105f2550c.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Define the TDCALL exit reason, which is carved out from the VMX exit
reason namespace as the TDCALL exit from TDX guest to TDX-SEAM is really
just a VM-Exit.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/uapi/asm/vmx.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index b8ff9e8ac0d5..95fd84bd909a 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -88,6 +88,7 @@
 #define EXIT_REASON_XRSTORS             64
 #define EXIT_REASON_UMWAIT              67
 #define EXIT_REASON_TPAUSE              68
+#define EXIT_REASON_TDCALL              77
 
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -148,7 +149,8 @@
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
 	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
 	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
-	{ EXIT_REASON_TPAUSE,                "TPAUSE" }
+	{ EXIT_REASON_TPAUSE,                "TPAUSE" }, \
+	{ EXIT_REASON_TDCALL,                "TDCALL" }
 
 #define VMX_EXIT_REASON_FLAGS \
 	{ VMX_EXIT_REASONS_FAILED_VMENTRY,	"FAILED_VMENTRY" }

From patchwork Mon Nov 16 18:26:40 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910373
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 12E0FC55ABD
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id BE6FF2231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:30:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388478AbgKPS3f (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:35 -0500
Received: from mga02.intel.com ([134.134.136.20]:48448 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388296AbgKPS2S (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:18 -0500
IronPort-SDR: 
 PcUpB1VtxuHYlEa4fLYRvbx/qwmu5nm7RKQIKF6cLe4Gee1uvDqqR7Rwn0AIF0Y55NiIDYZWU2
 W7fkfXhLWE2Q==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819189"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819189"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:18 -0800
IronPort-SDR: 
 GGz46BLE+1CSeSwbw1SfIA+tXn/PkoV4eAywjZirIFZmcH9evOtMGgfHKwT4MmL1kxx5hYjvNl
 CoHzMwlKh06A==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528308"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:17 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Kai Huang <kai.huang@linux.intel.com>
Subject: [RFC PATCH 55/67] KVM: TDX: Add SEAMRR related MSRs macro definition
Date: Mon, 16 Nov 2020 10:26:40 -0800
Message-Id: 
 <7e03253675d49ee0d4af5ade35752e59147a3c69.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Kai Huang <kai.huang@linux.intel.com>

Two new MSRs IA32_SEAMRR_PHYS_BASE and IA32_SEAMRR_PHYS_MASK are added
in SPR for TDX. Add macro definition for both of them.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index aad12236b33c..f42da6b11b42 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -924,4 +924,12 @@
 #define MSR_VM_IGNNE                    0xc0010115
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+/* Intel SEAMRR */
+#define MSR_IA32_SEAMRR_PHYS_BASE	0x00001400
+#define MSR_IA32_SEAMRR_PHYS_MASK	0x00001401
+
+#define MSR_IA32_SEAMRR_PHYS_BASE_CONFIGURED	(1ULL << 3)
+#define MSR_IA32_SEAMRR_PHYS_MASK_ENABLED	(1ULL << 11)
+#define MSR_IA32_SEAMRR_PHYS_MASK_LOCKED	(1ULL << 10)
+
 #endif /* _ASM_X86_MSR_INDEX_H */

From patchwork Mon Nov 16 18:26:41 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910367
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 193A9C64E7A
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:41 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id E7AEB2231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388487AbgKPS3T (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:19 -0500
Received: from mga02.intel.com ([134.134.136.20]:48450 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388305AbgKPS2T (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:19 -0500
IronPort-SDR: 
 eI1idiInql3r3siB3hK3HivsGOvAqO3V4KqU4WxD62/ckbk/7qt8ZwaDRfX1ewjAcDVqWo+CVT
 Ml3ZyzcIFZbg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819190"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819190"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:18 -0800
IronPort-SDR: 
 vYXoz2ORHWm3r9tBmm8QiLlZ/HydHSE5UG2K5ulm5jji9al/VgJkMOAvd2ZLU74bMLw0VrNxsG
 MnDY5OxHA6Gg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528319"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:18 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Kai Huang <kai.huang@linux.intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [RFC PATCH 56/67] KVM: TDX: Add macro framework to wrap TDX SEAMCALLs
Date: Mon, 16 Nov 2020 10:26:41 -0800
Message-Id: 
 <25f0d2c2f73c20309a1b578cc5fc15f4fd6b9a13.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Co-developed-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/tdx_ops.h | 531 +++++++++++++++++++++++++++++++++++++
 1 file changed, 531 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_ops.h

diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
new file mode 100644
index 000000000000..a6f87cfe9bda
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -0,0 +1,531 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_TDX_OPS_H
+#define __KVM_X86_TDX_OPS_H
+
+#include <linux/compiler.h>
+
+#include <asm/asm.h>
+#include <asm/kvm_host.h>
+
+struct tdx_ex_ret {
+	union {
+		/* Used to retrieve values from hardware. */
+		struct {
+			u64 rcx;
+			u64 rdx;
+			u64 r8;
+			u64 r9;
+			u64 r10;
+		};
+		/* Functions that return SEPT and level that failed. */
+		struct {
+			u64 septep;
+			int level;
+		};
+		/* TDDBG{RD,WR} return the TDR, field code, and value. */
+		struct {
+			u64 tdr;
+			u64 field;
+			u64 field_val;
+		};
+		/* TDDBG{RD,WR}MEM return the address and its value. */
+		struct {
+			u64 addr;
+			u64 val;
+		};
+		/* TDRDPAGEMD and TDRECLAIMPAGE return page metadata. */
+		struct {
+			u64 page_type;
+			u64 owner;
+			u64 page_size;
+		};
+		/* TDRDSEPT returns the contents of the SEPT entry. */
+		struct {
+			u64 septe;
+			u64 ign;
+		};
+		/*
+		 * TDSYSINFO returns the buffer address and its size, and the
+		 * CMR_INFO address and its number of entries.
+		 */
+		struct {
+			u64 buffer;
+			u64 nr_bytes;
+			u64 cmr_info;
+			u64 nr_cmr_entries;
+		};
+		/*
+		 * TDINIT and TDSYSINIT return CPUID info on error.  Note, only
+		 * the leaf and subleaf are valid on TDINIT error.
+		 */
+		struct {
+			u32 leaf;
+			u32 subleaf;
+			u32 eax_mask;
+			u32 ebx_mask;
+			u32 ecx_mask;
+			u32 edx_mask;
+			u32 eax_val;
+			u32 ebx_val;
+			u32 ecx_val;
+			u32 edx_val;
+		};
+		/* TDSYSINITTDMR returns the input PA and next PA. */
+		struct {
+			u64 prev;
+			u64 next;
+		};
+	};
+};
+
+#define pr_seamcall_error(op, err)					  \
+	pr_err_ratelimited("SEAMCALL[" #op "] failed: 0x%llx (cpu %d)\n", \
+			   SEAMCALL_##op ? (err) : (err), smp_processor_id());
+
+#define TDX_ERR(err, op)			\
+({						\
+	int __ret_warn_on = WARN_ON_ONCE(err);	\
+						\
+	if (unlikely(__ret_warn_on))		\
+		pr_seamcall_error(op, err);	\
+	__ret_warn_on;				\
+})
+
+#define tdenter(args...)		({ 0; })
+
+#define seamcall ".byte 0x66,0x0f,0x01,0xcf"
+
+#ifndef	INTEL_TDX_BOOT_TIME_SEAMCALL
+#define __seamcall				\
+	"1:" seamcall "\n\t"			\
+	"jmp 3f\n\t"				\
+	"2: call kvm_spurious_fault\n\t"	\
+	"3:\n\t"				\
+	_ASM_EXTABLE(1b, 2b)
+#else
+/*
+ * The default BUG()s on faults, which is undesirable during boot, and calls
+ * kvm_spurious_fault(), which isn't linkable if KVM is built as a module.
+ * RAX contains '0' on success, TDX-SEAM errno on failure, vector on fault.
+ */
+#define __seamcall			\
+	"1:" seamcall "\n\t"		\
+	"2: \n\t"			\
+	_ASM_EXTABLE_FAULT(1b, 2b)
+#endif
+
+#define seamcall_N(fn, inputs...)					\
+do {									\
+	u64 ret;							\
+									\
+	asm volatile(__seamcall						\
+		     : ASM_CALL_CONSTRAINT, "=a"(ret)			\
+		     : "a"(SEAMCALL_##fn), inputs			\
+		     : );						\
+	return ret;							\
+} while (0)
+
+#define seamcall_0(fn)	 						\
+	seamcall_N(fn, "i"(0))
+#define seamcall_1(fn, rcx)	 					\
+	seamcall_N(fn, "c"(rcx))
+#define seamcall_2(fn, rcx, rdx)					\
+	seamcall_N(fn, "c"(rcx), "d"(rdx))
+#define seamcall_3(fn, rcx, rdx, __r8)					\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+									\
+	seamcall_N(fn, "c"(rcx), "d"(rdx), "r"(r8));			\
+} while (0)
+#define seamcall_4(fn, rcx, rdx, __r8, __r9)				\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+	register long r9 asm("r9") = __r9;				\
+									\
+	seamcall_N(fn, "c"(rcx), "d"(rdx), "r"(r8), "r"(r9));		\
+} while (0)
+
+#define seamcall_N_2(fn, ex, inputs...)					\
+do {									\
+	u64 ret;							\
+									\
+	asm volatile(__seamcall						\
+		     : ASM_CALL_CONSTRAINT, "=a"(ret),			\
+		       "=c"((ex)->rcx), "=d"((ex)->rdx)			\
+		     : "a"(SEAMCALL_##fn), inputs			\
+		     : );						\
+	return ret;							\
+} while (0)
+
+#define seamcall_0_2(fn, ex)						\
+	seamcall_N_2(fn, ex, "i"(0))
+#define seamcall_1_2(fn, rcx, ex)					\
+	seamcall_N_2(fn, ex, "c"(rcx))
+#define seamcall_2_2(fn, rcx, rdx, ex)					\
+	seamcall_N_2(fn, ex, "c"(rcx), "d"(rdx))
+#define seamcall_3_2(fn, rcx, rdx, __r8, ex)				\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+									\
+	seamcall_N_2(fn, ex, "c"(rcx), "d"(rdx), "r"(r8));		\
+} while (0)
+#define seamcall_4_2(fn, rcx, rdx, __r8, __r9, ex)			\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+	register long r9 asm("r9") = __r9;				\
+									\
+	seamcall_N_2(fn, ex, "c"(rcx), "d"(rdx), "r"(r8), "r"(r9));	\
+} while (0)
+
+#define seamcall_N_3(fn, ex, inputs...)					\
+do {									\
+	register long r8_out asm("r8");					\
+	u64 ret;							\
+									\
+	asm volatile(__seamcall						\
+		     : ASM_CALL_CONSTRAINT, "=a"(ret),			\
+		       "=c"((ex)->rcx), "=d"((ex)->rdx), "=r"(r8_out)	\
+		     : "a"(SEAMCALL_##fn), inputs			\
+		     : );						\
+	(ex)->r8 = r8_out;						\
+	return ret;							\
+} while (0)
+
+#define seamcall_0_3(fn, ex)						\
+	seamcall_N_3(fn, ex, "i"(0))
+#define seamcall_1_3(fn, rcx, ex)					\
+	seamcall_N_3(fn, ex, "c"(rcx))
+#define seamcall_2_3(fn, rcx, rdx, ex)					\
+	seamcall_N_3(fn, ex, "c"(rcx), "d"(rdx))
+#define seamcall_3_3(fn, rcx, rdx, __r8, ex)				\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+									\
+	seamcall_N_3(fn, ex, "c"(rcx), "d"(rdx), "r"(r8));		\
+} while (0)
+#define seamcall_4_3(fn, rcx, rdx, __r8, __r9, ex)			\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+	register long r9 asm("r9") = __r9;				\
+									\
+	seamcall_N_3(fn, ex, "c"(rcx), "d"(rdx), "r"(r8), "r"(r9));	\
+} while (0)
+
+#define seamcall_N_4(fn, ex, inputs...)					\
+do {									\
+	register long r8_out asm("r8");					\
+	register long r9_out asm("r9");					\
+	u64 ret;							\
+									\
+	asm volatile(__seamcall						\
+		     : ASM_CALL_CONSTRAINT, "=a"(ret), "=c"((ex)->rcx),	\
+		       "=d"((ex)->rdx), "=r"(r8_out), "=r"(r9_out)	\
+		     : "a"(SEAMCALL_##fn), inputs			\
+		     : );						\
+	(ex)->r8 = r8_out;						\
+	(ex)->r9 = r9_out;						\
+	return ret;							\
+} while (0)
+
+#define seamcall_0_4(fn, ex)						\
+	seamcall_N_4(fn, ex, "i"(0))
+#define seamcall_1_4(fn, rcx, ex)					\
+	seamcall_N_4(fn, ex, "c"(rcx))
+#define seamcall_2_4(fn, rcx, rdx, ex)					\
+	seamcall_N_4(fn, ex, "c"(rcx), "d"(rdx))
+#define seamcall_3_4(fn, rcx, rdx, __r8, ex)				\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+									\
+	seamcall_N_4(fn, ex, "c"(rcx), "d"(rdx), "r"(r8));		\
+} while (0)
+#define seamcall_4_4(fn, rcx, rdx, __r8, __r9, ex)			\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+	register long r9 asm("r9") = __r9;				\
+									\
+	seamcall_N_4(fn, ex, "c"(rcx), "d"(rdx), "r"(r8), "r"(r9));	\
+} while (0)
+
+#define seamcall_N_5(fn, ex, inputs...)					\
+do {									\
+	register long r8_out asm("r8");					\
+	register long r9_out asm("r9");					\
+	register long r10_out asm("r10");				\
+	u64 ret;							\
+									\
+	asm volatile(__seamcall						\
+		     : ASM_CALL_CONSTRAINT, "=a"(ret), "=c"((ex)->rcx),	\
+		       "=d"((ex)->rdx), "=r"(r8_out), "=r"(r9_out),	\
+		       "=r"(r10_out)					\
+		     : "a"(SEAMCALL_##fn), inputs			\
+		     : );						\
+	(ex)->r8 = r8_out;						\
+	(ex)->r9 = r9_out;						\
+	(ex)->r10 = r10_out;						\
+	return ret;							\
+} while (0)
+
+#define seamcall_0_5(fn, ex)						\
+	seamcall_N_5(fn, ex, "i"(0))
+#define seamcall_1_5(fn, rcx, ex)					\
+	seamcall_N_5(fn, ex, "c"(rcx))
+#define seamcall_2_5(fn, rcx, rdx, ex)					\
+	seamcall_N_5(fn, ex, "c"(rcx), "d"(rdx))
+#define seamcall_3_5(fn, rcx, rdx, __r8, ex)				\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+									\
+	seamcall_N_5(fn, ex, "c"(rcx), "d"(rdx), "r"(r8));		\
+} while (0)
+#define seamcall_4_5(fn, rcx, rdx, __r8, __r9, ex)			\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+	register long r9 asm("r9") = __r9;				\
+									\
+	seamcall_N_5(fn, ex, "c"(rcx), "d"(rdx), "r"(r8), "r"(r9));	\
+} while (0)
+#define seamcall_5_5(fn, rcx, rdx, __r8, __r9, __r10, ex)		\
+do {									\
+	register long r8 asm("r8") = __r8;				\
+	register long r9 asm("r9") = __r9;				\
+	register long r10 asm("r10") = __r10;				\
+									\
+	seamcall_N_5(fn, ex, "c"(rcx), "d"(rdx), "r"(r8), "r"(r9), "r"(r10)); \
+} while (0)
+
+static inline u64 tdaddcx(hpa_t tdr, hpa_t addr)
+{
+	seamcall_2(TDADDCX, addr, tdr);
+}
+
+static inline u64 tdaddpage(hpa_t tdr, gpa_t gpa, hpa_t hpa, hpa_t source,
+			    struct tdx_ex_ret *ex)
+{
+	seamcall_4_2(TDADDPAGE, gpa, tdr, hpa, source, ex);
+}
+
+static inline u64 tdaddsept(hpa_t tdr, gpa_t gpa, int level, hpa_t page,
+			    struct tdx_ex_ret *ex)
+{
+	seamcall_3_2(TDADDSEPT, gpa | level, tdr, page, ex);
+}
+
+static inline u64 tdaddvpx(hpa_t tdvpr, hpa_t addr)
+{
+	seamcall_2(TDADDVPX, addr, tdvpr);
+}
+
+static inline u64 tdassignhkid(hpa_t tdr, int hkid)
+{
+	seamcall_3(TDASSIGNHKID, tdr, 0, hkid);
+}
+
+static inline u64 tdaugpage(hpa_t tdr, gpa_t gpa, hpa_t hpa,
+			    struct tdx_ex_ret *ex)
+{
+	seamcall_3_2(TDAUGPAGE, gpa, tdr, hpa, ex);
+}
+
+static inline u64 tdblock(hpa_t tdr, gpa_t gpa, int level,
+			  struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDBLOCK, gpa | level, tdr, ex);
+}
+
+static inline u64 tdconfigkey(hpa_t tdr)
+{
+	seamcall_1(TDCONFIGKEY, tdr);
+}
+
+static inline u64 tdcreate(hpa_t tdr, int hkid)
+{
+	seamcall_2(TDCREATE, tdr, hkid);
+}
+
+static inline u64 tdcreatevp(hpa_t tdr, hpa_t tdvpr)
+{
+	seamcall_2(TDCREATEVP, tdvpr, tdr);
+}
+
+static inline u64 tddbgrd(hpa_t tdr, u64 field, struct tdx_ex_ret *ex)
+{
+	seamcall_2_3(TDDBGRD, tdr, field, ex);
+}
+
+static inline u64 tddbgwr(hpa_t tdr, u64 field, u64 val, u64 mask,
+			  struct tdx_ex_ret *ex)
+{
+	seamcall_4_3(TDDBGWR, tdr, field, val, mask, ex);
+}
+
+static inline u64 tddbgrdmem(hpa_t addr, struct tdx_ex_ret *ex)
+{
+	seamcall_1_2(TDDBGRDMEM, addr, ex);
+}
+
+static inline u64 tddbgwrmem(hpa_t addr, u64 val, struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDDBGWRMEM, addr, val, ex);
+}
+
+static inline u64 tddemotepage(hpa_t tdr, gpa_t gpa, int level, hpa_t page,
+			       struct tdx_ex_ret *ex)
+{
+	seamcall_3_2(TDDEMOTEPAGE, gpa | level, tdr, page, ex);
+}
+
+static inline u64 tdextendmr(hpa_t tdr, gpa_t gpa, struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDEXTENDMR, gpa, tdr, ex);
+}
+
+static inline u64 tdfinalizemr(hpa_t tdr)
+{
+	seamcall_1(TDFINALIZEMR, tdr);
+}
+
+static inline u64 tdflushvp(hpa_t tdvpr)
+{
+	seamcall_1(TDFLUSHVP, tdvpr);
+}
+
+static inline u64 tdflushvpdone(hpa_t tdr)
+{
+	seamcall_1(TDFLUSHVPDONE, tdr);
+}
+
+static inline u64 tdfreehkids(hpa_t tdr)
+{
+	seamcall_1(TDFREEHKIDS, tdr);
+}
+
+static inline u64 tdinit(hpa_t tdr, hpa_t td_params, struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDINIT, tdr, td_params, ex);
+}
+
+static inline u64 tdinitvp(hpa_t tdvpr, u64 rcx)
+{
+	seamcall_2(TDINITVP, tdvpr, rcx);
+}
+
+static inline u64 tdpromotepage(hpa_t tdr, gpa_t gpa, int level,
+				struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDPROMOTEPAGE, gpa | level, tdr, ex);
+}
+
+static inline u64 tdrdpagemd(hpa_t page, struct tdx_ex_ret *ex)
+{
+	seamcall_1_3(TDRDPAGEMD, page, ex);
+}
+
+static inline u64 tdrdsept(hpa_t tdr, gpa_t gpa, int level,
+			   struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDRDSEPT, gpa | level, tdr, ex);
+}
+
+static inline u64 tdrdvps(hpa_t tdvpr, u64 field, struct tdx_ex_ret *ex)
+{
+	seamcall_2_3(TDRDVPS, tdvpr, field, ex);
+}
+
+static inline u64 tdreclaimhkids(hpa_t tdr)
+{
+	seamcall_1(TDRECLAIMHKIDS, tdr);
+}
+
+static inline u64 tdreclaimpage(hpa_t page, struct tdx_ex_ret *ex)
+{
+	seamcall_1_3(TDRECLAIMPAGE, page, ex);
+}
+
+static inline u64 tdremovepage(hpa_t tdr, gpa_t gpa, int level,
+				struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDREMOVEPAGE, gpa | level, tdr, ex);
+}
+
+static inline u64 tdremovesept(hpa_t tdr, gpa_t gpa, int level,
+			       struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDREMOVESEPT, gpa | level, tdr, ex);
+}
+
+static inline u64 tdsysconfig(hpa_t tdmr, int nr_entries, int hkid)
+{
+	seamcall_3(TDSYSCONFIG, tdmr, nr_entries, hkid);
+}
+
+static inline u64 tdsysconfigkey(void)
+{
+	seamcall_0(TDSYSCONFIGKEY);
+}
+
+static inline u64 tdsysinfo(hpa_t tdsysinfo, int nr_bytes, hpa_t cmr_info,
+			    int nr_cmr_entries, struct tdx_ex_ret *ex)
+{
+	seamcall_4_4(TDSYSINFO, tdsysinfo, nr_bytes, cmr_info, nr_cmr_entries, ex);
+}
+
+static inline u64 tdsysinit(u64 attributes, struct tdx_ex_ret *ex)
+{
+	seamcall_1_5(TDSYSINIT, attributes, ex);
+}
+
+static inline u64 tdsysinitlp(struct tdx_ex_ret *ex)
+{
+	seamcall_0_3(TDSYSINITLP, ex);
+}
+
+static inline u64 tdsysinittdmr(hpa_t tdmr, struct tdx_ex_ret *ex)
+{
+	seamcall_1_2(TDSYSINITTDMR, tdmr, ex);
+}
+
+static inline u64 tdsysshutdownlp(void)
+{
+	seamcall_0(TDSYSSHUTDOWNLP);
+}
+
+static inline u64 tdteardown(hpa_t tdr)
+{
+	seamcall_1(TDTEARDOWN, tdr);
+}
+
+static inline u64 tdtrack(hpa_t tdr)
+{
+	seamcall_1(TDTRACK, tdr);
+}
+
+static inline u64 tdunblock(hpa_t tdr, gpa_t gpa, int level,
+			    struct tdx_ex_ret *ex)
+{
+	seamcall_2_2(TDUNBLOCK, gpa | level, tdr, ex);
+}
+
+static inline u64 tdwbcache(bool resume)
+{
+	seamcall_1(TDWBCACHE, resume ? 1 : 0);
+}
+
+static inline u64 tdwbinvdpage(hpa_t page)
+{
+	seamcall_1(TDWBINVDPAGE, page);
+}
+
+static inline u64 tdwrsept(hpa_t tdr, gpa_t gpa, int level, u64 val,
+			   struct tdx_ex_ret *ex)
+{
+	seamcall_3_2(TDWRSEPT, gpa | level, tdr, val, ex);
+}
+
+static inline u64 tdwrvps(hpa_t tdvpr, u64 field, u64 val, u64 mask,
+			  struct tdx_ex_ret *ex)
+{
+	seamcall_4_3(TDWRVPS, tdvpr, field, val, mask, ex);
+}
+
+#endif /* __KVM_X86_TDX_OPS_H */

From patchwork Mon Nov 16 18:26:42 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910363
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 83BD0C4742C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:40 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 48AA2206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388471AbgKPS3M (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:12 -0500
Received: from mga02.intel.com ([134.134.136.20]:48453 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388312AbgKPS2U (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:20 -0500
IronPort-SDR: 
 wudqxWdr7QCrVtMNB94Lsv2ChEL2GZKRvpWS1fN1GXTYndes6LgpYOZKY5VpscLEoU6cqlYNyp
 MB4f9EayHM0g==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819193"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819193"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:19 -0800
IronPort-SDR: 
 RfjYix6oNY12ACehnzgokwmrMWxMQV9LWXIvpT7Juoyg0j8kM5jeTkBwA18Q+BNkgLYroELfRX
 5h5vbnJNsPjw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528329"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:18 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 57/67] KVM: TDX: Stub in tdx.h with structs, accessors,
 and VMCS helpers
Date: Mon, 16 Nov 2020 10:26:42 -0800
Message-Id: 
 <b5e3e95fd149c5be20284622fd4d607a4c9a2933.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Stub in kvm_tdx, vcpu_tdx, their various accessors, and VMCS helpers.
The VMCS helpers, which rely on the stubs, will be used by preparatory
patches to move VMX functions for accessing VMCS state to common code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/tdx.h | 167 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx.h

diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
new file mode 100644
index 000000000000..b55108a8e484
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -0,0 +1,167 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_TDX_H
+#define __KVM_X86_TDX_H
+
+#include <linux/list.h>
+#include <linux/kvm_host.h>
+
+#include "tdx_arch.h"
+#include "tdx_errno.h"
+#include "tdx_ops.h"
+
+#ifdef CONFIG_KVM_INTEL_TDX
+
+struct tdx_td_page {
+	unsigned long va;
+	hpa_t pa;
+	bool added;
+};
+
+struct kvm_tdx {
+	struct kvm kvm;
+
+	struct tdx_td_page tdr;
+	struct tdx_td_page tdcs[TDX1_NR_TDCX_PAGES];
+};
+
+struct vcpu_tdx {
+	struct kvm_vcpu	vcpu;
+
+	struct tdx_td_page tdvpr;
+	struct tdx_td_page tdvpx[TDX1_NR_TDVPX_PAGES];
+};
+
+static inline bool is_td(struct kvm *kvm)
+{
+	return kvm->arch.vm_type == KVM_X86_TDX_VM;
+}
+
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu)
+{
+	return is_td(vcpu->kvm);
+}
+
+static inline bool is_debug_td(struct kvm_vcpu *vcpu)
+{
+	return !vcpu->kvm->arch.guest_state_protected;
+}
+
+static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm)
+{
+	return container_of(kvm, struct kvm_tdx, kvm);
+}
+
+static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct vcpu_tdx, vcpu);
+}
+
+static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
+{
+	BUILD_BUG_ON_MSG(__builtin_constant_p(field) && (field) & 0x1,
+			 "Read/Write to TD VMCS *_HIGH fields not supported");
+
+	BUILD_BUG_ON(bits != 16 && bits != 32 && bits != 64);
+
+	BUILD_BUG_ON_MSG(bits != 64 && __builtin_constant_p(field) &&
+			 (((field) & 0x6000) == 0x2000 ||
+			  ((field) & 0x6000) == 0x6000),
+			 "Invalid TD VMCS access for 64-bit field");
+	BUILD_BUG_ON_MSG(bits != 32 && __builtin_constant_p(field) &&
+			 ((field) & 0x6000) == 0x4000,
+			 "Invalid TD VMCS access for 32-bit field");
+	BUILD_BUG_ON_MSG(bits != 16 && __builtin_constant_p(field) &&
+			 ((field) & 0x6000) == 0x0000,
+			 "Invalid TD VMCS access for 16-bit field");
+}
+
+static __always_inline void tdvps_gpr_check(u64 field, u8 bits)
+{
+	BUILD_BUG_ON_MSG(__builtin_constant_p(field) && (field) >= NR_VCPU_REGS,
+			 "Invalid TD guest GPR index");
+}
+
+static __always_inline void tdvps_apic_check(u64 field, u8 bits) {}
+static __always_inline void tdvps_dr_check(u64 field, u8 bits) {}
+static __always_inline void tdvps_state_check(u64 field, u8 bits) {}
+static __always_inline void tdvps_msr_check(u64 field, u8 bits) {}
+static __always_inline void tdvps_management_check(u64 field, u8 bits) {}
+
+#define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass)			       \
+static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx,  \
+							u32 field)	       \
+{									       \
+	struct tdx_ex_ret ex_ret;					       \
+	u64 err;							       \
+									       \
+	tdvps_##lclass##_check(field, bits);				       \
+	err = tdrdvps(tdx->tdvpr.pa, TDVPS_##uclass(field), &ex_ret);	       \
+	if (unlikely(err)) {						       \
+		pr_err("TDRDVPS["#uclass".0x%x] failed: 0x%llx\n", field, err);\
+		return 0;						       \
+	}								       \
+	return (u##bits)ex_ret.r8;					       \
+}									       \
+static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx,    \
+						      u32 field, u##bits val)  \
+{									       \
+	struct tdx_ex_ret ex_ret;					       \
+	u64 err;							       \
+									       \
+	tdvps_##lclass##_check(field, bits);				       \
+	err = tdwrvps(tdx->tdvpr.pa, TDVPS_##uclass(field), val,	       \
+		      GENMASK_ULL(bits - 1, 0), &ex_ret);		       \
+	if (unlikely(err))						       \
+		pr_err("TDWRVPS["#uclass".0x%x] = 0x%llx failed: 0x%llx\n",    \
+		       field, (u64)val, err);				       \
+}									       \
+static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *tdx,   \
+						       u32 field, u64 bit)     \
+{									       \
+	struct tdx_ex_ret ex_ret;					       \
+	u64 err;							       \
+									       \
+	tdvps_##lclass##_check(field, bits);				       \
+	err = tdwrvps(tdx->tdvpr.pa, TDVPS_##uclass(field), bit, bit, &ex_ret);\
+	if (unlikely(err))						       \
+		pr_err("TDWRVPS["#uclass".0x%x] |= 0x%llx failed: 0x%llx\n",   \
+		       field, bit, err);				       \
+}									       \
+static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \
+						         u32 field, u64 bit)   \
+{									       \
+	struct tdx_ex_ret ex_ret;					       \
+	u64 err;							       \
+									       \
+	tdvps_##lclass##_check(field, bits);				       \
+	err = tdwrvps(tdx->tdvpr.pa, TDVPS_##uclass(field), 0, bit, &ex_ret);  \
+	if (unlikely(err))						       \
+		pr_err("TDWRVPS["#uclass".0x%x] &= ~0x%llx failed: 0x%llx\n",  \
+		       field, bit, err);				       \
+}
+
+TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
+
+TDX_BUILD_TDVPS_ACCESSORS(64, APIC, apic);
+TDX_BUILD_TDVPS_ACCESSORS(64, GPR, gpr);
+TDX_BUILD_TDVPS_ACCESSORS(64, DR, dr);
+TDX_BUILD_TDVPS_ACCESSORS(64, STATE, state);
+TDX_BUILD_TDVPS_ACCESSORS(64, MSR, msr);
+TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
+
+#else
+
+struct kvm_tdx;
+struct vcpu_tdx;
+
+static inline bool is_td(struct kvm *kvm) { return false; }
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; }
+static inline bool is_debug_td(struct kvm_vcpu *vcpu) { return false; }
+static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) { return NULL; }
+static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) { return NULL; }
+
+#endif /* CONFIG_KVM_INTEL_TDX */
+
+#endif /* __KVM_X86_TDX_H */

From patchwork Mon Nov 16 18:26:43 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910361
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4DB0BC6379F
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:40 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 23B5B2231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388459AbgKPS3G (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:29:06 -0500
Received: from mga02.intel.com ([134.134.136.20]:48453 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388317AbgKPS2V (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:21 -0500
IronPort-SDR: 
 n/m9Zva0Mc5aknYfyhNvp02npyMv6qkbHnAq8wDAQoow8eYlG33n3jNl5MDYqEsIHv0W3k7FZO
 Yb4kn+Ksz5zQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819199"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819199"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:19 -0800
IronPort-SDR: 
 AqtDAvX5LM2JAXr/Y+/57MQKnPKjTrQw+7pJJA77ng9VkPN5mC3045vHsh8xjvAGGN3RTkBe9v
 YAFjQXjlrPHw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528339"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:19 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 58/67] KVM: VMX: Add macro framework to read/write VMCS
 for VMs and TDs
Date: Mon, 16 Nov 2020 10:26:43 -0800
Message-Id: 
 <3a5f49671c7bc14323bfb15ac04c90b28b6faaad.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a macro framework to hide VMX vs. TDX details of VMREAD and VMWRITE
so the VMX and TDX can shared common flows, e.g. accessing DTs.

Note, the TDX paths are dead code at this time.  There is no great way
to deal with the chicken-and-egg scenario of having things in place for
TDX without first having TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/common.h | 41 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 58edf1296cbd..baee96abdd7e 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -11,6 +11,47 @@
 #include "vmcs.h"
 #include "vmx.h"
 #include "x86.h"
+#include "tdx.h"
+
+#ifdef CONFIG_KVM_INTEL_TDX
+#define VT_BUILD_VMCS_HELPERS(type, bits, tdbits)			   \
+static __always_inline type vmread##bits(struct kvm_vcpu *vcpu,		   \
+					 unsigned long field)		   \
+{									   \
+	if (unlikely(is_td_vcpu(vcpu))) {				   \
+		if (KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm))		   \
+			return 0;					   \
+		return td_vmcs_read##tdbits(to_tdx(vcpu), field);	   \
+	}								   \
+	return vmcs_read##bits(field);					   \
+}									   \
+static __always_inline void vmwrite##bits(struct kvm_vcpu *vcpu,	   \
+					  unsigned long field, type value) \
+{									   \
+	if (unlikely(is_td_vcpu(vcpu))) {				   \
+		if (KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm))		   \
+			return;						   \
+		return td_vmcs_write##tdbits(to_tdx(vcpu), field, value);  \
+	}								   \
+	vmcs_write##bits(field, value);					   \
+}
+#else
+#define VT_BUILD_VMCS_HELPERS(type, bits, tdbits)			   \
+static __always_inline type vmread##bits(struct kvm_vcpu *vcpu,		   \
+					 unsigned long field)		   \
+{									   \
+	return vmcs_read##bits(field);					   \
+}									   \
+static __always_inline void vmwrite##bits(struct kvm_vcpu *vcpu,	   \
+					  unsigned long field, type value) \
+{									   \
+	vmcs_write##bits(field, value);					   \
+}
+#endif /* CONFIG_KVM_INTEL_TDX */
+VT_BUILD_VMCS_HELPERS(u16, 16, 16);
+VT_BUILD_VMCS_HELPERS(u32, 32, 32);
+VT_BUILD_VMCS_HELPERS(u64, 64, 64);
+VT_BUILD_VMCS_HELPERS(unsigned long, l, 64);
 
 void vmx_handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info);
 

From patchwork Mon Nov 16 18:26:44 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910357
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E93EBC6379D
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id C3E5C206F9
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388443AbgKPS25 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:57 -0500
Received: from mga02.intel.com ([134.134.136.20]:48450 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388318AbgKPS2V (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:21 -0500
IronPort-SDR: 
 GoqmwuYb1yHWad8PA9dg+Ebf2McGBwXOEKEFSEKf/aWKqwTIfTeP2ONigBcF5KxyXd551/qUxl
 /SqRLlwss4Yw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819201"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819201"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:20 -0800
IronPort-SDR: 
 GMIiLWRsDiTa3jAhAO/6pmbiYxQ8YtlkFI5wH29mDkGhwfhHucPWPhe64exQaAm0Oprn6HUL/e
 ndOcqO+dIMWg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528348"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:20 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 59/67] KVM: VMX: Move AR_BYTES encoder/decoder helpers to
 common.h
Date: Mon, 16 Nov 2020 10:26:44 -0800
Message-Id: 
 <a51b1a108112e400687ac82754f778f85b2973e4.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the AR_BYTES helpers to common.h so that future patches can reuse
them to decode/encode AR for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/common.h | 41 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 46 ++++-----------------------------------
 2 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index baee96abdd7e..ad106364c51f 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,6 +4,7 @@
 
 #include <linux/kvm_host.h>
 
+#include <asm/kvm.h>
 #include <asm/traps.h>
 #include <asm/vmx.h>
 
@@ -121,4 +122,44 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
+static inline u32 vmx_encode_ar_bytes(struct kvm_segment *var)
+{
+	u32 ar;
+
+	if (var->unusable || !var->present)
+		ar = 1 << 16;
+	else {
+		ar = var->type & 15;
+		ar |= (var->s & 1) << 4;
+		ar |= (var->dpl & 3) << 5;
+		ar |= (var->present & 1) << 7;
+		ar |= (var->avl & 1) << 12;
+		ar |= (var->l & 1) << 13;
+		ar |= (var->db & 1) << 14;
+		ar |= (var->g & 1) << 15;
+	}
+
+	return ar;
+}
+
+static inline void vmx_decode_ar_bytes(u32 ar, struct kvm_segment *var)
+{
+	var->unusable = (ar >> 16) & 1;
+	var->type = ar & 15;
+	var->s = (ar >> 4) & 1;
+	var->dpl = (ar >> 5) & 3;
+	/*
+	 * Some userspaces do not preserve unusable property. Since usable
+	 * segment has to be present according to VMX spec we can use present
+	 * property to amend userspace bug by making unusable segment always
+	 * nonpresent. vmx_encode_ar_bytes() already marks nonpresent
+	 * segment as unusable.
+	 */
+	var->present = !var->unusable;
+	var->avl = (ar >> 12) & 1;
+	var->l = (ar >> 13) & 1;
+	var->db = (ar >> 14) & 1;
+	var->g = (ar >> 15) & 1;
+}
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 85401a7eef9a..8bd71b91c6f0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -361,7 +361,6 @@ static const struct kernel_param_ops vmentry_l1d_flush_ops = {
 };
 module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, 0644);
 
-static u32 vmx_segment_access_rights(struct kvm_segment *var);
 static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
 							  u32 msr, int type);
 
@@ -2736,7 +2735,7 @@ static void fix_rmode_seg(int seg, struct kvm_segment *save)
 	vmcs_write16(sf->selector, var.selector);
 	vmcs_writel(sf->base, var.base);
 	vmcs_write32(sf->limit, var.limit);
-	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(&var));
+	vmcs_write32(sf->ar_bytes, vmx_encode_ar_bytes(&var));
 }
 
 static void enter_rmode(struct kvm_vcpu *vcpu)
@@ -3131,7 +3130,6 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 ar;
 
 	if (vmx->rmode.vm86_active && seg != VCPU_SREG_LDTR) {
 		*var = vmx->rmode.segs[seg];
@@ -3145,23 +3143,7 @@ void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg)
 	var->base = vmx_read_guest_seg_base(vmx, seg);
 	var->limit = vmx_read_guest_seg_limit(vmx, seg);
 	var->selector = vmx_read_guest_seg_selector(vmx, seg);
-	ar = vmx_read_guest_seg_ar(vmx, seg);
-	var->unusable = (ar >> 16) & 1;
-	var->type = ar & 15;
-	var->s = (ar >> 4) & 1;
-	var->dpl = (ar >> 5) & 3;
-	/*
-	 * Some userspaces do not preserve unusable property. Since usable
-	 * segment has to be present according to VMX spec we can use present
-	 * property to amend userspace bug by making unusable segment always
-	 * nonpresent. vmx_segment_access_rights() already marks nonpresent
-	 * segment as unusable.
-	 */
-	var->present = !var->unusable;
-	var->avl = (ar >> 12) & 1;
-	var->l = (ar >> 13) & 1;
-	var->db = (ar >> 14) & 1;
-	var->g = (ar >> 15) & 1;
+	vmx_decode_ar_bytes(vmx_read_guest_seg_ar(vmx, seg), var);
 }
 
 static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
@@ -3187,26 +3169,6 @@ int vmx_get_cpl(struct kvm_vcpu *vcpu)
 	}
 }
 
-static u32 vmx_segment_access_rights(struct kvm_segment *var)
-{
-	u32 ar;
-
-	if (var->unusable || !var->present)
-		ar = 1 << 16;
-	else {
-		ar = var->type & 15;
-		ar |= (var->s & 1) << 4;
-		ar |= (var->dpl & 3) << 5;
-		ar |= (var->present & 1) << 7;
-		ar |= (var->avl & 1) << 12;
-		ar |= (var->l & 1) << 13;
-		ar |= (var->db & 1) << 14;
-		ar |= (var->g & 1) << 15;
-	}
-
-	return ar;
-}
-
 void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -3241,7 +3203,7 @@ void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg)
 	if (is_unrestricted_guest(vcpu) && (seg != VCPU_SREG_LDTR))
 		var->type |= 0x1; /* Accessed */
 
-	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
+	vmcs_write32(sf->ar_bytes, vmx_encode_ar_bytes(var));
 
 out:
 	vmx->emulation_required = emulation_required(vcpu);
@@ -3288,7 +3250,7 @@ static bool rmode_segment_valid(struct kvm_vcpu *vcpu, int seg)
 	var.dpl = 0x3;
 	if (seg == VCPU_SREG_CS)
 		var.type = 0x3;
-	ar = vmx_segment_access_rights(&var);
+	ar = vmx_encode_ar_bytes(&var);
 
 	if (var.base != (var.selector << 4))
 		return false;

From patchwork Mon Nov 16 18:26:45 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910355
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 89CC4C55ABD
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 4574F20756
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388434AbgKPS2x (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:53 -0500
Received: from mga02.intel.com ([134.134.136.20]:48454 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388320AbgKPS2W (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:22 -0500
IronPort-SDR: 
 SRRvIOYy8jzW3cnPpnIcvCTOB3vJvLp6QjLOfrQrbUR7q1oeTAmrvhCp7ai8imYa9c8tffcG7s
 pnyKz+O3Quyw==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819203"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819203"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:20 -0800
IronPort-SDR: 
 6fZ+HPVnHsswxLy/JLzT3ch5u1zP0wpolP1FBu27+bi37mjPfgZHykXx4Ig8MZnBPxnj5Gh6et
 q6/ZbSH6Y23A==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528361"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:20 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 60/67] KVM: VMX: MOVE GDT and IDT accessors to common code
Date: Mon, 16 Nov 2020 10:26:45 -0800
Message-Id: 
 <d9c6fbfe32c64f5c72cbe20a858f70c13452a1fd.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/main.c |  6 ++++--
 arch/x86/kvm/vmx/vmx.c  | 12 ------------
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 30b1815fd5a7..53e1ea8df861 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -317,7 +317,8 @@ static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
 static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
-	vmx_get_idt(vcpu, dt);
+	dt->size = vmread32(vcpu, GUEST_IDTR_LIMIT);
+	dt->address = vmreadl(vcpu, GUEST_IDTR_BASE);
 }
 
 static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
@@ -327,7 +328,8 @@ static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 
 static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
-	vmx_get_gdt(vcpu, dt);
+	dt->size = vmread32(vcpu, GUEST_GDTR_LIMIT);
+	dt->address = vmreadl(vcpu, GUEST_GDTR_BASE);
 }
 
 static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8bd71b91c6f0..93b319eacdfa 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3217,24 +3217,12 @@ static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
 	*l = (ar >> 13) & 1;
 }
 
-static void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
-{
-	dt->size = vmcs_read32(GUEST_IDTR_LIMIT);
-	dt->address = vmcs_readl(GUEST_IDTR_BASE);
-}
-
 static void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	vmcs_write32(GUEST_IDTR_LIMIT, dt->size);
 	vmcs_writel(GUEST_IDTR_BASE, dt->address);
 }
 
-static void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
-{
-	dt->size = vmcs_read32(GUEST_GDTR_LIMIT);
-	dt->address = vmcs_readl(GUEST_GDTR_BASE);
-}
-
 static void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	vmcs_write32(GUEST_GDTR_LIMIT, dt->size);

From patchwork Mon Nov 16 18:26:46 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910353
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D086BC63697
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 947902231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:29:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388448AbgKPS25 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:57 -0500
Received: from mga02.intel.com ([134.134.136.20]:48453 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388325AbgKPS2V (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:21 -0500
IronPort-SDR: 
 KvZU+V2usUvHeCkxcHIB3Vx7Von7iGxSIfbCTVrNk9Xh7Ne0Rlox4tWoF7aERCfLeXfDIUbeId
 SY1T4idZOFlg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819210"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819210"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:21 -0800
IronPort-SDR: 
 NikdYxL7VqlcIQ45AmL/S77+dt/pBzwYeGgoOlDCgtLmJb+8VgH7n52VsMTjrL0WxDMMpfD3SJ
 kK7BuhX/jg/A==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528372"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:21 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 61/67] KVM: VMX: Move .get_interrupt_shadow()
 implementation to common VMX code
Date: Mon, 16 Nov 2020 10:26:46 -0800
Message-Id: 
 <e02bda4c41445a8ba9adde7d49ee62587334ef08.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/common.h | 14 ++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 10 +---------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index ad106364c51f..8519423bfd88 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -122,6 +122,20 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
+static inline u32 __vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+	u32 interruptibility;
+	int ret = 0;
+
+	interruptibility = vmread32(vcpu, GUEST_INTERRUPTIBILITY_INFO);
+	if (interruptibility & GUEST_INTR_STATE_STI)
+		ret |= KVM_X86_SHADOW_INT_STI;
+	if (interruptibility & GUEST_INTR_STATE_MOV_SS)
+		ret |= KVM_X86_SHADOW_INT_MOV_SS;
+
+	return ret;
+}
+
 static inline u32 vmx_encode_ar_bytes(struct kvm_segment *var)
 {
 	u32 ar;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 93b319eacdfa..9c15df71700d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1461,15 +1461,7 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 
 u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu)
 {
-	u32 interruptibility = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
-	int ret = 0;
-
-	if (interruptibility & GUEST_INTR_STATE_STI)
-		ret |= KVM_X86_SHADOW_INT_STI;
-	if (interruptibility & GUEST_INTR_STATE_MOV_SS)
-		ret |= KVM_X86_SHADOW_INT_MOV_SS;
-
-	return ret;
+	return __vmx_get_interrupt_shadow(vcpu);
 }
 
 void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)

From patchwork Mon Nov 16 18:26:47 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910347
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 66E0EC2D0A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:55 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 1D98224199
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388354AbgKPS2Y (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:24 -0500
Received: from mga02.intel.com ([134.134.136.20]:48454 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388334AbgKPS2X (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:23 -0500
IronPort-SDR: 
 tQaWWjdg+iibMpXQ/3mw0MCZjFLjhfLhtlnh8Ku1uqH8AEcr94+l7sRgrIngxCEu63STll88mK
 TCb5Ic7rmaVg==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819211"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819211"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:22 -0800
IronPort-SDR: 
 fUllWTxfGS6xw1Qk7QFGZ+35j+NBk0e18JZ7/O/HxgDyKyrQZJU9vOJ/ILaHc6a6qa9jhGBIeH
 XcH6xKnmEOnw==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528377"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:21 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Kai Huang <kai.huang@linux.intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [RFC PATCH 62/67] KVM: TDX: Load and init TDX-SEAM module during boot
Date: Mon, 16 Nov 2020 10:26:47 -0800
Message-Id: 
 <542b02522475c69143e3ac8bcf6014b7db03bd55.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a hook into the early boot flow to load TDX-SEAM and do BSP-only
init of TDX-SEAM.

Perform TDSYSINIT, TDSYSINITLP sequence to initialize TDX during kernel
boot.  Call TDSYSINIT on BSP for platform level initialization, and call
TDSYSINITLP for all cpus for per-cpu initialization.

On BSP, also call TDSYSINFO to get TDX info right after TDSYSINITLP.
While TDX initialization on AP is done in identify_cpu() when AP is
brought up, on BSP it is done right after SEAM module is loaded, but
not in identify_cpu(). The reason is constructing TDMRs needs to be
done before kernel normal page allocator is up, since it requires to
reserve large memory for PAMT (>4MB), which kernel page allocator cannot
allocate. And reserving how much memory for PAMT requires TDX info
reteurned by TDSYSINFO, so it also needs to be done in BSP right after
TDSYSINITLP.

Check kernel parameters and other variables that prevent/indicate that
not all logical CPUs can be onlined.  TDSYSINITLP must be called on all
logical CPUs as part of TDX-SEAM configuration, e.g. TDSYSCONFIG is
guaranteed to fail if not all CPUs are onlined.

Query the 'nr_cpus', 'possible_cpus' and 'maxcpus' kernel parameters, as
well as the 'disabled_cpus' counter that can be incremented during ACPI
parsing (CPUs marked as disabled cannot be brought up later).

Note, the kernel ignores the "Online Capable" bit defined in the ACPI
specification v6.3, section 5.2.12.2 Processor Local APIC Structure:

  CPUs marked as disabled ("Enabled" bit cleared) but it can be
  brought up later by OS if "Online Capable" bit is set.

and simply treats ACPI hot-added CPUs as enabled, i.e. with ACPI CPU
hotplug, the aforementioned variables can change dynamically post-boot.
But, CPU hotplug is unsupported on TDX enabled systems, therefore the
variables are effectively constant post-boot TDX.

In the post-SMP boot phase (tdx_init()), verify that all present CPUs
were succesfully booted.  Note that this also covers the SMT=off case,
i.e. verifies that to-be-disabled sibling threads are booted and run
through TDSYSINITLP.

Detect the TDX private keyID range by reading MSR_IA32_MKTME_KEYID_PART,
which is configured by BIOS and partitions the MKTME KeyID space into
regular KeyIDs and TDX-only KeyIDs.  Disable TDX if the partitioning is
not consistent across all CPUs, i.e. if BIOS screwed up.

Construct Trust Domain Memory Regions (TDMRs) based on info reported by
TDSYSINFO.  For simplicity, all system memory is configured as TDMRs,
otherwise page allocator needs to be modified to distinguish normal and
TD memory allocation.  The overhead of marking all memory as TDMRs
consists of the memory needed for TDX-SEAM's Physical Address Metadata
Tables (PAMTs) used to track TDMRs.

TDMRs are constructed (and PAMTs associated with TDMRs are reserved)
on basis of NUMA node for better performance -- when accessing TD
memory in TDMR, CPU doesn't have to access PAMT in remote node.

Sanity check that the CMRs reported by TDSYSINFO have covered all memory
reported in e820, and disable TDX if there is a discrepancy.  If there
is memory available to the kernel (reported in e820) that is not covered
by a TDMR then it's possible the page allocator will allocate a page
that's not usable for a TD's memory, i.e. would break KVM.

Once all enumeration and sanity checking is done, call TDSYSCONFIG,
TDSYSCONFIGKEY and TDSYSINITTDMR to configure and initialize TDMRs.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/Kbuild                     |    1 +
 arch/x86/include/asm/kvm_boot.h     |   43 +
 arch/x86/kernel/cpu/intel.c         |    4 +
 arch/x86/kernel/setup.c             |    3 +
 arch/x86/kvm/Kconfig                |    8 +
 arch/x86/kvm/boot/Makefile          |    5 +
 arch/x86/kvm/boot/seam/seamldr.S    |  188 +++++
 arch/x86/kvm/boot/seam/seamloader.c |  162 ++++
 arch/x86/kvm/boot/seam/tdx.c        | 1131 +++++++++++++++++++++++++++
 9 files changed, 1545 insertions(+)
 create mode 100644 arch/x86/include/asm/kvm_boot.h
 create mode 100644 arch/x86/kvm/boot/Makefile
 create mode 100644 arch/x86/kvm/boot/seam/seamldr.S
 create mode 100644 arch/x86/kvm/boot/seam/seamloader.c
 create mode 100644 arch/x86/kvm/boot/seam/tdx.c

diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 30dec019756b..4f35eaad7468 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -4,6 +4,7 @@ obj-y += entry/
 obj-$(CONFIG_PERF_EVENTS) += events/
 
 obj-$(CONFIG_KVM) += kvm/
+obj-$(subst m,y,$(CONFIG_KVM)) += kvm/boot/
 
 # Xen paravirtualization support
 obj-$(CONFIG_XEN) += xen/
diff --git a/arch/x86/include/asm/kvm_boot.h b/arch/x86/include/asm/kvm_boot.h
new file mode 100644
index 000000000000..5054fb324283
--- /dev/null
+++ b/arch/x86/include/asm/kvm_boot.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_X86_KVM_BOOT_H
+#define _ASM_X86_KVM_BOOT_H
+
+#include <linux/cpumask.h>
+#include <linux/mutex.h>
+#include <linux/smp.h>
+#include <linux/types.h>
+#include <asm/processor.h>
+
+#ifdef CONFIG_KVM_INTEL_TDX
+int __init seam_load_module(void *module, unsigned long module_size,
+			    void *sigstruct, unsigned long sigstruct_size,
+			    void *seamldr, unsigned long seamldr_size);
+
+void __init tdx_seam_init(void);
+void tdx_init_cpu(struct cpuinfo_x86 *c);
+
+void tdx_seamcall_on_other_pkgs(smp_call_func_t fn, void *param,
+				struct mutex *lock);
+#define tdx_seamcall_on_each_pkg(fn, param, lock)		\
+do {								\
+	fn(param);						\
+	if (topology_max_packages() > 1)			\
+		tdx_seamcall_on_other_pkgs(fn, param, lock);	\
+} while (0)
+
+/*
+ * Return pointer to TDX system info (TDSYSINFO_STRUCT) if TDX has been
+ * successfully initialized, or NULL.
+ */
+struct tdsysinfo_struct;
+struct tdsysinfo_struct *tdx_get_sysinfo(void);
+
+/* TDX keyID allocation functions */
+extern int tdx_keyid_alloc(void);
+extern void tdx_keyid_free(int keyid);
+#else
+static inline void __init tdx_seam_init(void) {}
+static inline void tdx_init_cpu(struct cpuinfo_x86 *c) {}
+#endif
+
+#endif /* _ASM_X86_KVM_BOOT_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 59a1e3ce3f14..bd6338433873 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -15,6 +15,7 @@
 #include <asm/msr.h>
 #include <asm/bugs.h>
 #include <asm/cpu.h>
+#include <asm/kvm_boot.h>
 #include <asm/intel-family.h>
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
@@ -711,6 +712,9 @@ static void init_intel(struct cpuinfo_x86 *c)
 	if (cpu_has(c, X86_FEATURE_TME))
 		detect_tme(c);
 
+	if (cpu_has(c, X86_FEATURE_TDX))
+		tdx_init_cpu(c);
+
 	init_intel_misc_features(c);
 
 	if (tsx_ctrl_state == TSX_CTRL_ENABLE)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84f581c91db4..3bf04246efd1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -38,6 +38,7 @@
 #include <asm/io_apic.h>
 #include <asm/kasan.h>
 #include <asm/kaslr.h>
+#include <asm/kvm_boot.h>
 #include <asm/mce.h>
 #include <asm/mtrr.h>
 #include <asm/realmode.h>
@@ -1200,6 +1201,8 @@ void __init setup_arch(char **cmdline_p)
 
 	prefill_possible_map();
 
+	tdx_seam_init();
+
 	init_cpu_to_node();
 	init_gi_nodes();
 
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f92dfd8ef10d..6f41966c69a7 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -84,6 +84,14 @@ config KVM_INTEL
 	  To compile this as a module, choose M here: the module
 	  will be called kvm-intel.
 
+config KVM_INTEL_TDX
+	bool "Trusted Domain Extensions"
+	depends on KVM_INTEL && X86_64
+	select FW_LOADER
+
+	help
+	  Extends KVM on Intel processors to support Trusted Domain Extensions.
+
 config KVM_AMD
 	tristate "KVM for AMD processors support"
 	depends on KVM
diff --git a/arch/x86/kvm/boot/Makefile b/arch/x86/kvm/boot/Makefile
new file mode 100644
index 000000000000..8356cbe979b9
--- /dev/null
+++ b/arch/x86/kvm/boot/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+ccflags-y += -I$(srctree)/arch/x86/kvm
+
+obj-$(CONFIG_KVM_INTEL_TDX) += seam/seamldr.o seam/seamloader.o seam/tdx.o
diff --git a/arch/x86/kvm/boot/seam/seamldr.S b/arch/x86/kvm/boot/seam/seamldr.S
new file mode 100644
index 000000000000..c7d93df62ce3
--- /dev/null
+++ b/arch/x86/kvm/boot/seam/seamldr.S
@@ -0,0 +1,188 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * ASM helper to load Intel SEAM module.
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Authors:
+ *	Kai Huang <kai.huang>@intel.com
+ */
+#include <linux/linkage.h>
+#include <linux/init.h>
+#include <uapi/asm/processor-flags.h>
+#include <asm/asm.h>
+#include <asm/errno.h>
+#include <asm/msr-index.h>
+#include <asm/segment.h>
+
+.macro save_msr _msr
+	movl	$(\_msr), %ecx
+	rdmsr
+	pushq	%rax
+	pushq	%rdx
+.endm
+
+.macro restore_msr _msr
+	popq	%rdx
+	popq	%rax
+	movl	$(\_msr), %ecx
+	wrmsr
+.endm
+
+	.text
+	__INIT
+	.code64
+SYM_FUNC_START(launch_seamldr)
+
+	pushq	%rbp
+	movq	%rsp, %rbp
+	pushq	%r15
+	pushq	%r14
+	pushq	%r13
+	pushq	%r12
+	pushq	%rbx
+
+	/* Save DR7, SEAMLDR sets it to 0x400. */
+	movq	%dr7, %rax
+	pushq	%rax
+
+	/*
+	 * SEAMLDR restores GDTR and CS before ExitAC, DS/ES/SS don't need to
+	 * be manually preserved as this is 64-bit mode, and FS/GS and IDTR are
+	 * not modified by EnterACCS or SEAMLDR.
+	 */
+
+	/* EnterACCS and SEAMLDR modify CR0 and CR4. */
+	movq	%cr0, %rax
+	pushq	%rax
+	movq	%cr4, %rax
+	pushq	%rax
+
+	/* Enable CR4.SMXE for GETSEC */
+	orq	$X86_CR4_SMXE, %rax
+	movq	%rax, %cr4
+
+	/*
+	 * Load R8-R11 immediately, they won't be clobbered, unlike RDX.
+	 *
+	 *  - R8: SEAMLDR_PARAMS physical address
+	 *  - R9: GDT base to be setup by SEMALDR when returning to kernel
+	 *  - R10: RIP of resume point
+	 *  - R11: CR3 when returning to kernel
+	 */
+	movq	%rdx, %r8
+	sgdt	kernel_gdt64(%rip)
+	movq	kernel_gdt64_base(%rip), %r9
+	leaq	.Lseamldr_resume(%rip), %r10
+	movq	%cr3, %r11
+
+	/* Save MSRs that are modified by EnterACCS and/or SEAMLDR */
+	save_msr MSR_EFER
+	save_msr MSR_IA32_CR_PAT
+	save_msr MSR_IA32_MISC_ENABLE
+
+	/*
+	 * MSRs that are clobbered by SEAMLDR but are not enabled during early
+	 * boot and so don't need to be saved/restored.
+	 *
+	 * save_msr MSR_IA32_DEBUGCTLMSR
+	 * save_msr MSR_CORE_PERF_GLOBAL_CTRL
+	 * save_msr MSR_IA32_PEBS_ENABLE
+	 * save_msr MSR_IA32_RTIT_CTL
+	 * save_msr MSR_IA32_LBR_CTRL
+	 */
+
+	/* Now as last step, save RSP before invoking GETSEC[ENTERACCS] */
+	movq	%rsp, saved_rsp(%rip)
+
+	/*
+	 * Load the Remaining params for EnterACCS.
+	 *
+	 *  - EBX: SEAMLDR ACM physical address
+	 *  - ECX: SEAMLDR ACM size
+	 *  - EAX: 2
+	 */
+	movl	%edi, %ebx
+	movl	%esi, %ecx
+
+	/* Invoke GETSEC[ENTERACCS] */
+	movl	$2, %eax
+.Lseamldr_enteraccs:
+	getsec
+
+.Lseamldr_resume:
+	/*
+	 * SEAMLDR restores CRs and GDT.  Segment registers are flat, but
+	 * don't hold kernel selectors.  Reload the data segs now.
+	 */
+	movl	$__KERNEL_DS, %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %ss
+
+	/*
+	 * Restore stack from RIP relative storage, and then restore everything
+	 * else from the stack.
+	 */
+	movq	saved_rsp(%rip), %rsp
+
+	/*
+	 * Restore CPU status, in reverse order of saving. Firstly, restore
+	 * MSRs.
+	 */
+	restore_msr  MSR_IA32_MISC_ENABLE
+	restore_msr  MSR_IA32_CR_PAT
+	restore_msr  MSR_EFER
+
+	popq	%rax
+	movq	%rax, %cr4
+	popq	%rax
+	movq	%rax, %cr0
+
+	popq	%rax
+	movq	%rax, %dr7
+
+	popq	%rbx
+	popq	%r12
+	popq	%r13
+	popq	%r14
+	popq	%r15
+	popq	%rbp
+
+	/* Far return to load the kernel's CS. */
+	popq	%rax
+	pushq	$__KERNEL_CS
+	pushq	%rax
+
+	movq	%r9, %rax
+	lretq
+
+.pushsection .fixup, "ax"
+	/*
+	 * ENTERACCS faulted, return -EFAULT.  Restore CR4 (to clear SMXE) and
+	 * GPRs (to make objtool happy, only RBP/RSP are actually modified).
+	 */
+1:	movq	8 * 6(%rsp), %rax
+	movq	%rax, %cr4
+	addq	$(8 *9), %rsp
+	popq	%rbx
+	popq	%r12
+	popq	%r13
+	popq	%r14
+	popq	%r15
+	popq	%rbp
+	movq	$-EFAULT, %rax
+	ret
+.popsection
+	_ASM_EXTABLE(.Lseamldr_enteraccs, 1b)
+
+SYM_FUNC_END(launch_seamldr)
+
+	__INITDATA
+	.balign	8
+kernel_gdt64:
+	.word	0
+kernel_gdt64_base:
+	.quad	0
+saved_rsp:
+	.quad	0
diff --git a/arch/x86/kvm/boot/seam/seamloader.c b/arch/x86/kvm/boot/seam/seamloader.c
new file mode 100644
index 000000000000..00202daeac74
--- /dev/null
+++ b/arch/x86/kvm/boot/seam/seamloader.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "seam: " fmt
+
+#include <linux/types.h>
+#include <linux/bits.h>
+#include <linux/memblock.h>
+#include <asm/apic.h>
+#include <asm/cpu.h>
+#include <asm/delay.h>
+#include <asm/kvm_boot.h>
+#include <asm/msr-index.h>
+#include <asm/msr.h>
+#include <asm/page_types.h>
+
+#define MTRRCAP_SEAMRR	BIT(15)
+
+#define SEAMLDR_MAX_NR_MODULE_PAGES	496
+
+struct seamldr_params {
+	u32 version;
+	u32 scenario;
+	u64 sigstruct_pa;
+	u8 reserved[104];
+	u64 module_pages;
+	u64 module_pa_list[SEAMLDR_MAX_NR_MODULE_PAGES];
+} __packed __aligned(PAGE_SIZE);
+
+/* The ACM and input params need to be below 4G. */
+static phys_addr_t __init seam_alloc_lowmem(phys_addr_t size)
+{
+	return memblock_phys_alloc_range(size, PAGE_SIZE, 0, BIT_ULL(32));
+}
+
+static bool __init is_seamrr_enabled(void)
+{
+	u64 mtrrcap, seamrr_base, seamrr_mask;
+
+	if (!boot_cpu_has(X86_FEATURE_MTRR) ||
+	    rdmsrl_safe(MSR_MTRRcap, &mtrrcap) || !(mtrrcap & MTRRCAP_SEAMRR))
+		return 0;
+
+	if (rdmsrl_safe(MSR_IA32_SEAMRR_PHYS_BASE, &seamrr_base) ||
+	    !(seamrr_base & MSR_IA32_SEAMRR_PHYS_BASE_CONFIGURED)) {
+		pr_info("SEAMRR base is not configured by BIOS\n");
+		return 0;
+	}
+
+	if (rdmsrl_safe(MSR_IA32_SEAMRR_PHYS_MASK, &seamrr_mask) ||
+	    !(seamrr_mask & MSR_IA32_SEAMRR_PHYS_MASK_ENABLED)) {
+		pr_info("SEAMRR is not enabled by BIOS\n");
+		return 0;
+	}
+
+	return 1;
+}
+
+extern int __init launch_seamldr(unsigned long seamldr_pa,
+				 unsigned long seamldr_size,
+				 unsigned long params_pa);
+
+int __init seam_load_module(void *module, unsigned long module_size,
+			    void *sigstruct, unsigned long sigstruct_size,
+			    void *seamldr, unsigned long seamldr_size)
+{
+	phys_addr_t module_pa, seamldr_pa, params_pa;
+	struct seamldr_params *params;
+	int enteraccs_attempts = 10;
+	u32 icr_busy;
+	int ret;
+	u64 i;
+
+	if (!is_seamrr_enabled())
+		return -ENOTSUPP;
+
+	/* SEAM module must be 4K aligned, and less than 496 pages. */
+	if (!module_size || !IS_ALIGNED(module_size, PAGE_SIZE) ||
+	    module_size > SEAMLDR_MAX_NR_MODULE_PAGES * PAGE_SIZE) {
+		pr_err("Invalid SEAM module size 0x%lx\n", module_size);
+		return -EINVAL;
+	}
+	/* SEAM signature structure must be 0x200 DWORDS, which is 2048 bytes */
+	if (sigstruct_size != 2048) {
+		pr_err("Invalid SEAM signature structure size 0x%lx\n",
+		       sigstruct_size);
+		return -EINVAL;
+	}
+	if (!seamldr_size) {
+		pr_err("Invalid SEAMLDR ACM size\n");
+		return -EINVAL;
+	}
+
+	ret = -ENOMEM;
+	/* SEAMLDR requires the SEAM module to be 4k aligned. */
+	module_pa = __pa(module);
+	if (!IS_ALIGNED(module_pa, 4096)) {
+		module_pa = memblock_phys_alloc(module_size, PAGE_SIZE);
+		if (!module_pa) {
+			pr_err("Unable to allocate memory to copy SEAM module\n");
+			goto out;
+		}
+		memcpy(__va(module_pa), module, module_size);
+	}
+
+	/* GETSEC[EnterACCS] requires the ACM to be 4k aligned and below 4G. */
+	seamldr_pa = __pa(seamldr);
+	if (seamldr_pa >= BIT_ULL(32) || !IS_ALIGNED(seamldr_pa, 4096)) {
+		seamldr_pa = seam_alloc_lowmem(seamldr_size);
+		if (!seamldr_pa)
+			goto free_seam_module;
+		memcpy(__va(seamldr_pa), seamldr, seamldr_size);
+	}
+
+	/*
+	 * Allocate and initialize the SEAMLDR params.  Pages are passed in as
+	 * a list of physical addresses.
+	 */
+	params_pa = seam_alloc_lowmem(PAGE_SIZE);
+	if (!params_pa) {
+		pr_err("Unable to allocate memory for SEAMLDR_PARAMS\n");
+		goto free_seamldr;
+	}
+
+	ret = -EIO;
+	/* Ensure APs are in WFS. */
+	apic_icr_write(APIC_DEST_ALLBUT | APIC_INT_LEVELTRIG | APIC_INT_ASSERT |
+		       APIC_DM_INIT, 0);
+	icr_busy = safe_apic_wait_icr_idle();
+	if (WARN_ON(icr_busy))
+		goto free_seamldr;
+
+	apic_icr_write(APIC_DEST_ALLBUT | APIC_INT_LEVELTRIG | APIC_DM_INIT, 0);
+	icr_busy = safe_apic_wait_icr_idle();
+	if (WARN_ON(icr_busy))
+		goto free_seamldr;
+	mb();
+
+	params = __va(params_pa);
+	memset(params, 0, PAGE_SIZE);
+	params->sigstruct_pa = __pa(sigstruct);
+	params->module_pages = PFN_UP(module_size);
+	for (i = 0; i < params->module_pages; i++)
+		params->module_pa_list[i] = module_pa + i * PAGE_SIZE;
+
+retry_enteraccs:
+	ret = launch_seamldr(seamldr_pa, seamldr_size, params_pa);
+	if (ret == -EFAULT && !WARN_ON(!enteraccs_attempts--)) {
+		udelay(1 * USEC_PER_MSEC);
+		goto retry_enteraccs;
+	}
+	pr_info("Launch SEAMLDR returned %d\n", ret);
+
+	memblock_free_early(params_pa, PAGE_SIZE);
+free_seamldr:
+	if (seamldr_pa != __pa(seamldr))
+		memblock_free_early(seamldr_pa, seamldr_size);
+free_seam_module:
+	if (module_pa != __pa(module))
+		memblock_free_early(module_pa, module_size);
+out:
+	return ret;
+}
diff --git a/arch/x86/kvm/boot/seam/tdx.c b/arch/x86/kvm/boot/seam/tdx.c
new file mode 100644
index 000000000000..98a9e52cc5a6
--- /dev/null
+++ b/arch/x86/kvm/boot/seam/tdx.c
@@ -0,0 +1,1131 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/earlycpio.h>
+#include <linux/fs.h>
+#include <linux/initrd.h>
+#include <linux/percpu.h>
+#include <linux/memblock.h>
+#include <linux/idr.h>
+#include <linux/sort.h>
+
+#include <asm/cpu.h>
+#include <asm/kvm_boot.h>
+#include <asm/virtext.h>
+#include <asm/tlbflush.h>
+#include <asm/e820/api.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "tdx: " fmt
+
+/* Instruct tdx_ops.h to do boot-time friendly SEAMCALL exception handling. */
+#define INTEL_TDX_BOOT_TIME_SEAMCALL 1
+
+#include "vmx/tdx_arch.h"
+#include "vmx/tdx_ops.h"
+#include "vmx/tdx_errno.h"
+
+#include "vmx/vmcs.h"
+
+static DEFINE_PER_CPU(unsigned long, tdx_vmxon_vmcs);
+static atomic_t tdx_init_cpu_errors;
+
+/*
+ * TODO: better to have kernel boot parameter to let admin control whether to
+ * enable TDX with sysprof or not.
+ *
+ * Or how to decide tdx_sysprof??
+ */
+static bool tdx_sysprof;
+
+/* KeyID range reserved to TDX by BIOS */
+static u32 tdx_keyids_start;
+static u32 tdx_nr_keyids;
+
+/* TDX keyID pool */
+static DEFINE_IDA(tdx_keyid_pool);
+
+static int *tdx_package_masters __ro_after_init;
+
+/*
+ * TDX system information returned by TDSYSINFO.
+ */
+static struct tdsysinfo_struct tdx_tdsysinfo;
+
+/*
+ * CMR info array returned by TDSYSINFO.
+ *
+ * TDSYSINFO doesn't return specific error code indicating whether we didn't
+ * pass long-enough CMR info array to it, so just reserve enough space for
+ * the maximum number of CMRs.
+ */
+static struct cmr_info tdx_cmrs[TDX1_MAX_NR_CMRS] __aligned(512);
+static int tdx_nr_cmrs;
+
+/*
+ * TDMR info array used as input for TDSYSCONFIG.
+ */
+static struct tdmr_info tdx_tdmrs[TDX1_MAX_NR_TDMRS] __initdata;
+static int tdx_nr_tdmrs __initdata;
+static atomic_t tdx_next_tdmr_index;
+static atomic_t tdx_nr_initialized_tdmrs;
+
+/* TDMRs must be 1gb aligned */
+#define TDMR_ALIGNMENT		BIT_ULL(30)
+#define TDMR_PFN_ALIGNMENT	(TDMR_ALIGNMENT >> PAGE_SHIFT)
+
+/*
+ * TDSYSCONFIG takes a array of pointers to TDMR infos.  Its just big enough
+ * that allocating it on the stack is undesirable.
+ */
+static u64 tdx_tdmr_addrs[TDX1_MAX_NR_TDMRS] __aligned(512) __initdata;
+
+struct pamt_info {
+	u64 pamt_base;
+	u64 pamt_size;
+};
+
+/*
+ * PAMT info for each TDMR, used to free PAMT when TDX is disabled due to
+ * whatever reason.
+ */
+static struct pamt_info tdx_pamts[TDX1_MAX_NR_TDMRS] __initdata;
+
+static int __init set_tdmr_reserved_area(struct tdmr_info *tdmr, int *p_idx,
+					 u64 offset, u64 size)
+{
+	int idx = *p_idx;
+
+	if (idx >= tdx_tdsysinfo.max_reserved_per_tdmr)
+		return -EINVAL;
+
+	/* offset & size must be 4K aligned */
+	if (offset & ~PAGE_MASK || size & ~PAGE_MASK)
+		return -EINVAL;
+
+	tdmr->reserved_areas[idx].offset = offset;
+	tdmr->reserved_areas[idx].size = size;
+
+	*p_idx = idx + 1;
+	return 0;
+}
+
+/*
+ * Construct TDMR reserved areas.
+ *
+ * Two types of address range will be put into reserved areas: 1) PAMT range,
+ * since PAMT cannot overlap with TDMR non-reserved range; 2) any CMR hole
+ * within TDMR range, since TDMR non-reserved range must be in CMR.
+ *
+ * Note: we are not putting any memory hole made by kernel (which is not CMR
+ * hole -- i.e. some memory range is reserved by kernel and won't be freed to
+ * page allocator, and it is memory hole from page allocator's view) into
+ * reserved area for the sake of simplicity of implementation. The other
+ * reason is for TDX1 one TDMR can only have upto 16 reserved areas so if
+ * there are lots of holes we won't be have enough reserved areas to hold
+ * them. This is OK, since kernel page allocator will never allocate pages
+ * from those areas (as they are invalid). PAMT may internally mark them as
+ * 'normal' pages but it is OK.
+ *
+ * Returns -EINVAL if number of reserved areas exceeds TDX1 limitation.
+ *
+ */
+static int __init __construct_tdmr_reserved_areas(struct tdmr_info *tdmr,
+						  u64 pamt_base, u64 pamt_size)
+{
+	u64 tdmr_start, tdmr_end, offset, size;
+	struct cmr_info *cmr, *next_cmr;
+	bool pamt_done = false;
+	int i, idx, ret;
+
+	memset(tdmr->reserved_areas, 0, sizeof(tdmr->reserved_areas));
+
+	/* Save some typing later */
+	tdmr_start = tdmr->base;
+	tdmr_end = tdmr->base + tdmr->size;
+
+	if (WARN_ON(!tdx_nr_cmrs))
+		return -EINVAL;
+	/*
+	 * Find the first CMR whose end is greater than tdmr_start_pfn.
+	 */
+	cmr = &tdx_cmrs[0];
+	for (i = 0; i < tdx_nr_cmrs; i++) {
+		cmr = &tdx_cmrs[i];
+		if ((cmr->base + cmr->size) > tdmr_start)
+			break;
+	}
+
+	/* Unable to find ?? Something is wrong here */
+	if (i == tdx_nr_cmrs)
+		return -EINVAL;
+
+	/*
+	 * If CMR base is within TDMR range, [tdmr_start, cmr->base) needs to be
+	 * in reserved area.
+	 */
+	idx = 0;
+	if (cmr->base > tdmr_start) {
+		offset = 0;
+		size = cmr->base - tdmr_start;
+
+		ret = set_tdmr_reserved_area(tdmr, &idx, offset, size);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * Check whether there's any hole between CMRs within TDMR range.
+	 * If there is any, it needs to be in reserved area.
+	 */
+	for (++i; i < tdx_nr_cmrs; i++) {
+		next_cmr = &tdx_cmrs[i];
+
+		/*
+		 * If next CMR is beyond TDMR range, there's no CMR hole within
+		 * TDMR range, and we only need to insert PAMT into reserved
+		 * area, thus  we are done here.
+		 */
+		if (next_cmr->base >= tdmr_end)
+			break;
+
+		/* Otherwise need to have CMR hole in reserved area */
+		if (cmr->base + cmr->size < next_cmr->base) {
+			offset = cmr->base + cmr->size - tdmr_start;
+			size = next_cmr->base - (cmr->base + cmr->size);
+
+			/*
+			 * Reserved areas needs to be in physical address
+			 * ascending order, therefore we need to check PAMT
+			 * range before filling any CMR hole into reserved
+			 * area.
+			 */
+			if (pamt_base < tdmr_start + offset) {
+				/*
+				 * PAMT won't overlap with any CMR hole
+				 * otherwise there's bug -- see comments below.
+				 */
+				if (WARN_ON((pamt_base + pamt_size) >
+					    (tdmr_start + offset)))
+					return -EINVAL;
+
+				ret = set_tdmr_reserved_area(tdmr, &idx,
+							     pamt_base - tdmr_start,
+							     pamt_size);
+				if (ret)
+					return ret;
+
+				pamt_done = true;
+			}
+
+			/* Insert CMR hole into reserved area */
+			ret = set_tdmr_reserved_area(tdmr, &idx, offset, size);
+			if (ret)
+				return ret;
+		}
+
+		cmr = next_cmr;
+	}
+
+	if (!pamt_done) {
+		/*
+		 * PAMT won't overlap with CMR range, otherwise there's bug
+		 * -- we have guaranteed this by checking all CMRs have
+		 * covered all memory in e820.
+		 */
+		if (WARN_ON((pamt_base + pamt_size) > (cmr->base + cmr->size)))
+			return -EINVAL;
+
+		ret = set_tdmr_reserved_area(tdmr, &idx,
+					     pamt_base - tdmr_start, pamt_size);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * If CMR end is in TDMR range, [cmr->end, tdmr_end) needs to be in
+	 * reserved area.
+	 */
+	if (cmr->base + cmr->size < tdmr_end) {
+		offset = cmr->base + cmr->size - tdmr_start;
+		size = tdmr_end - (cmr->base + cmr->size);
+
+		ret = set_tdmr_reserved_area(tdmr, &idx, offset, size);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int __init __construct_tdmr_node(int tdmr_idx,
+					unsigned long tdmr_start_pfn,
+					unsigned long tdmr_end_pfn)
+{
+	u64 tdmr_size, pamt_1g_size, pamt_2m_size, pamt_4k_size, pamt_size;
+	struct pamt_info *pamt = &tdx_pamts[tdmr_idx];
+	struct tdmr_info *tdmr = &tdx_tdmrs[tdmr_idx];
+	u64 pamt_phys;
+	int ret;
+
+	tdmr_size = (tdmr_end_pfn - tdmr_start_pfn) << PAGE_SHIFT;
+
+	/* sanity check */
+	if (!tdmr_size || !IS_ALIGNED(tdmr_size, TDMR_ALIGNMENT))
+		return -EINVAL;
+
+	/* 1 entry to cover 1G */
+	pamt_1g_size = (tdmr_size >> 30) * tdx_tdsysinfo.pamt_entry_size;
+	/* 1 entry to cover 2M */
+	pamt_2m_size = (tdmr_size >> 21) * tdx_tdsysinfo.pamt_entry_size;
+	/* 1 entry to cover 4K */
+	pamt_4k_size = (tdmr_size >> 12) * tdx_tdsysinfo.pamt_entry_size;
+
+	pamt_size = ALIGN(pamt_1g_size, PAGE_SIZE) +
+		    ALIGN(pamt_2m_size, PAGE_SIZE) +
+		    ALIGN(pamt_4k_size, PAGE_SIZE);
+
+	pamt_phys = memblock_phys_alloc_range(pamt_size, PAGE_SIZE,
+					      tdmr_start_pfn << PAGE_SHIFT,
+					      tdmr_end_pfn << PAGE_SHIFT);
+	if (!pamt_phys)
+		return -ENOMEM;
+
+	tdmr->base = tdmr_start_pfn << PAGE_SHIFT;
+	tdmr->size = tdmr_size;
+
+	/* PAMT for 1G at first */
+	tdmr->pamt_1g_base = pamt_phys;
+	tdmr->pamt_1g_size = ALIGN(pamt_1g_size, PAGE_SIZE);
+	/* PAMT for 2M right after PAMT for 1G */
+	tdmr->pamt_2m_base = tdmr->pamt_1g_base + tdmr->pamt_1g_size;
+	tdmr->pamt_2m_size = ALIGN(pamt_2m_size, PAGE_SIZE);
+	/* PAMT for 4K comes after PAMT for 2M */
+	tdmr->pamt_4k_base = tdmr->pamt_2m_base + tdmr->pamt_2m_size;
+	tdmr->pamt_4k_size = ALIGN(pamt_4k_size, PAGE_SIZE);
+
+	/* Construct TDMR's reserved areas */
+	ret = __construct_tdmr_reserved_areas(tdmr, tdmr->pamt_1g_base,
+					      pamt_size);
+	if (ret) {
+		memblock_free(pamt_phys, pamt_size);
+		return ret;
+	}
+
+	/* Record PAMT info for this TDMR */
+	pamt->pamt_base = pamt_phys;
+	pamt->pamt_size = pamt_size;
+
+	return 0;
+}
+
+/*
+ * Convert node's memory into TDMRs as less as possible.
+ *
+ * @node_start_pfn and @node_end_pfn are not node's real memory region, but
+ * already 1G aligned passed from caller.
+ */
+static int __init construct_tdmr_node(int *p_tdmr_idx,
+				      unsigned long tdmr_start_pfn,
+				      unsigned long tdmr_end_pfn)
+{
+	u64 start_pfn, end_pfn, mid_pfn;
+	int ret = 0, idx = *p_tdmr_idx;
+
+	start_pfn = tdmr_start_pfn;
+	end_pfn = tdmr_end_pfn;
+
+	while (start_pfn < tdmr_end_pfn) {
+		/* Cast to u32, else compiler will sign extend and complain. */
+		if (idx >= (u32)tdx_tdsysinfo.max_tdmrs) {
+			ret = -EINVAL;
+			break;
+		}
+
+		ret = __construct_tdmr_node(idx, start_pfn, end_pfn);
+
+		/*
+		 * Try again with smaller TDMR if the failure was due to unable
+		 * to allocate PAMT.
+		 */
+		if (ret == -ENOMEM) {
+			mid_pfn = start_pfn + (end_pfn - start_pfn) / 2;
+			mid_pfn = ALIGN_DOWN(mid_pfn, TDMR_PFN_ALIGNMENT);
+			mid_pfn = max(mid_pfn, start_pfn + TDMR_PFN_ALIGNMENT);
+			if (mid_pfn == end_pfn)
+				break;
+			end_pfn = mid_pfn;
+			continue;
+		} else if (ret) {
+			break;
+		}
+
+		/* Successfully done with one TDMR, and continue if there's remaining */
+		start_pfn = end_pfn;
+		end_pfn = tdmr_end_pfn;
+		idx++;
+	}
+
+	/* Setup next TDMR entry to work on */
+	*p_tdmr_idx = idx;
+	return ret;
+}
+
+/*
+ * Construct TDMR based on system memory info and CMR info. To avoid modifying
+ * kernel core-mm page allocator to have TDMR specific logic for memory
+ * allocation in TDMR, we choose to simply convert all memory to TDMR, with the
+ * disadvantage of wasting some memory for PAMT, but since TDX is mainly a
+ * virtualization feature so it is expected majority of memory will be used as
+ * TD guest memory so wasting some memory for PAMT won't be big issue.
+ *
+ * There are some restrictions of TDMR/PAMT/CMR:
+ *
+ *  - TDMR's base and size need to be 1G aligned.
+ *  - TDMR's size need to be multiple of 1G.
+ *  - TDMRs cannot overlap with each other.
+ *  - PAMTs cannot overlap with each other.
+ *  - Each TDMR can have reserved areas (TDX1 upto 16).
+ *  - TDMR reserved areas must be in physical address ascending order.
+ *  - TDMR non-reserved area must be in CMR.
+ *  - TDMR reserved area doesn't have to be in CMR.
+ *  - TDMR non-reserved area cannot overlap with PAMT.
+ *  - PAMT may reside within TDMR reserved area.
+ *  - PAMT must be in CMR.
+ *
+ */
+static int __init __construct_tdmrs(void)
+{
+	u64 tdmr_start_pfn, tdmr_end_pfn, tdmr_start_pfn_next, inc_pfn;
+	unsigned long start_pfn, end_pfn;
+	int last_nid, nid, i, idx, ret;
+
+	/* Sanity check on tdx_tdsysinfo... */
+	if (!tdx_tdsysinfo.max_tdmrs || !tdx_tdsysinfo.max_reserved_per_tdmr ||
+	    !tdx_tdsysinfo.pamt_entry_size) {
+		pr_err("Invalid TDSYSINFO_STRUCT reported by TDSYSINFO.\n");
+		return -ENOTSUPP;
+	}
+
+	idx = 0;
+	tdmr_start_pfn = 0;
+	tdmr_end_pfn = 0;
+	last_nid = MAX_NUMNODES;
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
+		if (last_nid == MAX_NUMNODES) {
+			/* First memory range */
+			last_nid = nid;
+			tdmr_start_pfn = ALIGN_DOWN(start_pfn, TDMR_PFN_ALIGNMENT);
+			WARN_ON(tdmr_start_pfn != 0);
+		} else if (nid == last_nid) {
+			/*
+			 * This memory range is in the same node as previous
+			 * one, update tdmr_end_pfn.
+			 */
+			tdmr_end_pfn = ALIGN(end_pfn, TDMR_PFN_ALIGNMENT);
+		} else if (ALIGN_DOWN(start_pfn, TDMR_PFN_ALIGNMENT) >= tdmr_end_pfn) {
+			/* This memory range is in next node */
+			/*
+			 * If new TDMR start pfn is greater than previous TDMR
+			 * end pfn, then it's ready to convert previous node's
+			 * memory to TDMR.
+			 */
+			ret = construct_tdmr_node(&idx, tdmr_start_pfn,
+						  tdmr_end_pfn);
+			if (ret)
+				return ret;
+			tdmr_start_pfn = ALIGN(start_pfn, TDMR_PFN_ALIGNMENT);
+			tdmr_end_pfn = ALIGN(end_pfn, TDMR_PFN_ALIGNMENT);
+			last_nid = nid;
+		} else {
+			/*
+			 * This memory range is in the next node, and the
+			 * boundary between nodes falls into 1G range. In this
+			 * case, put beginning of second node into the TDMR
+			 * which covers previous node. This is not ideal but
+			 * this case is very unlikely as well so should be OK
+			 * for now.
+			 */
+			tdmr_end_pfn = ALIGN(start_pfn, TDMR_PFN_ALIGNMENT);
+
+			ret = construct_tdmr_node(&idx, tdmr_start_pfn,
+						  tdmr_end_pfn);
+			if (ret)
+				return ret;
+
+			tdmr_start_pfn = tdmr_end_pfn;
+			last_nid = nid;
+		}
+	}
+
+	/* Spread out the remaining memory across multiple TDMRs. */
+	inc_pfn = (tdmr_end_pfn - tdmr_start_pfn) /
+		  (tdx_tdsysinfo.max_tdmrs - idx);
+	inc_pfn = ALIGN(inc_pfn, TDMR_PFN_ALIGNMENT);
+
+	tdmr_start_pfn_next = tdmr_end_pfn;
+	while (tdmr_start_pfn < tdmr_start_pfn_next) {
+		if (idx == tdx_tdsysinfo.max_tdmrs - 1)
+			tdmr_end_pfn = tdmr_start_pfn_next;
+		else
+			tdmr_end_pfn = tdmr_start_pfn + inc_pfn;
+retry:
+		tdmr_end_pfn = min(tdmr_end_pfn, tdmr_start_pfn_next);
+
+		ret = construct_tdmr_node(&idx, tdmr_start_pfn, tdmr_end_pfn);
+		if (ret == -ENOMEM) {
+			if (tdmr_end_pfn == tdmr_start_pfn_next)
+				return -ENOMEM;
+			tdmr_end_pfn += inc_pfn;
+			goto retry;
+		}
+		if (ret)
+			return ret;
+		tdmr_start_pfn = tdmr_end_pfn;
+	}
+
+	tdx_nr_tdmrs = idx;
+
+	return 0;
+}
+
+static int __init e820_type_cmr_ram(enum e820_type type)
+{
+	/*
+	 * CMR needs to at least cover e820 memory regions which will be later
+	 * freed to kernel memory allocator, otherwise kernel may allocate
+	 * non-TDMR pages, i.e. when KVM allocates memory.
+	 *
+	 * Note memblock also treats E820_TYPE_RESERVED_KERN as memory so also
+	 * need to cover it.
+	 *
+	 * FIXME:
+	 *
+	 * Need to cover other types which are actually RAM, i.e:
+	 *
+	 *   E820_TYPE_ACPI,
+	 *   E820_TYPE_NVS
+	 */
+	return (type == E820_TYPE_RAM || type == E820_TYPE_RESERVED_KERN);
+}
+
+static int __init in_cmr_range(u64 addr, u64 size)
+{
+	struct cmr_info *cmr;
+	u64 cmr_end, end;
+	int i;
+
+	end = addr + size;
+
+	/* Ignore bad area */
+	if (end < addr)
+		return 1;
+
+	for (i = 0; i < tdx_nr_cmrs; i++) {
+		cmr = &tdx_cmrs[i];
+		cmr_end = cmr->base + cmr->size;
+
+		/* Found one CMR which covers the range [addr, addr + size) */
+		if (cmr->base <= addr && cmr_end >= end)
+			return 1;
+	}
+
+	return 0;
+}
+
+static int __init sanitize_cmrs(void)
+{
+	struct e820_entry *entry;
+	bool observed_empty;
+	int i, j;
+
+	if (!tdx_nr_cmrs)
+		return -EIO;
+
+	for (i = 0, j = -1, observed_empty = false; i < tdx_nr_cmrs; i++) {
+		if (!tdx_cmrs[i].size) {
+			observed_empty = true;
+			continue;
+		}
+		/* Valid entry after empty entry isn't allowed, per SEAM. */
+		if (observed_empty)
+			return -EIO;
+
+		/* The previous CMR must reside fully below this CMR. */
+		if (j >= 0 &&
+		    (tdx_cmrs[j].base + tdx_cmrs[j].size) > tdx_cmrs[i].base)
+			return -EIO;
+
+		if (j < 0 ||
+		    (tdx_cmrs[j].base + tdx_cmrs[j].size) != tdx_cmrs[i].base) {
+			j++;
+			if (i != j) {
+				tdx_cmrs[j].base = tdx_cmrs[i].base;
+				tdx_cmrs[j].size = tdx_cmrs[i].size;
+			}
+		} else {
+			 tdx_cmrs[j].size += tdx_cmrs[i].size;
+		}
+	}
+	tdx_nr_cmrs = j + 1;
+	if (!tdx_nr_cmrs)
+		return -EINVAL;
+
+	/*
+	 * Sanity check whether CMR has covered all memory in E820. We need
+	 * to make sure that CMR covers all memory that will be freed to page
+	 * allocator, otherwise alloc_pages() may return non-TDMR pages, i.e.
+	 * when KVM allocates memory for VM. Cannot allow that to happen, so
+	 * disable TDX if we found CMR doesn't cover all.
+	 *
+	 * FIXME:
+	 *
+	 * Alternatively we could just check against memblocks? Only memblocks
+	 * are freed to page allocator so it appears to be OK as long as CMR
+	 * covers all memblocks. But CMR should be generated by BIOS thus should
+	 * be cover e820..
+	 */
+	for (i = 0; i < e820_table->nr_entries; i++) {
+		entry = &e820_table->entries[i];
+
+		if (!e820_type_cmr_ram(entry->type))
+			continue;
+
+		if (!in_cmr_range(entry->addr, entry->size))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int __init construct_tdmrs(void)
+{
+	struct pamt_info *pamt;
+	int ret, i;
+
+	ret = sanitize_cmrs();
+	if (ret)
+		return ret;
+
+	ret = __construct_tdmrs();
+	if (ret)
+		goto free_pamts;
+	return 0;
+
+free_pamts:
+	for (i = 0; i < ARRAY_SIZE(tdx_pamts); i++) {
+		pamt = &tdx_pamts[i];
+		if (pamt->pamt_base && pamt->pamt_size) {
+			if (WARN_ON(!IS_ALIGNED(pamt->pamt_base, PAGE_SIZE) ||
+				    !IS_ALIGNED(pamt->pamt_size, PAGE_SIZE)))
+				continue;
+
+			memblock_free(pamt->pamt_base, pamt->pamt_size);
+		}
+	}
+
+	memset(tdx_pamts, 0, sizeof(tdx_pamts));
+	memset(tdx_tdmrs, 0, sizeof(tdx_tdmrs));
+	tdx_nr_tdmrs = 0;
+	return ret;
+}
+
+
+/*
+ * Well.. I guess a better way is to put cpu_vmxon() into asm/virtext.h,
+ * and split kvm_cpu_vmxon() into cpu_vmxon(), and intel_pt_handle_vmx(),
+ * so we just only have one cpu_vmxon() in asm/virtext.h..
+ */
+static inline void cpu_vmxon(u64 vmxon_region)
+{
+	cr4_set_bits(X86_CR4_VMXE);
+	asm volatile ("vmxon %0" : : "m"(vmxon_region));
+}
+
+static inline int tdx_init_vmxon_vmcs(struct vmcs *vmcs)
+{
+	u64 msr;
+
+	/*
+	 * Can't enable TDX if VMX is unsupported or disabled by BIOS.
+	 * cpu_has(X86_FEATURE_VMX) can't be relied on as the BSP calls this
+	 * before the kernel has configured feat_ctl().
+	 */
+	if (!cpu_has_vmx())
+		return -ENOTSUPP;
+
+	if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr) ||
+	    !(msr & FEAT_CTL_LOCKED) ||
+	    !(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX))
+		return -ENOTSUPP;
+
+	if (rdmsrl_safe(MSR_IA32_VMX_BASIC, &msr))
+		return -ENOTSUPP;
+
+	memset(vmcs, 0, PAGE_SIZE);
+	vmcs->hdr.revision_id = (u32)msr;
+
+	return 0;
+}
+
+#define MSR_IA32_TME_ACTIVATE		0x982
+
+static inline void tdx_get_keyids(u32 *keyids_start, u32 *nr_keyids)
+{
+	u32 nr_mktme_ids;
+
+	rdmsr(MSR_IA32_MKTME_KEYID_PART, nr_mktme_ids, *nr_keyids);
+
+	/* KeyID 0 is reserved, i.e. KeyIDs are 1-based. */
+	*keyids_start = nr_mktme_ids + 1;
+}
+
+static int tdx_init_ap(unsigned long vmcs)
+{
+	u32 keyids_start, nr_keyids;
+	struct tdx_ex_ret ex_ret;
+	u64 err;
+
+	/*
+	 * MSR_IA32_MKTME_KEYID_PART is core-scoped, disable TDX if this CPU's
+	 * partitioning doesn't match the BSP's partitioning.
+	 */
+	tdx_get_keyids(&keyids_start, &nr_keyids);
+	if (keyids_start != tdx_keyids_start || nr_keyids != tdx_nr_keyids) {
+		pr_err("MKTME KeyID partioning inconsistent on CPU %u\n",
+		       smp_processor_id());
+		return -ENOTSUPP;
+	}
+
+	cpu_vmxon(__pa(vmcs));
+	err = tdsysinitlp(&ex_ret);
+	cpu_vmxoff();
+
+	if (TDX_ERR(err, TDSYSINITLP))
+		return -EIO;
+
+	return 0;
+}
+
+void tdx_init_cpu(struct cpuinfo_x86 *c)
+{
+	unsigned long vmcs;
+
+	/* Allocate VMCS for VMXON. */
+	vmcs = __get_free_page(GFP_KERNEL);
+	if (!vmcs)
+		goto err;
+
+	/* VMCS configuration shouldn't fail at this point. */
+	if (WARN_ON_ONCE(tdx_init_vmxon_vmcs((void *)vmcs)))
+		goto err_vmcs;
+
+	/* BSP does TDSYSINITLP as part of tdx_seam_init(). */
+	if (c != &boot_cpu_data && tdx_init_ap(vmcs))
+		goto err_vmcs;
+
+	this_cpu_write(tdx_vmxon_vmcs, vmcs);
+	return;
+
+err_vmcs:
+	free_page(vmcs);
+err:
+	clear_cpu_cap(c, X86_FEATURE_TDX);
+	atomic_inc(&tdx_init_cpu_errors);
+}
+
+static __init int tdx_init_bsp(void)
+{
+	struct tdx_ex_ret ex_ret;
+	void *vmcs;
+	u64 err;
+	int ret;
+
+	/*
+	 * Detect HKID for TDX if initialization was successful.
+	 *
+	 * TDX provides core-scoped MSR for us to simply read out TDX start
+	 * keyID and number of keyIDs.
+	 */
+	tdx_get_keyids(&tdx_keyids_start, &tdx_nr_keyids);
+	if (!tdx_nr_keyids)
+		return -ENOTSUPP;
+
+	/*
+	 * Allocate a temporary VMCS for early BSP init, the VMCS for late(ish)
+	 * init will be allocated after the page allocator is up and running.
+	 */
+	vmcs = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+	if (!vmcs)
+		return -ENOMEM;
+
+	ret = tdx_init_vmxon_vmcs(vmcs);
+	if (ret)
+		goto out;
+
+	cpu_vmxon(__pa(vmcs));
+
+	err = tdsysinit(tdx_sysprof ? BIT(0) : 0, &ex_ret);
+	if (TDX_ERR(err, TDSYSINIT)) {
+		ret = -EIO;
+		goto out_vmxoff;
+	}
+
+	err = tdsysinitlp(&ex_ret);
+	if (TDX_ERR(err, TDSYSINITLP)) {
+		ret = -EIO;
+		goto out_vmxoff;
+	}
+
+	/*
+	 * Do TDSYSINFO to collect the information needed to construct TDMRs,
+	 * which needs to be done before kernel page allocator is up as the
+	 * page allocator can't provide the large chunk (>4MB) of memory needed
+	 * for the PAMTs.
+	 */
+	err = tdsysinfo(__pa(&tdx_tdsysinfo), sizeof(tdx_tdsysinfo),
+			__pa(tdx_cmrs), TDX1_MAX_NR_CMRS, &ex_ret);
+	if (TDX_ERR(err, TDSYSINFO)) {
+		ret = -EIO;
+		goto out_vmxoff;
+	}
+
+	tdx_nr_cmrs = ex_ret.nr_cmr_entries;
+	ret = 0;
+
+out_vmxoff:
+	cpu_vmxoff();
+out:
+	memblock_free(__pa(vmcs), PAGE_SIZE);
+	return ret;
+}
+
+static bool __init tdx_all_cpus_available(void)
+{
+	/*
+	 * CPUs detected in ACPI can be marked as disabled due to:
+	 *   1) disabled in ACPI MADT table
+	 *   2) disabled by 'disable_cpu_apicid' kernel parameter, which
+	 *     disables CPU with particular APIC id.
+	 *   3) limited by 'nr_cpus' kernel parameter.
+	 */
+	if (disabled_cpus) {
+		pr_info("Disabled CPUs detected");
+		goto err;
+	}
+
+	if (num_possible_cpus() < num_processors) {
+		pr_info("Number of CPUs limited by 'possible_cpus' kernel param");
+		goto err;
+	}
+
+	if (setup_max_cpus < num_processors) {
+		pr_info("Boot-time CPUs limited by 'maxcpus' kernel param");
+		goto err;
+	}
+
+	return true;
+
+err:
+	pr_cont(", skipping TDX-SEAM load/config.\n");
+	return false;
+}
+
+static bool __init tdx_get_firmware(struct cpio_data *blob, const char *name)
+{
+	char path[64];
+
+	if (get_builtin_firmware(blob, name))
+		return true;
+
+	if (!IS_ENABLED(CONFIG_BLK_DEV_INITRD) || !initrd_start)
+		return false;
+
+	snprintf(path, sizeof(path), "lib/firmware/%s", name);
+	*blob = find_cpio_data(path, (void *)initrd_start,
+			       initrd_end - initrd_start, NULL);
+
+	return !!blob->data;
+}
+
+void __init tdx_seam_init(void)
+{
+	const char *sigstruct_name = "intel-seam/libtdx.so.sigstruct";
+	const char *seamldr_name = "intel-seam/seamldr.acm";
+	const char *module_name = "intel-seam/libtdx.so";
+	struct cpio_data module, sigstruct, seamldr;
+
+	/*
+	 * Don't load/configure SEAM if not all CPUs can be brought up during
+	 * smp_init(), TDX must execute TDSYSINITLP on all logical processors.
+	 */
+	if (!tdx_all_cpus_available())
+		return;
+
+	if (!tdx_get_firmware(&module, module_name))
+		return;
+
+	if (!tdx_get_firmware(&sigstruct, sigstruct_name))
+		return;
+
+	if (!tdx_get_firmware(&seamldr, seamldr_name))
+		return;
+
+	if (seam_load_module(module.data, module.size, sigstruct.data,
+			     sigstruct.size, seamldr.data, seamldr.size))
+		return;
+
+	if (tdx_init_bsp() || construct_tdmrs())
+		return;
+
+	setup_force_cpu_cap(X86_FEATURE_TDX);
+}
+
+/*
+ * Setup one-cpu-per-pkg array to do package-scoped SEAMCALLs.  The array is
+ * only necessary if there are multiple packages.
+ */
+static int __init init_package_masters(void)
+{
+	int cpu, pkg, nr_filled, nr_pkgs;
+
+	nr_pkgs = topology_max_packages();
+	if (nr_pkgs == 1)
+		return 0;
+
+	tdx_package_masters = kcalloc(nr_pkgs, sizeof(int), GFP_KERNEL);
+	if (!tdx_package_masters)
+		return -ENOMEM;
+
+	memset(tdx_package_masters, -1, nr_pkgs * sizeof(int));
+
+	nr_filled = 0;
+	for_each_online_cpu(cpu) {
+		pkg = topology_physical_package_id(cpu);
+		if (tdx_package_masters[pkg] >= 0)
+			continue;
+
+		tdx_package_masters[pkg] = cpu;
+		if (++nr_filled == topology_max_packages())
+			break;
+	}
+
+	if (WARN_ON(nr_filled != topology_max_packages())) {
+		kfree(tdx_package_masters);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static void __tdx_seamcall_on_other_pkgs(smp_call_func_t fn, void *param)
+{
+	int i, cpu, cur_package;
+
+	cpu = raw_smp_processor_id();
+	cur_package = topology_physical_package_id(cpu);
+
+	for (i = 0; i < topology_max_packages(); i++) {
+		if (i == cur_package)
+			continue;
+
+		smp_call_function_single(tdx_package_masters[i], fn, param, 1);
+	}
+}
+
+void tdx_seamcall_on_other_pkgs(smp_call_func_t fn, void *param,
+				struct mutex *lock)
+{
+	if (WARN_ON_ONCE(!tdx_package_masters))
+		return;
+
+	mutex_lock(lock);
+	preempt_disable();
+
+	__tdx_seamcall_on_other_pkgs(fn, param);
+
+	preempt_enable();
+	mutex_unlock(lock);
+}
+EXPORT_SYMBOL_GPL(tdx_seamcall_on_other_pkgs);
+
+static void __init tdx_vmxon(void *ret)
+{
+	cpu_vmxon(__pa(this_cpu_read(tdx_vmxon_vmcs)));
+}
+
+static void __init tdx_vmxoff(void *ign)
+{
+	cpu_vmxoff();
+}
+
+static void __init tdx_free_vmxon_vmcs(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		free_page(per_cpu(tdx_vmxon_vmcs, cpu));
+		per_cpu(tdx_vmxon_vmcs, cpu) = 0;
+	}
+}
+
+static void __init do_tdsysconfigkey(void *failed)
+{
+	u64 err;
+
+	if (*(int *)failed)
+		return;
+
+	do {
+		err = tdsysconfigkey();
+	} while (err == TDX_KEY_GENERATION_FAILED);
+	TDX_ERR(err, TDSYSCONFIGKEY);
+
+	if (err)
+		*(int *)failed = -EIO;
+}
+
+static void __init __tdx_init_tdmrs(void *failed)
+{
+	struct tdx_ex_ret ex_ret;
+	u64 base, size;
+	u64 err;
+	int i;
+
+	for (i = atomic_fetch_add(1, &tdx_next_tdmr_index);
+	     i < tdx_nr_tdmrs;
+	     i = atomic_fetch_add(1, &tdx_next_tdmr_index)) {
+		base = tdx_tdmrs[i].base;
+		size = tdx_tdmrs[i].size;
+
+		do {
+			/* Abort if a different CPU failed. */
+			if (atomic_read(failed))
+				return;
+
+			err = tdsysinittdmr(base, &ex_ret);
+			if (TDX_ERR(err, TDSYSINITTDMR)) {
+				atomic_inc(failed);
+				return;
+			}
+		/*
+		 * Note, "next" is simply an indicator, base is passed to
+		 * TDSYSINTTDMR on every iteration.
+		 */
+		} while (ex_ret.next < (base + size));
+
+		atomic_inc(&tdx_nr_initialized_tdmrs);
+	}
+}
+
+static int __init tdx_init_tdmrs(void)
+{
+	atomic_t failed = ATOMIC_INIT(0);
+
+	/*
+	 * Flush the cache to guarantee there no MODIFIED cache lines exist for
+	 * PAMTs before TDSYSINITTDMR, which will initialize PAMT memory using
+	 * TDX-SEAM's reserved/system HKID.
+	 */
+	wbinvd_on_all_cpus();
+
+	on_each_cpu(__tdx_init_tdmrs, &failed, 0);
+
+	while (atomic_read(&tdx_nr_initialized_tdmrs) < tdx_nr_tdmrs) {
+		if (atomic_read(&failed))
+			return -EIO;
+	}
+
+	return 0;
+}
+
+static int __init tdx_init(void)
+{
+	int ret, i;
+	u64 err;
+
+	if (!boot_cpu_has(X86_FEATURE_TDX))
+		return -ENOTSUPP;
+
+	/* Disable TDX if any CPU(s) failed to boot. */
+	if (!cpumask_equal(cpu_present_mask, &cpus_booted_once_mask)) {
+		ret = -EIO;
+		goto err;
+	}
+
+	if (atomic_read(&tdx_init_cpu_errors)) {
+		ret = -EIO;
+		goto err;
+	}
+
+	ret = init_package_masters();
+	if (ret)
+		goto err;
+
+	on_each_cpu(tdx_vmxon, NULL, 1);
+
+	for (i = 0; i < tdx_nr_tdmrs; i++)
+		tdx_tdmr_addrs[i] = __pa(&tdx_tdmrs[i]);
+
+	/* Use the first keyID as TDX-SEAM's global key. */
+	err = tdsysconfig(__pa(tdx_tdmr_addrs), tdx_nr_tdmrs, tdx_keyids_start);
+	if (TDX_ERR(err, TDSYSCONFIG)) {
+		ret = -EIO;
+		goto err_vmxoff;
+	}
+
+	do_tdsysconfigkey(&ret);
+	if (!ret && topology_max_packages() > 1)
+		__tdx_seamcall_on_other_pkgs(do_tdsysconfigkey, &ret);
+	if (ret)
+		goto err_vmxoff;
+
+	ret = tdx_init_tdmrs();
+	if (ret)
+		goto err_vmxoff;
+
+	on_each_cpu(tdx_vmxoff, NULL, 1);
+	tdx_free_vmxon_vmcs();
+
+	pr_info("TDX initialized.\n");
+	return 0;
+
+err_vmxoff:
+	on_each_cpu(tdx_vmxoff, NULL, 1);
+err:
+	tdx_free_vmxon_vmcs();
+	clear_cpu_cap(&boot_cpu_data, X86_FEATURE_TDX);
+	return ret;
+}
+arch_initcall(tdx_init);
+
+struct tdsysinfo_struct *tdx_get_sysinfo(void)
+{
+	if (boot_cpu_has(X86_FEATURE_TDX))
+		return &tdx_tdsysinfo;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(tdx_get_sysinfo);
+
+int tdx_keyid_alloc(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_TDX))
+		return -EINVAL;
+
+	if (WARN_ON_ONCE(!tdx_keyids_start || !tdx_nr_keyids))
+		return -EINVAL;
+
+	/* The first keyID is reserved for the global key. */
+	return ida_alloc_range(&tdx_keyid_pool, tdx_keyids_start + 1,
+			       tdx_keyids_start + tdx_nr_keyids - 2,
+			       GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(tdx_keyid_alloc);
+
+void tdx_keyid_free(int keyid)
+{
+	if (!keyid || keyid < 0)
+		return;
+
+	ida_free(&tdx_keyid_pool, keyid);
+}
+EXPORT_SYMBOL_GPL(tdx_keyid_free);

From patchwork Mon Nov 16 18:26:48 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910339
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E0A2FC61DD8
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:55 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id A0E5C2231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388423AbgKPS2s (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:48 -0500
Received: from mga02.intel.com ([134.134.136.20]:48458 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388337AbgKPS2X (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:23 -0500
IronPort-SDR: 
 jQlx1HG65Jyy1JiKjz5FLH/EeFSoCpvB8MgDvjwXE+Vl5UPGGkGY/KYO9NHzODsS+Pdt/1xUVb
 zu4lmfBKtu6g==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819213"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819213"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:22 -0800
IronPort-SDR: 
 ugxbilTI9eOuuteaQH2t/r+tGicD2+WLH7F8pD2PSYyMubmdgyHpeMa54D5RFrsg6000ZjO8gz
 NDdX2Huf4b/w==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528381"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:22 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Kai Huang <kai.huang@linux.intel.com>
Subject: [RFC PATCH 63/67] cpu/hotplug: Document that TDX also depends on
 booting CPUs once
Date: Mon, 16 Nov 2020 10:26:48 -0800
Message-Id: 
 <1d588f512e13b0342e6e76aabb2263440bdde8f8.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Kai Huang <kai.huang@linux.intel.com>

Add a comment to explain that TDX also depends on booting logical CPUs
at least once.

TDSYSINITLP must be run on all CPUs, even software disabled CPUs in the
-nosmt case.  Fortunately, current SMT handling for #MC already supports
booting all CPUs once; the to-be-disabled sibling is booted once (and
later put into deep C-state to honor SMT=off) to allow the init code to
set CR4.MCE and avoid an unwanted shutdown on a broadcasted MCE.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 kernel/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6ff2578ecf17..17a8d7db99b2 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -435,6 +435,10 @@ static inline bool cpu_smt_allowed(unsigned int cpu)
 	 * that the init code can get a chance to set CR4.MCE on each
 	 * CPU. Otherwise, a broadcasted MCE observing CR4.MCE=0b on any
 	 * core will shutdown the machine.
+	 *
+	 * Intel TDX also requires running TDSYSINITLP on all logical CPUs
+	 * during boot, booting all CPUs once allows TDX to play nice with
+	 * 'nosmt'.
 	 */
 	return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
 }

From patchwork Mon Nov 16 18:26:49 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910341
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6AE94C6379F
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:54 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 312DB2231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388391AbgKPS2c (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:32 -0500
Received: from mga02.intel.com ([134.134.136.20]:48463 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388349AbgKPS2Z (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:25 -0500
IronPort-SDR: 
 J/ADQz6Tv5R2hM6HKyt5mYSnvx6Snb57d2g1e9Zzhfhc2VbLVK4yWS0WJn3V5TFYKUR82PD4KP
 ImqtixvwzLiQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819217"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819217"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:23 -0800
IronPort-SDR: 
 P/PomvdexH5Gh8ak4PBtyPaAIhINRJSP3oqC8LOkyBaiPGctFOlWJEztvzcq6kVo0WvYKBVEdo
 U4ysYwp4ouIQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528395"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:22 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>,
        Kai Huang <kai.huang@linux.intel.com>,
        Isaku Yamahata <isaku.yamahata@linux.intel.com>
Subject: [RFC PATCH 64/67] KVM: TDX: Add "basic" support for building and
 running Trust Domains
Date: Mon, 16 Nov 2020 10:26:49 -0800
Message-Id: 
 <b7004ea31380e113f38965f21f86a10cb7be1dc9.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add what is effectively a TDX-specific ioctl for initializing the guest
Trust Domain.  Implement the functionality as a subcommand of
KVM_MEMORY_ENCRYPT_OP, analogous to how the ioctl is used by SVM to
manage SEV guests.

For easy compatibility with future versions of TDX-SEAM, add a
KVM-defined struct, tdx_capabilities, to track requirements/capabilities
for the overall system, and define a global instance to serve as the
canonical reference.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h       |   51 +
 arch/x86/kvm/trace.h                  |   57 +
 arch/x86/kvm/vmx/common.h             |    1 +
 arch/x86/kvm/vmx/main.c               |  384 ++++-
 arch/x86/kvm/vmx/posted_intr.c        |    6 +
 arch/x86/kvm/vmx/tdx.c                | 1850 +++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h                |   78 ++
 arch/x86/kvm/vmx/tdx_ops.h            |   13 +
 arch/x86/kvm/vmx/tdx_stubs.c          |   45 +
 arch/x86/kvm/vmx/vmenter.S            |  140 ++
 arch/x86/kvm/x86.c                    |    5 +-
 tools/arch/x86/include/uapi/asm/kvm.h |   51 +
 12 files changed, 2666 insertions(+), 15 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/tdx.c
 create mode 100644 arch/x86/kvm/vmx/tdx_stubs.c

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 29cdf262e516..03f7bcc3fb85 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -490,4 +490,55 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_SEV_ES_VM	1
 #define KVM_X86_TDX_VM		2
 
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	__u32 id;
+	__u32 metadata;
+	__u64 data;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
+struct kvm_tdx_init_vm {
+	__u32 max_vcpus;
+	__u32 reserved;
+	__u64 attributes;
+	__u64 cpuid;
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index aef960f90f26..e2d9e5caecc8 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -623,6 +623,63 @@ TRACE_EVENT(kvm_nested_vmexit_inject,
 		  __entry->exit_int_info, __entry->exit_int_info_err)
 );
 
+/*
+ * Tracepoint for TDVMCALL from a TDX guest
+ */
+TRACE_EVENT(kvm_tdvmcall,
+	TP_PROTO(struct kvm_vcpu *vcpu, __u32 exit_reason,
+		 __u64 p1, __u64 p2, __u64 p3, __u64 p4),
+	TP_ARGS(vcpu, exit_reason, p1, p2, p3, p4),
+
+	TP_STRUCT__entry(
+		__field(	__u64,		rip		)
+		__field(	__u32,		exit_reason	)
+		__field(	__u64,		p1		)
+		__field(	__u64,		p2		)
+		__field(	__u64,		p3		)
+		__field(	__u64,		p4		)
+	),
+
+	TP_fast_assign(
+		__entry->rip			= kvm_rip_read(vcpu);
+		__entry->exit_reason		= exit_reason;
+		__entry->p1			= p1;
+		__entry->p2			= p2;
+		__entry->p3			= p3;
+		__entry->p4			= p4;
+	),
+
+	TP_printk("rip: %llx reason: %s p1: %llx p2: %llx p3: %llx p4: %llx",
+		  __entry->rip,
+		  __print_symbolic(__entry->exit_reason, VMX_EXIT_REASONS),
+		  __entry->p1, __entry->p2, __entry->p3, __entry->p4)
+);
+
+/*
+ * Tracepoint for SEPT related SEAMCALLs.
+ */
+TRACE_EVENT(kvm_sept_seamcall,
+	TP_PROTO(__u64 op, __u64 gpa, __u64 hpa, int level),
+	TP_ARGS(op, gpa, hpa, level),
+
+	TP_STRUCT__entry(
+		__field(	__u64,		op	)
+		__field(	__u64,		gpa	)
+		__field(	__u64,		hpa	)
+		__field(	int,		level	)
+	),
+
+	TP_fast_assign(
+		__entry->op			= op;
+		__entry->gpa			= gpa;
+		__entry->hpa			= hpa;
+		__entry->level			= level;
+	),
+
+	TP_printk("op: %llu gpa: 0x%llx hpa: 0x%llx level: %u",
+		  __entry->op, __entry->gpa, __entry->hpa, __entry->level)
+);
+
 /*
  * Tracepoint for nested #vmexit because of interrupt pending
  */
diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 8519423bfd88..a48a683af2c3 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -9,6 +9,7 @@
 #include <asm/vmx.h>
 
 #include "mmu.h"
+#include "tdx.h"
 #include "vmcs.h"
 #include "vmx.h"
 #include "x86.h"
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 53e1ea8df861..6437b8b23199 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -1,8 +1,21 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/moduleparam.h>
 
+#ifdef CONFIG_KVM_INTEL_TDX
+static bool __read_mostly enable_tdx = 1;
+module_param_named(tdx, enable_tdx, bool, 0444);
+#else
+#define enable_tdx 0
+#endif
+
 #include "vmx.c"
 
+#ifdef CONFIG_KVM_INTEL_TDX
+#include "tdx.c"
+#else
+#include "tdx_stubs.c"
+#endif
+
 static struct kvm_x86_ops vt_x86_ops __initdata;
 
 static int __init vt_cpu_has_kvm_support(void)
@@ -23,6 +36,16 @@ static int __init vt_check_processor_compatibility(void)
 	if (ret)
 		return ret;
 
+	if (enable_tdx) {
+		/*
+		 * Reject the entire module load if the per-cpu check fails, it
+		 * likely indicates a hardware or system configuration issue.
+		 */
+		ret = tdx_check_processor_compatibility();
+		if (ret)
+			return ret;
+	}
+
 	return 0;
 }
 
@@ -31,13 +54,16 @@ static __init void vt_set_ept_masks(void)
 	const u64 u_mask = VMX_EPT_READABLE_MASK;
 	const u64 a_mask = enable_ept_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull;
 	const u64 d_mask = enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull;
-	const u64 p_mask = cpu_has_vmx_ept_execute_only() ? 0ull :
-							    VMX_EPT_READABLE_MASK;
 	const u64 x_mask = VMX_EPT_EXECUTABLE_MASK;
 	const u64 nx_mask = 0ull;
+	const u64 init_value = enable_tdx ? VMX_EPT_SUPPRESS_VE_BIT : 0ull;
+	const u64 p_mask = (cpu_has_vmx_ept_execute_only() ?
+				0ull : VMX_EPT_READABLE_MASK) | init_value;
 
 	kvm_mmu_set_mask_ptes(u_mask, a_mask, d_mask, nx_mask, x_mask, p_mask,
 			      VMX_EPT_RWX_MASK, 0ull);
+
+	kvm_mmu_set_spte_init_value(init_value);
 }
 
 static __init int vt_hardware_setup(void)
@@ -48,6 +74,11 @@ static __init int vt_hardware_setup(void)
 	if (ret)
 		return ret;
 
+#ifdef CONFIG_KVM_INTEL_TDX
+	if (enable_tdx && tdx_hardware_setup(&vt_x86_ops))
+		enable_tdx = false;
+#endif
+
 	if (enable_ept)
 		vt_set_ept_masks();
 
@@ -56,11 +87,23 @@ static __init int vt_hardware_setup(void)
 
 static int vt_hardware_enable(void)
 {
-	return hardware_enable();
+	int ret;
+
+	ret = hardware_enable();
+	if (ret)
+		return ret;
+
+	if (enable_tdx)
+		tdx_hardware_enable();
+	return 0;
 }
 
 static void vt_hardware_disable(void)
 {
+	/* Note, TDX *and* VMX need to be disabled if TDX is enabled. */
+	if (enable_tdx)
+		tdx_hardware_disable();
+
 	hardware_disable();
 }
 
@@ -71,62 +114,92 @@ static bool vt_cpu_has_accelerated_tpr(void)
 
 static bool vt_is_vm_type_supported(unsigned long type)
 {
-	return type == KVM_X86_LEGACY_VM;
+	return type == KVM_X86_LEGACY_VM ||
+	       (type == KVM_X86_TDX_VM && enable_tdx);
 }
 
 static int vt_vm_init(struct kvm *kvm)
 {
+	if (kvm->arch.vm_type == KVM_X86_TDX_VM)
+		return tdx_vm_init(kvm);
+
 	return vmx_vm_init(kvm);
 }
 
 static void vt_vm_teardown(struct kvm *kvm)
 {
-
+	if (is_td(kvm))
+		return tdx_vm_teardown(kvm);
 }
 
 static void vt_vm_destroy(struct kvm *kvm)
 {
-
+	if (is_td(kvm))
+		return tdx_vm_destroy(kvm);
 }
 
 static int vt_vcpu_create(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_create(vcpu);
+
 	return vmx_create_vcpu(vcpu);
 }
 
 static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_run(vcpu);
+
 	return vmx_vcpu_run(vcpu);
 }
 
 static void vt_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_free(vcpu);
+
 	return vmx_free_vcpu(vcpu);
 }
 
 static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_reset(vcpu, init_event);
+
 	return vmx_vcpu_reset(vcpu, init_event);
 }
 
 static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_load(vcpu, cpu);
+
 	return vmx_vcpu_load(vcpu, cpu);
 }
 
 static void vt_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_put(vcpu);
+
 	return vmx_vcpu_put(vcpu);
 }
 
 static int vt_handle_exit(struct kvm_vcpu *vcpu,
 			     enum exit_fastpath_completion fastpath)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_handle_exit(vcpu, fastpath);
+
 	return vmx_handle_exit(vcpu, fastpath);
 }
 
 static void vt_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_handle_exit_irqoff(vcpu);
+
 	vmx_handle_exit_irqoff(vcpu);
 }
 
@@ -142,21 +215,33 @@ static void vt_update_emulated_instruction(struct kvm_vcpu *vcpu)
 
 static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_set_msr(vcpu, msr_info);
+
 	return vmx_set_msr(vcpu, msr_info);
 }
 
 static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
+	if (is_td_vcpu(vcpu))
+		return false;
+
 	return vmx_smi_allowed(vcpu, for_injection);
 }
 
 static int vt_pre_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
 	return vmx_pre_enter_smm(vcpu, smstate);
 }
 
 static int vt_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 {
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return 0;
+
 	return vmx_pre_leave_smm(vcpu, smstate);
 }
 
@@ -168,6 +253,9 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
 static bool vt_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn,
 				       int insn_len)
 {
+	if (is_td_vcpu(vcpu))
+		return false;
+
 	return vmx_can_emulate_instruction(vcpu, insn, insn_len);
 }
 
@@ -176,11 +264,17 @@ static int vt_check_intercept(struct kvm_vcpu *vcpu,
 				 enum x86_intercept_stage stage,
 				 struct x86_exception *exception)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return X86EMUL_UNHANDLEABLE;
+
 	return vmx_check_intercept(vcpu, info, stage, exception);
 }
 
 static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return true;
+
 	return vmx_apic_init_signal_blocked(vcpu);
 }
 
@@ -189,13 +283,43 @@ static void vt_migrate_timers(struct kvm_vcpu *vcpu)
 	vmx_migrate_timers(vcpu);
 }
 
+static int vt_mem_enc_op_dev(void __user *argp)
+{
+	if (!enable_tdx)
+		return -EINVAL;
+
+	return tdx_dev_ioctl(argp);
+}
+
+static int vt_mem_enc_op(struct kvm *kvm, void __user *argp)
+{
+	if (!is_td(kvm))
+		return -ENOTTY;
+
+	return tdx_vm_ioctl(kvm, argp);
+}
+
+static int vt_mem_enc_op_vcpu(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	if (!is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return tdx_vcpu_ioctl(vcpu, argp);
+}
+
 static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_set_virtual_apic_mode(vcpu);
+
 	return vmx_set_virtual_apic_mode(vcpu);
 }
 
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_apicv_post_state_restore(vcpu);
+
 	return vmx_apicv_post_state_restore(vcpu);
 }
 
@@ -209,36 +333,57 @@ static bool vt_check_apicv_inhibit_reasons(ulong bit)
 
 static void vt_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	return vmx_hwapic_irr_update(vcpu, max_irr);
 }
 
 static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	return vmx_hwapic_isr_update(vcpu, max_isr);
 }
 
 static bool vt_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return false;
+
 	return vmx_guest_apic_has_interrupt(vcpu);
 }
 
 static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return -1;
+
 	return vmx_sync_pir_to_irr(vcpu);
 }
 
 static int vt_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_deliver_posted_interrupt(vcpu, vector);
+
 	return vmx_deliver_posted_interrupt(vcpu, vector);
 }
 
 static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	return vmx_vcpu_after_set_cpuid(vcpu);
 }
 
 static bool vt_has_emulated_msr(struct kvm *kvm, u32 index)
 {
+	if (kvm && is_td(kvm))
+		return tdx_is_emulated_msr(index, true);
+
 	return vmx_has_emulated_msr(index);
 }
 
@@ -249,11 +394,23 @@ static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
 
 static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * All host state is saved/restored across SEAMCALL/SEAMRET, and the
+	 * guest state of a TD is obviously off limits.  Deferring MSRs and DRs
+	 * is pointless because TDX-SEAM needs to load *something* so as not to
+	 * expose guest state.
+	 */
+	if (is_td_vcpu(vcpu))
+		return;
+
 	vmx_prepare_switch_to_guest(vcpu);
 }
 
 static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_update_exception_bitmap(vcpu);
+
 	update_exception_bitmap(vcpu);
 }
 
@@ -264,54 +421,84 @@ static int vt_get_msr_feature(struct kvm_msr_entry *msr)
 
 static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_get_msr(vcpu, msr_info);
+
 	return vmx_get_msr(vcpu, msr_info);
 }
 
 static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_get_segment_base(vcpu, seg);
+
 	return vmx_get_segment_base(vcpu, seg);
 }
 
 static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
 			      int seg)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_get_segment(vcpu, var, seg);
+
 	vmx_get_segment(vcpu, var, seg);
 }
 
 static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
 			      int seg)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_set_segment(vcpu, var, seg);
 }
 
 static int vt_get_cpl(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_get_cpl(vcpu);
+
 	return vmx_get_cpl(vcpu);
 }
 
 static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu) && !is_debug_td(vcpu), vcpu->kvm))
+		return;
+
 	vmx_get_cs_db_l_bits(vcpu, db, l);
 }
 
 static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_set_cr0(vcpu, cr0);
 }
 
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
 			    int pgd_level)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_load_mmu_pgd(vcpu, pgd, pgd_level);
+
 	vmx_load_mmu_pgd(vcpu, pgd, pgd_level);
 }
 
 static int vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 1;
+
 	return vmx_set_cr4(vcpu, cr4);
 }
 
 static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return -EIO;
+
 	return vmx_set_efer(vcpu, efer);
 }
 
@@ -323,6 +510,9 @@ static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 
 static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_set_idt(vcpu, dt);
 }
 
@@ -334,16 +524,30 @@ static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 
 static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_set_gdt(vcpu, dt);
 }
 
 static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_set_dr7(vcpu, val);
+
 	vmx_set_dr7(vcpu, val);
 }
 
 static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * MOV-DR exiting is always cleared for TD guest, even in debug mode.
+	 * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never
+	 * reach here for TD vcpu.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_sync_dirty_debug_regs(vcpu);
 }
 
@@ -355,31 +559,41 @@ static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 
 	switch (reg) {
 	case VCPU_REGS_RSP:
-		vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP);
+		vcpu->arch.regs[VCPU_REGS_RSP] = vmreadl(vcpu, GUEST_RSP);
 		break;
 	case VCPU_REGS_RIP:
-		vcpu->arch.regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP);
+#ifdef CONFIG_KVM_INTEL_TDX
+		/*
+		 * RIP can be read by tracepoints, stuff a bogus value and
+		 * avoid a WARN/error.
+		 */
+		if (unlikely(is_td_vcpu(vcpu) && !is_debug_td(vcpu))) {
+			vcpu->arch.regs[VCPU_REGS_RIP] = 0xdeadul << 48;
+			break;
+		}
+#endif
+		vcpu->arch.regs[VCPU_REGS_RIP] = vmreadl(vcpu, GUEST_RIP);
 		break;
 	case VCPU_EXREG_PDPTR:
-		if (enable_ept)
+		if (enable_ept && !KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
 			ept_save_pdptrs(vcpu);
 		break;
 	case VCPU_EXREG_CR0:
 		guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
 
 		vcpu->arch.cr0 &= ~guest_owned_bits;
-		vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & guest_owned_bits;
+		vcpu->arch.cr0 |= vmreadl(vcpu, GUEST_CR0) & guest_owned_bits;
 		break;
 	case VCPU_EXREG_CR3:
 		if (is_unrestricted_guest(vcpu) ||
 		    (enable_ept && is_paging(vcpu)))
-			vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
+			vcpu->arch.cr3 = vmreadl(vcpu, GUEST_CR3);
 		break;
 	case VCPU_EXREG_CR4:
 		guest_owned_bits = vcpu->arch.cr4_guest_owned_bits;
 
 		vcpu->arch.cr4 &= ~guest_owned_bits;
-		vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits;
+		vcpu->arch.cr4 |= vmreadl(vcpu, GUEST_CR4) & guest_owned_bits;
 		break;
 	default:
 		KVM_BUG_ON(1, vcpu->kvm);
@@ -389,171 +603,284 @@ static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 
 static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_get_rflags(vcpu);
+
 	return vmx_get_rflags(vcpu);
 }
 
 static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_set_rflags(vcpu, rflags);
+
 	vmx_set_rflags(vcpu, rflags);
 }
 
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_flush_tlb(vcpu);
+
 	vmx_flush_tlb_all(vcpu);
 }
 
 static void vt_flush_tlb_current(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_flush_tlb(vcpu);
+
 	vmx_flush_tlb_current(vcpu);
 }
 
 static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_flush_tlb_gva(vcpu, addr);
 }
 
 static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	vmx_flush_tlb_guest(vcpu);
 }
 
 static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_set_interrupt_shadow(vcpu, mask);
 }
 
 static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
 {
-	return vmx_get_interrupt_shadow(vcpu);
+	return __vmx_get_interrupt_shadow(vcpu);
 }
 
 static void vt_patch_hypercall(struct kvm_vcpu *vcpu,
 				  unsigned char *hypercall)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_patch_hypercall(vcpu, hypercall);
 }
 
 static void vt_inject_irq(struct kvm_vcpu *vcpu)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_inject_irq(vcpu);
 }
 
 static void vt_inject_nmi(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_inject_nmi(vcpu);
+
 	vmx_inject_nmi(vcpu);
 }
 
 static void vt_queue_exception(struct kvm_vcpu *vcpu)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu) && !is_debug_td(vcpu), vcpu->kvm))
+		return;
+
 	vmx_queue_exception(vcpu);
 }
 
 static void vt_cancel_injection(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	vmx_cancel_injection(vcpu);
 }
 
 static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
+	if (is_td_vcpu(vcpu))
+		return true;
+
 	return vmx_interrupt_allowed(vcpu, for_injection);
 }
 
 static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
+	/*
+	 * TDX-SEAM manages NMI windows and NMI reinjection, and hides NMI
+	 * blocking, all KVM can do is throw an NMI over the wall.
+	 */
+	if (is_td_vcpu(vcpu))
+		return true;
+
 	return vmx_nmi_allowed(vcpu, for_injection);
 }
 
 static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * Assume NMIs are always unmasked.  KVM could query PEND_NMI and treat
+	 * NMIs as masked if a previous NMI is still pending, but SEAMCALLs are
+	 * expensive and the end result is unchanged as the only relevant usage
+	 * of get_nmi_mask() is to limit the number of pending NMIs, i.e. it
+	 * only changes whether KVM or TDX-SEAM drops an NMI.
+	 */
+	if (is_td_vcpu(vcpu))
+		return false;
+
 	return vmx_get_nmi_mask(vcpu);
 }
 
 static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	vmx_set_nmi_mask(vcpu, masked);
 }
 
 static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
+	/* TDX-SEAM handles NMI windows, KVM always reports NMIs as unblocked. */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	enable_nmi_window(vcpu);
 }
 
 static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	enable_irq_window(vcpu);
 }
 
 static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	update_cr8_intercept(vcpu, tpr, irr);
 }
 
 static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 {
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
 	vmx_set_apic_access_page_addr(vcpu);
 }
 
 static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
 	vmx_refresh_apicv_exec_ctrl(vcpu);
 }
 
 static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 {
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
 	vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap);
 }
 
 static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr)
 {
+	/* TODO: Reject this and update Qemu, or eat it? */
+	if (is_td(kvm))
+		return 0;
+
 	return vmx_set_tss_addr(kvm, addr);
 }
 
 static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
 {
+	/* TODO: Reject this and update Qemu, or eat it? */
+	if (is_td(kvm))
+		return 0;
+
 	return vmx_set_identity_map_addr(kvm, ident_addr);
 }
 
 static u64 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
+	if (is_td_vcpu(vcpu)) {
+		if (is_mmio)
+			return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
+		return  MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
+	}
+
 	return vmx_get_mt_mask(vcpu, gfn, is_mmio);
 }
 
 static void vt_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
 			     u32 *intr_info, u32 *error_code)
 {
+	if (is_td_vcpu(vcpu))
+		return tdx_get_exit_info(vcpu, info1, info2, intr_info,
+					 error_code);
 
 	return vmx_get_exit_info(vcpu, info1, info2, intr_info, error_code);
 }
 
 static u64 vt_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
 	return vmx_write_l1_tsc_offset(vcpu, offset);
 }
 
 static void vt_request_immediate_exit(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return __kvm_request_immediate_exit(vcpu);
+
 	vmx_request_immediate_exit(vcpu);
 }
 
 static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	vmx_sched_in(vcpu, cpu);
 }
 
 static void vt_slot_enable_log_dirty(struct kvm *kvm,
 					struct kvm_memory_slot *slot)
 {
+	if (is_td(kvm))
+		return;
+
 	vmx_slot_enable_log_dirty(kvm, slot);
 }
 
 static void vt_slot_disable_log_dirty(struct kvm *kvm,
 					 struct kvm_memory_slot *slot)
 {
+	if (is_td(kvm))
+		return;
+
 	vmx_slot_disable_log_dirty(kvm, slot);
 }
 
 static void vt_flush_log_dirty(struct kvm *kvm)
 {
+	if (is_td(kvm))
+		return;
+
 	vmx_flush_log_dirty(kvm);
 }
 
@@ -561,6 +888,9 @@ static void vt_enable_log_dirty_pt_masked(struct kvm *kvm,
 					     struct kvm_memory_slot *memslot,
 					     gfn_t offset, unsigned long mask)
 {
+	if (is_td(kvm))
+		return;
+
 	vmx_enable_log_dirty_pt_masked(kvm, memslot, offset, mask);
 }
 
@@ -569,12 +899,16 @@ static int vt_pre_block(struct kvm_vcpu *vcpu)
 	if (pi_pre_block(vcpu))
 		return 1;
 
+	if (is_td_vcpu(vcpu))
+		return 0;
+
 	return vmx_pre_block(vcpu);
 }
 
 static void vt_post_block(struct kvm_vcpu *vcpu)
 {
-	vmx_post_block(vcpu);
+	if (!is_td_vcpu(vcpu))
+		vmx_post_block(vcpu);
 
 	pi_post_block(vcpu);
 }
@@ -584,17 +918,26 @@ static void vt_post_block(struct kvm_vcpu *vcpu)
 static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			      bool *expired)
 {
+	if (is_td_vcpu(vcpu))
+		return -EINVAL;
+
 	return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired);
 }
 
 static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
 {
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
 	vmx_cancel_hv_timer(vcpu);
 }
 #endif
 
 static void vt_setup_mce(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return;
+
 	vmx_setup_mce(vcpu);
 }
 
@@ -729,6 +1072,10 @@ static struct kvm_x86_ops vt_x86_ops __initdata = {
 	.migrate_timers = vt_migrate_timers,
 
 	.msr_filter_changed = vt_msr_filter_changed,
+
+	.mem_enc_op_dev = vt_mem_enc_op_dev,
+	.mem_enc_op = vt_mem_enc_op,
+	.mem_enc_op_vcpu = vt_mem_enc_op_vcpu,
 };
 
 static struct kvm_x86_init_ops vt_init_ops __initdata = {
@@ -745,6 +1092,9 @@ static int __init vt_init(void)
 	unsigned int vcpu_size = 0, vcpu_align = 0;
 	int r;
 
+	/* tdx_pre_kvm_init must be called before vmx_pre_kvm_init(). */
+	tdx_pre_kvm_init(&vcpu_size, &vcpu_align, &vt_x86_ops.vm_size);
+
 	vmx_pre_kvm_init(&vcpu_size, &vcpu_align, &vt_x86_ops);
 
 	r = kvm_init(&vt_init_ops, vcpu_size, vcpu_align, THIS_MODULE);
@@ -755,8 +1105,14 @@ static int __init vt_init(void)
 	if (r)
 		goto err_kvm_exit;
 
+	r = tdx_init();
+	if (r)
+		goto err_vmx_exit;
+
 	return 0;
 
+err_vmx_exit:
+	vmx_exit();
 err_kvm_exit:
 	kvm_exit();
 err_vmx_post_exit:
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index f02962dcc72c..86c3ae5ee27e 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -6,6 +6,7 @@
 
 #include "lapic.h"
 #include "posted_intr.h"
+#include "tdx.h"
 #include "trace.h"
 #include "vmx.h"
 
@@ -18,6 +19,11 @@ static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
 
 static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
+#ifdef CONFIG_KVM_INTEL_TDX
+	if (is_td_vcpu(vcpu))
+		return &(to_tdx(vcpu)->pi_desc);
+#endif
+
 	return &(to_vmx(vcpu)->pi_desc);
 }
 
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
new file mode 100644
index 000000000000..adcb866861b7
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -0,0 +1,1850 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+#include <linux/kvm_host.h>
+#include <linux/jump_label.h>
+#include <linux/trace_events.h>
+#include <linux/pagemap.h>
+
+#include <asm/kvm_boot.h>
+#include <asm/virtext.h>
+
+#include "common.h"
+#include "cpuid.h"
+#include "lapic.h"
+#include "tdx.h"
+#include "tdx_errno.h"
+#include "tdx_ops.h"
+
+#include <trace/events/kvm.h>
+#include "trace.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) "tdx: " fmt
+
+/* Capabilities of KVM + TDX-SEAM. */
+struct tdx_capabilities tdx_caps;
+
+static DEFINE_MUTEX(tdwbcache_lock);
+static DEFINE_MUTEX(tdconfigkey_lock);
+
+/*
+ * A per-CPU list of TD vCPUs associated with a given CPU.  Used when a CPU
+ * is brought down to invoke TDFLUSHVP on the approapriate TD vCPUS.
+ */
+static DEFINE_PER_CPU(struct list_head, associated_tdvcpus);
+
+static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcpu)
+{
+	return kvm_rcx_read(vcpu);
+}
+static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu *vcpu)
+{
+	return kvm_rdx_read(vcpu);
+}
+static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu)
+{
+	return kvm_r8_read(vcpu);
+}
+static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu)
+{
+	return kvm_r9_read(vcpu);
+}
+
+#define BUILD_TDVMCALL_ACCESSORS(param, gpr)				    \
+static __always_inline							    \
+unsigned long tdvmcall_##param##_read(struct kvm_vcpu *vcpu)		    \
+{									    \
+	return kvm_##gpr##_read(vcpu);					    \
+}									    \
+static __always_inline void tdvmcall_##param##_write(struct kvm_vcpu *vcpu, \
+						     unsigned long val)	    \
+{									    \
+	kvm_##gpr##_write(vcpu, val);					    \
+}
+BUILD_TDVMCALL_ACCESSORS(p1, r12);
+BUILD_TDVMCALL_ACCESSORS(p2, r13);
+BUILD_TDVMCALL_ACCESSORS(p3, r14);
+BUILD_TDVMCALL_ACCESSORS(p4, r15);
+
+static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *vcpu)
+{
+	return kvm_r10_read(vcpu);
+}
+static __always_inline unsigned long tdvmcall_exit_reason(struct kvm_vcpu *vcpu)
+{
+	return kvm_r11_read(vcpu);
+}
+static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu,
+						     long val)
+{
+	kvm_r10_write(vcpu, val);
+}
+static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu,
+						    unsigned long val)
+{
+	kvm_r11_write(vcpu, val);
+}
+
+static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
+{
+	return tdx->tdvpr.added;
+}
+
+static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->tdr.added;
+}
+
+static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->hkid >= 0;
+}
+
+static inline bool is_td_initialized(struct kvm *kvm)
+{
+	return !!kvm->max_vcpus;
+}
+
+static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->finalized;
+}
+
+static void tdx_clear_page(unsigned long page)
+{
+	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
+	unsigned long i;
+
+	/* Zeroing the page is only necessary for systems with MKTME-i. */
+	if (!static_cpu_has(X86_FEATURE_MOVDIR64B))
+		return;
+
+	for (i = 0; i < 4096; i += 64)
+		/* MOVDIR64B [rdx], es:rdi */
+		asm (".byte 0x66, 0x0f, 0x38, 0xf8, 0x3a"
+		     : : "d" (zero_page), "D" (page + i) : "memory");
+}
+
+static int __tdx_reclaim_page(unsigned long va, hpa_t pa, bool do_wb)
+{
+	struct tdx_ex_ret ex_ret;
+	u64 err;
+
+	err = tdreclaimpage(pa, &ex_ret);
+	if (TDX_ERR(err, TDRECLAIMPAGE))
+		return -EIO;
+
+	if (do_wb) {
+		err = tdwbinvdpage(pa);
+		if (TDX_ERR(err, TDWBINVDPAGE))
+			return -EIO;
+	}
+
+	tdx_clear_page(va);
+	return 0;
+}
+
+static int tdx_reclaim_page(unsigned long va, hpa_t pa)
+{
+	return __tdx_reclaim_page(va, pa, false);
+}
+
+static int tdx_alloc_td_page(struct tdx_td_page *page)
+{
+	page->va = __get_free_page(GFP_KERNEL_ACCOUNT);
+	if (!page->va)
+		return -ENOMEM;
+
+	page->pa = __pa(page->va);
+	return 0;
+}
+
+static void tdx_add_td_page(struct tdx_td_page *page)
+{
+	WARN_ON_ONCE(page->added);
+	page->added = true;
+}
+
+static void tdx_reclaim_td_page(struct tdx_td_page *page)
+{
+	if (page->added) {
+		if (tdx_reclaim_page(page->va, page->pa))
+			return;
+
+		page->added = false;
+	}
+	free_page(page->va);
+}
+
+static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
+{
+	list_del(&to_tdx(vcpu)->cpu_list);
+
+	/*
+	 * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1,
+	 * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU
+	 * to its list before its deleted from this CPUs list.
+	 */
+	smp_wmb();
+
+	vcpu->cpu = -1;
+}
+
+static void tdx_flush_vp(void *arg)
+{
+	struct kvm_vcpu *vcpu = arg;
+	u64 err;
+
+	/* Task migration can race with CPU offlining. */
+	if (vcpu->cpu != raw_smp_processor_id())
+		return;
+
+	err = tdflushvp(to_tdx(vcpu)->tdvpr.pa);
+	if (unlikely(err && err != TDX_VCPU_NOT_ASSOCIATED))
+		TDX_ERR(err, TDFLUSHVP);
+
+	tdx_disassociate_vp(vcpu);
+}
+
+static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->cpu == -1)
+		return;
+
+	/*
+	 * No need to do TDFLUSHVP if the vCPU hasn't been initialized.  The
+	 * list tracking still needs to be updated so that it's correct if/when
+	 * the vCPU does get initialized.
+	 */
+	if (is_td_vcpu_created(to_tdx(vcpu)))
+		smp_call_function_single(vcpu->cpu, tdx_flush_vp, vcpu, 1);
+	else
+		tdx_disassociate_vp(vcpu);
+}
+
+static void tdx_do_tdwbcache(void *data)
+{
+	u64 err = 0;
+
+	do {
+		err = tdwbcache(!!err);
+	} while (err == TDX_INTERRUPTED_RESUMABLE);
+
+	if (err && cmpxchg64((u64 *)data, 0, err) == 0)
+		TDX_ERR(err, TDWBCACHE);
+}
+
+static void tdx_vm_teardown(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	struct kvm_vcpu *vcpu;
+	u64 err;
+	int i;
+
+	if (!is_hkid_assigned(kvm_tdx))
+		return;
+
+	if (!is_td_created(kvm_tdx))
+		goto free_hkid;
+
+	err = tdreclaimhkids(kvm_tdx->tdr.pa);
+	if (TDX_ERR(err, TDRECLAIMHKIDS))
+		return;
+
+	kvm_for_each_vcpu(i, vcpu, (&kvm_tdx->kvm))
+		tdx_flush_vp_on_cpu(vcpu);
+
+	err = tdflushvpdone(kvm_tdx->tdr.pa);
+	if (TDX_ERR(err, TDFLUSHVPDONE))
+		return;
+
+	tdx_seamcall_on_each_pkg(tdx_do_tdwbcache, &err, &tdwbcache_lock);
+
+	if (unlikely(err))
+		return;
+
+	err = tdfreehkids(kvm_tdx->tdr.pa);
+	if (TDX_ERR(err, TDFREEHKIDS))
+		return;
+
+free_hkid:
+	tdx_keyid_free(kvm_tdx->hkid);
+	kvm_tdx->hkid = -1;
+}
+
+static void tdx_vm_destroy(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	int i;
+
+	/* Can't reclaim or free TD pages if teardown failed. */
+	if (is_hkid_assigned(kvm_tdx))
+		return;
+
+	kvm_mmu_zap_all_private(kvm);
+
+	for (i = 0; i < tdx_caps.tdcs_nr_pages; i++)
+		tdx_reclaim_td_page(&kvm_tdx->tdcs[i]);
+
+	if (kvm_tdx->tdr.added &&
+	    __tdx_reclaim_page(kvm_tdx->tdr.va, kvm_tdx->tdr.pa, true))
+		return;
+
+	free_page(kvm_tdx->tdr.va);
+}
+
+struct tdx_tdconfigkey {
+	hpa_t tdr;
+	u64 err;
+};
+
+static void tdx_do_tdconfigkey(void *data)
+{
+	struct tdx_tdconfigkey *configkey = data;
+	u64 err;
+
+	if (configkey->err)
+		return;
+
+	do {
+		err = tdconfigkey(configkey->tdr);
+	} while (err == TDX_KEY_GENERATION_FAILED);
+
+	if (TDX_ERR(err, TDCONFIGKEY))
+		configkey->err = err;
+}
+
+static int tdx_vm_init(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	struct tdx_tdconfigkey configkey;
+	int ret, i;
+	u64 err;
+
+	kvm->dirty_log_unsupported = true;
+	kvm->readonly_mem_unsupported = true;
+
+	kvm->arch.tsc_immutable = true;
+	kvm->arch.eoi_intercept_unsupported = true;
+	kvm->arch.guest_state_protected = true;
+	kvm->arch.irq_injection_disallowed = true;
+	kvm->arch.mce_injection_disallowed = true;
+	kvm_mmu_set_mmio_spte_mask(kvm, 0, 0);
+
+	/* TODO: Enable 2mb and 1gb large page support. */
+	kvm->arch.tdp_max_page_level = PG_LEVEL_4K;
+
+	kvm_apicv_init(kvm, true);
+
+	/* vCPUs can't be created until after KVM_TDX_INIT_VM. */
+	kvm->max_vcpus = 0;
+
+	kvm_tdx->hkid = tdx_keyid_alloc();
+	if (kvm_tdx->hkid < 0)
+		return -EBUSY;
+	if (WARN_ON_ONCE(kvm_tdx->hkid >> 16)) {
+		ret = -EIO;
+		goto free_hkid;
+	}
+
+	ret = tdx_alloc_td_page(&kvm_tdx->tdr);
+	if (ret)
+		goto free_hkid;
+
+	for (i = 0; i < tdx_caps.tdcs_nr_pages; i++) {
+		ret = tdx_alloc_td_page(&kvm_tdx->tdcs[i]);
+		if (ret)
+			goto free_tdcs;
+	}
+
+	ret = -EIO;
+	err = tdcreate(kvm_tdx->tdr.pa, kvm_tdx->hkid);
+	if (TDX_ERR(err, TDCREATE))
+		goto free_tdcs;
+	tdx_add_td_page(&kvm_tdx->tdr);
+
+	configkey.tdr = kvm_tdx->tdr.pa;
+	configkey.err = 0;
+
+	tdx_seamcall_on_each_pkg(tdx_do_tdconfigkey, &configkey,
+				 &tdconfigkey_lock);
+	if (configkey.err)
+		goto teardown;
+
+	for (i = 0; i < tdx_caps.tdcs_nr_pages; i++) {
+		err = tdaddcx(kvm_tdx->tdr.pa, kvm_tdx->tdcs[i].pa);
+		if (TDX_ERR(err, TDADDCX))
+			goto teardown;
+		tdx_add_td_page(&kvm_tdx->tdcs[i]);
+	}
+
+	/*
+	 * Note, TDINIT cannot be invoked here.  TDINIT requires a dedicated
+	 * ioctl() to define the configure CPUID values for the TD.
+	 */
+	return 0;
+
+	/*
+	 * The sequence for freeing resources from a partially initialized TD
+	 * varies based on where in the initialization flow failure occurred.
+	 * Simply use the full teardown and destroy, which naturally play nice
+	 * with partial initialization.
+	 */
+teardown:
+	tdx_vm_teardown(kvm);
+	tdx_vm_destroy(kvm);
+	return ret;
+
+free_tdcs:
+	/* @i points at the TDCS page that failed allocation. */
+	for (--i; i >= 0; i--)
+		free_page(kvm_tdx->tdcs[i].va);
+
+	free_page(kvm_tdx->tdr.va);
+free_hkid:
+	tdx_keyid_free(kvm_tdx->hkid);
+	return ret;
+}
+
+static int tdx_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int cpu, ret, i;
+
+	ret = tdx_alloc_td_page(&tdx->tdvpr);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		ret = tdx_alloc_td_page(&tdx->tdvpx[i]);
+		if (ret)
+			goto free_tdvpx;
+	}
+
+	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
+
+	vcpu->arch.switch_db_regs = KVM_DEBUGREG_AUTO_SWITCHED;
+	vcpu->arch.cr0_guest_owned_bits = -1ul;
+	vcpu->arch.cr4_guest_owned_bits = -1ul;
+
+	/* TODO: Grab TSC_OFFSET from the TDCS (need updated API). */
+	vcpu->arch.tsc_offset = 0;
+	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
+
+	tdx->pi_desc.nv = POSTED_INTR_VECTOR;
+	tdx->pi_desc.sn = 1;
+
+	cpu = get_cpu();
+	list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu));
+	vcpu->cpu = cpu;
+	put_cpu();
+
+	return 0;
+
+free_tdvpx:
+	/* @i points at the TDVPX page that failed allocation. */
+	for (--i; i >= 0; i--)
+		free_page(tdx->tdvpx[i].va);
+
+	free_page(tdx->tdvpr.va);
+
+	return ret;
+}
+
+static void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	if (vcpu->cpu != cpu) {
+		tdx_flush_vp_on_cpu(vcpu);
+
+		/*
+		 * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure
+		 * vcpu->cpu is read before tdx->cpu_list.
+		 */
+		smp_rmb();
+
+		list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu));
+	}
+
+	vmx_vcpu_pi_load(vcpu, cpu);
+}
+
+static void tdx_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	vmx_vcpu_pi_put(vcpu);
+}
+
+static void tdx_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int i;
+
+	/* Can't reclaim or free pages if teardown failed. */
+	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
+		return;
+
+	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
+		tdx_reclaim_td_page(&tdx->tdvpx[i]);
+
+	tdx_reclaim_td_page(&tdx->tdvpr);
+}
+
+static void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	struct msr_data apic_base_msr;
+	u64 err;
+	int i;
+
+	if (WARN_ON(init_event) || !vcpu->arch.apic)
+		goto td_bugged;
+
+	err = tdcreatevp(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
+	if (TDX_ERR(err, TDCREATEVP))
+		goto td_bugged;
+	tdx_add_td_page(&tdx->tdvpr);
+
+	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		err = tdaddvpx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
+		if (TDX_ERR(err, TDADDVPX))
+			goto td_bugged;
+		tdx_add_td_page(&tdx->tdvpx[i]);
+	}
+
+	apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
+	if (kvm_vcpu_is_reset_bsp(vcpu))
+		apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
+	apic_base_msr.host_initiated = true;
+	if (WARN_ON(kvm_set_apic_base(vcpu, &apic_base_msr)))
+		goto td_bugged;
+
+	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+	return;
+
+td_bugged:
+	vcpu->kvm->vm_bugged = true;
+	return;
+}
+
+static void tdx_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
+}
+
+u64 __tdx_vcpu_run(hpa_t tdvpr, void *regs, u32 regs_mask);
+
+static fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	if (unlikely(vcpu->kvm->vm_bugged)) {
+		tdx->exit_reason.full = TDX_NON_RECOVERABLE_VCPU;
+		return EXIT_FASTPATH_NONE;
+	}
+
+	if (pi_test_on(&tdx->pi_desc)) {
+		apic->send_IPI_self(POSTED_INTR_VECTOR);
+
+		kvm_wait_lapic_expire(vcpu, true);
+	}
+
+	tdx->exit_reason.full = __tdx_vcpu_run(tdx->tdvpr.pa, vcpu->arch.regs,
+					       tdx->tdvmcall.regs_mask);
+
+	vmx_register_cache_reset(vcpu);
+
+	trace_kvm_exit((unsigned int)tdx->exit_reason.full, vcpu, KVM_ISA_VMX);
+
+	if (tdx->exit_reason.error || tdx->exit_reason.non_recoverable)
+		return EXIT_FASTPATH_NONE;
+
+	if (tdx->exit_reason.basic == EXIT_REASON_TDCALL)
+		tdx->tdvmcall.rcx = vcpu->arch.regs[VCPU_REGS_RCX];
+	else
+		tdx->tdvmcall.rcx = 0;
+
+	return EXIT_FASTPATH_NONE;
+}
+
+static void tdx_hardware_enable(void)
+{
+	INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id()));
+}
+
+static void tdx_hardware_disable(void)
+{
+	int cpu = raw_smp_processor_id();
+	struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu);
+	struct vcpu_tdx *tdx, *tmp;
+
+	/* Safe variant needed as tdx_disassociate_vp() deletes the entry. */
+	list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list)
+		tdx_disassociate_vp(&tdx->vcpu);
+}
+
+static void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	u16 exit_reason = to_tdx(vcpu)->exit_reason.basic;
+
+	if (exit_reason == EXIT_REASON_EXCEPTION_NMI)
+		vmx_handle_exception_nmi_irqoff(vcpu, tdexit_intr_info(vcpu));
+	else if (exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     tdexit_intr_info(vcpu));
+}
+
+static int tdx_handle_exception(struct kvm_vcpu *vcpu)
+{
+	u32 intr_info = tdexit_intr_info(vcpu);
+
+	if (is_nmi(intr_info) || is_machine_check(intr_info))
+		return 1;
+
+	kvm_pr_unimpl("unexpected exception 0x%x\n", intr_info);
+	return -EFAULT;
+}
+
+static int tdx_handle_external_interrupt(struct kvm_vcpu *vcpu)
+{
+	++vcpu->stat.irq_exits;
+	return 1;
+}
+
+static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
+{
+	vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+	vcpu->mmio_needed = 0;
+	return 0;
+}
+
+static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu)
+{
+	u32 eax, ebx, ecx, edx;
+
+	eax = tdvmcall_p1_read(vcpu);
+	ecx = tdvmcall_p2_read(vcpu);
+
+	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, true);
+
+	tdvmcall_p1_write(vcpu, eax);
+	tdvmcall_p2_write(vcpu, ebx);
+	tdvmcall_p3_write(vcpu, ecx);
+	tdvmcall_p4_write(vcpu, edx);
+
+	tdvmcall_set_return_code(vcpu, 0);
+
+	return 1;
+}
+
+static int tdx_emulate_hlt(struct kvm_vcpu *vcpu)
+{
+	tdvmcall_set_return_code(vcpu, 0);
+
+	return kvm_vcpu_halt(vcpu);
+}
+
+static int tdx_complete_pio_in(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+	unsigned long val = 0;
+	int ret;
+
+	BUG_ON(vcpu->arch.pio.count != 1);
+
+	ret = ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size,
+					 vcpu->arch.pio.port, &val, 1);
+	WARN_ON(!ret);
+
+	tdvmcall_set_return_code(vcpu, 0);
+	tdvmcall_set_return_val(vcpu, val);
+
+	return 1;
+}
+
+static int tdx_emulate_io(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+	unsigned long val = 0;
+	unsigned port;
+	int size, ret;
+
+	++vcpu->stat.io_exits;
+
+	size = tdvmcall_p1_read(vcpu);
+	port = tdvmcall_p3_read(vcpu);
+
+	if (size > 4) {
+		tdvmcall_set_return_code(vcpu, -E2BIG);
+		return 1;
+	}
+
+	if (!tdvmcall_p2_read(vcpu)) {
+		ret = ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1);
+		if (!ret)
+			vcpu->arch.complete_userspace_io = tdx_complete_pio_in;
+		else
+			tdvmcall_set_return_val(vcpu, val);
+	} else {
+		val = tdvmcall_p4_read(vcpu);
+		ret = ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1);
+
+		// No need for a complete_userspace_io callback.
+		vcpu->arch.pio.count = 0;
+	}
+	if (ret)
+		tdvmcall_set_return_code(vcpu, 0);
+	return ret;
+}
+
+static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+
+	nr = tdvmcall_exit_reason(vcpu);
+	a0 = tdvmcall_p1_read(vcpu);
+	a1 = tdvmcall_p2_read(vcpu);
+	a2 = tdvmcall_p3_read(vcpu);
+	a3 = tdvmcall_p4_read(vcpu);
+
+	ret = __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, true);
+
+	tdvmcall_set_return_code(vcpu, ret);
+
+	return 1;
+}
+
+static int tdx_complete_mmio(struct kvm_vcpu *vcpu)
+{
+	unsigned long val = 0;
+	gpa_t gpa;
+	int size;
+
+	BUG_ON(vcpu->mmio_needed != 1);
+	vcpu->mmio_needed = 0;
+
+	if (!vcpu->mmio_is_write) {
+		gpa = vcpu->mmio_fragments[0].gpa;
+		size = vcpu->mmio_fragments[0].len;
+
+		memcpy(&val, vcpu->run->mmio.data, size);
+		tdvmcall_set_return_val(vcpu, val);
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	}
+	return 1;
+}
+
+static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size,
+				 unsigned long val)
+{
+	if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val);
+	return 0;
+}
+
+static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size)
+{
+	unsigned long val;
+
+	if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	tdvmcall_set_return_val(vcpu, val);
+	trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	return 0;
+}
+
+static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
+{
+	struct kvm_memory_slot *slot;
+	int size, write, r;
+	unsigned long val;
+	gpa_t gpa;
+
+	BUG_ON(vcpu->mmio_needed);
+
+	size = tdvmcall_p1_read(vcpu);
+	write = tdvmcall_p2_read(vcpu);
+	gpa = tdvmcall_p3_read(vcpu);
+	val = write ? tdvmcall_p4_read(vcpu) : 0;
+
+	/* Strip the shared bit, allow MMIO with and without it set. */
+	gpa &= ~(vcpu->kvm->arch.gfn_shared_mask << PAGE_SHIFT);
+
+	if (size > 8u || ((gpa + size - 1) ^ gpa) & PAGE_MASK) {
+		tdvmcall_set_return_code(vcpu, -E2BIG);
+		return 1;
+	}
+
+	slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa >> PAGE_SHIFT);
+	if (slot && !(slot->flags & KVM_MEMSLOT_INVALID)) {
+		tdvmcall_set_return_code(vcpu, -EFAULT);
+		return 1;
+	}
+
+	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+		trace_kvm_fast_mmio(gpa);
+		return 1;
+	}
+
+	if (write)
+		r = tdx_mmio_write(vcpu, gpa, size, val);
+	else
+		r = tdx_mmio_read(vcpu, gpa, size);
+	if (!r) {
+		tdvmcall_set_return_code(vcpu, 0);
+		return 1;
+	}
+
+	vcpu->mmio_needed = 1;
+	vcpu->mmio_is_write = write;
+	vcpu->arch.complete_userspace_io = tdx_complete_mmio;
+
+	vcpu->run->mmio.phys_addr = gpa;
+	vcpu->run->mmio.len = size;
+	vcpu->run->mmio.is_write = write;
+	vcpu->run->exit_reason = KVM_EXIT_MMIO;
+
+	if (write) {
+		memcpy(vcpu->run->mmio.data, &val, size);
+	} else {
+		vcpu->mmio_fragments[0].gpa = gpa;
+		vcpu->mmio_fragments[0].len = size;
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL);
+	}
+	return 0;
+}
+
+static int tdx_emulate_rdmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index = tdvmcall_p1_read(vcpu);
+	u64 data;
+
+	if (kvm_get_msr(vcpu, index, &data)) {
+		trace_kvm_msr_read_ex(index);
+		tdvmcall_set_return_code(vcpu, -EFAULT);
+		return 1;
+	}
+	trace_kvm_msr_read(index, data);
+
+	tdvmcall_set_return_code(vcpu, 0);
+	tdvmcall_set_return_val(vcpu, data);
+	return 1;
+}
+
+static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index = tdvmcall_p1_read(vcpu);
+	u64 data = tdvmcall_p2_read(vcpu);
+
+	if (kvm_set_msr(vcpu, index, data)) {
+		trace_kvm_msr_write_ex(index, data);
+		tdvmcall_set_return_code(vcpu, -EFAULT);
+		return 1;
+	}
+
+	trace_kvm_msr_write(index, data);
+	tdvmcall_set_return_code(vcpu, 0);
+	return 1;
+}
+
+static int tdx_map_gpa(struct kvm_vcpu *vcpu)
+{
+	gpa_t gpa = tdvmcall_p1_read(vcpu);
+	gpa_t size = tdvmcall_p2_read(vcpu);
+
+	if (!IS_ALIGNED(gpa, 4096) || !IS_ALIGNED(size, 4096) ||
+	    (gpa + size) < gpa ||
+	    (gpa + size) > vcpu->kvm->arch.gfn_shared_mask << (PAGE_SHIFT + 1))
+		tdvmcall_set_return_code(vcpu, -EINVAL);
+	else
+		tdvmcall_set_return_code(vcpu, 0);
+
+	return 1;
+}
+
+static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
+{
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
+	vcpu->run->system_event.flags = tdvmcall_p1_read(vcpu);
+	return 0;
+}
+
+static int handle_tdvmcall(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	unsigned long exit_reason;
+
+	if (unlikely(tdx->tdvmcall.xmm_mask))
+		goto unsupported;
+
+	if (tdvmcall_exit_type(vcpu))
+		return tdx_emulate_vmcall(vcpu);
+
+	exit_reason = tdvmcall_exit_reason(vcpu);
+
+	trace_kvm_tdvmcall(vcpu, exit_reason,
+			   tdvmcall_p1_read(vcpu), tdvmcall_p2_read(vcpu),
+			   tdvmcall_p3_read(vcpu), tdvmcall_p4_read(vcpu));
+
+	switch (exit_reason) {
+	case EXIT_REASON_CPUID:
+		return tdx_emulate_cpuid(vcpu);
+	case EXIT_REASON_HLT:
+		return tdx_emulate_hlt(vcpu);
+	// case EXIT_REASON_RDPMC:
+	// 	ret = tdx_emulate_rdpmc(vcpu);
+	// 	break;
+	// case EXIT_REASON_VMCALL:
+	// 	
+	// 	break;
+	case EXIT_REASON_IO_INSTRUCTION:
+		return tdx_emulate_io(vcpu);
+	case EXIT_REASON_MSR_READ:
+		return tdx_emulate_rdmsr(vcpu);
+	case EXIT_REASON_MSR_WRITE:
+		return tdx_emulate_wrmsr(vcpu);
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_emulate_mmio(vcpu);
+	case TDVMCALL_MAP_GPA:
+		return tdx_map_gpa(vcpu);
+	case TDVMCALL_REPORT_FATAL_ERROR:
+		return tdx_report_fatal_error(vcpu);
+	default:
+		break;
+	}
+
+unsupported:
+	tdvmcall_set_return_code(vcpu, -EOPNOTSUPP);
+	return 1;
+}
+
+static void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
+			     int pgd_level)
+{
+	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, pgd & PAGE_MASK);
+}
+
+#define SEPT_ERR(err, op, kvm)			\
+({						\
+	int __ret = KVM_BUG_ON(err, kvm);	\
+						\
+	if (unlikely(__ret))			\
+		pr_seamcall_error(op, err);	\
+	__ret;					\
+})
+
+static void tdx_measure_page(struct kvm_tdx *kvm_tdx, hpa_t gpa)
+{
+	struct tdx_ex_ret ex_ret;
+	u64 err;
+	int i;
+
+	for (i = 0; i < PAGE_SIZE; i += TDX1_EXTENDMR_CHUNKSIZE) {
+		err = tdextendmr(kvm_tdx->tdr.pa, gpa + i, &ex_ret);
+		if (SEPT_ERR(err, TDEXTENDMR, &kvm_tdx->kvm))
+			break;
+	}
+}
+
+static void tdx_sept_set_private_spte(struct kvm_vcpu *vcpu, gfn_t gfn,
+				      int level, kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	hpa_t hpa = pfn << PAGE_SHIFT;
+	gpa_t gpa = gfn << PAGE_SHIFT;
+	struct tdx_ex_ret ex_ret;
+	hpa_t source_pa;
+	u64 err;
+
+	if (WARN_ON_ONCE(is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn)))
+		return;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level != PG_LEVEL_4K, vcpu->kvm))
+		return;
+
+	/* Pin the page, KVM doesn't yet support page migration. */
+	get_page(pfn_to_page(pfn));
+
+	/* Build-time faults are induced and handled via TDADDPAGE. */
+	if (is_td_finalized(kvm_tdx)) {
+		trace_kvm_sept_seamcall(SEAMCALL_TDAUGPAGE, gpa, hpa, level);
+
+		err = tdaugpage(kvm_tdx->tdr.pa, gpa, hpa, &ex_ret);
+		SEPT_ERR(err, TDAUGPAGE, vcpu->kvm);
+		return;
+	}
+
+	trace_kvm_sept_seamcall(SEAMCALL_TDADDPAGE, gpa, hpa, level);
+
+	source_pa = kvm_tdx->source_pa & ~KVM_TDX_MEASURE_MEMORY_REGION;
+
+	err = tdaddpage(kvm_tdx->tdr.pa,  gpa, hpa, source_pa, &ex_ret);
+	if (!SEPT_ERR(err, TDADDPAGE, vcpu->kvm) &&
+	    (kvm_tdx->source_pa & KVM_TDX_MEASURE_MEMORY_REGION))
+		tdx_measure_page(kvm_tdx, gpa);
+}
+
+static void tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, int level,
+				       kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	gpa_t gpa = gfn << PAGE_SHIFT;
+	hpa_t hpa = pfn << PAGE_SHIFT;
+	struct tdx_ex_ret ex_ret;
+	u64 err;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level != PG_LEVEL_NONE, kvm))
+		return;
+
+	if (is_hkid_assigned(kvm_tdx)) {
+		trace_kvm_sept_seamcall(SEAMCALL_TDREMOVEPAGE, gpa, hpa, level);
+
+		err = tdremovepage(kvm_tdx->tdr.pa, gpa, level, &ex_ret);
+		if (SEPT_ERR(err, TDREMOVEPAGE, kvm))
+			return;
+	} else if (tdx_reclaim_page((unsigned long)__va(hpa), hpa)) {
+		return;
+	}
+
+	put_page(pfn_to_page(pfn));
+}
+
+static int tdx_sept_link_private_sp(struct kvm_vcpu *vcpu, gfn_t gfn,
+				    int level, void *sept_page)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	gpa_t gpa = gfn << PAGE_SHIFT;
+	hpa_t hpa = __pa(sept_page);
+	struct tdx_ex_ret ex_ret;
+	u64 err;
+
+	trace_kvm_sept_seamcall(SEAMCALL_TDADDSEPT, gpa, hpa, level);
+
+	err = tdaddsept(kvm_tdx->tdr.pa, gpa, level, hpa, &ex_ret);
+	if (SEPT_ERR(err, TDADDSEPT, vcpu->kvm))
+		return -EIO;
+
+	return 0;
+}
+
+static void tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, int level)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	gpa_t gpa = gfn << PAGE_SHIFT;
+	struct tdx_ex_ret ex_ret;
+	u64 err;
+
+	trace_kvm_sept_seamcall(SEAMCALL_TDBLOCK, gpa, -1ull, level);
+
+	err = tdblock(kvm_tdx->tdr.pa, gpa, level, &ex_ret);
+	SEPT_ERR(err, TDBLOCK, kvm);
+}
+
+static void tdx_sept_unzap_private_spte(struct kvm *kvm, gfn_t gfn, int level)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	gpa_t gpa = gfn << PAGE_SHIFT;
+	struct tdx_ex_ret ex_ret;
+	u64 err;
+
+	trace_kvm_sept_seamcall(SEAMCALL_TDUNBLOCK, gpa, -1ull, level);
+
+	err = tdunblock(kvm_tdx->tdr.pa, gpa, level, &ex_ret);
+	SEPT_ERR(err, TDUNBLOCK, kvm);
+}
+
+static int tdx_sept_free_private_sp(struct kvm *kvm, gfn_t gfn, int level,
+				    void *sept_page)
+{
+	/*
+	 * free_private_sp() is (obviously) called when a shadow page is being
+	 * zapped.  KVM doesn't (yet) zap private SPs while the TD is active.
+	 */
+	if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm))
+		return -EINVAL;
+
+	return tdx_reclaim_page((unsigned long)sept_page, __pa(sept_page));
+}
+
+static int tdx_sept_tlb_remote_flush(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx;
+	u64 err;
+
+	if (!is_td(kvm))
+		return -ENOTSUPP;
+
+	kvm_tdx = to_kvm_tdx(kvm);
+	kvm_tdx->tdtrack = true;
+
+	kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH);
+
+	if (is_hkid_assigned(kvm_tdx) && is_td_finalized(kvm_tdx)) {
+		err = tdtrack(to_kvm_tdx(kvm)->tdr.pa);
+		SEPT_ERR(err, TDTRACK, kvm);
+	}
+
+	WRITE_ONCE(kvm_tdx->tdtrack, false);
+
+	return 0;
+}
+
+static void tdx_flush_tlb(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	u64 root_hpa = mmu->root_hpa;
+
+	/* Flush the shared EPTP, if it's valid. */
+	if (VALID_PAGE(root_hpa))
+		ept_sync_context(construct_eptp(vcpu, root_hpa,
+						mmu->shadow_root_level));
+
+	while (READ_ONCE(kvm_tdx->tdtrack))
+		cpu_relax();
+}
+
+static inline bool tdx_is_private_gpa(struct kvm *kvm, gpa_t gpa)
+{
+	return !((gpa >> PAGE_SHIFT) & kvm->arch.gfn_shared_mask);
+}
+
+#define TDX_SEPT_PFERR (PFERR_WRITE_MASK | PFERR_USER_MASK)
+
+static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qual;
+
+	if (tdx_is_private_gpa(vcpu->kvm, tdexit_gpa(vcpu)))
+		exit_qual = TDX_SEPT_PFERR;
+	else
+		exit_qual = tdexit_exit_qual(vcpu);
+	return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual);
+}
+
+static int tdx_handle_ept_misconfig(struct kvm_vcpu *vcpu)
+{
+	WARN_ON(1);
+
+	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+
+	return 0;
+}
+
+static int tdx_handle_exit(struct kvm_vcpu *vcpu,
+			   enum exit_fastpath_completion fastpath)
+{
+	union tdx_exit_reason exit_reason = to_tdx(vcpu)->exit_reason;
+
+	if (unlikely(exit_reason.non_recoverable))
+		return tdx_handle_triple_fault(vcpu);
+
+	if (unlikely(exit_reason.error))
+		goto unhandled_exit;
+
+	WARN_ON_ONCE(fastpath != EXIT_FASTPATH_NONE);
+
+	switch (exit_reason.basic) {
+	case EXIT_REASON_EXCEPTION_NMI:
+		return tdx_handle_exception(vcpu);
+	case EXIT_REASON_EXTERNAL_INTERRUPT:
+		return tdx_handle_external_interrupt(vcpu);
+	case EXIT_REASON_TDCALL:
+		return handle_tdvmcall(vcpu);
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_handle_ept_violation(vcpu);
+	case EXIT_REASON_EPT_MISCONFIG:
+		return tdx_handle_ept_misconfig(vcpu);
+	default:
+		break;
+	}
+
+unhandled_exit:
+	kvm_pr_unimpl("Unhandled TD-Exit Reason 0x%llx\n", exit_reason.full);
+	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason = exit_reason.full;
+	return 0;
+}
+
+static void tdx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
+			      u32 *intr_info, u32 *error_code)
+{
+	*info1 = tdexit_exit_qual(vcpu);
+	*info2 = 0;
+
+	*intr_info = tdexit_intr_info(vcpu);
+	*error_code = 0;
+}
+
+static int __init tdx_check_processor_compatibility(void)
+{
+	/* TDX-SEAM itself verifies compatibility on all CPUs. */
+	return 0;
+}
+
+static void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	WARN_ON_ONCE(kvm_get_apic_mode(vcpu) != LAPIC_MODE_X2APIC);
+}
+
+static void tdx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	pi_clear_on(&tdx->pi_desc);
+	memset(tdx->pi_desc.pir, 0, sizeof(tdx->pi_desc.pir));
+}
+
+/*
+ * Send interrupt to vcpu via posted interrupt way.
+ * 1. If target vcpu is running(non-root mode), send posted interrupt
+ * notification to vcpu and hardware will sync PIR to vIRR atomically.
+ * 2. If target vcpu isn't running(root mode), kick it to pick up the
+ * interrupt from PIR in next vmentry.
+ */
+static int tdx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	if (pi_test_and_set_pir(vector, &tdx->pi_desc))
+		return 0;
+
+	/* If a previous notification has sent the IPI, nothing to do. */
+	if (pi_test_and_set_on(&tdx->pi_desc))
+		return 0;
+
+	if (vcpu != kvm_get_running_vcpu() &&
+	    !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
+		kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
+static int tdx_dev_ioctl(void __user *argp)
+{
+	struct kvm_tdx_capabilities __user *user_caps;
+	struct kvm_tdx_capabilities caps;
+	struct kvm_tdx_cmd cmd;
+
+	BUILD_BUG_ON(sizeof(struct kvm_tdx_cpuid_config) !=
+		     sizeof(struct tdx_cpuid_config));
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.metadata || cmd.id != KVM_TDX_CAPABILITIES)
+		return -EINVAL;
+
+	user_caps = (void __user *)cmd.data;
+	if (copy_from_user(&caps, user_caps, sizeof(caps)))
+		return -EFAULT;
+
+	if (caps.nr_cpuid_configs < tdx_caps.nr_cpuid_configs)
+		return -E2BIG;
+	caps.nr_cpuid_configs = tdx_caps.nr_cpuid_configs;
+
+	if (copy_to_user(user_caps->cpuid_configs, &tdx_caps.cpuid_configs,
+			 tdx_caps.nr_cpuid_configs * sizeof(struct tdx_cpuid_config)))
+		return -EFAULT;
+
+	caps.attrs_fixed0 = tdx_caps.attrs_fixed0;
+	caps.attrs_fixed1 = tdx_caps.attrs_fixed1;
+	caps.xfam_fixed0 = tdx_caps.xfam_fixed0;
+	caps.xfam_fixed1 = tdx_caps.xfam_fixed1;
+
+	if (copy_to_user((void __user *)cmd.data, &caps, sizeof(caps)))
+		return -EFAULT;
+
+	return 0;
+}
+
+/*
+ * TDX-SEAM definitions for fixed{0,1} are inverted relative to VMX.  The TDX
+ * definitions are sane, the VMX definitions are backwards.
+ *
+ * if fixed0[i] == 0: val[i] must be 0
+ * if fixed1[i] == 1: val[i] must be 1
+ */
+static inline bool tdx_fixed_bits_valid(u64 val, u64 fixed0, u64 fixed1)
+{
+	return ((val & fixed0) | fixed1) == val;
+}
+
+static struct kvm_cpuid_entry2 *tdx_find_cpuid_entry(struct kvm_tdx *kvm_tdx,
+						     u32 function, u32 index)
+{
+	struct kvm_cpuid_entry2 *e;
+	int i;
+
+	for (i = 0; i < kvm_tdx->cpuid_nent; i++) {
+		e = &kvm_tdx->cpuid_entries[i];
+
+		if (e->function == function && (e->index == index ||
+		    !(e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)))
+			return e;
+	}
+	return NULL;
+}
+
+static int setup_tdparams(struct kvm *kvm, struct td_params *td_params,
+			  struct kvm_tdx_init_vm *init_vm)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	struct tdx_cpuid_config *config;
+	struct kvm_cpuid_entry2 *entry;
+	struct tdx_cpuid_value *value;
+	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
+	u32 guest_tsc_khz;
+	int max_pa;
+	int i;
+
+	td_params->attributes = init_vm->attributes;
+	td_params->max_vcpus = init_vm->max_vcpus;
+
+	/* TODO: Enforce consistent CPUID features for all vCPUs. */
+	for (i = 0; i < tdx_caps.nr_cpuid_configs; i++) {
+		config = &tdx_caps.cpuid_configs[i];
+
+		entry = tdx_find_cpuid_entry(kvm_tdx, config->leaf,
+					     config->sub_leaf);
+		if (!entry)
+			continue;
+
+		/*
+		 * Non-configurable bits must be '0', even if they are fixed to
+		 * '1' by TDX-SEAM, i.e. mask off non-configurable bits.
+		 */
+		value = &td_params->cpuid_values[i];
+		value->eax = entry->eax & config->eax;
+		value->ebx = entry->ebx & config->ebx;
+		value->ecx = entry->ecx & config->ecx;
+		value->edx = entry->edx & config->edx;
+	}
+
+	entry = tdx_find_cpuid_entry(kvm_tdx, 0xd, 0);
+	if (entry)
+		guest_supported_xcr0 = (entry->eax | ((u64)entry->edx << 32));
+	else
+		guest_supported_xcr0 = 0;
+	guest_supported_xcr0 &= supported_xcr0;
+
+	entry = tdx_find_cpuid_entry(kvm_tdx, 0xd, 1);
+	if (entry)
+		guest_supported_xss = (entry->ecx | ((u64)entry->edx << 32));
+	else
+		guest_supported_xss = 0;
+	guest_supported_xss &= supported_xss;
+
+	max_pa = 36;
+	entry = tdx_find_cpuid_entry(kvm_tdx, 0x80000008, 0);
+	if (entry)
+		max_pa = entry->eax & 0xff;
+
+	td_params->eptp_controls = VMX_EPTP_MT_WB;
+
+	if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
+		td_params->eptp_controls |= VMX_EPTP_PWL_5;
+		td_params->exec_controls |= TDX1_EXEC_CONTROL_MAX_GPAW;
+	} else {
+		td_params->eptp_controls |= VMX_EPTP_PWL_4;
+	}
+
+	if (!tdx_fixed_bits_valid(td_params->attributes,
+				  tdx_caps.attrs_fixed0,
+				  tdx_caps.attrs_fixed1))
+		return -EINVAL;
+
+	/* Setup td_params.xfam */
+	td_params->xfam = guest_supported_xcr0 | guest_supported_xss;
+	if (!tdx_fixed_bits_valid(td_params->xfam,
+				  tdx_caps.xfam_fixed0,
+				  tdx_caps.xfam_fixed1))
+		return -EINVAL;
+
+	/* TODO: Support a scaled guest TSC, i.e. take this from userspace. */
+	guest_tsc_khz = tsc_khz;
+	if (guest_tsc_khz < TDX1_MIN_TSC_FREQUENCY_KHZ ||
+	    guest_tsc_khz > TDX1_MAX_TSC_FREQUENCY_KHZ)
+		return -EINVAL;
+
+	td_params->tsc_frequency = TDX1_TSC_KHZ_TO_25MHZ(guest_tsc_khz);
+	if (TDX1_TSC_25MHZ_TO_KHZ(td_params->tsc_frequency) != guest_tsc_khz)
+		pr_warn_once("KVM: TD TSC not a multiple of 25Mhz\n");
+
+	/* TODO
+	 *  - MRCONFIGID
+	 *  - MROWNER
+	 *  - MROWNERCONFIG
+	 */
+	return 0;
+}
+
+static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	struct kvm_cpuid2 __user *user_cpuid;
+	struct kvm_tdx_init_vm init_vm;
+	struct td_params *td_params;
+	struct tdx_ex_ret ex_ret;
+	struct kvm_cpuid2 cpuid;
+	int ret;
+	u64 err;
+
+	if (is_td_initialized(kvm))
+		return -EINVAL;
+
+	if (cmd->metadata)
+		return -EINVAL;
+
+	if (copy_from_user(&init_vm, (void __user *)cmd->data, sizeof(init_vm)))
+		return -EFAULT;
+
+	if (init_vm.max_vcpus > KVM_MAX_VCPUS || init_vm.reserved)
+		return -EINVAL;
+
+	user_cpuid = (void *)init_vm.cpuid;
+	if (copy_from_user(&cpuid, user_cpuid, sizeof(cpuid)))
+		return -EFAULT;
+
+	if (cpuid.nent > KVM_MAX_CPUID_ENTRIES)
+		return -E2BIG;
+
+	if (copy_from_user(&kvm_tdx->cpuid_entries, user_cpuid->entries,
+			   cpuid.nent * sizeof(struct kvm_cpuid_entry2)))
+		return -EFAULT;
+
+	BUILD_BUG_ON(sizeof(struct td_params) != 1024);
+
+	td_params = kzalloc(sizeof(struct td_params), GFP_KERNEL_ACCOUNT);
+	if (!td_params)
+		return -ENOMEM;
+
+	kvm_tdx->cpuid_nent = cpuid.nent;
+
+	ret = setup_tdparams(kvm, td_params, &init_vm);
+	if (ret)
+		goto free_tdparams;
+
+	err = tdinit(kvm_tdx->tdr.pa, __pa(td_params), &ex_ret);
+	if (TDX_ERR(err, TDINIT)) {
+		ret = -EIO;
+		goto free_tdparams;
+	}
+
+	kvm->max_vcpus = td_params->max_vcpus;
+	kvm->arch.guest_state_protected = !(td_params->attributes &
+					    TDX1_TD_ATTRIBUTE_DEBUG);
+
+	if (td_params->exec_controls & TDX1_EXEC_CONTROL_MAX_GPAW)
+		kvm->arch.gfn_shared_mask = BIT_ULL(51) >> PAGE_SHIFT;
+	else
+		kvm->arch.gfn_shared_mask = BIT_ULL(47) >> PAGE_SHIFT;
+
+free_tdparams:
+	kfree(td_params);
+	if (ret)
+		kvm_tdx->cpuid_nent = 0;
+	return ret;
+}
+
+static int tdx_init_mem_region(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	struct kvm_tdx_init_mem_region region;
+	struct kvm_vcpu *vcpu;
+	struct page *page;
+	kvm_pfn_t pfn;
+	int idx, ret;
+
+	/* The BSP vCPU must be created before initializing memory regions. */
+	if (!atomic_read(&kvm->online_vcpus))
+		return -EINVAL;
+
+	if (cmd->metadata & ~KVM_TDX_MEASURE_MEMORY_REGION)
+		return -EINVAL;
+
+	if (copy_from_user(&region, (void __user *)cmd->data, sizeof(region)))
+		return -EFAULT;
+
+	/* Sanity check */
+	if (!IS_ALIGNED(region.source_addr, PAGE_SIZE))
+		return -EINVAL;
+	if (!IS_ALIGNED(region.gpa, PAGE_SIZE))
+		return -EINVAL;
+	if (region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa)
+		return -EINVAL;
+	if (!tdx_is_private_gpa(kvm, region.gpa))
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu(kvm, 0);
+	if (mutex_lock_killable(&vcpu->mutex))
+		return -EINTR;
+
+	vcpu_load(vcpu);
+	idx = srcu_read_lock(&kvm->srcu);
+
+	kvm_mmu_reload(vcpu);
+
+	while (region.nr_pages) {
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+
+		/* Pin the source page. */
+		ret = get_user_pages_fast(region.source_addr, 1, 0, &page);
+		if (ret < 0)
+			break;
+		if (ret != 1) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		kvm_tdx->source_pa = pfn_to_hpa(page_to_pfn(page)) |
+				     (cmd->metadata & KVM_TDX_MEASURE_MEMORY_REGION);
+
+		pfn = kvm_mmu_map_tdp_page(vcpu, region.gpa, TDX_SEPT_PFERR,
+					   PG_LEVEL_4K);
+		if (is_error_noslot_pfn(pfn) || kvm->vm_bugged)
+			ret = -EFAULT;
+		else
+			ret = 0;
+
+		put_page(page);
+		if (ret)
+			break;
+
+		region.source_addr += PAGE_SIZE;
+		region.gpa += PAGE_SIZE;
+		region.nr_pages--;
+	}
+
+	srcu_read_unlock(&kvm->srcu, idx);
+	vcpu_put(vcpu);
+
+	mutex_unlock(&vcpu->mutex);
+
+	if (copy_to_user((void __user *)cmd->data, &region, sizeof(region)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
+static int tdx_td_finalizemr(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+	u64 err;
+
+	if (!is_td_initialized(kvm) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	err = tdfinalizemr(kvm_tdx->tdr.pa);
+	if (TDX_ERR(err, TDFINALIZEMR))
+		return -EIO;
+
+	kvm_tdx->finalized = true;
+	return 0;
+}
+
+static int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_tdx_cmd tdx_cmd;
+	int r;
+
+	if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd)))
+		return -EFAULT;
+
+	mutex_lock(&kvm->lock);
+
+	switch (tdx_cmd.id) {
+	case KVM_TDX_INIT_VM:
+		r = tdx_td_init(kvm, &tdx_cmd);
+		break;
+	case KVM_TDX_INIT_MEM_REGION:
+		r = tdx_init_mem_region(kvm, &tdx_cmd);
+		break;
+	case KVM_TDX_FINALIZE_VM:
+		r = tdx_td_finalizemr(kvm);
+		break;
+	default:
+		r = -EINVAL;
+		goto out;
+	}
+
+	if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd)))
+		r = -EFAULT;
+
+out:
+	mutex_unlock(&kvm->lock);
+	return r;
+}
+
+static int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	struct kvm_tdx_cmd cmd;
+	u64 err;
+
+	if (tdx->initialized)
+		return -EINVAL;
+
+	if (!is_td_initialized(vcpu->kvm) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.metadata || cmd.id != KVM_TDX_INIT_VCPU)
+		return -EINVAL;
+
+	err = tdinitvp(tdx->tdvpr.pa, cmd.data);
+	if (TDX_ERR(err, TDINITVP))
+		return -EIO;
+
+	tdx->initialized = true;
+
+	td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR);
+	td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->pi_desc));
+	td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
+	return 0;
+}
+
+static void tdx_update_exception_bitmap(struct kvm_vcpu *vcpu)
+{
+	/* TODO: Figure out exception bitmap for debug TD. */
+}
+
+static void tdx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	/* TODO: Add TDWRVPS(GUEST_DR7) for debug TDs. */
+	if (is_debug_td(vcpu))
+		return;
+
+	KVM_BUG_ON(val != DR7_FIXED_1, vcpu->kvm);
+}
+
+static int tdx_get_cpl(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm))
+		return 0;
+
+	/*
+	 * For debug TDs, tdx_get_cpl() may be called before the vCPU is
+	 * initialized, i.e. before TDRDVPS is legal, if the vCPU is scheduled
+	 * out.  If this happens, simply return CPL0 to avoid TDRDVPS failure.
+	 */
+	if (!to_tdx(vcpu)->initialized)
+		return 0;
+
+	return VMX_AR_DPL(td_vmcs_read32(to_tdx(vcpu), GUEST_SS_AR_BYTES));
+}
+
+static unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm))
+		return 0;
+
+	return td_vmcs_read64(to_tdx(vcpu), GUEST_RFLAGS);
+}
+
+static void tdx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	if (KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm))
+		return;
+
+	/*
+	 * TODO: This is currently disallowed by TDX-SEAM, which breaks single-
+	 * step debug.
+	 */
+	td_vmcs_write64(to_tdx(vcpu), GUEST_RFLAGS, rflags);
+}
+
+static bool tdx_is_emulated_msr(u32 index, bool write)
+{
+	switch (index) {
+	case MSR_IA32_UCODE_REV:
+	case MSR_IA32_ARCH_CAPABILITIES:
+	case MSR_IA32_POWER_CTL:
+	case MSR_MTRRcap:
+	case 0x200 ... 0x2ff:
+	case MSR_IA32_TSCDEADLINE:
+	case MSR_IA32_MISC_ENABLE:
+	case MSR_KVM_STEAL_TIME:
+	case MSR_KVM_POLL_CONTROL:
+	case MSR_PLATFORM_INFO:
+	case MSR_MISC_FEATURES_ENABLES:
+	case MSR_IA32_MCG_CTL:
+	case MSR_IA32_MCG_STATUS:
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(32) - 1:
+		return true;
+	case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+		/*
+		 * x2APIC registers that are virtualized by the CPU can't be
+		 * emulated, KVM doesn't have access to the virtual APIC page.
+		 */
+		switch (index) {
+		case X2APIC_MSR(APIC_TASKPRI):
+		case X2APIC_MSR(APIC_PROCPRI):
+		case X2APIC_MSR(APIC_EOI):
+		case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR):
+			return false;
+		default:
+			return true;
+		}
+	case MSR_IA32_APICBASE:
+	case MSR_EFER:
+		return !write;
+	default:
+		return false;
+	}
+}
+
+static int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_is_emulated_msr(msr->index, false))
+		return kvm_get_msr_common(vcpu, msr);
+	return 1;
+}
+
+static int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_is_emulated_msr(msr->index, true))
+		return kvm_set_msr_common(vcpu, msr);
+	return 1;
+}
+
+static u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	if (!is_debug_td(vcpu))
+		return 0;
+
+	return td_vmcs_read64(to_tdx(vcpu), GUEST_ES_BASE + seg * 2);
+}
+
+static void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			    int seg)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	if (!is_debug_td(vcpu)) {
+		memset(var, 0, sizeof(*var));
+		return;
+	}
+
+	seg *= 2;
+	var->base = td_vmcs_read64(tdx, GUEST_ES_BASE + seg);
+	var->limit = td_vmcs_read32(tdx, GUEST_ES_LIMIT + seg);
+	var->selector = td_vmcs_read16(tdx, GUEST_ES_SELECTOR + seg);
+	vmx_decode_ar_bytes(td_vmcs_read32(tdx, GUEST_ES_AR_BYTES + seg), var);
+}
+
+static void tdx_cache_gprs(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int i;
+
+	if (!is_td_vcpu(vcpu) || !is_debug_td(vcpu))
+		return;
+
+	for (i = 0; i < NR_VCPU_REGS; i++) {
+		if (i == VCPU_REGS_RSP || i == VCPU_REGS_RIP)
+			continue;
+
+		vcpu->arch.regs[i] = td_gpr_read64(tdx, i);
+	}
+}
+
+static void tdx_flush_gprs(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int i;
+
+	if (!is_td_vcpu(vcpu) || KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm))
+		return;
+
+	for (i = 0; i < NR_VCPU_REGS; i++)
+		td_gpr_write64(tdx, i, vcpu->arch.regs[i]);
+}
+
+static void __init tdx_pre_kvm_init(unsigned int *vcpu_size,
+				    unsigned int *vcpu_align,
+				    unsigned int *vm_size)
+{
+	*vcpu_size = sizeof(struct vcpu_tdx);
+	*vcpu_align = __alignof__(struct vcpu_tdx);
+
+	if (sizeof(struct kvm_tdx) > *vm_size)
+		*vm_size = sizeof(struct kvm_tdx);
+}
+
+static int __init tdx_init(void)
+{
+	return 0;
+}
+
+static int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
+{
+	struct tdsysinfo_struct *tdsysinfo = tdx_get_sysinfo();
+
+	if (tdsysinfo == NULL) {
+		WARN_ON_ONCE(boot_cpu_has(X86_FEATURE_TDX));
+		return -ENODEV;
+	}
+
+	if (WARN_ON_ONCE(x86_ops->tlb_remote_flush))
+		return -EIO;
+
+	tdx_caps.tdcs_nr_pages = tdsysinfo->tdcs_base_size / PAGE_SIZE;
+	if (tdx_caps.tdcs_nr_pages != TDX1_NR_TDCX_PAGES)
+		return -EIO;
+
+	tdx_caps.tdvpx_nr_pages = tdsysinfo->tdvps_base_size / PAGE_SIZE - 1;
+	if (tdx_caps.tdvpx_nr_pages != TDX1_NR_TDVPX_PAGES)
+		return -EIO;
+
+	tdx_caps.attrs_fixed0 = tdsysinfo->attributes_fixed0;
+	tdx_caps.attrs_fixed1 = tdsysinfo->attributes_fixed1;
+	tdx_caps.xfam_fixed0 =	tdsysinfo->xfam_fixed0;
+	tdx_caps.xfam_fixed1 = tdsysinfo->xfam_fixed1;
+
+	tdx_caps.nr_cpuid_configs = tdsysinfo->num_cpuid_config;
+	if (tdx_caps.nr_cpuid_configs > TDX1_MAX_NR_CPUID_CONFIGS)
+		return -EIO;
+
+	if (!memcpy(tdx_caps.cpuid_configs, tdsysinfo->cpuid_configs,
+		    tdsysinfo->num_cpuid_config * sizeof(struct tdx_cpuid_config)))
+		return -EIO;
+
+	x86_ops->cache_gprs = tdx_cache_gprs;
+	x86_ops->flush_gprs = tdx_flush_gprs;
+
+	x86_ops->tlb_remote_flush = tdx_sept_tlb_remote_flush;
+	x86_ops->set_private_spte = tdx_sept_set_private_spte;
+	x86_ops->drop_private_spte = tdx_sept_drop_private_spte;
+	x86_ops->zap_private_spte = tdx_sept_zap_private_spte;
+	x86_ops->unzap_private_spte = tdx_sept_unzap_private_spte;
+	x86_ops->link_private_sp = tdx_sept_link_private_sp;
+	x86_ops->free_private_sp = tdx_sept_free_private_sp;
+
+	return 0;
+}
+
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index b55108a8e484..e6e768b40eaf 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -8,6 +8,7 @@
 #include "tdx_arch.h"
 #include "tdx_errno.h"
 #include "tdx_ops.h"
+#include "posted_intr.h"
 
 #ifdef CONFIG_KVM_INTEL_TDX
 
@@ -22,6 +23,47 @@ struct kvm_tdx {
 
 	struct tdx_td_page tdr;
 	struct tdx_td_page tdcs[TDX1_NR_TDCX_PAGES];
+
+	int hkid;
+
+	int cpuid_nent;
+	struct kvm_cpuid_entry2 cpuid_entries[KVM_MAX_CPUID_ENTRIES];
+
+	bool finalized;
+	bool tdtrack;
+
+	hpa_t source_pa;
+};
+
+union tdx_exit_reason {
+	struct {
+		/* 31:0 mirror the VMX Exit Reason format */
+		u64 basic		: 16;
+		u64 reserved16		: 1;
+		u64 reserved17		: 1;
+		u64 reserved18		: 1;
+		u64 reserved19		: 1;
+		u64 reserved20		: 1;
+		u64 reserved21		: 1;
+		u64 reserved22		: 1;
+		u64 reserved23		: 1;
+		u64 reserved24		: 1;
+		u64 reserved25		: 1;
+		u64 reserved26		: 1;
+		u64 enclave_mode	: 1;
+		u64 smi_pending_mtf	: 1;
+		u64 smi_from_vmx_root	: 1;
+		u64 reserved30		: 1;
+		u64 failed_vmentry	: 1;
+
+		/* 63:32 are TDX specific */
+		u64 details_l1		: 8;
+		u64 class		: 8;
+		u64 reserved61_48	: 14;
+		u64 non_recoverable	: 1;
+		u64 error		: 1;
+	};
+	u64 full;
 };
 
 struct vcpu_tdx {
@@ -29,6 +71,42 @@ struct vcpu_tdx {
 
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page tdvpx[TDX1_NR_TDVPX_PAGES];
+
+	struct list_head cpu_list;
+
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	union {
+		struct {
+			union {
+				struct {
+					u16 gpr_mask;
+					u16 xmm_mask;
+				};
+				u32 regs_mask;
+			};
+			u32 reserved;
+		};
+		u64 rcx;
+	} tdvmcall;
+
+	union tdx_exit_reason exit_reason;
+
+	bool initialized;
+};
+
+struct tdx_capabilities {
+	u8 tdcs_nr_pages;
+	u8 tdvpx_nr_pages;
+
+	u64 attrs_fixed0;
+	u64 attrs_fixed1;
+	u64 xfam_fixed0;
+	u64 xfam_fixed1;
+
+	u32 nr_cpuid_configs;
+	struct tdx_cpuid_config cpuid_configs[TDX1_MAX_NR_CPUID_CONFIGS];
 };
 
 static inline bool is_td(struct kvm *kvm)
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index a6f87cfe9bda..9e76a2a9763b 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -6,6 +6,7 @@
 
 #include <asm/asm.h>
 #include <asm/kvm_host.h>
+#include <asm/cacheflush.h>
 
 struct tdx_ex_ret {
 	union {
@@ -294,25 +295,34 @@ do {									\
 	seamcall_N_5(fn, ex, "c"(rcx), "d"(rdx), "r"(r8), "r"(r9), "r"(r10)); \
 } while (0)
 
+static inline void tdx_clflush_page(hpa_t addr)
+{
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+}
+
 static inline u64 tdaddcx(hpa_t tdr, hpa_t addr)
 {
+	tdx_clflush_page(addr);
 	seamcall_2(TDADDCX, addr, tdr);
 }
 
 static inline u64 tdaddpage(hpa_t tdr, gpa_t gpa, hpa_t hpa, hpa_t source,
 			    struct tdx_ex_ret *ex)
 {
+	tdx_clflush_page(hpa);
 	seamcall_4_2(TDADDPAGE, gpa, tdr, hpa, source, ex);
 }
 
 static inline u64 tdaddsept(hpa_t tdr, gpa_t gpa, int level, hpa_t page,
 			    struct tdx_ex_ret *ex)
 {
+	tdx_clflush_page(page);
 	seamcall_3_2(TDADDSEPT, gpa | level, tdr, page, ex);
 }
 
 static inline u64 tdaddvpx(hpa_t tdvpr, hpa_t addr)
 {
+	tdx_clflush_page(addr);
 	seamcall_2(TDADDVPX, addr, tdvpr);
 }
 
@@ -324,6 +334,7 @@ static inline u64 tdassignhkid(hpa_t tdr, int hkid)
 static inline u64 tdaugpage(hpa_t tdr, gpa_t gpa, hpa_t hpa,
 			    struct tdx_ex_ret *ex)
 {
+	tdx_clflush_page(hpa);
 	seamcall_3_2(TDAUGPAGE, gpa, tdr, hpa, ex);
 }
 
@@ -340,11 +351,13 @@ static inline u64 tdconfigkey(hpa_t tdr)
 
 static inline u64 tdcreate(hpa_t tdr, int hkid)
 {
+	tdx_clflush_page(tdr);
 	seamcall_2(TDCREATE, tdr, hkid);
 }
 
 static inline u64 tdcreatevp(hpa_t tdr, hpa_t tdvpr)
 {
+	tdx_clflush_page(tdvpr);
 	seamcall_2(TDCREATEVP, tdvpr, tdr);
 }
 
diff --git a/arch/x86/kvm/vmx/tdx_stubs.c b/arch/x86/kvm/vmx/tdx_stubs.c
new file mode 100644
index 000000000000..def5b0789bf7
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_stubs.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kvm_host.h>
+
+static int tdx_vm_init(struct kvm *kvm) { return 0; }
+static void tdx_vm_teardown(struct kvm *kvm) {}
+static void tdx_vm_destroy(struct kvm *kvm) {}
+static int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return 0; }
+static void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
+static void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
+static void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
+static fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT_FASTPATH_NONE; }
+static void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
+static void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
+static void tdx_hardware_enable(void) {}
+static void tdx_hardware_disable(void) {}
+static void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu) {}
+static int tdx_handle_exit(struct kvm_vcpu *vcpu,
+			   enum exit_fastpath_completion fastpath) { return 0; }
+static int tdx_dev_ioctl(void __user *argp) { return -EINVAL; }
+static int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EINVAL; }
+static int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EINVAL; }
+static void tdx_flush_tlb(struct kvm_vcpu *vcpu) {}
+static void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
+			     int pgd_level) {}
+static void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) {}
+static void tdx_apicv_post_state_restore(struct kvm_vcpu *vcpu) {}
+static int tdx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector) { return -1; }
+static void tdx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
+			      u32 *intr_info, u32 *error_code) { }
+static int __init tdx_check_processor_compatibility(void) { return 0; }
+static void __init tdx_pre_kvm_init(unsigned int *vcpu_size,
+				    unsigned int *vcpu_align,
+				    unsigned int *vm_size) {}
+static int __init tdx_init(void) { return 0; }
+static void tdx_update_exception_bitmap(struct kvm_vcpu *vcpu) {}
+static void tdx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) {}
+static int tdx_get_cpl(struct kvm_vcpu *vcpu) { return 0; }
+static unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu) { return 0; }
+static void tdx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) {}
+static bool tdx_is_emulated_msr(u32 index, bool write) { return false; }
+static int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; }
+static int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; }
+static u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg) { return 0; }
+static void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			    int seg) {}
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 90ad7a6246e3..ddefa2e80441 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -2,6 +2,7 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm/bitsperlong.h>
+#include <asm/errno.h>
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/nospec-branch.h>
 #include <asm/segment.h>
@@ -28,6 +29,13 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
 
+#ifdef CONFIG_KVM_INTEL_TDX
+#define TDENTER 		0
+#define EXIT_REASON_TDCALL	77
+#define TDENTER_ERROR_BIT	63
+#define seamcall .byte 0x66,0x0f,0x01,0xcf
+#endif
+
 .section .noinstr.text, "ax"
 
 /**
@@ -328,3 +336,135 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
 	pop %_ASM_BP
 	ret
 SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
+
+#ifdef CONFIG_KVM_INTEL_TDX
+/**
+ * __tdx_vcpu_run - Call SEAMCALL(TDENTER) to run a TD vcpu
+ * @tdvpr:	physical address of TDVPR
+ * @regs:	void * (to registers of TDVCPU)
+ * @gpr_mask:	non-zero if guest registers need to be loaded prior to TDENTER
+ *
+ * Returns:
+ *	TD-Exit Reason
+ *
+ * Note: KVM doesn't support using XMM in its hypercalls, it's the HyperV
+ *	 code's responsibility to save/restore XMM registers on TDVMCALL.
+ */
+SYM_FUNC_START(__tdx_vcpu_run)
+	push %rbp
+	mov  %rsp, %rbp
+
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+	push %rbx
+
+	/* Save @regs, which is needed after TDENTER to capture output. */
+	push %rsi
+
+	/* Load @tdvpr to RCX */
+	mov %rdi, %rcx
+
+	/* No need to load guest GPRs if the last exit wasn't a TDVMCALL. */
+	test %dx, %dx
+	je 1f
+
+	/* Load @regs to RAX, which will be clobbered with $TDENTER anyways. */
+	mov %rsi, %rax
+
+	mov VCPU_RBX(%rax), %rbx
+	mov VCPU_RDX(%rax), %rdx
+	mov VCPU_RBP(%rax), %rbp
+	mov VCPU_RSI(%rax), %rsi
+	mov VCPU_RDI(%rax), %rdi
+
+	mov VCPU_R8 (%rax),  %r8
+	mov VCPU_R9 (%rax),  %r9
+	mov VCPU_R10(%rax), %r10
+	mov VCPU_R11(%rax), %r11
+	mov VCPU_R12(%rax), %r12
+	mov VCPU_R13(%rax), %r13
+	mov VCPU_R14(%rax), %r14
+	mov VCPU_R15(%rax), %r15
+
+	/*  Load TDENTER to RAX.  This kills the @regs pointer! */
+1:	mov $TDENTER, %rax
+
+2:	seamcall
+
+	/* Skip to the exit path if TDENTER failed. */
+	bt $TDENTER_ERROR_BIT, %rax
+	jc 4f
+
+	/* Temporarily save the TD-Exit reason. */
+	push %rax
+
+	/* check if TD-exit due to TDVMCALL */
+	cmp $EXIT_REASON_TDCALL, %ax
+
+	/* Reload @regs to RAX. */
+	mov 8(%rsp), %rax
+
+	/* Jump on non-TDVMCALL */
+	jne 3f
+
+	/* Save all output from SEAMCALL(TDENTER) */
+	mov %rbx, VCPU_RBX(%rax)
+	mov %rbp, VCPU_RBP(%rax)
+	mov %rsi, VCPU_RSI(%rax)
+	mov %rdi, VCPU_RDI(%rax)
+	mov %r10, VCPU_R10(%rax)
+	mov %r11, VCPU_R11(%rax)
+	mov %r12, VCPU_R12(%rax)
+	mov %r13, VCPU_R13(%rax)
+	mov %r14, VCPU_R14(%rax)
+	mov %r15, VCPU_R15(%rax)
+
+3:	mov %rcx, VCPU_RCX(%rax)
+	mov %rdx, VCPU_RDX(%rax)
+	mov %r8,  VCPU_R8 (%rax)
+	mov %r9,  VCPU_R9 (%rax)
+
+	/*
+	 * Clear all general purpose registers except RSP and RAX to prevent
+	 * speculative use of the guest's values.
+	 */
+	xor %rbx, %rbx
+	xor %rcx, %rcx
+	xor %rdx, %rdx
+	xor %rsi, %rsi
+	xor %rdi, %rdi
+	xor %rbp, %rbp
+	xor %r8,  %r8
+	xor %r9,  %r9
+	xor %r10, %r10
+	xor %r11, %r11
+	xor %r12, %r12
+	xor %r13, %r13
+	xor %r14, %r14
+	xor %r15, %r15
+
+	/* Restore the TD-Exit reason to RAX for return. */
+	pop %rax
+
+	/* "POP" @regs. */
+4:	add $8, %rsp
+	pop %rbx
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	pop %rbp
+	ret
+
+5:	cmpb $0, kvm_rebooting
+	je 6f
+	mov $-EFAULT, %rax
+	jmp 4b
+6:	ud2
+	_ASM_EXTABLE(2b, 5b)
+
+SYM_FUNC_END(__tdx_vcpu_run)
+#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f7ffb36c318c..5566e7f25ce6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9744,7 +9744,8 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 {
 	int ret;
 
-	if (vcpu->kvm->arch.guest_state_protected)
+	if (vcpu->kvm->arch.guest_state_protected ||
+	    vcpu->kvm->arch.vm_type == KVM_X86_TDX_VM)
 		return -EINVAL;
 
 	vcpu_load(vcpu);
@@ -11388,6 +11389,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_tdvmcall);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_sept_seamcall);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmenter_failed);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h
index 44313ac967dd..959dc883fb11 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -470,4 +470,55 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_SEV_ES_VM	1
 #define KVM_X86_TDX_VM		2
 
+/* Trust Domain eXtension command*/
+enum tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	__u32 id;
+	__u32 metadata;
+	__u64 data;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
+struct kvm_tdx_init_vm {
+	__u32 max_vcpus;
+	__u32 reserved;
+	__u64 attributes;
+	__u64 cpuid;
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */

From patchwork Mon Nov 16 18:26:50 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910333
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9F856C64E69
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:54 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 78753241A3
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388409AbgKPS2i (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:38 -0500
Received: from mga02.intel.com ([134.134.136.20]:48461 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388350AbgKPS2Y (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:24 -0500
IronPort-SDR: 
 2NAb9x1av2R5apCQGoZ41S3XFhLPFWhuWWf7tI2dwyY0GpYVrAPK5Msceovz+3vPGsjun8y2K5
 F07dzokQUuTQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819220"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819220"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:23 -0800
IronPort-SDR: 
 SdA6ZcXzGnuhcWBYFW0bgewEnreUvA1yTxRfIvU5avyG+OQsqjJt1nGl0BWrPrhl2iKtHRoEwc
 e9+HnkO5+Vqg==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528406"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:23 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [RFC PATCH 65/67] KVM: x86: Mark the VM (TD) as bugged if
 non-coherent DMA is detected
Date: Mon, 16 Nov 2020 10:26:50 -0800
Message-Id: 
 <fa0cba75850470e3e46bf370aa921aa5d43fc64a.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX is not supported on platforms with non-coherent IOMMUs, freak out if
one is encountered, and because SEPT doesn't allow the memtype control
that's needed to support non-coherent DMA.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5566e7f25ce6..05dbdfdd7a8b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11144,6 +11144,7 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
 
 void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
 {
+	KVM_BUG_ON(kvm->arch.vm_type == KVM_X86_TDX_VM, kvm);
 	atomic_inc(&kvm->arch.noncoherent_dma_count);
 }
 EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);

From patchwork Mon Nov 16 18:26:51 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910345
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 22D6FC64E7C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:55 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id F20142231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388401AbgKPS2i (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:38 -0500
Received: from mga02.intel.com ([134.134.136.20]:48461 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388289AbgKPS2Y (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:24 -0500
IronPort-SDR: 
 kNPCjkUVLm6FUkTSC5JF9qAgKngrkffQLUHWgtZD6rt1PmjzdDTAp355htjSLDHaqelTn14AQ4
 F0JxzhyQuDFA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819222"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819222"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:24 -0800
IronPort-SDR: 
 y8zciq0fWoIgTVdn9rgqx49hpoPpxoSlAxJ0Uu8bnhrJQ3639knTg7l9XAPT+r3n2EbzTRh7Wl
 Dpbyb2BUiYpQ==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528413"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:23 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com
Subject: [RFC PATCH 66/67] fixup! KVM: TDX: Add "basic" support for building
 and running Trust Domains
Date: Mon, 16 Nov 2020 10:26:51 -0800
Message-Id: 
 <53e3cbd5e790fd41a8c12c3560409da7d19e1523.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index adcb866861b7..d2c1766416f2 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -331,9 +331,6 @@ static int tdx_vm_init(struct kvm *kvm)
 	kvm->arch.mce_injection_disallowed = true;
 	kvm_mmu_set_mmio_spte_mask(kvm, 0, 0);
 
-	/* TODO: Enable 2mb and 1gb large page support. */
-	kvm->arch.tdp_max_page_level = PG_LEVEL_4K;
-
 	kvm_apicv_init(kvm, true);
 
 	/* vCPUs can't be created until after KVM_TDX_INIT_VM. */

From patchwork Mon Nov 16 18:26:52 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Isaku Yamahata <isaku.yamahata@intel.com>
X-Patchwork-Id: 11910343
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 091AAC4742C
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:54 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id C69B92231B
	for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388379AbgKPS22 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 16 Nov 2020 13:28:28 -0500
Received: from mga02.intel.com ([134.134.136.20]:48461 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2388365AbgKPS2Z (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 16 Nov 2020 13:28:25 -0500
IronPort-SDR: 
 TEPuvBFsSWGHNxGc9AhdAkMUj6cl+P7oHvegfFxaC7FJncE0YSVVYT2sOwZEIHmoV7hmr3/4y3
 +xS57VCqyEFA==
X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819228"
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="157819228"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:25 -0800
IronPort-SDR: 
 CWjwe3xKuC+kp8OHc6JKytQ8BahVNoyOSGnbKs3LTuRQuKARvdmCJSU3mmbNHDoJtgl+jM8YNA
 1MGbtaFVymqA==
X-IronPort-AV: E=Sophos;i="5.77,483,1596524400";
   d="scan'208";a="400528424"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Nov 2020 10:28:24 -0800
From: isaku.yamahata@intel.com
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H . Peter Anvin" <hpa@zytor.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com
Subject: [RFC PATCH 67/67] KVM: X86: not for review: add dummy file for
 TDX-SEAM module
Date: Mon, 16 Nov 2020 10:26:52 -0800
Message-Id: 
 <b8dc44b28c68f8bebde09427c88e3e327cdfa391.1605232743.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com>
References: <cover.1605232743.git.isaku.yamahata@intel.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch is not for review, but to make build success.
Add dummy empty file for TDX-SEAM module as
linux/lib/firmware/intel-seam/libtdx.so.

TDX-SEAM module isn't published. Its specification is at [1].

[1] Intel TDX Module 1.0 EAS
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 lib/firmware/intel-seam/libtdx.so | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 lib/firmware/intel-seam/libtdx.so

diff --git a/lib/firmware/intel-seam/libtdx.so b/lib/firmware/intel-seam/libtdx.so
new file mode 100644
index 000000000000..e69de29bb2d1