From patchwork Tue Dec 10 00:49:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900570 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9630D26AEC; Tue, 10 Dec 2024 00:47:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791674; cv=none; b=D7Rc2CD3oA44UxAGLT/Ago5pmFO3EZ3dsAO7iFao4apu64znKureKGp8pxOT0zpwGIC8EINhNs2JtHRC+XdQ0XFt5u5SYoYbVFlQVh9/n3fe2ytBX/hfbmtQE/JyJx1NNrghqLwJ3wsrpB9x7aNdpg/GCQyaPSRp1bVM7R++fMs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791674; c=relaxed/simple; bh=ep4sns6fiJ21YowEfKsA+fUSVKFZ3KDavuMkAMgE58c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=io/aUqPqYzlFIKh4hgrwY5kFJYkzOMPESjCDWSo/eNaiCszTnu3l526cH927ZlkI0hJFUrW/b3GOTy8om2CBrexWGLWPI/r+pZJYGFgrHGRBBA4ktdYazd+khWt+q22amxl49RmaEv2Bjm3jz8I+xTQaHhuOWzKv3mTtja0hTKM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hQiLyMok; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hQiLyMok" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791673; x=1765327673; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ep4sns6fiJ21YowEfKsA+fUSVKFZ3KDavuMkAMgE58c=; b=hQiLyMoklFUovG2Fp3JuimQRioQoEp4ZfxOFudc5p12nmU0V9nRA3Hvp u1NBu3VrUHTbuVVR8eTkCeJsPLSzotx/jPgTZiCPooT2kiL1365zjh091 Nu7mHREiP5/8FWKoV7faUUe91/20fB/Nvl02QAPHW24MbjBub2Ouh1y1X sjnORtPiR/PhbtrRgFJSFBAxjsQAlajXrdc/Ujb2+rZtR1uKKji9oKnY2 OH2SwmaY1k8SQPPiKaqduEXCg8IL/uGL0iMHhvEr4I/0ZPwijyYvuw9CK mkknJyqJ8tWWYgFbVxP+MKaABYhai07q/xl2IzzmveacHfJFVKbnI+ERN w==; X-CSE-ConnectionGUID: wWA63IemSyqX7SF5lkBDGg== X-CSE-MsgGUID: U8hQsCfLT/KjtEWmv/m8Ng== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793677" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793677" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:47:53 -0800 X-CSE-ConnectionGUID: ObZ2XeshQneaKj7fiIuDOw== X-CSE-MsgGUID: qKH5w1WhSKSySGCDAWthuQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033006" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:47:48 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 01/18] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior Date: Tue, 10 Dec 2024 08:49:27 +0800 Message-ID: <20241210004946.3718496-2-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a flag KVM_DEBUGREG_AUTO_SWITCH to skip saving/restoring guest DRs. TDX-SEAM unconditionally saves/restores guest DRs on TD exit/enter, and resets DRs to architectural INIT state on TD exit. Use the new flag KVM_DEBUGREG_AUTO_SWITCH to indicate that KVM doesn't need to save/restore guest DRs. KVM still needs to restore host DRs after TD exit if there are active breakpoints in the host, which is covered by the existing code. MOV-DR exiting is always cleared for TDX guests, so the handler for DR access is never called, and KVM_DEBUGREG_WONT_EXIT is never set. Add a warning if both KVM_DEBUGREG_WONT_EXIT and KVM_DEBUGREG_AUTO_SWITCH are set. Opportunistically convert the KVM_DEBUGREG_* definitions to use BIT(). Reported-by: Xiaoyao Li Signed-off-by: Sean Christopherson Co-developed-by: Chao Gao Signed-off-by: Chao Gao Signed-off-by: Isaku Yamahata [binbin: rework changelog] Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Update the comment about KVM_DEBUGREG_AUTO_SWITCH. - Check explicitly KVM_DEBUGREG_AUTO_SWITCH is not set in switch_db_regs before restoring guest DRs, because KVM_DEBUGREG_BP_ENABLED could be set by userspace. (Paolo) https://lore.kernel.org/lkml/ea136ac6-53cf-cdc5-a741-acfb437819b1@redhat.com/ - Fix the issue that host DRs are not restored in v19 (Binbin) https://lore.kernel.org/kvm/20240413002026.GP3039520@ls.amr.corp.intel.com/ - Update the changelog a bit. --- arch/x86/include/asm/kvm_host.h | 11 +++++++++-- arch/x86/kvm/vmx/tdx.c | 1 + arch/x86/kvm/x86.c | 4 +++- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index df535f08e004..a0814079777f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -604,8 +604,15 @@ struct kvm_pmu { struct kvm_pmu_ops; enum { - KVM_DEBUGREG_BP_ENABLED = 1, - KVM_DEBUGREG_WONT_EXIT = 2, + KVM_DEBUGREG_BP_ENABLED = BIT(0), + KVM_DEBUGREG_WONT_EXIT = BIT(1), + /* + * Guest debug registers (DR0-3, DR6 and DR7) are saved/restored by + * hardware on exit from or enter to guest. KVM needn't switch them. + * DR0-3, DR6 and DR7 are set to their architectural INIT value on VM + * exit, host values need to be restored. + */ + KVM_DEBUGREG_AUTO_SWITCH = BIT(2), }; struct kvm_mtrr { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 3cf8a4e1fc29..b87daa643e6e 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -727,6 +727,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX; + vcpu->arch.switch_db_regs = KVM_DEBUGREG_AUTO_SWITCH; vcpu->arch.cr0_guest_owned_bits = -1ul; vcpu->arch.cr4_guest_owned_bits = -1ul; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e155ae90e9fa..2b4bd56e9fb4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10965,7 +10965,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (vcpu->arch.guest_fpu.xfd_err) wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); - if (unlikely(vcpu->arch.switch_db_regs)) { + if (unlikely(vcpu->arch.switch_db_regs && + !(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH))) { set_debugreg(0, 7); set_debugreg(vcpu->arch.eff_db[0], 0); set_debugreg(vcpu->arch.eff_db[1], 1); @@ -11012,6 +11013,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) */ if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) { WARN_ON(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP); + WARN_ON(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH); kvm_x86_call(sync_dirty_debug_regs)(vcpu); kvm_update_dr0123(vcpu); kvm_update_dr7(vcpu); From patchwork Tue Dec 10 00:49:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900571 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 503D170824; Tue, 10 Dec 2024 00:47:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791678; cv=none; b=hXJkAbZciynuL57RsFOUgR7x94CbSp5K6KiXuky3sur0/VAbau2j2atJh50xB16zL4jhQQczRxP4OqaddipVimvJPDk1s2agEjHHrgPkWBu0CJapwmcZl58YjuOGJkQ0dihdvxT7kohW1j8T6YkQVs/EwU8Oks+EwM8vk5rpqWo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791678; c=relaxed/simple; bh=LtHixNDxmGs+Yn/0JWV3y5pxILC73hM/NhFnT+0NrSQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tknyiIftHGJl0lCNVJXC4RLZW/Ru5Nbwcgnb5O7FH7NVqO7vT1EPcx73J9hjz2D6A1h0b8OHxUbyk0/dfuwsjZ5RNcyau0knxy6jW+aqCxAcb5Tv61xccvPx1fbBD56tbMMUUm30FbNA7ddqEHC5IEyK35YSBkezeMPJwlLF8qU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QNMqZB4+; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QNMqZB4+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791677; x=1765327677; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LtHixNDxmGs+Yn/0JWV3y5pxILC73hM/NhFnT+0NrSQ=; b=QNMqZB4+NQ5CVvd4YdiTX74gO22YrMWlIxp5Pyo+LJmT0o8cC7sOc7Sz K19BKkTBc9rAc2RiYhMfNPflOFQBgdP/3XfA45Giy2sLXjFpDvgS8FJII H32J5KtXA0TB6tpF7L50Tpab/ENkLotcWHaDNFCvoiw0fUQyWY8Kue/lm OlOAcpuNFThPF4bgLjT4iW92p4PYQ6u81tqIn3I4KZ2gWoum9B2Q27j1O 7tbQvfl+d0GTp3leh8/oNGI7D1MACKrQx4bapI9Z6O4A1xcm7/pxHvMDL N0ajmDBkZuruluq5OYIdXjuizbsIUy0kOWBFqvRhX9e88YmbAfUvvisnn g==; X-CSE-ConnectionGUID: rzmp08vyQWSWxtyLjUyEJQ== X-CSE-MsgGUID: TyNTIKgMR3uuJYNYr+ln2w== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793684" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793684" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:47:56 -0800 X-CSE-ConnectionGUID: dtiCF5rUT064Wama6d62VA== X-CSE-MsgGUID: PtePnla1Qi+CwaWSxPFEfw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033012" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:47:52 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 02/18] KVM: TDX: Handle EPT violation/misconfig exit Date: Tue, 10 Dec 2024 08:49:28 +0800 Message-ID: <20241210004946.3718496-3-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata For TDX, on EPT violation, call common __vmx_handle_ept_violation() to trigger x86 MMU code; on EPT misconfiguration, bug the VM since it shouldn't happen. EPT violation due to instruction fetch should never be triggered from shared memory in TDX guest. If such EPT violation occurs, treat it as broken hardware. EPT misconfiguration shouldn't happen on neither shared nor secure EPT for TDX guests. - TDX module guarantees no EPT misconfiguration on secure EPT. Per TDX module v1.5 spec section 9.4 "Secure EPT Induced TD Exits": "By design, since secure EPT is fully controlled by the TDX module, an EPT misconfiguration on a private GPA indicates a TDX module bug and is handled as a fatal error." - For shared EPT, the MMIO caching optimization, which is the only case where current KVM configures EPT entries to generate EPT misconfiguration, is implemented in a different way for TDX guests. KVM configures EPT entries to non-present value without suppressing #VE bit. It causes #VE in the TDX guest and the guest will call TDG.VP.VMCALL to request MMIO emulation. Signed-off-by: Isaku Yamahata [binbin: rework changelog] Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Renamed from "KVM: TDX: Handle ept violation/misconfig exit" to "KVM: TDX: Handle EPT violation/misconfig exit" (Reinette) - Removed WARN_ON_ONCE(1) in tdx_handle_ept_misconfig(). (Rick) - Add comment above EPT_VIOLATION_ACC_INSTR check. (Chao) https://lore.kernel.org/lkml/Zgoz0sizgEZhnQ98@chao-email/ https://lore.kernel.org/lkml/ZjiE+O9fct5zI4Sf@chao-email/ - Remove unnecessary define of TDX_SEPT_VIOLATION_EXIT_QUAL. (Sean) - Replace pr_warn() and KVM_EXIT_EXCEPTION with KVM_BUG_ON(). (Sean) - KVM_BUG_ON() for EPT misconfig. (Sean) - Rework changelog. v14 -> v15: - use PFERR_GUEST_ENC_MASK to tell the fault is private --- arch/x86/kvm/vmx/tdx.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b87daa643e6e..aecf52dda00d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1770,6 +1770,36 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, __vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector); } +static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) +{ + unsigned long exit_qual; + + if (vt_is_tdx_private_gpa(vcpu->kvm, tdexit_gpa(vcpu))) { + /* + * Always treat SEPT violations as write faults. Ignore the + * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations. + * TD private pages are always RWX in the SEPT tables, + * i.e. they're always mapped writable. Just as importantly, + * treating SEPT violations as write faults is necessary to + * avoid COW allocations, which will cause TDAUGPAGE failures + * due to aliasing a single HPA to multiple GPAs. + */ + exit_qual = EPT_VIOLATION_ACC_WRITE; + } else { + exit_qual = tdexit_exit_qual(vcpu); + /* + * EPT violation due to instruction fetch should never be + * triggered from shared memory in TDX guest. If such EPT + * violation occurs, treat it as broken hardware. + */ + if (KVM_BUG_ON(exit_qual & EPT_VIOLATION_ACC_INSTR, vcpu->kvm)) + return -EIO; + } + + trace_kvm_page_fault(vcpu, tdexit_gpa(vcpu), exit_qual); + return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual); +} + int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -1814,6 +1844,11 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) return tdx_handle_external_interrupt(vcpu); case EXIT_REASON_TDCALL: return handle_tdvmcall(vcpu); + case EXIT_REASON_EPT_VIOLATION: + return tdx_handle_ept_violation(vcpu); + case EXIT_REASON_EPT_MISCONFIG: + KVM_BUG_ON(1, vcpu->kvm); + return -EIO; case EXIT_REASON_OTHER_SMI: /* * Unlike VMX, SMI in SEAM non-root mode (i.e. when From patchwork Tue Dec 10 00:49:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900572 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA155130E27; Tue, 10 Dec 2024 00:47:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791681; cv=none; b=JDr5u/btpECDKB6oERg62kD6qH2lqQMFSNw98oEERkj9Lw8nF9O7tZ1u8lJ47QXQpmEv9LnyoVHY5Mki59L5OOJ+vvLKQzq7schusViWBO0qOAUeZbjLH2qdRLXYfwar4APYawzk8gVOOjSdvc6Uy4IV3ggpuUORnCgsdp2owiY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791681; c=relaxed/simple; bh=PPv9bUtXMfcaRh4a+/udp92Rcvcy+pazYoHCs/v8+OA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IEtzXqZNrK77ECPjEk8XV4/OQv+rM6EHV9RetSKHsHjfTyt5FEf1iRk6usedfXBOUQCJ+ahPrRZaHumcMacMbvwv6GD0M6qgRoz0v2ErrAjbQTYgMFXtKtfZH/k4i9IuO4HKmXnDQSUtUF4wqx/YUtBxWWdzHQPtFSJX7HKHcqM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=InL111gK; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="InL111gK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791680; x=1765327680; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PPv9bUtXMfcaRh4a+/udp92Rcvcy+pazYoHCs/v8+OA=; b=InL111gKiCJRRvQkndWjD37xHnQlJfl6TMzOyrH+1pVncKDPiJI5NZBa lhiJrBiFgT0y7PZ5V2uKFfxik+qbntUadt7xKIsNa5WhT0tNxqEhzYPHW gnKIlyNxQf2LRm03oxNf34W2Z+OIhxXg1ACI3YQp4xoeEqHmhaDBnyDPV 9eX7+/8p6UcomVzWPUzfsuhCUed89JWTMzsL+KS02LaI2/CPQggbbxuk3 QyM/MjV5sZCUKc9UUZaA/CMkGO4d5ZsiOQZLs4Z+oQnkXPZdHxkrdsDZl fUmUYi8rdKtKVvrIGJdyIai/IF98ZxssJsq4ifIcGU42iCG77dgZaY70d w==; X-CSE-ConnectionGUID: yZorrUv0SQyWR1mBdm3QZg== X-CSE-MsgGUID: Ov2RD2kqTuSJqR26ZTdmqQ== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793694" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793694" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:00 -0800 X-CSE-ConnectionGUID: nSr9tx8hTPS2+Gvc49WZqg== X-CSE-MsgGUID: abC5L7DbSBCKtU7awHVS+A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033017" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:47:56 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 03/18] KVM: TDX: Detect unexpected SEPT violations due to pending SPTEs Date: Tue, 10 Dec 2024 08:49:29 +0800 Message-ID: <20241210004946.3718496-4-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yan Zhao Detect SEPT violations that occur when an SEPT entry is in PENDING state while the TD is configured not to receive #VE on SEPT violations. A TD guest can be configured not to receive #VE by setting SEPT_VE_DISABLE to 1 in tdh_mng_init() or modifying pending_ve_disable to 1 in TDCS when flexible_pending_ve is permitted. In such cases, the TDX module will not inject #VE into the TD upon encountering an EPT violation caused by an SEPT entry in the PENDING state. Instead, TDX module will exit to VMM and set extended exit qualification type to PENDING_EPT_VIOLATION and exit qualification bit 6:3 to 0. Since #VE will not be injected to such TDs, they are not able to be notified to accept a GPA. TD accessing before accepting a private GPA is regarded as an error within the guest. Detect such guest error by inspecting the (extended) exit qualification bits and make such VM dead. Cc: Xiaoyao Li Cc: Rick Edgecombe Signed-off-by: Yan Zhao Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - New patch --- arch/x86/include/asm/vmx.h | 2 ++ arch/x86/kvm/vmx/tdx.c | 20 +++++++++++++++++++- arch/x86/kvm/vmx/tdx_arch.h | 2 ++ 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 9298fb9d4bb3..028f3b8db2af 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -585,12 +585,14 @@ enum vm_entry_failure_code { #define EPT_VIOLATION_ACC_WRITE_BIT 1 #define EPT_VIOLATION_ACC_INSTR_BIT 2 #define EPT_VIOLATION_RWX_SHIFT 3 +#define EPT_VIOLATION_EXEC_R3_LIN_BIT 6 #define EPT_VIOLATION_GVA_IS_VALID_BIT 7 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8 #define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT) #define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT) #define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT) #define EPT_VIOLATION_RWX_MASK (VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT) +#define EPT_VIOLATION_EXEC_FOR_RING3_LIN (1 << EPT_VIOLATION_EXEC_R3_LIN_BIT) #define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT) #define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index aecf52dda00d..96b05e445837 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1770,11 +1770,29 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, __vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector); } +static inline bool tdx_is_sept_violation_unexpected_pending(struct kvm_vcpu *vcpu) +{ + u64 eeq_type = tdexit_ext_exit_qual(vcpu) & TDX_EXT_EXIT_QUAL_TYPE_MASK; + u64 eq = tdexit_exit_qual(vcpu); + + if (eeq_type != TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION) + return false; + + return !(eq & EPT_VIOLATION_RWX_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_RING3_LIN); +} + static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) { + gpa_t gpa = tdexit_gpa(vcpu); unsigned long exit_qual; - if (vt_is_tdx_private_gpa(vcpu->kvm, tdexit_gpa(vcpu))) { + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { + if (tdx_is_sept_violation_unexpected_pending(vcpu)) { + pr_warn("Guest access before accepting 0x%llx on vCPU %d\n", + gpa, vcpu->vcpu_id); + kvm_vm_dead(vcpu->kvm); + return -EIO; + } /* * Always treat SEPT violations as write faults. Ignore the * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations. diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 289728f1611f..2f9e88f497bc 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -104,6 +104,8 @@ struct tdx_cpuid_value { #define TDX_TD_ATTR_KL BIT_ULL(31) #define TDX_TD_ATTR_PERFMON BIT_ULL(63) +#define TDX_EXT_EXIT_QUAL_TYPE_MASK GENMASK(3, 0) +#define TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION 6 /* * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B. */ From patchwork Tue Dec 10 00:49:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900573 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F5681632FB; Tue, 10 Dec 2024 00:48:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791685; cv=none; b=gWXeHOzYyXZH4rYKDJAkBFVSxI/ITnbInMc2zlUiqrtG/a/EkrX0ZWyImJVLxSi9sExHOqTGaVl8F/nyP8NIvg1cwAE7q/SsNbQ/TiDJ0xlVIf//SZNnUcdYjU7vTBZRoxuFWMA+l9nejbpZyMi93Jg20j9P8yOO4oKi1mr7mMI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791685; c=relaxed/simple; bh=27fiCt6h2LdRcruWOgtZdm64bm9I78m5Xkchkn9Mrfs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VkHl6HbTqKYtvHlrNv8DxCFk0IYXX4J5IGlt+crgsZzTLHPBZJk8BNMKd3vkY9XCWPuVJF5obJ3g/8jMIxVGnj5hBwxNvSsKuhiA9zoizjDLefT8/GaEmL8m5fh0bgJ5AnxSC7nkpvTv5pvO4wosRAWxzijTpMXbnAOpnf69p7A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HxT5aiOP; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HxT5aiOP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791683; x=1765327683; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=27fiCt6h2LdRcruWOgtZdm64bm9I78m5Xkchkn9Mrfs=; b=HxT5aiOPIf2E0dEankFLE50yXjasP01G0orz9n0NNIt2Vgs8dHeyiiDu BI0D7fm+EoUR0QHhqlj4WVKIBSMHPtT4xcoCILPauRwGNjPMrPoyOP/Lu 6k3I3sYGj6eJCRD8X6nOQ1WM9WJYFZMwP+d9Npw+nAxakTXwtLdQ3C3Pw cWZFeZOWYCslzlLdXUyDlwNrSCx/ZxIsA2rrYlNklryEFB8EEnYPAEHbW JP/XX6PgSQt+e7D/KVBFmp8oMf6gCNjbHiB12rTB1aYBOUgZoBV7yKnUn 4jE9nH6CiBk09pihN7eyHcJAYctr3+nl5S2a+vbugyp6t3OiNY6z0m0fy Q==; X-CSE-ConnectionGUID: sBnmWoTxRdahhADLrpGBYA== X-CSE-MsgGUID: 09idXb/nQ4SMp8Ga4yPnHg== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793701" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793701" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:03 -0800 X-CSE-ConnectionGUID: F3peKpcIQp+7gbQdnkxqUQ== X-CSE-MsgGUID: mGgRAXP9SZOBR59jaeBLIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033020" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:47:59 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 04/18] KVM: TDX: Handle TDX PV CPUID hypercall Date: Tue, 10 Dec 2024 08:49:30 +0800 Message-ID: <20241210004946.3718496-5-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle TDX PV CPUID hypercall for the CPUIDs virtualized by VMM according to TDX Guest Host Communication Interface (GHCI). For TDX, most CPUID leaf/sub-leaf combinations are virtualized by the TDX module while some trigger #VE. On #VE, TDX guest can issue TDG.VP.VMCALL (same value as EXIT_REASON_CPUID) to request VMM to emulate CPUID operation. Wire up TDX PV CPUID hypercall to the KVM backend function. Signed-off-by: Isaku Yamahata [binbin: rewrite changelog] Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Rewrite changelog. - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- arch/x86/kvm/vmx/tdx.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 96b05e445837..62dbb47ead21 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1274,6 +1274,26 @@ static int tdx_report_fatal_error(struct kvm_vcpu *vcpu) return 0; } +static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu) +{ + u32 eax, ebx, ecx, edx; + + /* EAX and ECX for cpuid is stored in R12 and R13. */ + eax = tdvmcall_a0_read(vcpu); + ecx = tdvmcall_a1_read(vcpu); + + kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false); + + tdvmcall_a0_write(vcpu, eax); + tdvmcall_a1_write(vcpu, ebx); + tdvmcall_a2_write(vcpu, ecx); + tdvmcall_a3_write(vcpu, edx); + + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + + return 1; +} + static int tdx_complete_pio_out(struct kvm_vcpu *vcpu) { vcpu->arch.pio.count = 0; @@ -1455,6 +1475,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_map_gpa(vcpu); case TDVMCALL_REPORT_FATAL_ERROR: return tdx_report_fatal_error(vcpu); + case EXIT_REASON_CPUID: + return tdx_emulate_cpuid(vcpu); case EXIT_REASON_IO_INSTRUCTION: return tdx_emulate_io(vcpu); case EXIT_REASON_EPT_VIOLATION: From patchwork Tue Dec 10 00:49:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900574 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3503119D897; Tue, 10 Dec 2024 00:48:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791689; cv=none; b=DWjdODusal1gBR6ZkJiMRAs+8YBDVr9HvoMsMvonRNo8/pDKuxIMxcTDRnNVROWR+7h/k5NE8m6or2crMG8W2GH9RIjyn/XKDIFhILvYDkiIhH4rVFK6oWoAVMEMqnsBqiHpKX1+QU1IhAoZKfqqDnHb6rOY2VvKhzZOw/Z+4VE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791689; c=relaxed/simple; bh=PSSo+9eYAoqziDacJv2YmgxniM4s+DtQkJOYkubWaHY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tx3eJa6LzEJIIuGw7AtCVNFtJD/MPeK2qVUBJAdg0udr0Wyo1Mpd2SJPQulN0V77JV1iHsesQneLBZi+WDtzW1azHVUiff/LDtsRl35U60j57P/kyVuGtd/v6dDJbzQbD7tdEYkM0VoJpUb+gr8qkWomn2fNpHoQjl4Ut2L+x8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gBTz32e4; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gBTz32e4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791687; x=1765327687; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PSSo+9eYAoqziDacJv2YmgxniM4s+DtQkJOYkubWaHY=; b=gBTz32e4be436OFxfNbiU/u877gn2xf6/n5BiXaQCQnDCkby2qSI5X41 LOjhWE3XbP6q+BjmtoUEIRHygTyUFsRSmPDGnB1B6HmoazmEgyC/22h/r 5nXilNLIPdYYc5TaYESR4wAO7Ly4ew2dFppAtAjVxkyPfrqCLh7ZWk9V3 qRCeTjFEfsInIf/eE3PuoCAn5rwaiRSFqd6MoXf9qIIRjXpXnmk/5s/Vc Gru6psoUgk8Gr9BPQ2QLYcBPBM4dBQsJ3XSRVxg2mYsmG9EMRyFK5oJeQ 9OMbE4BDD9rbCu8tRLw6PFrBKQzkLkTC5aOdZvH7wFsKBp3CqyP6vuuVw g==; X-CSE-ConnectionGUID: hK8d6tUdSqyI4bhI5og7nA== X-CSE-MsgGUID: Ati0TWEqRpCVJ1X1LaVdxQ== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793704" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793704" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:06 -0800 X-CSE-ConnectionGUID: WrPuodzPQa28/GpCTSmOmg== X-CSE-MsgGUID: WTfxpc9mRKS6Fl65lCrX+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033023" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:03 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 05/18] KVM: TDX: Handle TDX PV HLT hypercall Date: Tue, 10 Dec 2024 08:49:31 +0800 Message-ID: <20241210004946.3718496-6-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle TDX PV HLT hypercall and the interrupt status due to it. TDX guest status is protected, KVM can't get the interrupt status of TDX guest and it assumes interrupt is always allowed unless TDX guest calls TDVMCALL with HLT, which passes the interrupt blocked flag. Update vt_interrupt_allowed() for TDX based on interrupt blocked flag passed by HLT TDVMCALL. Do not wakeup TD vCPU if interrupt is blocked for VT-d PI. For NMIs, KVM cannot determine the NMI blocking status for TDX guests, so KVM always assumes NMIs are not blocked. In the unlikely scenario where a guest invokes the PV HLT hypercall within an NMI handler, this could result in a spurious wakeup. The guest should implement the PV HLT hypercall within a loop if it truly requires no interruptions, since NMI could be unblocked by an IRET due to an exception occurring before the PV HLT is executed in the NMI handler. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Update the changelog. - Remove the interrupt_disabled_hlt field (Sean) https://lore.kernel.org/kvm/Zg1seIaTmM94IyR8@google.com/ - Move the logic of interrupt status to vt_interrupt_allowed() (Chao) https://lore.kernel.org/kvm/ZhIX7K0WK+gYtcan@chao-email/ - Add suggested-by tag. - Use tdx_check_exit_reason() - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) v19: - move tdvps_state_non_arch_check() to this patch v18: - drop buggy_hlt_workaround and use TDH.VP.RD(TD_VCPU_STATE_DETAILS) --- arch/x86/kvm/vmx/main.c | 2 +- arch/x86/kvm/vmx/posted_intr.c | 3 ++- arch/x86/kvm/vmx/tdx.c | 32 +++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx.h | 6 ++++++ arch/x86/kvm/vmx/tdx_arch.h | 11 +++++++++++ 5 files changed, 51 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 305425b19cb5..bfe848083eb9 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -424,7 +424,7 @@ static void vt_cancel_injection(struct kvm_vcpu *vcpu) static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection) { if (is_td_vcpu(vcpu)) - return true; + return tdx_interrupt_allowed(vcpu); return vmx_interrupt_allowed(vcpu, for_injection); } diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 87a6964c662a..1ce9b9e93a26 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -223,7 +223,8 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) return; if (kvm_vcpu_is_blocking(vcpu) && - (is_td_vcpu(vcpu) || !vmx_interrupt_blocked(vcpu))) + ((is_td_vcpu(vcpu) && tdx_interrupt_allowed(vcpu)) || + (!is_td_vcpu(vcpu) && !vmx_interrupt_blocked(vcpu)))) pi_enable_wakeup_handler(vcpu); /* diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 62dbb47ead21..2b64652a0d05 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -771,9 +771,31 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) local_irq_enable(); } +bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu) +{ + /* + * KVM can't get the interrupt status of TDX guest and it assumes + * interrupt is always allowed unless TDX guest calls TDVMCALL with HLT, + * which passes the interrupt blocked flag. + */ + if (!tdx_check_exit_reason(vcpu, EXIT_REASON_TDCALL) || + tdvmcall_exit_type(vcpu) || tdvmcall_leaf(vcpu) != EXIT_REASON_HLT) + return true; + + return !tdvmcall_a0_read(vcpu); +} + bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) { - return pi_has_pending_interrupt(vcpu); + u64 vcpu_state_details; + + if (pi_has_pending_interrupt(vcpu)) + return true; + + vcpu_state_details = + td_state_non_arch_read64(to_tdx(vcpu), TD_VCPU_STATE_DETAILS_NON_ARCH); + + return tdx_vcpu_state_details_intr_pending(vcpu_state_details); } /* @@ -1294,6 +1316,12 @@ static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu) return 1; } +static int tdx_emulate_hlt(struct kvm_vcpu *vcpu) +{ + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + return kvm_emulate_halt_noskip(vcpu); +} + static int tdx_complete_pio_out(struct kvm_vcpu *vcpu) { vcpu->arch.pio.count = 0; @@ -1477,6 +1505,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_report_fatal_error(vcpu); case EXIT_REASON_CPUID: return tdx_emulate_cpuid(vcpu); + case EXIT_REASON_HLT: + return tdx_emulate_hlt(vcpu); case EXIT_REASON_IO_INSTRUCTION: return tdx_emulate_io(vcpu); case EXIT_REASON_EPT_VIOLATION: diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index b553dd9b0b06..008180c0c30f 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -152,6 +152,7 @@ static __always_inline void tdvps_vmcs_check(u32 field, u8 bits) } static __always_inline void tdvps_management_check(u64 field, u8 bits) {} +static __always_inline void tdvps_state_non_arch_check(u64 field, u8 bits) {} #define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass) \ static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx, \ @@ -199,11 +200,15 @@ static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \ tdh_vp_wr_failed(tdx, #uclass, " &= ~", field, bit, err);\ } + +bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu); + TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs); TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs); TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs); TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management); +TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch); #else static inline void tdx_bringup(void) {} @@ -223,6 +228,7 @@ static inline bool is_td(struct kvm *kvm) { return false; } static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; } static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) { return NULL; } static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) { return NULL; } +static inline bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu) { return false; } #endif diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 2f9e88f497bc..861c0f649b69 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -71,6 +71,17 @@ enum tdx_tdcs_execution_control { TD_TDCS_EXEC_TSC_OFFSET = 10, }; +enum tdx_vcpu_guest_other_state { + TD_VCPU_STATE_DETAILS_NON_ARCH = 0x100, +}; + +#define TDX_VCPU_STATE_DETAILS_INTR_PENDING BIT_ULL(0) + +static inline bool tdx_vcpu_state_details_intr_pending(u64 vcpu_state_details) +{ + return !!(vcpu_state_details & TDX_VCPU_STATE_DETAILS_INTR_PENDING); +} + /* @field is any of enum tdx_tdcs_execution_control */ #define TDCS_EXEC(field) BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (field)) From patchwork Tue Dec 10 00:49:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900575 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEB7919DFA2; Tue, 10 Dec 2024 00:48:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791692; cv=none; b=QwNRQJ2JWHNUYF/Vbtq8QybHWNmVcK9fwfcW7N2h6m1eQ/S86khYhgMowxcNQiNiCxBaMLMjGXl0/fS+0FiBuyID8kk2w1gkA3L+VnbcnLqUVpM9EP+Ndvby/5PmB2W0ud/ADMWZT3jWkFpdHmdjecrQDKC+CLTTNIMecsogP44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791692; c=relaxed/simple; bh=6qbe1nTxwtwr/1P8atYc9YKT0eYcr+2D+LcplNIsU2E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SDAoFKExiEmIxV9wFvH9EGhSfgEn9+/oexOiJ/IT5A4HYdmf3WVlBm1IRZ93nx+lSdqjYX+bylUP8ZcjnX3DVTTFEvVW0bMTtQn3El0r8E1eFbukdfrCxKkLfAqvlztT1O7qJiWLfl3aw0YTEP0kfWyDVqllEErBxBjwnMPrYdw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VD+OlRCV; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VD+OlRCV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791690; x=1765327690; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6qbe1nTxwtwr/1P8atYc9YKT0eYcr+2D+LcplNIsU2E=; b=VD+OlRCV7NwI8jHtgrxGrI0S/LMQEeicmFqJkbk1tAUi3Hst6eaE7896 Ne7Ptn3xqsYKW7scn+QyZ/BE/o3CQtcBYtoGswsLsSAD31HxLzR3BOCk7 8xbr0d6ISYlrN2SLJfqoNZu7FYoWJ2U0a696I92zF3f5sayYqS80Inq8f hTwdvsZoXLmfRkoNYS3o8+fCfNdtaV5/gfcBDtvwAod4LdPB6BR75YmK3 p9wxySPx6ai6E0uVkojJEUzSRjcaL1uTjTAy1v77370BbN1JDdmsHFrDz gLP5CWxR82yJ9C4r0Uy8NGjaQGEOqRLPAE5bhkvFeUailxJmQ45vR21tu Q==; X-CSE-ConnectionGUID: 8pbkv7qgSyOlC/Nk+8KcaQ== X-CSE-MsgGUID: BVqSpzp9RqOyUSS/r4J9Bw== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793708" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793708" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:10 -0800 X-CSE-ConnectionGUID: 8XJbrh5uQ/qG/mNoEAGEUw== X-CSE-MsgGUID: HK1mZo+6QM6tCeW/m2xq/g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033028" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:07 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 06/18] KVM: x86: Move KVM_MAX_MCE_BANKS to header file Date: Tue, 10 Dec 2024 08:49:32 +0800 Message-ID: <20241210004946.3718496-7-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Move KVM_MAX_MCE_BANKS to header file so that it can be used for TDX in a future patch. Signed-off-by: Isaku Yamahata [binbin: split into new patch] Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - New patch. Split from "KVM: TDX: Implement callbacks for MSR operations for TDX". (Sean) https://lore.kernel.org/kvm/Zg1yPIV6cVJrwGxX@google.com/ --- arch/x86/kvm/x86.c | 1 - arch/x86/kvm/x86.h | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2b4bd56e9fb4..5eacdb5b9737 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -90,7 +90,6 @@ #include "trace.h" #define MAX_IO_MSRS 256 -#define KVM_MAX_MCE_BANKS 32 /* * Note, kvm_caps fields should *never* have default values, all fields must be diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index d69074a05779..0a1946368439 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -9,6 +9,8 @@ #include "kvm_cache_regs.h" #include "kvm_emulate.h" +#define KVM_MAX_MCE_BANKS 32 + struct kvm_caps { /* control of guest tsc rate supported? */ bool has_tsc_control; From patchwork Tue Dec 10 00:49:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900576 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA14A1C6F55; Tue, 10 Dec 2024 00:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791696; cv=none; b=B/XnDYieZhdjA7vLkiXhvQjSQovJT75jMUulAU1BCAKiNNy1XzDYNnfAessabTILt7SNS3lOEnSSu7KFFV8fwxRBqzW55bRhY9gjEDaffXW7RUyEkOHlcQGWo7fZUEP4Tcps6cyu7OOEkAaKeEM32cd09BtuojBHtJfoeBiLiFM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791696; c=relaxed/simple; bh=TLv9MNhfqNUbqTDbMVPcnS18Mpe8c6Qa8beuTcU+HtM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KFbZzTnAA6WkwG1ATxXp9r4b4hBj9BMCUUlPoBrWf5ERtJthe78iRzJJvf9qLZKS7gzjjqT5t9gnZbRk6oNpAnxhD1uSWGEmxWveSMigvTVGNrJJUqQackuuriXooLkLfApQ1PExj/lZ9Qm+2CJHQ/4zArAWtRXoNcyI9nf72Lo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AwNc39OY; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AwNc39OY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791694; x=1765327694; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TLv9MNhfqNUbqTDbMVPcnS18Mpe8c6Qa8beuTcU+HtM=; b=AwNc39OYPMWSGJI77HD/M3QAj3UCfiFzL4tOUv7p7hCN6gyc/fUW6C9G buADWmkZ2sFRqTQ+4GD5g3lclAi1jU4jmD0YDLChhhokIDzMZc2j0eUJg 510AyNSlQ35N1US2WY9CfEh8ADyD7zXLFJiQ7rYfY9t0MSVifCAPSAg3e L0xNuTgJ0s77qKhhH9KN+F+wIDJ38UEKWoLGVhXFqOOV+hJnJYmufSP70 c07tR+uu+AQc28ophgeKeBawcyaI8yTvo98TnEt/JtPjy6wY/weWADrU8 L80b/dxOacOOwnPSCHKsbsn22Ep+rGXjyH+cCkc0K2Rm7DMs9QmQbqmxe g==; X-CSE-ConnectionGUID: MTek+1kIQEOwZVNkU4Nv2w== X-CSE-MsgGUID: GuzvO0KqTd+8E/4LCPLNCQ== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793711" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793711" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:14 -0800 X-CSE-ConnectionGUID: dA49xAljT7qJ0/izZPr+Rw== X-CSE-MsgGUID: 2bhwuea6QLeh676yhYt3uQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033039" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:10 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 07/18] KVM: TDX: Implement callbacks for MSR operations Date: Tue, 10 Dec 2024 08:49:33 +0800 Message-ID: <20241210004946.3718496-8-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add functions to implement MSR related callbacks, .set_msr(), .get_msr(), and .has_emulated_msr(), for preparation of handling hypercalls from TDX guest for para-virtualized RDMSR and WRMSR. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX. There are three classes of MSRs virtualization for TDX. - Non-configurable: TDX module directly virtualizes it. VMM can't configure it, the value set by KVM_SET_MSRS is ignored. - Configurable: TDX module directly virtualizes it. VMM can configure at the VM creation time. The value set by KVM_SET_MSRS is used. - #VE case: TDX guest would issue TDG.VP.VMCALL and VMM handles the MSR hypercall. The value set by KVM_SET_MSRS is used. For the MSRs belonging to the #VE case, the TDX module injects #VE to the TDX guest upon RDMSR or WRMSR. The exact list of such MSRs are defined in TDX Module ABI Spec. Upon #VE, the TDX guest may call TDG.VP.VMCALL, which are defined in GHCI (Guest-Host Communication Interface) so that the host VMM (e.g. KVM) can virtualize the MSRs. TDX doesn't allow VMM to configure interception of MSR accesses. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX guest. If the userspace has set any MSR filters, it will be applied when handling TDG.VP.VMCALL in a later patch. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- TDX "the rest" breakout: - Renamed from "KVM: TDX: Implement callbacks for MSR operations for TDX" to "KVM: TDX: Implement callbacks for MSR operations" - Update changelog. - Remove the @write parameter of tdx_has_emulated_msr(), check whether the MSR is readonly or not in tdx_set_msr(). (Sean) https://lore.kernel.org/kvm/Zg1yPIV6cVJrwGxX@google.com/ - Change the code align with the patten "make the happy path the not-taken path". (Sean) - Open code the handling for an emulated KVM MSR MSR_KVM_POLL_CONTROL, and let others go through the default statement. (Sean) - Split macro KVM_MAX_MCE_BANKS move to a separate patch. (Sean) - Add comments in vt_msr_filter_changed(). - Add Suggested-by tag. --- arch/x86/kvm/vmx/main.c | 50 +++++++++++++++++++++++++--- arch/x86/kvm/vmx/tdx.c | 67 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 6 ++++ 3 files changed, 119 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index bfe848083eb9..a224e9d32701 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -191,6 +191,48 @@ static void vt_handle_exit_irqoff(struct kvm_vcpu *vcpu) vmx_handle_exit_irqoff(vcpu); } +static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + if (unlikely(is_td_vcpu(vcpu))) + return tdx_set_msr(vcpu, msr_info); + + return vmx_set_msr(vcpu, msr_info); +} + +/* + * The kvm parameter can be NULL (module initialization, or invocation before + * VM creation). Be sure to check the kvm parameter before using it. + */ +static bool vt_has_emulated_msr(struct kvm *kvm, u32 index) +{ + if (kvm && is_td(kvm)) + return tdx_has_emulated_msr(index); + + return vmx_has_emulated_msr(kvm, index); +} + +static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + if (unlikely(is_td_vcpu(vcpu))) + return tdx_get_msr(vcpu, msr_info); + + return vmx_get_msr(vcpu, msr_info); +} + +static void vt_msr_filter_changed(struct kvm_vcpu *vcpu) +{ + /* + * TDX doesn't allow VMM to configure interception of MSR accesses. + * TDX guest requests MSR accesses by calling TDVMCALL. The MSR + * filters will be applied when handling the TDVMCALL for RDMSR/WRMSR + * if the userspace has set any. + */ + if (is_td_vcpu(vcpu)) + return; + + vmx_msr_filter_changed(vcpu); +} + #ifdef CONFIG_KVM_SMM static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection) { @@ -510,7 +552,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .disable_virtualization_cpu = vt_disable_virtualization_cpu, .emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu, - .has_emulated_msr = vmx_has_emulated_msr, + .has_emulated_msr = vt_has_emulated_msr, .vm_size = sizeof(struct kvm_vmx), @@ -529,8 +571,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .update_exception_bitmap = vmx_update_exception_bitmap, .get_feature_msr = vmx_get_feature_msr, - .get_msr = vmx_get_msr, - .set_msr = vmx_set_msr, + .get_msr = vt_get_msr, + .set_msr = vt_set_msr, .get_segment_base = vmx_get_segment_base, .get_segment = vmx_get_segment, .set_segment = vmx_set_segment, @@ -637,7 +679,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .apic_init_signal_blocked = vt_apic_init_signal_blocked, .migrate_timers = vmx_migrate_timers, - .msr_filter_changed = vmx_msr_filter_changed, + .msr_filter_changed = vt_msr_filter_changed, .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 2b64652a0d05..770e3b847cd6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1984,6 +1984,73 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, *error_code = 0; } +bool tdx_has_emulated_msr(u32 index) +{ + switch (index) { + case MSR_IA32_UCODE_REV: + case MSR_IA32_ARCH_CAPABILITIES: + case MSR_IA32_POWER_CTL: + case MSR_IA32_CR_PAT: + case MSR_IA32_TSC_DEADLINE: + case MSR_IA32_MISC_ENABLE: + case MSR_PLATFORM_INFO: + case MSR_MISC_FEATURES_ENABLES: + case MSR_IA32_APICBASE: + case MSR_EFER: + case MSR_IA32_MCG_CAP: + case MSR_IA32_MCG_STATUS: + case MSR_IA32_MCG_CTL: + case MSR_IA32_MCG_EXT_CTL: + case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: + case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1: + /* MSR_IA32_MCx_{CTL, STATUS, ADDR, MISC, CTL2} */ + case MSR_KVM_POLL_CONTROL: + return true; + case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff: + /* + * x2APIC registers that are virtualized by the CPU can't be + * emulated, KVM doesn't have access to the virtual APIC page. + */ + switch (index) { + case X2APIC_MSR(APIC_TASKPRI): + case X2APIC_MSR(APIC_PROCPRI): + case X2APIC_MSR(APIC_EOI): + case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR): + case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR): + case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR): + return false; + default: + return true; + } + default: + return false; + } +} + +static bool tdx_is_read_only_msr(u32 index) +{ + return index == MSR_IA32_APICBASE || index == MSR_EFER; +} + +int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) +{ + if (!tdx_has_emulated_msr(msr->index)) + return 1; + + return kvm_get_msr_common(vcpu, msr); +} + +int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) +{ + if (tdx_is_read_only_msr(msr->index)) + return 1; + + if (!tdx_has_emulated_msr(msr->index)) + return 1; + + return kvm_set_msr_common(vcpu, msr); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 28dbef77700c..7fb1bbf12b39 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -142,6 +142,9 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, void tdx_inject_nmi(struct kvm_vcpu *vcpu); void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); +bool tdx_has_emulated_msr(u32 index); +int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); +int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -186,6 +189,9 @@ static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mo static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {} static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) {} +static inline bool tdx_has_emulated_msr(u32 index) { return false; } +static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } +static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } From patchwork Tue Dec 10 00:49:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900577 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CB3A1CEAC0; Tue, 10 Dec 2024 00:48:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791700; cv=none; b=IOPGw6svb/OWf7nO7nUPRZxVTVfpt/bvZ+qpGbbJNo+heN44mBpUq9QMJnazXZXQX7iBPh2vP/V6fwdLuPw3oryJ0rGnTg9BrlZywkbUN1vzlQE+jVgKfa00XxXYJW3bM/LVK7YBaXOk1uQELkevKTorT5Bd7qOHduFmHSXJG5k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791700; c=relaxed/simple; bh=gBLPzmj8cVe3Ps1IqTTp5ny47gy8z17oWB2zry3ynkY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mRF20KX+FE2sDgYYQwQeQtt6tnjAOe1UT+HC34Vem7rUO5angNJyONM1Grrhs6gQ/bBsraS4YCJ9XufDhQhsMK+CCwkAFFAyT3Kc6RGH5nOscJbEAZUeyidtpo5vJVx+NDZ4CaUgCkMfh88KLPR3cQUl+amAsndaEKM4GiTfoJg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Z30iF2uE; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Z30iF2uE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791698; x=1765327698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gBLPzmj8cVe3Ps1IqTTp5ny47gy8z17oWB2zry3ynkY=; b=Z30iF2uErzbo94PF7S4RXEgW6wcz/SXrbWDOTB3TUW7g4f99PtEMVmXH fQcduoTeaRrnpK3U5cfMgrytTwhNQbN2zUM3xtAebZHWtxDOi9SoH+xmW 17xWe1njF/JSgmq3YBaOTUfmJWfwf73eG++EQAU1D/CJ7t7VHmkZrvEZ5 PASeG4UX94w5z3PrQ4Ou/yqBngl95tQ9a9zHvu4jaDLqEhqs+dWVicP03 GBNQKUAw6y5PRCIAxDo9ihGW9eUm5x9l5LQpMsjSPG+143OxtfEVWoaBd ejRMBltFoqSJp4GMHWom5vB1VeGb0X/jU6swGMD29qaFQaC1GQp/1e3YY w==; X-CSE-ConnectionGUID: XuUGAJHKSCS+pZP2dTswgQ== X-CSE-MsgGUID: Poj29lamS8CAPUTWNxpYcA== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793714" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793714" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:18 -0800 X-CSE-ConnectionGUID: pQULy6oqR7mRScT//9vZSw== X-CSE-MsgGUID: L4vpe74tQLqkV83G5LtrxA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033046" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:14 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 08/18] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall Date: Tue, 10 Dec 2024 08:49:34 +0800 Message-ID: <20241210004946.3718496-9-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Wire up TDX PV rdmsr/wrmsr hypercall to the KVM backend function. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- TDX "the rest" breakout: - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- arch/x86/kvm/vmx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 770e3b847cd6..d5343e2ba509 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1493,6 +1493,41 @@ static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) return 1; } +static int tdx_emulate_rdmsr(struct kvm_vcpu *vcpu) +{ + u32 index = tdvmcall_a0_read(vcpu); + u64 data; + + if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ) || + kvm_get_msr(vcpu, index, &data)) { + trace_kvm_msr_read_ex(index); + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; + } + trace_kvm_msr_read(index, data); + + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + tdvmcall_set_return_val(vcpu, data); + return 1; +} + +static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu) +{ + u32 index = tdvmcall_a0_read(vcpu); + u64 data = tdvmcall_a1_read(vcpu); + + if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE) || + kvm_set_msr(vcpu, index, data)) { + trace_kvm_msr_write_ex(index, data); + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; + } + + trace_kvm_msr_write(index, data); + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + return 1; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { if (tdvmcall_exit_type(vcpu)) @@ -1511,6 +1546,10 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_emulate_io(vcpu); case EXIT_REASON_EPT_VIOLATION: return tdx_emulate_mmio(vcpu); + case EXIT_REASON_MSR_READ: + return tdx_emulate_rdmsr(vcpu); + case EXIT_REASON_MSR_WRITE: + return tdx_emulate_wrmsr(vcpu); default: break; } From patchwork Tue Dec 10 00:49:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900578 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C3AD1D5ADE; Tue, 10 Dec 2024 00:48:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791703; cv=none; b=Dfco3AlKeFUQ9Juj3LN6KNJVNUWXxpuK8Hmunc8yHiKKjWGZvNZY3nYXmbH4h6/zr8J1OBpBV0iAiWM+oWYP/p1ZEgNsZWMFkvTacLvXZI8FFCB0yo6shP8BkKVeh83g1TFUagYML0c3aFK5hRcTv2phAZ5lAj7M19yF817Unxs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791703; c=relaxed/simple; bh=6TNcsdWCwMI7RV7T6/6/KVAHkwUaJrJW56PM6AigFd0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Pp36eXIHkvSAWhZ1AMeMvo8i7A7BQiVNikSOh5BLCounLPAGVU0QXy3KH5J+vi6QoGpxU9QC2m2ThVaucjkCbzV2Nk5ldSeWrw+UaCtHQ8CsqQaDb7yrpapAimKfYF59wUL1pD6poTazmnaHWnJO48yK+XE+NGSrsKv07hsS4XA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iUvAczlz; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iUvAczlz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791702; x=1765327702; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6TNcsdWCwMI7RV7T6/6/KVAHkwUaJrJW56PM6AigFd0=; b=iUvAczlzJasnlL3yxaQ6B98ndtUTHhbJINsh17ZzB+NBT2QbgPUGAdMd I3IRq60Mte2dhSuYPO9PDzGXiHrkN6Kf9NOWlD31dDsvZ3jb8IaLpwSId I5ow7F9mcLKeg9PfqauXVYJvwc2ZIjy1sOkKl4IDT4rg04KjRgEGsHyLm T2VwMHa50d3JC1EDkG0FYVchrSLNKJQoynRBhPHZ9M8zM3cYNAkqkpkpW rwg+GxVzNZ7puvOFcfXc29KI0CZJw9uYSelEaYK6SuWlnYgrqAU5Fu/Y6 /+K1uXfWDy9VF+TmEOzzoTwwwV131Yfo6JxR/kRBNf1AU7plSKhMXmNkC Q==; X-CSE-ConnectionGUID: 1SIF906yTNyRkOnfKMYKSw== X-CSE-MsgGUID: 3jkBUaviT0K6J6puXIqKug== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793725" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793725" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:21 -0800 X-CSE-ConnectionGUID: pWyvm5F1T4Synv+KNOgWxg== X-CSE-MsgGUID: V1AerEyeTBOxunGO53/Yyg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033050" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:18 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 09/18] KVM: TDX: Enable guest access to LMCE related MSRs Date: Tue, 10 Dec 2024 08:49:35 +0800 Message-ID: <20241210004946.3718496-10-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Allow TDX guest to configure LMCE (Local Machine Check Event) by handling MSR IA32_FEAT_CTL and IA32_MCG_EXT_CTL. MCE and MCA are advertised via cpuid based on the TDX module spec. Guest kernel can access IA32_FEAT_CTL to check whether LMCE is opted-in by the platform or not. If LMCE is opted-in by the platform, guest kernel can access IA32_MCG_EXT_CTL to enable/disable LMCE. Handle MSR IA32_FEAT_CTL and IA32_MCG_EXT_CTL for TDX guests to avoid failure when a guest accesses them with TDG.VP.VMCALL on #VE. E.g., Linux guest will treat the failure as a #GP(0). Userspace VMM may not opt-in LMCE by default, e.g., QEMU disables it by default, "-cpu lmce=on" is needed in QEMU command line to opt-in it. Signed-off-by: Isaku Yamahata [binbin: rework changelog] Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Renamed from "KVM: TDX: Handle MSR IA32_FEAT_CTL MSR and IA32_MCG_EXT_CTL" to "KVM: TDX: Enable guest access to LMCE related MSRs". - Update changelog. - Check reserved bits are not set when set MSR_IA32_MCG_EXT_CTL. --- arch/x86/kvm/vmx/tdx.c | 46 +++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d5343e2ba509..b5aae9d784f7 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2036,6 +2036,7 @@ bool tdx_has_emulated_msr(u32 index) case MSR_MISC_FEATURES_ENABLES: case MSR_IA32_APICBASE: case MSR_EFER: + case MSR_IA32_FEAT_CTL: case MSR_IA32_MCG_CAP: case MSR_IA32_MCG_STATUS: case MSR_IA32_MCG_CTL: @@ -2068,26 +2069,53 @@ bool tdx_has_emulated_msr(u32 index) static bool tdx_is_read_only_msr(u32 index) { - return index == MSR_IA32_APICBASE || index == MSR_EFER; + return index == MSR_IA32_APICBASE || index == MSR_EFER || + index == MSR_IA32_FEAT_CTL; } int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { - if (!tdx_has_emulated_msr(msr->index)) - return 1; + switch (msr->index) { + case MSR_IA32_FEAT_CTL: + /* + * MCE and MCA are advertised via cpuid. Guest kernel could + * check if LMCE is enabled or not. + */ + msr->data = FEAT_CTL_LOCKED; + if (vcpu->arch.mcg_cap & MCG_LMCE_P) + msr->data |= FEAT_CTL_LMCE_ENABLED; + return 0; + case MSR_IA32_MCG_EXT_CTL: + if (!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) + return 1; + msr->data = vcpu->arch.mcg_ext_ctl; + return 0; + default: + if (!tdx_has_emulated_msr(msr->index)) + return 1; - return kvm_get_msr_common(vcpu, msr); + return kvm_get_msr_common(vcpu, msr); + } } int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { - if (tdx_is_read_only_msr(msr->index)) - return 1; + switch (msr->index) { + case MSR_IA32_MCG_EXT_CTL: + if ((!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) || + (msr->data & ~MCG_EXT_CTL_LMCE_EN)) + return 1; + vcpu->arch.mcg_ext_ctl = msr->data; + return 0; + default: + if (tdx_is_read_only_msr(msr->index)) + return 1; - if (!tdx_has_emulated_msr(msr->index)) - return 1; + if (!tdx_has_emulated_msr(msr->index)) + return 1; - return kvm_set_msr_common(vcpu, msr); + return kvm_set_msr_common(vcpu, msr); + } } static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) From patchwork Tue Dec 10 00:49:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900579 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1D591D6DB8; Tue, 10 Dec 2024 00:48:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791707; cv=none; b=p8ohiur4R8moBuYXc2JnLt63wn7f3kcxaWPIWRq7sy2ajMDa92lPb2oDdCl+/x6OjuKUUkoFIZuEkVeOqE13Y7AAgmVb/gSqgxSP0nxEpJBK9ibaM8RxxXepe/XusimAU16E/bT3fHbNz1qKlj3Qcxz2azCQ/4mrGyUCjd6jqGY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791707; c=relaxed/simple; bh=3FAr91JGgHlr+EI5Okkc7KA7AOh/3MQ8oJ1eubya5lc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CeLlkaS8+Ea8gRLNwCGzZXWoafD1UuM4KWa1FPo6T9AGjUQ01aTDkNG3a3mFdhkaaqJMJsA+qi02fYpYEEDyMCbunkFenYCrt/LvDEahsh7941Gne4kMdW6vRYoxRPmVcPNPi+oDfqTwy0cNc+sdH7pjbSNAiMttKd9gSYh+ymE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ke8FZ1PP; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ke8FZ1PP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791705; x=1765327705; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3FAr91JGgHlr+EI5Okkc7KA7AOh/3MQ8oJ1eubya5lc=; b=ke8FZ1PPGLECU9K/8vCdQTIlj98O5I9cUZJQjfELpMd+odWqEIw/wBYd PzG9/1r8q6hkfJd6nDRWSjYnzkIik3sDHUjStfr1YuqmOOXiuZSxV5N2W zgXJ3E4xdDfsnQuwW2t2Jynz01YOoD7RtYRlRn4E4gnA6o5jKzyR0DY9E 8DleLNN+4gaVBTv03piBvrmb+WzFvDDi5UGv6/VPsUyyhw2z5on2LkEFK ZYujVr4Cz3vdlv4lnvhl6IBl1EVVHgSnH2PqJ310XLVUs8t5WtCJkU4lO uCcq9PkmhDQPsZDZpu32nvNYuITf17ObCpIzTaMVqruv4mrN4DS1mUHyo A==; X-CSE-ConnectionGUID: xMof4d6CTNCJcPAG5z9wAw== X-CSE-MsgGUID: uSTHTZZSScGn23NNbZypxQ== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793732" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793732" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:25 -0800 X-CSE-ConnectionGUID: FMUVT5oISIiFE8aw8bxf+g== X-CSE-MsgGUID: a3HxPKJGTNSJgJkL3zBVKQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033053" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:21 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 10/18] KVM: TDX: Handle TDG.VP.VMCALL hypercall Date: Tue, 10 Dec 2024 08:49:36 +0800 Message-ID: <20241210004946.3718496-11-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement TDG.VP.VMCALL hypercall. If the input value is zero, return success code and zero in output registers. TDG.VP.VMCALL hypercall is a subleaf of TDG.VP.VMCALL to enumerate which TDG.VP.VMCALL sub leaves are supported. This hypercall is for future enhancement of the Guest-Host-Communication Interface (GHCI) specification. The GHCI version of 344426-001US defines it to require input R12 to be zero and to return zero in output registers, R11, R12, R13, and R14 so that guest TD enumerates no enhancement. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) v19: - rename TDG_VP_VMCALL_GET_TD_VM_CALL_INFO => TDVMCALL_GET_TD_VM_CALL_INFO --- arch/x86/include/asm/shared/tdx.h | 1 + arch/x86/kvm/vmx/tdx.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index a602d081cf1c..192ae798b214 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -22,6 +22,7 @@ #define TDCS_NOTIFY_ENABLES 0x9100000000000010 /* TDX hypercall Leaf IDs */ +#define TDVMCALL_GET_TD_VM_CALL_INFO 0x10000 #define TDVMCALL_MAP_GPA 0x10001 #define TDVMCALL_GET_QUOTE 0x10002 #define TDVMCALL_REPORT_FATAL_ERROR 0x10003 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b5aae9d784f7..413359741085 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1528,6 +1528,20 @@ static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu) return 1; } +static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu) +{ + if (tdvmcall_a0_read(vcpu)) + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + else { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + kvm_r11_write(vcpu, 0); + tdvmcall_a0_write(vcpu, 0); + tdvmcall_a1_write(vcpu, 0); + tdvmcall_a2_write(vcpu, 0); + } + return 1; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { if (tdvmcall_exit_type(vcpu)) @@ -1550,6 +1564,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_emulate_rdmsr(vcpu); case EXIT_REASON_MSR_WRITE: return tdx_emulate_wrmsr(vcpu); + case TDVMCALL_GET_TD_VM_CALL_INFO: + return tdx_get_td_vm_call_info(vcpu); default: break; } From patchwork Tue Dec 10 00:49:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900580 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63B901D7E41; Tue, 10 Dec 2024 00:48:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791711; cv=none; b=uT1gOJcg4a2zaZQ77/77EOZD7h2cj71rBaIRLMhWxxuyF0Wf9yqt0QX2ZPMTNwhp99hD9WJ2GFLKD02aWhMs1+7cDXfaJUsnA+MBtpZHaYD4QLcwt7zaC5V6HGWqAPMBOVadtdTp++KJxHSUGyNHV55eYChYgIWENmicDe/o29o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791711; c=relaxed/simple; bh=EYUts2Z2lAdPO9LcGK1xT8MkrrbCZ7m4cknAafwwwDg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XRUdWGLQP2wdLhEsJnJt4k3MWOHHyZAHPOomQtQjvdZfagVTXPyitPLM29con1o2pRZWW6D++Hqb5TrAoLR9a00YifPGIBOY425vvoN9P2EUePAs8bqHlGJRtB5goL+NiaGDE9adOQtRpCfOH9Bf+i/ETUfCyYQd1A0ZLqftrEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cnXeVBvJ; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cnXeVBvJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791709; x=1765327709; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EYUts2Z2lAdPO9LcGK1xT8MkrrbCZ7m4cknAafwwwDg=; b=cnXeVBvJLmqfnc6SlRslqvxgw0pITrBWEeNF2tTECgjv3oWFVsKxsGz0 bWiYMdHGnv18Y2hySNA4tcDidc8DvLMQODgFsHIA2Y9D6GSOWDqdk7gyZ NK31zBtjNKI/zUvrkmbIujC9tyBZBlDj7hPg0lPrhWI/q84aEdOYrl1GK ZnaEEfWZGQFpVttVyW8c6PVRvftx4ydTcm05JG4hIDlo1hPvq6CzaJDA1 mwa5igNVP5+OISo4xZU7w1kEIsnjjG4x9IiZ+tGR6azVp3or0CD6JWVKv nlfoA6X0Hj7gk/Hlg2DqUsXAv8b21wu0YBzK+kJF0ymR5HovZuImTCoBz Q==; X-CSE-ConnectionGUID: v5jzqwRpTz2WwwvHXXOubw== X-CSE-MsgGUID: AafqVSnDQiC8UVwNK9pMfA== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793745" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793745" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:29 -0800 X-CSE-ConnectionGUID: hro0ympISc6t/f83oqdnKQ== X-CSE-MsgGUID: vgeUAg6nSamHS4zhJp+VYQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033061" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:25 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 11/18] KVM: TDX: Add methods to ignore accesses to CPU state Date: Tue, 10 Dec 2024 08:49:37 +0800 Message-ID: <20241210004946.3718496-12-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX protects TDX guest state from VMM. Implement access methods for TDX guest state to ignore them or return zero. Because those methods can be called by kvm ioctls to set/get cpu registers, they don't have KVM_BUG_ON. Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Dropped KVM_BUG_ON() in vt_sync_dirty_debug_regs(). (Rick) Since the KVM_BUG_ON() is removed, change the changlog accordingly. - Remove unnecessary wrappers. (Binbin) --- arch/x86/kvm/vmx/main.c | 288 +++++++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.c | 28 +++- arch/x86/kvm/vmx/x86_ops.h | 4 + 3 files changed, 291 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index a224e9d32701..f6b449ae1ef7 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -333,6 +333,200 @@ static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector); } +static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_vcpu_after_set_cpuid(vcpu); +} + +static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_update_exception_bitmap(vcpu); +} + +static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_segment_base(vcpu, seg); +} + +static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, + int seg) +{ + if (is_td_vcpu(vcpu)) { + memset(var, 0, sizeof(*var)); + return; + } + + vmx_get_segment(vcpu, var, seg); +} + +static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, + int seg) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_segment(vcpu, var, seg); +} + +static int vt_get_cpl(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_cpl(vcpu); +} + +static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) +{ + if (is_td_vcpu(vcpu)) { + *db = 0; + *l = 0; + return; + } + + vmx_get_cs_db_l_bits(vcpu, db, l); +} + +static bool vt_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (is_td_vcpu(vcpu)) + return true; + + return vmx_is_valid_cr0(vcpu, cr0); +} + +static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_cr0(vcpu, cr0); +} + +static bool vt_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + if (is_td_vcpu(vcpu)) + return true; + + return vmx_is_valid_cr4(vcpu, cr4); +} + +static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_cr4(vcpu, cr4); +} + +static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_set_efer(vcpu, efer); +} + +static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) { + memset(dt, 0, sizeof(*dt)); + return; + } + + vmx_get_idt(vcpu, dt); +} + +static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_idt(vcpu, dt); +} + +static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) { + memset(dt, 0, sizeof(*dt)); + return; + } + + vmx_get_gdt(vcpu, dt); +} + +static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_gdt(vcpu, dt); +} + +static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_dr7(vcpu, val); +} + +static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) +{ + /* + * MOV-DR exiting is always cleared for TD guest, even in debug mode. + * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never + * reach here for TD vcpu. + */ + if (is_td_vcpu(vcpu)) + return; + + vmx_sync_dirty_debug_regs(vcpu); +} + +static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) +{ + if (is_td_vcpu(vcpu)) { + tdx_cache_reg(vcpu, reg); + return; + } + + vmx_cache_reg(vcpu, reg); +} + +static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_rflags(vcpu); +} + +static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_rflags(vcpu, rflags); +} + +static bool vt_get_if_flag(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return false; + + return vmx_get_if_flag(vcpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -455,6 +649,14 @@ static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) vmx_inject_irq(vcpu, reinjected); } +static void vt_inject_exception(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_inject_exception(vcpu); +} + static void vt_cancel_injection(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) @@ -491,6 +693,14 @@ static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); } +static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_update_cr8_intercept(vcpu, tpr, irr); +} + static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) @@ -507,6 +717,30 @@ static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) vmx_refresh_apicv_exec_ctrl(vcpu); } +static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap); +} + +static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr) +{ + if (is_td(kvm)) + return 0; + + return vmx_set_tss_addr(kvm, addr); +} + +static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) +{ + if (is_td(kvm)) + return 0; + + return vmx_set_identity_map_addr(kvm, ident_addr); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -569,30 +803,30 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_vcpu_load, .vcpu_put = vt_vcpu_put, - .update_exception_bitmap = vmx_update_exception_bitmap, + .update_exception_bitmap = vt_update_exception_bitmap, .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_get_msr, .set_msr = vt_set_msr, - .get_segment_base = vmx_get_segment_base, - .get_segment = vmx_get_segment, - .set_segment = vmx_set_segment, - .get_cpl = vmx_get_cpl, - .get_cs_db_l_bits = vmx_get_cs_db_l_bits, - .is_valid_cr0 = vmx_is_valid_cr0, - .set_cr0 = vmx_set_cr0, - .is_valid_cr4 = vmx_is_valid_cr4, - .set_cr4 = vmx_set_cr4, - .set_efer = vmx_set_efer, - .get_idt = vmx_get_idt, - .set_idt = vmx_set_idt, - .get_gdt = vmx_get_gdt, - .set_gdt = vmx_set_gdt, - .set_dr7 = vmx_set_dr7, - .sync_dirty_debug_regs = vmx_sync_dirty_debug_regs, - .cache_reg = vmx_cache_reg, - .get_rflags = vmx_get_rflags, - .set_rflags = vmx_set_rflags, - .get_if_flag = vmx_get_if_flag, + .get_segment_base = vt_get_segment_base, + .get_segment = vt_get_segment, + .set_segment = vt_set_segment, + .get_cpl = vt_get_cpl, + .get_cs_db_l_bits = vt_get_cs_db_l_bits, + .is_valid_cr0 = vt_is_valid_cr0, + .set_cr0 = vt_set_cr0, + .is_valid_cr4 = vt_is_valid_cr4, + .set_cr4 = vt_set_cr4, + .set_efer = vt_set_efer, + .get_idt = vt_get_idt, + .set_idt = vt_set_idt, + .get_gdt = vt_get_gdt, + .set_gdt = vt_set_gdt, + .set_dr7 = vt_set_dr7, + .sync_dirty_debug_regs = vt_sync_dirty_debug_regs, + .cache_reg = vt_cache_reg, + .get_rflags = vt_get_rflags, + .set_rflags = vt_set_rflags, + .get_if_flag = vt_get_if_flag, .flush_tlb_all = vt_flush_tlb_all, .flush_tlb_current = vt_flush_tlb_current, @@ -609,7 +843,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .patch_hypercall = vmx_patch_hypercall, .inject_irq = vt_inject_irq, .inject_nmi = vt_inject_nmi, - .inject_exception = vmx_inject_exception, + .inject_exception = vt_inject_exception, .cancel_injection = vt_cancel_injection, .interrupt_allowed = vt_interrupt_allowed, .nmi_allowed = vt_nmi_allowed, @@ -617,13 +851,13 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_nmi_mask = vt_set_nmi_mask, .enable_nmi_window = vt_enable_nmi_window, .enable_irq_window = vt_enable_irq_window, - .update_cr8_intercept = vmx_update_cr8_intercept, + .update_cr8_intercept = vt_update_cr8_intercept, .x2apic_icr_is_split = false, .set_virtual_apic_mode = vt_set_virtual_apic_mode, .set_apic_access_page_addr = vt_set_apic_access_page_addr, .refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl, - .load_eoi_exitmap = vmx_load_eoi_exitmap, + .load_eoi_exitmap = vt_load_eoi_exitmap, .apicv_pre_state_restore = vt_apicv_pre_state_restore, .required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS, .hwapic_irr_update = vt_hwapic_irr_update, @@ -633,13 +867,13 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt, .protected_apic_has_interrupt = tdx_protected_apic_has_interrupt, - .set_tss_addr = vmx_set_tss_addr, - .set_identity_map_addr = vmx_set_identity_map_addr, + .set_tss_addr = vt_set_tss_addr, + .set_identity_map_addr = vt_set_identity_map_addr, .get_mt_mask = vmx_get_mt_mask, .get_exit_info = vt_get_exit_info, - .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, + .vcpu_after_set_cpuid = vt_vcpu_after_set_cpuid, .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 413359741085..4bf3a6dc66fc 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -733,8 +733,15 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.tsc_offset = kvm_tdx->tsc_offset; vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset; - vcpu->arch.guest_state_protected = - !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG); + /* + * TODO: support off-TD debug. If TD DEBUG is enabled, guest state + * can be accessed. guest_state_protected = false. and kvm ioctl to + * access CPU states should be usable for user space VMM (e.g. qemu). + * + * vcpu->arch.guest_state_protected = + * !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG); + */ + vcpu->arch.guest_state_protected = true; if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE) vcpu->arch.xfd_no_write_intercept = true; @@ -2134,6 +2141,23 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) } } +void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) +{ + kvm_register_mark_available(vcpu, reg); + switch (reg) { + case VCPU_REGS_RSP: + case VCPU_REGS_RIP: + case VCPU_EXREG_PDPTR: + case VCPU_EXREG_CR0: + case VCPU_EXREG_CR3: + case VCPU_EXREG_CR4: + break; + default: + KVM_BUG_ON(1, vcpu->kvm); + break; + } +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 7fb1bbf12b39..5b79bb72b13a 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -146,6 +146,8 @@ bool tdx_has_emulated_msr(u32 index); int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); +void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg); + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, @@ -193,6 +195,8 @@ static inline bool tdx_has_emulated_msr(u32 index) { return false; } static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } +static inline void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) {} + static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, From patchwork Tue Dec 10 00:49:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900581 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18C5E70839; Tue, 10 Dec 2024 00:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791714; cv=none; b=N58rCOi1kF3zc5b2sEjIL0SCYw3D/afK0GZCVd/IZ2kUHiYBskqZSNFdkz9MFRSa5CwOWH7W/PHZM3P8n1FNP3pYj8gr1+OwQgrdZtCKTjlhvBeKZqbztRd3mf0HlE4iMvomGEjBXHWZdeGEMvu/3Cro9aw02gK+Rn96TgrGwU8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791714; c=relaxed/simple; bh=gp8+ZOguOpyMDySL02wo3b9lFOkGzOg9r3xwCJHI8Ng=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NRuZ+je+FxzwmzLCIgjKP5t5CfrxR4YcmWpIGXLpDu4+86/qKVvMMPvzG9KUVlYMR/zkMjb7tfrCqrQ5AvKriQIL09EtuTt8+SzX9myec0zKCUND4oj1a4C1rSDq6w078pP/I6VoVftmKxsXjxjhUSoHRmfIs7BGOTi6U1moKcM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SBj/8/g8; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SBj/8/g8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791713; x=1765327713; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gp8+ZOguOpyMDySL02wo3b9lFOkGzOg9r3xwCJHI8Ng=; b=SBj/8/g82fvFAUUAYqTjrvsn5nGbfObGaQLROFQVHlPH82YQe8+3YjiO 22AblyPfzH+5BXD+qjwQqTaw7vgl8XG78sfQLVLcQjE0DNy8QyJgZmXW9 SfX3JCHJUp1pZqMdgqrs/o85r1+55muQsUiYEWOaf0k1XpZva/d2N2vG9 SSgsJDHR+8ip9oHpQkf4CLx7cLbzYs7u8i67WYoH5l2z9L4A6/vvImT1O IJ0NBKs/wyHFVF9hfn8raKaVL6MghLyh4Lakvx8PondnpgkIwMgcQxsdG cNF6b7F2mH7jfFLDqpSnUW5IiE5ZazG7JlBUXIo0UWovIVwoZPp2mvF0d Q==; X-CSE-ConnectionGUID: TqU+v5OTQvCamg+YDRu7Rg== X-CSE-MsgGUID: Dfbi9Wd2QyGedAO5iGRG5A== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793754" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793754" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:32 -0800 X-CSE-ConnectionGUID: IMFRehYkT9KlmoJICb4Yjw== X-CSE-MsgGUID: 6htOobOoRxqq8fvLW3SD0w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033068" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:29 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 12/18] KVM: TDX: Add method to ignore guest instruction emulation Date: Tue, 10 Dec 2024 08:49:38 +0800 Message-ID: <20241210004946.3718496-13-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Skip instruction emulation and let the TDX guest retry for MMIO emulation after installing the MMIO SPTE with suppress #VE bit cleared. TDX protects TDX guest state from VMM, instructions in guest memory cannot be emulated. MMIO emulation is the only case that triggers the instruction emulation code path for TDX guest. The MMIO emulation handling flow as following: - The TDX guest issues a vMMIO instruction. (The GPA must be shared and is not covered by KVM memory slot.) - The default SPTE entry for shared-EPT by KVM has suppress #VE bit set. So EPT violation causes TD exit to KVM. - Trigger KVM page fault handler and install a new SPTE with suppress #VE bit cleared. - Skip instruction emulation and return X86EMU_RETRY_INSTR to let the vCPU retry. - TDX guest re-executes the vMMIO instruction. - TDX guest gets #VE because KVM has cleared #VE suppress bit. - TDX guest #VE handler converts MMIO into TDG.VP.VMCALL Return X86EMU_RETRY_INSTR in the callback check_emulate_instruction() for TDX guests to retry the MMIO instruction. Also, the instruction emulation handling will be skipped, so that the callback check_intercept() will never be called for TDX guest. Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Dropped vt_check_intercept(). - Add a comment in vt_check_emulate_instruction(). - Update the changelog. --- arch/x86/kvm/vmx/main.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index f6b449ae1ef7..c97d0540a385 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -268,6 +268,22 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu) } #endif +static int vt_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, + void *insn, int insn_len) +{ + /* + * For TDX, this can only be triggered for MMIO emulation. Let the + * guest retry after installing the SPTE with suppress #VE bit cleared, + * so that the guest will receive #VE when retry. The guest is expected + * to call TDG.VP.VMCALL to request VMM to do MMIO emulation on + * #VE. + */ + if (is_td_vcpu(vcpu)) + return X86EMUL_RETRY_INSTR; + + return vmx_check_emulate_instruction(vcpu, emul_type, insn, insn_len); +} + static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu) { /* @@ -909,7 +925,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .enable_smi_window = vt_enable_smi_window, #endif - .check_emulate_instruction = vmx_check_emulate_instruction, + .check_emulate_instruction = vt_check_emulate_instruction, .apic_init_signal_blocked = vt_apic_init_signal_blocked, .migrate_timers = vmx_migrate_timers, From patchwork Tue Dec 10 00:49:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900582 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABEC61D968E; Tue, 10 Dec 2024 00:48:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791718; cv=none; b=Km975na3hUvgg8j26UU0RGYYVVjq2L+FlZMQTHLcM/P6uglJXklsFOZc6n3ss1NO8WqYdZZg0Gz+ow0SIGd8+iTxLILU94J74q53cp3nOLEg1PfUAqFRKiB+DslxmDi80zjYkEC7o2GDCbVwpftn4ncYs2Gebe0LKxYpz3CM3D8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791718; c=relaxed/simple; bh=abkJj1S2ONcjiOqFLrz1o9/veYrxrEpPuYVVFZmUZMA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=It7Wv9qxdv031s6pTvV5h39Tx/ABMQQk5pxPOI2MWi55ILwMy7ucsmIBSjaOL43+/uhil+cZHOkMYNvqcHyizBtugwM8NU7qLDfORmIYu4cEoNYzt0tW/33+8i51xncMAJBWlEqiTML3jCPT/cjcvS5xYTCuO+3UpFAQabeq0VM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KzfkqQvi; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KzfkqQvi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791716; x=1765327716; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=abkJj1S2ONcjiOqFLrz1o9/veYrxrEpPuYVVFZmUZMA=; b=KzfkqQviz2CMx92tHJBCpq9JSJVRyAFiNDLZoNQUjNmQzRTeaIe1YS2r cxCArLUgiOr4iRyP83aA7cp6A7RQTCt0/YvuZqcxMl75gcKByrsTPylAX nbk3B1n0fC2PKMQVdMzdu5RfwWNT2iOH7CH9QYKeZN2UmZXuOONpzmNZK J+eLYltGcVVfjwRnZK/RKEsSuzWBSpvizX9XU8QClsg5XjJEVCzovgU6c ENYoihyqqqK+HYu3r6IV4gW0HfJBIZvezLcBSp6ECsz/SfQcyJWfzprrx arU8Arfs/rZFpuXPVQBzrWbr2VneQ4aGr9HpO2n1aqtNyi83SD9zY1mc3 g==; X-CSE-ConnectionGUID: htfJSwhCTzm3blk9cfnxfA== X-CSE-MsgGUID: 1IZK7dRjRra00IFlb/BMEA== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793762" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793762" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:36 -0800 X-CSE-ConnectionGUID: IIyDdVHQQti7O3djBBLcAw== X-CSE-MsgGUID: ddFBGUKPRlWC+nXAWLG4uQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033077" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:32 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 13/18] KVM: TDX: Add methods to ignore VMX preemption timer Date: Tue, 10 Dec 2024 08:49:39 +0800 Message-ID: <20241210004946.3718496-14-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX doesn't support VMX preemption timer. Implement access methods for VMM to ignore VMX preemption timer. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Dropped KVM_BUG_ON() in vt_cancel_hv_timer(). (Rick) --- arch/x86/kvm/vmx/main.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c97d0540a385..4a9b176b8a36 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -757,6 +757,27 @@ static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) return vmx_set_identity_map_addr(kvm, ident_addr); } +#ifdef CONFIG_X86_64 +static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, + bool *expired) +{ + /* VMX-preemption timer isn't available for TDX. */ + if (is_td_vcpu(vcpu)) + return -EINVAL; + + return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired); +} + +static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu) +{ + /* VMX-preemption timer can't be set. See vt_set_hv_timer(). */ + if (is_td_vcpu(vcpu)) + return; + + vmx_cancel_hv_timer(vcpu); +} +#endif + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -912,8 +933,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .pi_start_assignment = vmx_pi_start_assignment, #ifdef CONFIG_X86_64 - .set_hv_timer = vmx_set_hv_timer, - .cancel_hv_timer = vmx_cancel_hv_timer, + .set_hv_timer = vt_set_hv_timer, + .cancel_hv_timer = vt_cancel_hv_timer, #endif .setup_mce = vmx_setup_mce, From patchwork Tue Dec 10 00:49:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900583 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57CDF21E0AC; Tue, 10 Dec 2024 00:48:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791721; cv=none; b=BIyfbIc0n4kMcgP1vaIjlQD4SExf/H63y0Vxiz5PPGI/pslcphGXIPfTTNylCutJuX2WjKhE+tGn1ZPOq3j240rNSxdUh8ejIUW62qNAaXsL6OdI42/tPcQS+NSzW4GOQg4MrSaYZsTt/xP9MbVJEkSZfrBHwZrlJbuNHL7pxAQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791721; c=relaxed/simple; bh=fzLIhQN5WxLViXl8Ip86IRGpAtvCs3G+P+plG8lekls=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s92zZM3jPQRbKjNXjE5DjV36lNa0JkFzcPMih6zy6ZqCwaj0EkXSd/b/k2pkZKssV83JLC+ikuVf08dojbXcHG9BGd4uMZp2rUFxjWFfDlthBHBfecbU6MKLALbcsOoWQ25B22qOSjyGK/RR6uxTPD/2LoI+T9Z9d9Y/78/DrIA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HCkScfyz; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HCkScfyz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791720; x=1765327720; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fzLIhQN5WxLViXl8Ip86IRGpAtvCs3G+P+plG8lekls=; b=HCkScfyzHtDWduSTLsHHsfUbCcgwMjz/QngC5pA1s2Xepq7UI+qRrqZ1 VummAl2Ykx2VNWJZBfI56eZYSfzZc45c4SaNCPrRdQJ35DvQvBZ1I2bvh 1pT/1STrIa2nm6M1P/kHSuHqet63AB5+jNJIJJ4yYonweeA/RCdllwrtF OIC/stzgl4i+DR7zjq18iPfgBWk+0DHtrtVeTXd0BxMdvLrqVlSAFG71B djYi4KB1Sed3YYlIyXw3ybwBeecIzgOdS1GKEnKCyY2elfQoKfiQ9LZAh 0fevKbfMHbbdroRzz3x2DKiY4Q/VbZA9qu6fMupVA/1nfdPYpPGmhOjDv A==; X-CSE-ConnectionGUID: ffOn9efhQFCmultu3A8NAg== X-CSE-MsgGUID: vFMVC6/tTGGmzuTZsQohIg== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793770" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793770" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:40 -0800 X-CSE-ConnectionGUID: rSh7Pie4TZ2CEc3uiWlqiw== X-CSE-MsgGUID: P57dJcm6R3uQBhx75zhw5Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033083" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:36 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 14/18] KVM: TDX: Add methods to ignore accesses to TSC Date: Tue, 10 Dec 2024 08:49:40 +0800 Message-ID: <20241210004946.3718496-15-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX protects TDX guest TSC state from VMM. Implement access methods to ignore guest TSC. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Dropped KVM_BUG_ON() in vt_get_l2_tsc_offset(). (Rick) --- arch/x86/kvm/vmx/main.c | 44 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 4a9b176b8a36..81ca5acb9964 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -757,6 +757,42 @@ static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) return vmx_set_identity_map_addr(kvm, ident_addr); } +static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu) +{ + /* TDX doesn't support L2 guest at the moment. */ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_l2_tsc_offset(vcpu); +} + +static u64 vt_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu) +{ + /* TDX doesn't support L2 guest at the moment. */ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_l2_tsc_multiplier(vcpu); +} + +static void vt_write_tsc_offset(struct kvm_vcpu *vcpu) +{ + /* In TDX, tsc offset can't be changed. */ + if (is_td_vcpu(vcpu)) + return; + + vmx_write_tsc_offset(vcpu); +} + +static void vt_write_tsc_multiplier(struct kvm_vcpu *vcpu) +{ + /* In TDX, tsc multiplier can't be changed. */ + if (is_td_vcpu(vcpu)) + return; + + vmx_write_tsc_multiplier(vcpu); +} + #ifdef CONFIG_X86_64 static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, bool *expired) @@ -914,10 +950,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, - .get_l2_tsc_offset = vmx_get_l2_tsc_offset, - .get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier, - .write_tsc_offset = vmx_write_tsc_offset, - .write_tsc_multiplier = vmx_write_tsc_multiplier, + .get_l2_tsc_offset = vt_get_l2_tsc_offset, + .get_l2_tsc_multiplier = vt_get_l2_tsc_multiplier, + .write_tsc_offset = vt_write_tsc_offset, + .write_tsc_multiplier = vt_write_tsc_multiplier, .load_mmu_pgd = vt_load_mmu_pgd, From patchwork Tue Dec 10 00:49:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900584 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 049912248B6; Tue, 10 Dec 2024 00:48:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791725; cv=none; b=JEfWQ7pDaNIjfqnI4Yf5SemTmDNQbPlWBtBCXd3B9B/MglZRgWRyu3SyYmYVkqBvnZ8A825uEUH5YSXWTBCtutpzRysbSlCC6FmfUVjKTtSPz6NJYjVZUPWXZVeT9EZOVwZeZYYuHhtiDGxMpURtlu/xi7e2tcxIuquRznfp+eo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791725; c=relaxed/simple; bh=SSrfpZufJZ+LgQYUKOwSmGT0aapFamc5o1VENM+ecPY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cdA35FOgiOm30rQ7kIa9/rDyqfqV/JZPh5FGUpM5+KbOGxXiAyoYyHMv0HafRN+STo8jNYbpfv3dpoHoV+vdj4T4ZzpMA5N58EY6kiB3lvFDik3Cjmx5OlrMTs7ekZj74VLOtQ9D4JzUTS6zRilz9Vnept0f84GpdaL6xLp7CxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cOH3Rj4e; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cOH3Rj4e" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791724; x=1765327724; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SSrfpZufJZ+LgQYUKOwSmGT0aapFamc5o1VENM+ecPY=; b=cOH3Rj4ebPYHE/BAB0xvTmyGl9cO2Kod83AWQ/2DGW/pdGJz4i88RFKZ H9qLqX17oBpvrJXcy3pbCAyfzORdXV32b2VKrTN3nrfRcJwfmO7QSs259 HuYMu+2D8BQcQXaMheM1tvZadgVJCK+RmeOAw+FLhfghb0/3djIL1bJNd XyWbHwAPrbjI9w/4WAK035qenOujhiZu3NLNSHnxGCTIj6uYX9DfBlifE 5USMAHLx+t4ZockwwJvdnJtqFbAvBBaryYsXV3qKlcdkzP+CwnD0ZWZqH QBbHWnNY9HEdH2h/E5PknQBTmKCsaow0STHjg3/mxfpjphvUhUC8tWPZ2 g==; X-CSE-ConnectionGUID: MVJLqqVfREGRvSJnvBN0cQ== X-CSE-MsgGUID: x1Bt/tQ5QWGT/AUm/Ati7A== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793780" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793780" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:43 -0800 X-CSE-ConnectionGUID: Bl4QXkcSRySFaOGaszrVXA== X-CSE-MsgGUID: bRG8LA2dSjC7wOiWQ8rclQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033091" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:40 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 15/18] KVM: TDX: Ignore setting up mce Date: Tue, 10 Dec 2024 08:49:41 +0800 Message-ID: <20241210004946.3718496-16-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Because vmx_set_mce function is VMX specific and it cannot be used for TDX. Add vt stub to ignore setting up mce for TDX. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- arch/x86/kvm/vmx/main.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 81ca5acb9964..01ad3865d54f 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -814,6 +814,14 @@ static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu) } #endif +static void vt_setup_mce(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_setup_mce(vcpu); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -973,7 +981,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .cancel_hv_timer = vt_cancel_hv_timer, #endif - .setup_mce = vmx_setup_mce, + .setup_mce = vt_setup_mce, #ifdef CONFIG_KVM_SMM .smi_allowed = vt_smi_allowed, From patchwork Tue Dec 10 00:49:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900585 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC561226173; Tue, 10 Dec 2024 00:48:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791729; cv=none; b=B9NuWPMWFQxkYljXxAg7GXsEgFed9H5UVmcQjdHsncP1NDFYoZaflnKJ0R2M1d3f7vu2BkV/yi6MM/2ZQ4yo7SRRml5kqvNTACKwN4jKxrwwwg9eXfSWv6ldHpxFpDHDJ8CNMrqAzuLZZvVK3A8feMVuEPo/b4pjrkUZZfpFLyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791729; c=relaxed/simple; bh=30ltR/0kh3Q1IFeb8do8LGhyHFYvpma6JQxut931cHM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KNP9YrKPrACmbERHqlzl+npK1eGAtXdRGivuYjNDAv241KXqwfImQcKPLJeQuaTyuxIJlca2MjOSQ0r/amo+v+XhIfvImop0OZvkxl+imLwebCyUZ4VebLdLwjkr9VxvH0IQdzmNZ0SiKHmoEd5Xj7b6KZt4kCABUuntP1+ZdzE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NJkb9PGb; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NJkb9PGb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791727; x=1765327727; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=30ltR/0kh3Q1IFeb8do8LGhyHFYvpma6JQxut931cHM=; b=NJkb9PGb2z0aADEDEX/N9hDm2+drtQZxnnRAWJT1N6yQqZud2ahaQE6i 3EypwwaQoWjdZeN3FCYQbjCopCksLJ6476/z8DiL3CnPkf8T9kvmsPuCS wdebnpL2tX7bVbjXWNkYzAAFcoO033YmmBI9rO+C6NjRqwsNRHaid1bUJ YoRv1SeHFbd19k780V10wPeTsukVqWriLhp0LOCKagQl/PqkRluiAXfLI FO1FLTg40mwvJ/+JxqVQ5wFphphnECru1SN4jNXysMgASTTJqKSonmliD Mh1UDhuLk7qdrmC+ATAZCH36UXOt+a+N1ob4v3xumFDvXxKH0VwkGKMRw g==; X-CSE-ConnectionGUID: pOqak0gxTSKAPK30oQbGBQ== X-CSE-MsgGUID: CKFyqCL0QHykJjZ4Zxbg1Q== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793790" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793790" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:47 -0800 X-CSE-ConnectionGUID: /1tLC4jZT/a/fB9zOgZoWA== X-CSE-MsgGUID: QOUcOScUS0WS0kCZ7q501w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033103" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:43 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 16/18] KVM: TDX: Add a method to ignore hypercall patching Date: Tue, 10 Dec 2024 08:49:42 +0800 Message-ID: <20241210004946.3718496-17-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Because guest TD memory is protected, VMM patching guest binary for hypercall instruction isn't possible. Add a method to ignore hypercall patching. Note: guest TD kernel needs to be modified to use TDG.VP.VMCALL for hypercall. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Renamed from "KVM: TDX: Add a method to ignore for TDX to ignore hypercall patch" to "KVM: TDX: Add a method to ignore hypercall patching". - Dropped KVM_BUG_ON() in vt_patch_hypercall(). (Rick) - Remove "with a warning" from "Add a method to ignore hypercall patching with a warning." in changelog to reflect code change. --- arch/x86/kvm/vmx/main.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 01ad3865d54f..81b9d2379a74 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -657,6 +657,19 @@ static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu) return vmx_get_interrupt_shadow(vcpu); } +static void vt_patch_hypercall(struct kvm_vcpu *vcpu, + unsigned char *hypercall) +{ + /* + * Because guest memory is protected, guest can't be patched. TD kernel + * is modified to use TDG.VP.VMCALL for hypercall. + */ + if (is_td_vcpu(vcpu)) + return; + + vmx_patch_hypercall(vcpu, hypercall); +} + static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) { if (is_td_vcpu(vcpu)) @@ -921,7 +934,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vt_set_interrupt_shadow, .get_interrupt_shadow = vt_get_interrupt_shadow, - .patch_hypercall = vmx_patch_hypercall, + .patch_hypercall = vt_patch_hypercall, .inject_irq = vt_inject_irq, .inject_nmi = vt_inject_nmi, .inject_exception = vt_inject_exception, From patchwork Tue Dec 10 00:49:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900586 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EAA9226195; Tue, 10 Dec 2024 00:48:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791732; cv=none; b=N88bfg8LeeEO5YkIX7seRKJmo+CU1DcedLh4sjJMRd1ISFuhcoQeI5BILkx2tCv5/5BL5AHdlaFXUpimGGpJel0RGrsNiwrIWOJ+fQAMJB8K1Rwqlr23/i0oFAjsYcsRckJXIkZfesvbs6RrLWRjaL0UVLWWp21JOjZHr7T1LPs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791732; c=relaxed/simple; bh=kfu6C0SYQzrE02TTUxcV+ExCr0Z5wpO62XbaFtIPi00=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B4C048g2zN9RsUWX3AdNYXuR5eTSQTioovgNKP9cXURoCRVkcRHarKi6g6hieuXJklXGnRRAOYR7hgq8GbvzP/WBpJHod17VSCFS1CaQB4dWzesf1mEtS4m7e0gumMwcxfUvZYJ1AwzC3jinzaMjUy6iYDjMibi7w3Ao7plR0Gs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lZjcYahN; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lZjcYahN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791731; x=1765327731; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kfu6C0SYQzrE02TTUxcV+ExCr0Z5wpO62XbaFtIPi00=; b=lZjcYahNrDCYwem7KYxcjZoSbQ0UD1rje6B8xRqtemwVadQHGNTI2Wdc K19WXy/GaKbL0G2C3/3R5nS/QrEnGfoAmFf9J724WVaHjTe9QnV1YKP/7 c+BFPA0J3rn37tZiMYyqM5dhucaYY24cUjoTCUC9KWvTAc7Kj9XED+kt1 QGMLDCiT8lmX0Slkc6MEuGzDCViXVG6aBVp3+bbg64bWZLi0YGqTHT8Vl h5zZko8i3HugtGkafpfMnkNLUO4FzXCOzL2uJFBIbAcYef2AJsP7vs0z+ 67ZLwymfjFSa6U69Fsvu7hJ1ctfjU/SZjL/2JTNAot9rw03irybkFj0Xa w==; X-CSE-ConnectionGUID: IguIgon4QHGjOZAHjGBg5Q== X-CSE-MsgGUID: zdpRGAaBRq2Hs8HMZgf4pQ== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793795" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793795" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:51 -0800 X-CSE-ConnectionGUID: XJb69Tf9T1+UYNO2sQYqWg== X-CSE-MsgGUID: t1pYN42hTly3om0OgkVXaw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033116" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:47 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 17/18] KVM: TDX: Make TDX VM type supported Date: Tue, 10 Dec 2024 08:49:43 +0800 Message-ID: <20241210004946.3718496-18-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Now all the necessary code for TDX is in place, it's ready to run TDX guest. Advertise the VM type of KVM_X86_TDX_VM so that the user space VMM like QEMU can start to use it. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Move down to the end of patch series. --- arch/x86/kvm/vmx/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 81b9d2379a74..d1f58f9552ea 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -1060,6 +1060,7 @@ static int __init vt_init(void) sizeof(struct vcpu_tdx)); vcpu_align = max_t(unsigned, vcpu_align, __alignof__(struct vcpu_tdx)); + kvm_caps.supported_vm_types |= BIT(KVM_X86_TDX_VM); } /* From patchwork Tue Dec 10 00:49:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900587 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06D1613A41F; Tue, 10 Dec 2024 00:48:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791737; cv=none; b=gwK8AfaC/A+xtRxjqJXu5lwTe4XAQA1sSqDWOGAKvTgkN1is40EI/EsSV2lnuTn+KrskCRm5PORjQFQlOpudnfdaUYrWHM8EJdPwWxQ6Z1gXsVviL+aoYLMZRXER7Ttj83AzxPeHBbe8KWDLgMscCJ28NgKnWCJ6SvdZaDPfTiU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791737; c=relaxed/simple; bh=K6Q4ZaIrZrTcNAyuMVrFffvIa5+Juf5fCJtS8BkIwhg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=onAKkbILmTr6QOaw44jg06TPHQfRq1bWCVrU2GQFaCKWHFGMikHfvBf+dao7mJedsfT1UWVM6vfkniBs1Jvnbj3xu4c2JIq7otLPpdhtp2ofw2ybZ5v5kvtOz7V53IOOSybM/OBEuIU3HlkI1kbtGWZB5dccmEDNdGT/oyHsQAs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZmZmkodz; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZmZmkodz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791735; x=1765327735; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K6Q4ZaIrZrTcNAyuMVrFffvIa5+Juf5fCJtS8BkIwhg=; b=ZmZmkodzUthdoG7izBuWOvM0b8hyoj2NXAMcMckb86vb9/R1ZcC96FZ+ mY7cWR6BS43PrIepo2PEDY7sga1dzqCOJLpw2Lxtm2TCzpQ4Ed5UiJ2vV Vf4SuFyYXC8uV1uSsvUhmOSw/HyOgHb7KvwQnnD3B0eq397RL1JLkImkR fZUIcLeprMM0S+jJ4/WgSWAddnULAhw8sx9aV9NIQJ4L9lag0+xqZ5VI9 d8qVUCES3P6xev/LhpaTjp7T2BuZBQzSCniB3f5XiwC21KHMOOj22/iB6 ZjW0GSo6WNqQ9apPE6BhzRxgaHzcM0fmBeY65dP644uBcaOSLHFDjawSL g==; X-CSE-ConnectionGUID: u1gSsclcSGm/77xXCZLGGQ== X-CSE-MsgGUID: LFwnxeZ8TFSaLQ19crp/Sw== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793799" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793799" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:54 -0800 X-CSE-ConnectionGUID: FIGGAXF/QTmA3ReAjkGiag== X-CSE-MsgGUID: J74XKwKXQ0K0kbqvaoqNyA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033125" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:51 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 18/18] Documentation/virt/kvm: Document on Trust Domain Extensions(TDX) Date: Tue, 10 Dec 2024 08:49:44 +0800 Message-ID: <20241210004946.3718496-19-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add documentation to Intel Trusted Domain Extensions(TDX) support. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Updates to match code changes (Tony) --- Documentation/virt/kvm/api.rst | 9 +- Documentation/virt/kvm/x86/index.rst | 1 + Documentation/virt/kvm/x86/intel-tdx.rst | 357 +++++++++++++++++++++++ 3 files changed, 366 insertions(+), 1 deletion(-) create mode 100644 Documentation/virt/kvm/x86/intel-tdx.rst diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index bb39da72c647..c5da37565e1e 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1394,6 +1394,9 @@ the memory region are automatically reflected into the guest. For example, an mmap() that affects the region will be made visible immediately. Another example is madvise(MADV_DROP). +For TDX guest, deleting/moving memory region loses guest memory contents. +Read only region isn't supported. Only as-id 0 is supported. + Note: On arm64, a write generated by the page-table walker (to update the Access and Dirty flags, for example) never results in a KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This @@ -4758,7 +4761,7 @@ H_GET_CPU_CHARACTERISTICS hypercall. :Capability: basic :Architectures: x86 -:Type: vm +:Type: vm ioctl, vcpu ioctl :Parameters: an opaque platform specific structure (in/out) :Returns: 0 on success; -1 on error @@ -4770,6 +4773,10 @@ Currently, this ioctl is used for issuing Secure Encrypted Virtualization (SEV) commands on AMD Processors. The SEV commands are defined in Documentation/virt/kvm/x86/amd-memory-encryption.rst. +Currently, this ioctl is used for issuing Trusted Domain Extensions +(TDX) commands on Intel Processors. The TDX commands are defined in +Documentation/virt/kvm/x86/intel-tdx.rst. + 4.111 KVM_MEMORY_ENCRYPT_REG_REGION ----------------------------------- diff --git a/Documentation/virt/kvm/x86/index.rst b/Documentation/virt/kvm/x86/index.rst index 9ece6b8dc817..851e99174762 100644 --- a/Documentation/virt/kvm/x86/index.rst +++ b/Documentation/virt/kvm/x86/index.rst @@ -11,6 +11,7 @@ KVM for x86 systems cpuid errata hypercalls + intel-tdx mmu msr nested-vmx diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/kvm/x86/intel-tdx.rst new file mode 100644 index 000000000000..12531c4c09e1 --- /dev/null +++ b/Documentation/virt/kvm/x86/intel-tdx.rst @@ -0,0 +1,357 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Intel Trust Domain Extensions (TDX) +=================================== + +Overview +======== +TDX stands for Trust Domain Extensions which isolates VMs from +the virtual-machine manager (VMM)/hypervisor and any other software on +the platform. For details, see the specifications [1]_, whitepaper [2]_, +architectural extensions specification [3]_, module documentation [4]_, +loader interface specification [5]_, guest-hypervisor communication +interface [6]_, virtual firmware design guide [7]_, and other resources +([8]_, [9]_, [10]_, [11]_, and [12]_). + + +API description +=============== + +KVM_MEMORY_ENCRYPT_OP +--------------------- +:Type: vm ioctl, vcpu ioctl + +For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic +ioctl with TDX specific sub ioctl command. + +:: + + /* Trust Domain eXtension sub-ioctl() commands. */ + enum kvm_tdx_cmd_id { + KVM_TDX_CAPABILITIES = 0, + KVM_TDX_INIT_VM, + KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, + KVM_TDX_FINALIZE_VM, + KVM_TDX_GET_CPUID, + + KVM_TDX_CMD_NR_MAX, + }; + + struct kvm_tdx_cmd { + /* enum kvm_tdx_cmd_id */ + __u32 id; + /* flags for sub-commend. If sub-command doesn't use this, set zero. */ + __u32 flags; + /* + * data for each sub-command. An immediate or a pointer to the actual + * data in process virtual address. If sub-command doesn't use it, + * set zero. + */ + __u64 data; + /* + * Auxiliary error code. The sub-command may return TDX SEAMCALL + * status code in addition to -Exxx. + * Defined for consistency with struct kvm_sev_cmd. + */ + __u64 hw_error; + }; + +KVM_TDX_CAPABILITIES +-------------------- +:Type: vm ioctl + +Subset of TDSYSINFO_STRUCT retrieved by TDH.SYS.INFO TDX SEAM call will be +returned. It describes the Intel TDX module. + +- id: KVM_TDX_CAPABILITIES +- flags: must be 0 +- data: pointer to struct kvm_tdx_capabilities +- error: must be 0 +- unused: must be 0 + +:: + + struct kvm_tdx_capabilities { + __u64 supported_attrs; + __u64 supported_xfam; + __u64 reserved[254]; + struct kvm_cpuid2 cpuid; + }; + + +KVM_TDX_INIT_VM +--------------- +:Type: vm ioctl + +Does additional VM initialization specific to TDX which corresponds to +TDH.MNG.INIT TDX SEAM call. + +- id: KVM_TDX_INIT_VM +- flags: must be 0 +- data: pointer to struct kvm_tdx_init_vm +- error: must be 0 +- unused: must be 0 + +:: + + struct kvm_tdx_init_vm { + __u64 attributes; + __u64 xfam; + __u64 mrconfigid[6]; /* sha384 digest */ + __u64 mrowner[6]; /* sha384 digest */ + __u64 mrownerconfig[6]; /* sha384 digest */ + + /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */ + __u64 reserved[12]; + + /* + * Call KVM_TDX_INIT_VM before vcpu creation, thus before + * KVM_SET_CPUID2. + * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the + * TDX module directly virtualizes those CPUIDs without VMM. The user + * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with + * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of + * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX + * module doesn't virtualize. + */ + struct kvm_cpuid2 cpuid; + }; + + +KVM_TDX_INIT_VCPU +----------------- +:Type: vcpu ioctl + +Does additional VCPU initialization specific to TDX which corresponds to +TDH.VP.INIT TDX SEAM call. + +- id: KVM_TDX_INIT_VCPU +- flags: must be 0 +- data: initial value of the guest TD VCPU RCX +- error: must be 0 +- unused: must be 0 + +KVM_TDX_INIT_MEM_REGION +----------------------- +:Type: vcpu ioctl + +Encrypt a memory continuous region which corresponding to TDH.MEM.PAGE.ADD +TDX SEAM call. +If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement +which corresponds to TDH.MR.EXTEND TDX SEAM call. + +- id: KVM_TDX_INIT_MEM_REGION +- flags: flags + currently only KVM_TDX_MEASURE_MEMORY_REGION is defined +- data: pointer to struct kvm_tdx_init_mem_region +- error: must be 0 +- unused: must be 0 + +:: + + #define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0) + + struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; + }; + + +KVM_TDX_FINALIZE_VM +------------------- +:Type: vm ioctl + +Complete measurement of the initial TD contents and mark it ready to run +which corresponds to TDH.MR.FINALIZE + +- id: KVM_TDX_FINALIZE_VM +- flags: must be 0 +- data: must be 0 +- error: must be 0 +- unused: must be 0 + +KVM TDX creation flow +===================== +In addition to KVM normal flow, new TDX ioctls need to be called. The control flow +looks like as follows. + +#. system wide capability check + + * KVM_CAP_VM_TYPES: check if VM type is supported and if KVM_X86_TDX_VM + is supported. + +#. creating VM + + * KVM_CREATE_VM + * KVM_TDX_CAPABILITIES: query if TDX is supported on the platform. + * KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS): set max_vcpus. KVM_MAX_VCPUS by + default. KVM_MAX_VCPUS is not a part of ABI, but kernel internal constant + that is subject to change. Because max vcpus is a part of attestation, max + vcpus should be explicitly set. + * KVM_SET_TSC_KHZ for vm. optional + * KVM_TDX_INIT_VM: pass TDX specific VM parameters. + +#. creating VCPU + + * KVM_CREATE_VCPU + * KVM_TDX_INIT_VCPU: pass TDX specific VCPU parameters. + * KVM_SET_CPUID2: Enable CPUID[0x1].ECX.X2APIC(bit 21)=1 so that the following + setting of MSR_IA32_APIC_BASE success. Without this, + KVM_SET_MSRS(MSR_IA32_APIC_BASE) fails. + * KVM_SET_MSRS: Set the initial reset value of MSR_IA32_APIC_BASE to + APIC_DEFAULT_ADDRESS(0xfee00000) | XAPIC_ENABLE(bit 10) | + X2APIC_ENABLE(bit 11) [| MSR_IA32_APICBASE_BSP(bit 8) optional] + +#. initializing guest memory + + * allocate guest memory and initialize page same to normal KVM case + In TDX case, parse and load TDVF into guest memory in addition. + * KVM_TDX_INIT_MEM_REGION to add and measure guest pages. + If the pages has contents above, those pages need to be added. + Otherwise the contents will be lost and guest sees zero pages. + * KVM_TDX_FINALIAZE_VM: Finalize VM and measurement + This must be after KVM_TDX_INIT_MEM_REGION. + +#. run vcpu + +Design discussion +================= + +Coexistence of normal(VMX) VM and TD VM +--------------------------------------- +It's required to allow both legacy(normal VMX) VMs and new TD VMs to +coexist. Otherwise the benefits of VM flexibility would be eliminated. +The main issue for it is that the logic of kvm_x86_ops callbacks for +TDX is different from VMX. On the other hand, the variable, +kvm_x86_ops, is global single variable. Not per-VM, not per-vcpu. + +Several points to be considered: + + * No or minimal overhead when TDX is disabled(CONFIG_INTEL_TDX_HOST=n). + * Avoid overhead of indirect call via function pointers. + * Contain the changes under arch/x86/kvm/vmx directory and share logic + with VMX for maintenance. + Even though the ways to operation on VM (VMX instruction vs TDX + SEAM call) are different, the basic idea remains the same. So, many + logic can be shared. + * Future maintenance + The huge change of kvm_x86_ops in (near) future isn't expected. + a centralized file is acceptable. + +- Wrapping kvm x86_ops: The current choice + + Introduce dedicated file for arch/x86/kvm/vmx/main.c (the name, + main.c, is just chosen to show main entry points for callbacks.) and + wrapper functions around all the callbacks with + "if (is-tdx) tdx-callback() else vmx-callback()". + + Pros: + + - No major change in common x86 KVM code. The change is (mostly) + contained under arch/x86/kvm/vmx/. + - When TDX is disabled(CONFIG_INTEL_TDX_HOST=n), the overhead is + optimized out. + - Micro optimization by avoiding function pointer. + + Cons: + + - Many boiler plates in arch/x86/kvm/vmx/main.c. + +KVM MMU Changes +--------------- +KVM MMU needs to be enhanced to handle Secure/Shared-EPT. The +high-level execution flow is mostly same to normal EPT case. +EPT violation/misconfiguration -> invoke TDP fault handler -> +resolve TDP fault -> resume execution. (or emulate MMIO) +The difference is, that S-EPT is operated(read/write) via TDX SEAM +call which is expensive instead of direct read/write EPT entry. +One bit of GPA (51 or 47 bit) is repurposed so that it means shared +with host(if set to 1) or private to TD(if cleared to 0). + +- The current implementation + + * Reuse the existing MMU code with minimal update. Because the + execution flow is mostly same. But additional operation, TDX call + for S-EPT, is needed. So add hooks for it to kvm_x86_ops. + * For performance, minimize TDX SEAM call to operate on S-EPT. When + getting corresponding S-EPT pages/entry from faulting GPA, don't + use TDX SEAM call to read S-EPT entry. Instead create shadow copy + in host memory. + Repurpose the existing kvm_mmu_page as shadow copy of S-EPT and + associate S-EPT to it. + * Treats share bit as attributes. mask/unmask the bit where + necessary to keep the existing traversing code works. + Introduce kvm.arch.gfn_shared_mask and use "if (gfn_share_mask)" + for special case. + + * 0 : for non-TDX case + * 51 or 47 bit set for TDX case. + + Pros: + + - Large code reuse with minimal new hooks. + - Execution path is same. + + Cons: + + - Complicates the existing code. + - Repurpose kvm_mmu_page as shadow of Secure-EPT can be confusing. + +New KVM API, ioctl (sub)command, to manage TD VMs +------------------------------------------------- +Additional KVM APIs are needed to control TD VMs. The operations on TD +VMs are specific to TDX. + +- Piggyback and repurpose KVM_MEMORY_ENCRYPT_OP + + Although operations for TD VMs aren't necessarily related to memory + encryption, define sub operations of KVM_MEMORY_ENCRYPT_OP for TDX specific + ioctls. + + Pros: + + - No major change in common x86 KVM code. + - Follows the SEV case. + + Cons: + + - The sub operations of KVM_MEMORY_ENCRYPT_OP aren't necessarily memory + encryption, but operations on TD VMs. + +References +========== + +.. [1] TDX specification + https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html +.. [2] Intel Trust Domain Extensions (Intel TDX) + https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-whitepaper-final9-17.pdf +.. [3] Intel CPU Architectural Extensions Specification + https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-cpu-architectural-specification.pdf +.. [4] Intel TDX Module 1.0 EAS + https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf +.. [5] Intel TDX Loader Interface Specification + https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-seamldr-interface-specification.pdf +.. [6] Intel TDX Guest-Hypervisor Communication Interface + https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf +.. [7] Intel TDX Virtual Firmware Design Guide + https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1. +.. [8] intel public github + + * kvm TDX branch: https://github.com/intel/tdx/tree/kvm + * TDX guest branch: https://github.com/intel/tdx/tree/guest + +.. [9] tdvf + https://github.com/tianocore/edk2-staging/tree/TDVF +.. [10] KVM forum 2020: Intel Virtualization Technology Extensions to + Enable Hardware Isolated VMs + https://osseu2020.sched.com/event/eDzm/intel-virtualization-technology-extensions-to-enable-hardware-isolated-vms-sean-christopherson-intel +.. [11] Linux Security Summit EU 2020: + Architectural Extensions for Hardware Virtual Machine Isolation + to Advance Confidential Computing in Public Clouds - Ravi Sahita + & Jun Nakajima, Intel Corporation + https://osseu2020.sched.com/event/eDOx/architectural-extensions-for-hardware-virtual-machine-isolation-to-advance-confidential-computing-in-public-clouds-ravi-sahita-jun-nakajima-intel-corporation +.. [12] [RFCv2,00/16] KVM protected memory extension + https://lore.kernel.org/all/20201020061859.18385-1-kirill.shutemov@linux.intel.com/