From patchwork Tue Dec 10 00:49:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13900580 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63B901D7E41; Tue, 10 Dec 2024 00:48:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791711; cv=none; b=uT1gOJcg4a2zaZQ77/77EOZD7h2cj71rBaIRLMhWxxuyF0Wf9yqt0QX2ZPMTNwhp99hD9WJ2GFLKD02aWhMs1+7cDXfaJUsnA+MBtpZHaYD4QLcwt7zaC5V6HGWqAPMBOVadtdTp++KJxHSUGyNHV55eYChYgIWENmicDe/o29o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733791711; c=relaxed/simple; bh=EYUts2Z2lAdPO9LcGK1xT8MkrrbCZ7m4cknAafwwwDg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XRUdWGLQP2wdLhEsJnJt4k3MWOHHyZAHPOomQtQjvdZfagVTXPyitPLM29con1o2pRZWW6D++Hqb5TrAoLR9a00YifPGIBOY425vvoN9P2EUePAs8bqHlGJRtB5goL+NiaGDE9adOQtRpCfOH9Bf+i/ETUfCyYQd1A0ZLqftrEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cnXeVBvJ; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cnXeVBvJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733791709; x=1765327709; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EYUts2Z2lAdPO9LcGK1xT8MkrrbCZ7m4cknAafwwwDg=; b=cnXeVBvJLmqfnc6SlRslqvxgw0pITrBWEeNF2tTECgjv3oWFVsKxsGz0 bWiYMdHGnv18Y2hySNA4tcDidc8DvLMQODgFsHIA2Y9D6GSOWDqdk7gyZ NK31zBtjNKI/zUvrkmbIujC9tyBZBlDj7hPg0lPrhWI/q84aEdOYrl1GK ZnaEEfWZGQFpVttVyW8c6PVRvftx4ydTcm05JG4hIDlo1hPvq6CzaJDA1 mwa5igNVP5+OISo4xZU7w1kEIsnjjG4x9IiZ+tGR6azVp3or0CD6JWVKv nlfoA6X0Hj7gk/Hlg2DqUsXAv8b21wu0YBzK+kJF0ymR5HovZuImTCoBz Q==; X-CSE-ConnectionGUID: v5jzqwRpTz2WwwvHXXOubw== X-CSE-MsgGUID: AafqVSnDQiC8UVwNK9pMfA== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="44793745" X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="44793745" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:29 -0800 X-CSE-ConnectionGUID: hro0ympISc6t/f83oqdnKQ== X-CSE-MsgGUID: vgeUAg6nSamHS4zhJp+VYQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,220,1728975600"; d="scan'208";a="96033061" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 16:48:25 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 11/18] KVM: TDX: Add methods to ignore accesses to CPU state Date: Tue, 10 Dec 2024 08:49:37 +0800 Message-ID: <20241210004946.3718496-12-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241210004946.3718496-1-binbin.wu@linux.intel.com> References: <20241210004946.3718496-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX protects TDX guest state from VMM. Implement access methods for TDX guest state to ignore them or return zero. Because those methods can be called by kvm ioctls to set/get cpu registers, they don't have KVM_BUG_ON. Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" breakout: - Dropped KVM_BUG_ON() in vt_sync_dirty_debug_regs(). (Rick) Since the KVM_BUG_ON() is removed, change the changlog accordingly. - Remove unnecessary wrappers. (Binbin) --- arch/x86/kvm/vmx/main.c | 288 +++++++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.c | 28 +++- arch/x86/kvm/vmx/x86_ops.h | 4 + 3 files changed, 291 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index a224e9d32701..f6b449ae1ef7 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -333,6 +333,200 @@ static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector); } +static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_vcpu_after_set_cpuid(vcpu); +} + +static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_update_exception_bitmap(vcpu); +} + +static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_segment_base(vcpu, seg); +} + +static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, + int seg) +{ + if (is_td_vcpu(vcpu)) { + memset(var, 0, sizeof(*var)); + return; + } + + vmx_get_segment(vcpu, var, seg); +} + +static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, + int seg) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_segment(vcpu, var, seg); +} + +static int vt_get_cpl(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_cpl(vcpu); +} + +static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) +{ + if (is_td_vcpu(vcpu)) { + *db = 0; + *l = 0; + return; + } + + vmx_get_cs_db_l_bits(vcpu, db, l); +} + +static bool vt_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (is_td_vcpu(vcpu)) + return true; + + return vmx_is_valid_cr0(vcpu, cr0); +} + +static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_cr0(vcpu, cr0); +} + +static bool vt_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + if (is_td_vcpu(vcpu)) + return true; + + return vmx_is_valid_cr4(vcpu, cr4); +} + +static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_cr4(vcpu, cr4); +} + +static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_set_efer(vcpu, efer); +} + +static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) { + memset(dt, 0, sizeof(*dt)); + return; + } + + vmx_get_idt(vcpu, dt); +} + +static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_idt(vcpu, dt); +} + +static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) { + memset(dt, 0, sizeof(*dt)); + return; + } + + vmx_get_gdt(vcpu, dt); +} + +static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_gdt(vcpu, dt); +} + +static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_dr7(vcpu, val); +} + +static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) +{ + /* + * MOV-DR exiting is always cleared for TD guest, even in debug mode. + * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never + * reach here for TD vcpu. + */ + if (is_td_vcpu(vcpu)) + return; + + vmx_sync_dirty_debug_regs(vcpu); +} + +static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) +{ + if (is_td_vcpu(vcpu)) { + tdx_cache_reg(vcpu, reg); + return; + } + + vmx_cache_reg(vcpu, reg); +} + +static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_rflags(vcpu); +} + +static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_rflags(vcpu, rflags); +} + +static bool vt_get_if_flag(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return false; + + return vmx_get_if_flag(vcpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -455,6 +649,14 @@ static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) vmx_inject_irq(vcpu, reinjected); } +static void vt_inject_exception(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_inject_exception(vcpu); +} + static void vt_cancel_injection(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) @@ -491,6 +693,14 @@ static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); } +static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_update_cr8_intercept(vcpu, tpr, irr); +} + static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) @@ -507,6 +717,30 @@ static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) vmx_refresh_apicv_exec_ctrl(vcpu); } +static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap); +} + +static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr) +{ + if (is_td(kvm)) + return 0; + + return vmx_set_tss_addr(kvm, addr); +} + +static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) +{ + if (is_td(kvm)) + return 0; + + return vmx_set_identity_map_addr(kvm, ident_addr); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -569,30 +803,30 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_vcpu_load, .vcpu_put = vt_vcpu_put, - .update_exception_bitmap = vmx_update_exception_bitmap, + .update_exception_bitmap = vt_update_exception_bitmap, .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_get_msr, .set_msr = vt_set_msr, - .get_segment_base = vmx_get_segment_base, - .get_segment = vmx_get_segment, - .set_segment = vmx_set_segment, - .get_cpl = vmx_get_cpl, - .get_cs_db_l_bits = vmx_get_cs_db_l_bits, - .is_valid_cr0 = vmx_is_valid_cr0, - .set_cr0 = vmx_set_cr0, - .is_valid_cr4 = vmx_is_valid_cr4, - .set_cr4 = vmx_set_cr4, - .set_efer = vmx_set_efer, - .get_idt = vmx_get_idt, - .set_idt = vmx_set_idt, - .get_gdt = vmx_get_gdt, - .set_gdt = vmx_set_gdt, - .set_dr7 = vmx_set_dr7, - .sync_dirty_debug_regs = vmx_sync_dirty_debug_regs, - .cache_reg = vmx_cache_reg, - .get_rflags = vmx_get_rflags, - .set_rflags = vmx_set_rflags, - .get_if_flag = vmx_get_if_flag, + .get_segment_base = vt_get_segment_base, + .get_segment = vt_get_segment, + .set_segment = vt_set_segment, + .get_cpl = vt_get_cpl, + .get_cs_db_l_bits = vt_get_cs_db_l_bits, + .is_valid_cr0 = vt_is_valid_cr0, + .set_cr0 = vt_set_cr0, + .is_valid_cr4 = vt_is_valid_cr4, + .set_cr4 = vt_set_cr4, + .set_efer = vt_set_efer, + .get_idt = vt_get_idt, + .set_idt = vt_set_idt, + .get_gdt = vt_get_gdt, + .set_gdt = vt_set_gdt, + .set_dr7 = vt_set_dr7, + .sync_dirty_debug_regs = vt_sync_dirty_debug_regs, + .cache_reg = vt_cache_reg, + .get_rflags = vt_get_rflags, + .set_rflags = vt_set_rflags, + .get_if_flag = vt_get_if_flag, .flush_tlb_all = vt_flush_tlb_all, .flush_tlb_current = vt_flush_tlb_current, @@ -609,7 +843,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .patch_hypercall = vmx_patch_hypercall, .inject_irq = vt_inject_irq, .inject_nmi = vt_inject_nmi, - .inject_exception = vmx_inject_exception, + .inject_exception = vt_inject_exception, .cancel_injection = vt_cancel_injection, .interrupt_allowed = vt_interrupt_allowed, .nmi_allowed = vt_nmi_allowed, @@ -617,13 +851,13 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_nmi_mask = vt_set_nmi_mask, .enable_nmi_window = vt_enable_nmi_window, .enable_irq_window = vt_enable_irq_window, - .update_cr8_intercept = vmx_update_cr8_intercept, + .update_cr8_intercept = vt_update_cr8_intercept, .x2apic_icr_is_split = false, .set_virtual_apic_mode = vt_set_virtual_apic_mode, .set_apic_access_page_addr = vt_set_apic_access_page_addr, .refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl, - .load_eoi_exitmap = vmx_load_eoi_exitmap, + .load_eoi_exitmap = vt_load_eoi_exitmap, .apicv_pre_state_restore = vt_apicv_pre_state_restore, .required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS, .hwapic_irr_update = vt_hwapic_irr_update, @@ -633,13 +867,13 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt, .protected_apic_has_interrupt = tdx_protected_apic_has_interrupt, - .set_tss_addr = vmx_set_tss_addr, - .set_identity_map_addr = vmx_set_identity_map_addr, + .set_tss_addr = vt_set_tss_addr, + .set_identity_map_addr = vt_set_identity_map_addr, .get_mt_mask = vmx_get_mt_mask, .get_exit_info = vt_get_exit_info, - .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, + .vcpu_after_set_cpuid = vt_vcpu_after_set_cpuid, .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 413359741085..4bf3a6dc66fc 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -733,8 +733,15 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.tsc_offset = kvm_tdx->tsc_offset; vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset; - vcpu->arch.guest_state_protected = - !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG); + /* + * TODO: support off-TD debug. If TD DEBUG is enabled, guest state + * can be accessed. guest_state_protected = false. and kvm ioctl to + * access CPU states should be usable for user space VMM (e.g. qemu). + * + * vcpu->arch.guest_state_protected = + * !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG); + */ + vcpu->arch.guest_state_protected = true; if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE) vcpu->arch.xfd_no_write_intercept = true; @@ -2134,6 +2141,23 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) } } +void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) +{ + kvm_register_mark_available(vcpu, reg); + switch (reg) { + case VCPU_REGS_RSP: + case VCPU_REGS_RIP: + case VCPU_EXREG_PDPTR: + case VCPU_EXREG_CR0: + case VCPU_EXREG_CR3: + case VCPU_EXREG_CR4: + break; + default: + KVM_BUG_ON(1, vcpu->kvm); + break; + } +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 7fb1bbf12b39..5b79bb72b13a 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -146,6 +146,8 @@ bool tdx_has_emulated_msr(u32 index); int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); +void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg); + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, @@ -193,6 +195,8 @@ static inline bool tdx_has_emulated_msr(u32 index) { return false; } static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } +static inline void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) {} + static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,