From patchwork Mon Jan 13 02:10:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13936657 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C0A5126BF7; Mon, 13 Jan 2025 02:11:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734303; cv=none; b=bO1beLSfwEvYqPICyXA0wL7gRUC432LhUo59BSwxJLC9THcoKxxdrp4aKbbtrNe9A0P+YiinoiYGJszZNrrdn61gvIwcOYaOiIdaH2+PCz41loBcprNEKhefpB7uu0SVOxn1T6+RyQcsLbuCTEvve/lcNyM1D3LYb4xpop+JmWs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734303; c=relaxed/simple; bh=q8aqLqDc+mTchE3+0c19Ruw75q0tQpImjWf/MDNELjo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CAjTodbi95HxS2zjKGRb+FiQuMfyOE3BKmcUPygz04SUEEdZ6tfKV0VmC6ErjG8SBDnnXQkbDvfFaKUsJTArvOqWuFymWnjhgC/4A4JztU/7IpvWXn300c/607OnocLD73N2T+wi5+kn14p1qqLReLbgoaxc5V52E96/ol3L8KU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gfVjuWLP; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gfVjuWLP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736734302; x=1768270302; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q8aqLqDc+mTchE3+0c19Ruw75q0tQpImjWf/MDNELjo=; b=gfVjuWLPIKtEJi5VjRRaONqb8X0qcWqUyf26mFUw0F17shwGz61sivl8 MCnb+jEk4xYJ45rZEvaEZHcpI6EH5zY9JX9mnfsXftmGSrIUYRaquqvB9 FCzGEzyE9SjoyeOsH7711O8vYbWO2k+OfFR+mzOMKQfIPeNnH2HTkf0nP xz1BEE09FoIgx6dw2uX5Bv3djxDnX8rSaG3uqFNAetSAIY0idacZflFhB yrVe6gpkLX5mIAdNYoR65NopOxtiEAD7EDXNeLJz+g/ftXQe3YamrV797 po9b/wYdST6ciUBcrgJma+NZI78uV/HIYpXDr7RQV3iW/CfJa7ZeXUfpL w==; X-CSE-ConnectionGUID: h2DsxRggQduXNSvfJy9yGg== X-CSE-MsgGUID: xDKyn1R0Q5SKn4ikBRUf/g== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="40742672" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="40742672" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:11:41 -0800 X-CSE-ConnectionGUID: k+Frghk0QHOqnXjYb8pCsw== X-CSE-MsgGUID: GMLxz/TDSymxtFpU6mKKUg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="104843307" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:11:38 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com Subject: [PATCH 1/7] KVM: TDX: Return -EBUSY when tdh_mem_page_add() encounters TDX_OPERAND_BUSY Date: Mon, 13 Jan 2025 10:10:50 +0800 Message-ID: <20250113021050.18828-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250113020925.18789-1-yan.y.zhao@intel.com> References: <20250113020925.18789-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 tdh_mem_page_add() is called during TD build time. Within the TDX module, it acquires the exclusive lock on the TDR resource (eliminating the need to hold locks for TDCS/SEPT tree) and the exclusive lock on the PAMT entry for the page to be added. The TDX module returns TDX_OPERAND_BUSY if tdh_mem_page_add() contends with other SEAMCALLs. SEAMCALL Lock Type Resource ----------------------------------------------------------------------- tdh_mem_page_add EXCLUSIVE TDR NO_LOCK TDCS NO_LOCK SEPT tree EXCLUSIVE PAMT entry for the page to add Given (1)-(4) and the expected behavior from userspace (5), KVM doesn't expect tdh_mem_page_add() to encounter TDX_OPERAND_BUSY: (1) tdx_vcpu_create() only allows vCPU creation when the TD state is TD_STATE_INITIALIZED, so tdh_mem_page_add(), as invoked in vCPU ioctl, does not contend with tdh_mng_create()/tdh_mng_addcx()/tdh_mng_key_config()/tdh_mng_init(). (2) tdx_vcpu_ioctl() bails out on TD_STATE_RUNNABLE, so tdh_mem_page_add() does not contend with tdh_vp_enter()/tdh_mem_page_aug()/tdh_mem_track() and TDCALLs. (3) By holding slots_lock and the filemap invalidate lock, tdh_mem_page_add() does not contend with tdh_mr_finalize(), tdh_mem_page_remove()/tdh_mem_range_block()/ tdh_phymem_page_wbinvd_hkid() or another tdh_mem_page_add(), tdh_mem_sept_add()/tdh_mr_extend(). (4) By holding reference to kvm, tdh_mem_page_add() does not contend with tdh_mng_vpflushdone()/tdh_phymem_cache_wb()/tdh_mng_key_freeid()/ tdh_phymem_page_wbinvd_tdr()/tdh_phymem_page_reclaim(). (5) A well-behaved userspace invokes ioctl KVM_TDX_INIT_MEM_REGION on one vCPU after initializing all vCPUs and does not invoke ioctls on the other vCPUs before the KVM_TDX_INIT_MEM_REGION completes. Thus, tdh_mem_page_add() does not contend with tdh_vp_create()/tdh_vp_addcx()/tdh_vp_init*()/tdh_vp_rd()/tdh_vp_wr()/ tdh_mng_rd()/tdh_vp_flush() on the other vCPUs. However, if the userspace breaks (5), tdh_mem_page_add() could encounter TDX_OPERAND_BUSY when trying to acquire the exclusive lock on the TDR resource in the TDX module. In this case, simply return -EBUSY. Signed-off-by: Yan Zhao --- tdx_vcpu_pre_run() will check TD_STATE_RUNNABLE for (2). https://lore.kernel.org/kvm/3576c721-3ef2-40bd-8764-b50912df93a2@intel.com/ --- arch/x86/kvm/vmx/tdx.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d0dc3200fa37..1cf3ef0faff7 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -3024,13 +3024,11 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, } ret = 0; - do { - err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), - pfn_to_hpa(page_to_pfn(page)), - &entry, &level_state); - } while (err == TDX_ERROR_SEPT_BUSY); + err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), + pfn_to_hpa(page_to_pfn(page)), + &entry, &level_state); if (err) { - ret = -EIO; + ret = unlikely(err & TDX_OPERAND_BUSY) ? -EBUSY : -EIO; goto out; } From patchwork Mon Jan 13 02:11:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13936658 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 634DB125D5; Mon, 13 Jan 2025 02:12:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734351; cv=none; b=lVoO+4ohywH4Aj9clqs+UOtKQWqIRiz+KKhnqQdr92u6806a/MgEA6kJlsTLW1hxbFq+aq05X4q/Wy2wTxEbc5L5MmDZx04DJFOnmHZfyUbU9phcbNsU3yJykHLeXzuvf3/EfUYyWbL0ubYsTFAGF1ZUSAtthHf+HsQRj1wwiSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734351; c=relaxed/simple; bh=sTIaMISbxPrqRPwj70anC65LSpnvKOSP5cudKOCfXEg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pHPPW5yCf7H5k/FligWCCOw09TApuzJ2kOgi0dTUUobwglvHvYM7zbswJoSRdTOlqtgI8a5erZP3lI5KiLuuLfYB4VVzFGG7FENkzzAbXzMtE5PM3t9wSWztE6KC9My3WNKn7HKJeMBWdqOiS7/72uxLxMgi+e58tJQd5hTm21U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GN2q6nxD; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GN2q6nxD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736734350; x=1768270350; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sTIaMISbxPrqRPwj70anC65LSpnvKOSP5cudKOCfXEg=; b=GN2q6nxDi4xrjAhdoANCdDsH/z0d0j6AXHhQ4PtU7yZOZNXt1O/yaGiR B66uRdIFSzF9JE04Ali6l7rpi50T3PA7xCdluDWKcyl4Zzr9NvSSNbUcA rpx3i40T0jQggo0luQpdoFjTTPY5IBcUu/P0tbS9itNxo0husfa8qAcWP lX1KOuLU+Jyuoy6fzBWF3P48yBqCfczGSzYdlRbIU2zQTCQGK88Pek0nB 1gXEnTMf58tPV67tJZ9Xag//mmiFPx7kzwseQOPONDLFVeMcK8i82x8BN JTQIw+6rQgWVW+mayy1FztpAiOcgDiL+kNLz3Z8VpTIMAQNa87NUzaiA/ g==; X-CSE-ConnectionGUID: No2BaPbFTzSMtPP4fmSxDg== X-CSE-MsgGUID: azK8f7abRGOXy0sqN5LbQQ== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="36880621" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="36880621" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:12:30 -0800 X-CSE-ConnectionGUID: sBC+HgAiT3uTeb4f5UUYsg== X-CSE-MsgGUID: qtRGqdU3QWCRu+8JWxUuVg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="104842717" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:12:26 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com Subject: [PATCH 2/7] KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault() Date: Mon, 13 Jan 2025 10:11:38 +0800 Message-ID: <20250113021138.18875-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250113020925.18789-1-yan.y.zhao@intel.com> References: <20250113020925.18789-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Return RET_PF* (excluding RET_PF_EMULATE/RET_PF_CONTINUE/RET_PF_INVALID) instead of 1 in kvm_mmu_page_fault(). The callers of kvm_mmu_page_fault() are KVM page fault handlers (i.e., npf_interception(), handle_ept_misconfig(), __vmx_handle_ept_violation(), kvm_handle_page_fault()). They either check if the return value is > 0 (as in npf_interception()) or pass it further to vcpu_run() to decide whether to break out of the kernel loop and return to the user when r <= 0. Therefore, returning any positive value is equivalent to returning 1. Warn if r == RET_PF_CONTINUE (which should not be a valid value) to ensure a positive return value. This is a preparation to allow TDX's EPT violation handler to check the RET_PF* value and retry internally for RET_PF_RETRY. No functional changes are intended. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 10 +++++++++- arch/x86/kvm/mmu/mmu_internal.h | 12 +++++++++--- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index eedc6ff37b89..53dcf600e934 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6120,8 +6120,16 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err else if (r == RET_PF_SPURIOUS) vcpu->stat.pf_spurious++; + /* + * None of handle_mmio_page_fault(), kvm_mmu_do_page_fault(), or + * kvm_mmu_write_protect_fault() return RET_PF_CONTINUE. + * kvm_mmu_do_page_fault() only uses RET_PF_CONTINUE internally to + * indicate continuing the page fault handling until to the final + * page table mapping phase. + */ + WARN_ON_ONCE(r == RET_PF_CONTINUE); if (r != RET_PF_EMULATE) - return 1; + return r; emulate: return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn, diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 957636660149..4fde91cade1b 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -315,9 +315,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h * * Note, all values must be greater than or equal to zero so as not to encroach - * on -errno return values. Somewhat arbitrarily use '0' for CONTINUE, which - * will allow for efficient machine code when checking for CONTINUE, e.g. - * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero. + * on -errno return values. */ enum { RET_PF_CONTINUE = 0, @@ -329,6 +327,14 @@ enum { RET_PF_SPURIOUS, }; +/* + * Define RET_PF_CONTINUE as 0 to allow for + * - efficient machine code when checking for CONTINUE, e.g. + * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero, + * - kvm_mmu_do_page_fault() to return other RET_PF_* as a positive value. + */ +static_assert(RET_PF_CONTINUE == 0); + static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { From patchwork Mon Jan 13 02:12:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13936659 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4710C82488; Mon, 13 Jan 2025 02:13:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734391; cv=none; b=C07ZBY66DZNhkxbO/9Qt5PIrUxj/aAyLhNTVfiXMGd3AWn/HnH9qO+v0Ir1lZtCFoqZX2FiEY3+mZpFgT4vobz5ZjAWCx4ulpvj46ab/P2hBjZQFL5bfQPN1Vrz951YO0O38Xf0NyygNO39VSNSMZa0X1cRb7wwkIsMMvgAsu+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734391; c=relaxed/simple; bh=1qPxw2H6pIb2tKB9Z4otmErPdYyk8jXouhWmf3AO078=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bIT2BzGSVPR4H+GJ+A29fOP2+iBZ67YfYMVwoRjmk2q7bT+OreNHeDD94p+UiS6lsMhwuYfts3kOvDUAP5ckJ+9bMZVDbDGCObrn8tvRPGB015Ftmt/gKhX8ZFpx6xaf+5+24my5t3iWtGYHvBeSzFlzV1FEVxSK3J4yDZbA2aU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OBYwjLPC; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OBYwjLPC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736734390; x=1768270390; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1qPxw2H6pIb2tKB9Z4otmErPdYyk8jXouhWmf3AO078=; b=OBYwjLPCad+0zyrwUBZzKVD44epZx05xbetyARGnKw2p+UzaI+0jiOPA OpNMz399PhGD3JIYGdrbgcVcEuqg8KHmoe0AdCLE21ynsVW2AsEwwUmvo aRfJkFqCoGPaV6TU4uwzIpvqzb4Fm+Wn0hS4nP+O8CaHRBhdiLWLN8jcr qPrjhy5KSDC1n2HZBg6N++amtqjXt79TOUW0auXIac4KsS99ksepib82H IuaizfJR6DkPgVwr2/40sTaxlsuDEDnH+2gL1Gokunr6oLs5Ly8Cd7+9i B+nN7Aav5pZPZoQ/f6b4RS5qnIdw8+puIiQDR+Sl+K/PfSb1hNBjmsjEp A==; X-CSE-ConnectionGUID: Qld9p0hLRjmfmqVVlYhLIA== X-CSE-MsgGUID: ENUNiLySRSmv04V6kMZk1g== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="59461297" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="59461297" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:13:09 -0800 X-CSE-ConnectionGUID: ITu88GoCSPeXFWtHwX4Gng== X-CSE-MsgGUID: qo8freTQRlu8hgxGXZWDsw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="108951033" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:13:05 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com Subject: [PATCH 3/7] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY Date: Mon, 13 Jan 2025 10:12:18 +0800 Message-ID: <20250113021218.18922-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250113020925.18789-1-yan.y.zhao@intel.com> References: <20250113020925.18789-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Retry locally in the TDX EPT violation handler for private memory to reduce the chances for tdh_mem_sept_add()/tdh_mem_page_aug() to contend with tdh_vp_enter(). TDX EPT violation installs private pages via tdh_mem_sept_add() and tdh_mem_page_aug(). The two may have contention with tdh_vp_enter() or TDCALLs. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------ SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation) tdh_mem_page_aug ------------------------------------------------------------ SEPT entry tdh_mem_sept_add (Host lock) tdh_mem_page_aug (Host lock) tdg_mem_page_accept (Guest lock) tdg_mem_page_attr_rd (Guest lock) tdg_mem_page_attr_wr (Guest lock) Though the contention between tdh_mem_sept_add()/tdh_mem_page_aug() and TDCALLs may be removed in future TDX module, their contention with tdh_vp_enter() due to 0-step mitigation still persists. The TDX module may trigger 0-step mitigation in SEAMCALL TDH.VP.ENTER, which works as follows: 0. Each TDH.VP.ENTER records the guest RIP on TD entry. 1. When the TDX module encounters a VM exit with reason EPT_VIOLATION, it checks if the guest RIP is the same as last guest RIP on TD entry. -if yes, it means the EPT violation is caused by the same instruction that caused the last VM exit. Then, the TDX module increases the guest RIP no-progress count. When the count increases from 0 to the threshold (currently 6), the TDX module records the faulting GPA into a last_epf_gpa_list. -if no, it means the guest RIP has made progress. So, the TDX module resets the RIP no-progress count and the last_epf_gpa_list. 2. On the next TDH.VP.ENTER, the TDX module (after saving the guest RIP on TD entry) checks if the last_epf_gpa_list is empty. -if yes, TD entry continues without acquiring the lock on the SEPT tree. -if no, it triggers the 0-step mitigation by acquiring the exclusive lock on SEPT tree, walking the EPT tree to check if all page faults caused by the GPAs in the last_epf_gpa_list have been resolved before continuing TD entry. Since KVM TDP MMU usually re-enters guest whenever it exits to userspace (e.g. for KVM_EXIT_MEMORY_FAULT) or encounters a BUSY, it is possible for a tdh_vp_enter() to be called more than the threshold count before a page fault is addressed, triggering contention when tdh_vp_enter() attempts to acquire exclusive lock on SEPT tree. Retry locally in TDX EPT violation handler to reduce the count of invoking tdh_vp_enter(), hence reducing the possibility of its contention with tdh_mem_sept_add()/tdh_mem_page_aug(). However, the 0-step mitigation and the contention are still not eliminated due to KVM_EXIT_MEMORY_FAULT, signals/interrupts, and cases when one instruction faults more GFNs than the threshold count. Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/tdx.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 1cf3ef0faff7..bb9d914765fc 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1854,6 +1854,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) { gpa_t gpa = tdexit_gpa(vcpu); unsigned long exit_qual; + bool local_retry = false; + int ret; if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { if (tdx_is_sept_violation_unexpected_pending(vcpu)) { @@ -1872,6 +1874,24 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) * due to aliasing a single HPA to multiple GPAs. */ exit_qual = EPT_VIOLATION_ACC_WRITE; + + /* + * Mapping of private memory may return RET_PF_RETRY due to + * SEAMCALL contention, e.g. + * - TDH.MEM.PAGE.AUG/TDH.MEM.SEPT.ADD on local vCPU may + * contend with TDH.VP.ENTER (due to 0-step mitigation) + * on a remote vCPU. + * - TDH.MEM.PAGE.AUG/TDH.MEM.SEPT.ADD on local vCPU may + * contend with TDG.MEM.PAGE.ACCEPT on a remote vCPU. + * + * Retry internally in TDX to prevent exacerbating the + * activation of 0-step mitigation on local vCPU. + * However, despite these retries, the 0-step mitigation on the + * local vCPU may still be triggered due to: + * - Exiting on signals, interrupts. + * - KVM_EXIT_MEMORY_FAULT. + */ + local_retry = true; } else { exit_qual = tdexit_exit_qual(vcpu); /* @@ -1884,7 +1904,24 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) } trace_kvm_page_fault(vcpu, tdexit_gpa(vcpu), exit_qual); - return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual); + + while (1) { + ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual); + + if (ret != RET_PF_RETRY || !local_retry) + break; + + /* + * Break and keep the orig return value. + * Signal & irq handling will be done later in vcpu_run() + */ + if (signal_pending(current) || pi_has_pending_interrupt(vcpu) || + kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) + break; + + cond_resched(); + } + return ret; } int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) From patchwork Mon Jan 13 02:12:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13936660 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 505B74437C; Mon, 13 Jan 2025 02:13:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734423; cv=none; b=UY8fOvAfGV9nESR1eOWg/uLbBLt9l62TeqaShuUk9ANruSInlwQtEH+BgcUfx4BwM6gxHMemPTDcjIA7Hqb0gENRoYG6qIEGILxCgwTLOvKGrrnlfwhsUEpe76ZOpVIddv/UHNpZ96HXuQgWq3wdaoxYs61AHEoEuS/X6yoA44o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734423; c=relaxed/simple; bh=2CEeny/4sGUNO3WdsfyxDbXpCJRwX6qROzz0YdCjdYg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m6h3uVce2ywCgCfCXB80PYCrf9mgFmYXpzRGb0i+ZqPYKmRTMtHM/0icnFbOX74oQsiNNhsmoYgrBG8YLHFCg3HKhuhHS68jIzrvucXma41dtwxjuw8mtzMeetdurEejrfQFWPr+MwsO+PdHQKdwhNd2x3FbvjZ+K3Hoe3MAWN0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CH9WbmR2; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CH9WbmR2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736734421; x=1768270421; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2CEeny/4sGUNO3WdsfyxDbXpCJRwX6qROzz0YdCjdYg=; b=CH9WbmR25tKjvt4uGRjRCfHnCpScfP6kBKhG2GUMfPi/QhsdG8AAo3FU MHYAVGHhePuCSxX+KFClZ1zvvPQUuTrFfbjEV0Npzz6Np+MEL+vZzXJgL AjfX1/b0Qo1ssrbtuVjDRpKdm85ppN1DWk3BslC+kpxbGFvPzxJbIOS71 gLNPCjArIEZ8TWY1RVWz66Ey0h+wnn1rbdj1BKskc1GMI/tKHSiwK/xF6 pzcEVP0Jb4mxBomqKb6e9o3ciwW0H/Ns4XN1PpOpWw8C6mjBISjscJVwq acdfzGoMlh8o5mKhs52HnBtRFdN6kRua5LzDQNgDPGn2Zu26eFRfasXIE Q==; X-CSE-ConnectionGUID: 6g0Af8TORjqc9X3RDs2+pg== X-CSE-MsgGUID: L9Ffkj16QQOxtv7V4l6MmA== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="40742821" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="40742821" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:13:41 -0800 X-CSE-ConnectionGUID: cKpSbpHOT/CeCL+Sz1g+Ng== X-CSE-MsgGUID: 3WK+ktr2R0ajxUvQCLUl8Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="104843557" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:13:37 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com Subject: [PATCH 4/7] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Date: Mon, 13 Jan 2025 10:12:50 +0800 Message-ID: <20250113021250.18948-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250113020925.18789-1-yan.y.zhao@intel.com> References: <20250113020925.18789-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Kick off all vCPUs and prevent tdh_vp_enter() from executing whenever tdh_mem_range_block()/tdh_mem_track()/tdh_mem_page_remove() encounters contention, since the page removal path does not expect error and is less sensitive to the performance penalty caused by kicking off vCPUs. Although KVM has protected SEPT zap-related SEAMCALLs with kvm->mmu_lock, KVM may still encounter TDX_OPERAND_BUSY due to the contention in the TDX module. - tdh_mem_track() may contend with tdh_vp_enter(). - tdh_mem_range_block()/tdh_mem_page_remove() may contend with tdh_vp_enter() and TDCALLs. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------ TDCS epoch tdh_vp_enter tdh_mem_track ------------------------------------------------------------ SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation) tdh_mem_range_block ------------------------------------------------------------ SEPT entry tdh_mem_range_block (Host lock) tdh_mem_page_remove (Host lock) tdg_mem_page_accept (Guest lock) tdg_mem_page_attr_rd (Guest lock) tdg_mem_page_attr_wr (Guest lock) Use a TDX specific per-VM flag wait_for_sept_zap along with KVM_REQ_OUTSIDE_GUEST_MODE to kick off vCPUs and prevent them from entering TD, thereby avoiding the potential contention. Apply the kick-off and no vCPU entering only after each SEAMCALL busy error to minimize the window of no TD entry, as the contention due to 0-step mitigation or TDCALLs is expected to be rare. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/tdx.c | 62 ++++++++++++++++++++++++++++++++++++------ arch/x86/kvm/vmx/tdx.h | 7 +++++ 2 files changed, 60 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index bb9d914765fc..09677a4cd605 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -312,6 +312,26 @@ static void tdx_clear_page(unsigned long page_pa) __mb(); } +static void tdx_no_vcpus_enter_start(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + lockdep_assert_held_write(&kvm->mmu_lock); + + WRITE_ONCE(kvm_tdx->wait_for_sept_zap, true); + + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); +} + +static void tdx_no_vcpus_enter_stop(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + lockdep_assert_held_write(&kvm->mmu_lock); + + WRITE_ONCE(kvm_tdx->wait_for_sept_zap, false); +} + /* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */ static int __tdx_reclaim_page(hpa_t pa) { @@ -979,6 +999,14 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return EXIT_FASTPATH_NONE; } + /* + * Wait until retry of SEPT-zap-related SEAMCALL completes before + * allowing vCPU entry to avoid contention with tdh_vp_enter() and + * TDCALLs. + */ + if (unlikely(READ_ONCE(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap))) + return EXIT_FASTPATH_EXIT_HANDLED; + trace_kvm_entry(vcpu, force_immediate_exit); if (pi_test_on(&tdx->pi_desc)) { @@ -1647,15 +1675,23 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm)) return -EINVAL; - do { + /* + * When zapping private page, write lock is held. So no race condition + * with other vcpu sept operation. + * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs. + */ + err = tdh_mem_page_remove(kvm_tdx->tdr_pa, gpa, tdx_level, &entry, + &level_state); + if ((err & TDX_OPERAND_BUSY)) { /* - * When zapping private page, write lock is held. So no race - * condition with other vcpu sept operation. Race only with - * TDH.VP.ENTER. + * The second retry is expected to succeed after kicking off all + * other vCPUs and prevent them from invoking TDH.VP.ENTER. */ + tdx_no_vcpus_enter_start(kvm); err = tdh_mem_page_remove(kvm_tdx->tdr_pa, gpa, tdx_level, &entry, &level_state); - } while (unlikely(err == TDX_ERROR_SEPT_BUSY)); + tdx_no_vcpus_enter_stop(kvm); + } if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE && err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { @@ -1726,8 +1762,12 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, WARN_ON_ONCE(level != PG_LEVEL_4K); err = tdh_mem_range_block(kvm_tdx->tdr_pa, gpa, tdx_level, &entry, &level_state); - if (unlikely(err == TDX_ERROR_SEPT_BUSY)) - return -EAGAIN; + if (unlikely(err & TDX_OPERAND_BUSY)) { + /* After no vCPUs enter, the second retry is expected to succeed */ + tdx_no_vcpus_enter_start(kvm); + err = tdh_mem_range_block(kvm_tdx->tdr_pa, gpa, tdx_level, &entry, &level_state); + tdx_no_vcpus_enter_stop(kvm); + } if (KVM_BUG_ON(err, kvm)) { pr_tdx_error_2(TDH_MEM_RANGE_BLOCK, err, entry, level_state); return -EIO; @@ -1770,9 +1810,13 @@ static void tdx_track(struct kvm *kvm) lockdep_assert_held_write(&kvm->mmu_lock); - do { + err = tdh_mem_track(kvm_tdx->tdr_pa); + if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY) { + /* After no vCPUs enter, the second retry is expected to succeed */ + tdx_no_vcpus_enter_start(kvm); err = tdh_mem_track(kvm_tdx->tdr_pa); - } while (unlikely((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY)); + tdx_no_vcpus_enter_stop(kvm); + } if (KVM_BUG_ON(err, kvm)) pr_tdx_error(TDH_MEM_TRACK, err); diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 0833d1084331..e369a6f8721b 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -48,6 +48,13 @@ struct kvm_tdx { /* For KVM_TDX_INIT_MEM_REGION. */ atomic64_t nr_premapped; + + /* + * Prevent vCPUs from TD entry to ensure SEPT zap related SEAMCALLs do + * not contend with tdh_vp_enter() and TDCALLs. + * Set/unset is protected with kvm->mmu_lock. + */ + bool wait_for_sept_zap; }; /* TDX module vCPU states */ From patchwork Mon Jan 13 02:13:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13936661 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 283BC145B00; Mon, 13 Jan 2025 02:13:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734437; cv=none; b=BLQF4GY4C7igtgCGPHCIQVg7gG9JRv2+vTe/09K7Q603f/x9kILykVBKJMO8W/LGrNjhADZ7kRtPN29SdHNWXC8cED3ujFf7TD8epeqAM9qEWxNL4MPOE9y3VUPtk+eVyF9f35M6aediYlT5alzPZ6xj4CbLSjrpNoKQaeds2eg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734437; c=relaxed/simple; bh=qpaRne1AKCGk1uWv551ShWBepilaFlywD79fQ72qHaA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JdgqdTw5C8oxOT8SIfAAd+nHKFadDCDGB3E1ojbMhKcWzg1c1V3FzCEN4u2dQtPlfI5mgsVyKgHtS0xZeUEgFujAJx2HEJ8ZPbeqzBj/7AORpXuEFmU6zi1ZOzFwd5lw4IpO+mfZJKwXNqEfcWvnmY8RhHycLEgh/cJvhVGZYAw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dDqwoy9b; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dDqwoy9b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736734436; x=1768270436; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qpaRne1AKCGk1uWv551ShWBepilaFlywD79fQ72qHaA=; b=dDqwoy9b1u1U9Mn6E7mlnbnIvzV9jiFa6HZ5RvHHKhN67sqrqp4DEaU7 YLt4NizVwB39nOqv7+Ff7UaRnfg3/MX817tDif0sbruRSFj0IvA2wF4q3 YcGuvBbbQ9coHB6oYrezHOGyr/tRbCzQNG1qs/kAM8KOeBYpQbMr7sky4 9uNk6Xl3amXJ++v1U3HatxskwvSegGFQoymlLrUee4XuUtP1bCtg6dHoJ pO4kDpoZCKMzfwrupb955GEqh+FgsZNaeVfUWdXWQs+qhDCz6wJ+83qO9 /fwW7pFIGQ3iIEUg/D1Ja4k4mLQz4Qz6Bi7KlYd2lfZoB2IrxuKWRdOjL Q==; X-CSE-ConnectionGUID: imRiQNcKTlK9fNxEjjAcVA== X-CSE-MsgGUID: ZX4EVmoWQJGUE/L8wrAmUQ== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="36996140" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="36996140" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:13:51 -0800 X-CSE-ConnectionGUID: 1MAAQdH8TH+kPiFxUFjHvQ== X-CSE-MsgGUID: DGiZ8Ml5RUOhSmxlgcqmGg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="104896448" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:13:48 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com Subject: [PATCH 5/7] fixup! KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Date: Mon, 13 Jan 2025 10:13:00 +0800 Message-ID: <20250113021301.18962-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250113020925.18789-1-yan.y.zhao@intel.com> References: <20250113020925.18789-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Return -EBUSY instead of -EAGAIN when tdh_mem_sept_add() encounters any err of TDX_OPERAND_BUSY. Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/tdx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 09677a4cd605..4fb9faca5db2 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1740,8 +1740,9 @@ int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, err = tdh_mem_sept_add(to_kvm_tdx(kvm)->tdr_pa, gpa, tdx_level, hpa, &entry, &level_state); - if (unlikely(err == TDX_ERROR_SEPT_BUSY)) - return -EAGAIN; + if (unlikely(err & TDX_OPERAND_BUSY)) + return -EBUSY; + if (KVM_BUG_ON(err, kvm)) { pr_tdx_error_2(TDH_MEM_SEPT_ADD, err, entry, level_state); return -EIO; From patchwork Mon Jan 13 02:13:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13936662 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 559A585260; Mon, 13 Jan 2025 02:14:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734447; cv=none; b=pfv+HSjcfffxIQvTM6hV/Td2R39lFCnmL/LNEYa45+AuojIx1Y5E3BY+FpefhLeezxuR0VK8KZ3ZHCyzhiG4w/yEc67AKR3SGgJcYdos59zJU65ULqFN5Skzdc2SapkTmxl3XkaVuBbqCvrOtIG49nPWKYP4JxXgrfuBaYKG8EU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734447; c=relaxed/simple; bh=fSS+3wCmxy194LJIkPkwm+250byWmhWkyg6/QZ2Kofw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pXpmvYTyKTBH784pd/C+Cybm1HJyspEo8096uSRSgxmvmtaobRroLfHoUWcOwFqubNuuvEaREYEh3lDZ0svr7zxtSVxPWZ0Looni430Dm3NGOuQS+CN5PuU6R4avszQMMMyvbyGwo1Pv4MyzB9xNbMiujOVJrb74MuffFI/ujjI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=b5UIqp0o; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="b5UIqp0o" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736734445; x=1768270445; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fSS+3wCmxy194LJIkPkwm+250byWmhWkyg6/QZ2Kofw=; b=b5UIqp0o+p1IAv3cWFSqzDjznmyoxuT/FnT/uq3Saa3+aXzeIcrI7+Em VO9DE9MHQGRf521EeBH/2gbGQJxZkahbFQndjWUPq5a+di459o0ozp7Tg aYYDOCLEID65UoM2SMBvGoqDBbdmBXUVJt010BngJamWMp8ZXlR3sGYaU N3i0yFZqxJL6Qxgd7A6hJ+GknlsK7aYCg9NChiRNnIicjqpwQjRkidptg V/2g8GyXy786MiFrR10+0Tl80AYMlSqbc9UY7AnpFXLt4QzRXvmS21dCb +kHAzkMY/V7VrD3ehcJOK/qvkgc0qZD14cjEm9+po3lOJCFC9ucdbQkE5 A==; X-CSE-ConnectionGUID: /B30AeQdRj2XbeM78atU+A== X-CSE-MsgGUID: tKzpDerfQWuDxVm7E2f1dg== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="36996175" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="36996175" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:14:05 -0800 X-CSE-ConnectionGUID: h+IocwMlRgKw31HiqiKZ2w== X-CSE-MsgGUID: LCzFuU9aRquYlfLfJOmL9g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="104896531" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:13:59 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com Subject: [PATCH 6/7] fixup! KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Date: Mon, 13 Jan 2025 10:13:12 +0800 Message-ID: <20250113021312.18976-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250113020925.18789-1-yan.y.zhao@intel.com> References: <20250113020925.18789-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Remove the retry loop for tdh_phymem_page_wbinvd_hkid(). tdh_phymem_page_wbinvd_hkid() just acquires the lock on the PAMT entry of the page to be wbinvd, so it's not expected to encounter TDX_OPERAND_BUSY. Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/tdx.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 4fb9faca5db2..baabae95504b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1712,14 +1712,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, return -EIO; } - do { - /* - * TDX_OPERAND_BUSY can happen on locking PAMT entry. Because - * this page was removed above, other thread shouldn't be - * repeatedly operating on this page. Just retry loop. - */ - err = tdh_phymem_page_wbinvd_hkid(hpa, (u16)kvm_tdx->hkid); - } while (unlikely(err == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX))); + err = tdh_phymem_page_wbinvd_hkid(hpa, (u16)kvm_tdx->hkid); if (KVM_BUG_ON(err, kvm)) { pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); From patchwork Mon Jan 13 02:13:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13936663 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB8BF13AA31; Mon, 13 Jan 2025 02:14:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734456; cv=none; b=AKXAwhiIOnSLt9BCPEjBntDbWHH3FZszTgKV2/z/XyfH5eIIDyAJUQlMN0Hgiv5yQ1zTTUBaVWDiL4xYI7tq2l8pO95pg0S9jdHOT3irzg7CBnbDvjZHmyLvE6ruxB8uHaGd05UZ+GULMYotRTWl4z2EAQAiL5JecVt2rOyPbuE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736734456; c=relaxed/simple; bh=Yo94CZ4CaF1GQd0v5DxVQiaUOfdJ2d8gBEykvkzaxyc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QBci0cv0CSNDhPbqzoSL1KUGF2fBHYE0nmHO2ClSXOy1zU4wfkFupSO9GKqSL1Z9aTXRdMfLMkIoRN2J2i46zvKkxJlfMzM0Q8xJe/gYS9I1H1ktLOBFkzn+Cqb3E4ZTZ6GjDvHGMcb1qwR7Gl7mVpylg0/ZXdMkUZoyER1C9Ww= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Metkzc//; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Metkzc//" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736734454; x=1768270454; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Yo94CZ4CaF1GQd0v5DxVQiaUOfdJ2d8gBEykvkzaxyc=; b=Metkzc//D837znoamn+8HlxXiK0CtYuMjxbegewnqlsS9Z0U5eEFPLil pB3LH0tRqip8sJ4zfaCU/75oZl9e6s02BnyUKSEwehtFqhWVqzbHYnXHT lpR1mK0WLrCjtenDWXXP3kB4c8cv4Ub4//u9Y4MVNHqYWGaEtAynJIMrG JiwTlx1Rgvf8cPtpe9Shz2SQQrUxKX4KGlbDFMYhsvfqEQ90gHCRLr98d zJOEQlcRJXq9qD4hbzhdrO/JpDonAdykecZ7hWduu0txkgH95whZFyoyp RIfsp69iIu4FvqYj4ZrBvt4EajLP0xn5dQ5C1cwdFqeIdsf2wBgYXoYj8 Q==; X-CSE-ConnectionGUID: 1fqqz6iuT5GzD6qqiu+SUg== X-CSE-MsgGUID: J3aI7pGVSjaTiEpqJOm2rA== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="36996197" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="36996197" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:14:13 -0800 X-CSE-ConnectionGUID: DYPgB1ycQ9qtX02VgTbI5g== X-CSE-MsgGUID: YVbNIw/ZQS+vhMAQXBpxvw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="104896593" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2025 18:14:10 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com Subject: [PATCH 7/7] fixup! KVM: TDX: Implement TDX vcpu enter/exit path Date: Mon, 13 Jan 2025 10:13:22 +0800 Message-ID: <20250113021322.18991-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250113020925.18789-1-yan.y.zhao@intel.com> References: <20250113020925.18789-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Warn on force_immediate_exit in tdx_vcpu_run(). force_immediate_exit requires vCPU entering for events injection with an immediately exit followed. But The TDX module doesn't guarantee entry, it's already possible for KVM to _think_ it completely entry to the guest without actually having done so. Since KVM never needs to force an immediate exit for TDX, and can't do direct injection, there's no need to implement force_immediate_exit, i.e. carrying out the kicking vCPU and event reinjection. Simply warn on force_immediate_exit. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/tdx.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index baabae95504b..0e684f4683f2 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -999,6 +999,16 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return EXIT_FASTPATH_NONE; } + /* + * force_immediate_exit requires vCPU entering for events injection with + * an immediately exit followed. But The TDX module doesn't guarantee + * entry, it's already possible for KVM to _think_ it completely entry + * to the guest without actually having done so. + * Since KVM never needs to force an immediate exit for TDX, and can't + * do direct injection, just warn on force_immediate_exit. + */ + WARN_ON_ONCE(force_immediate_exit); + /* * Wait until retry of SEPT-zap-related SEAMCALL completes before * allowing vCPU entry to avoid contention with tdh_vp_enter() and