Message ID | 20250113021050.18828-1-yan.y.zhao@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: TDX SEPT SEAMCALL retry | expand |
On Mon, 2025-01-13 at 10:10 +0800, Yan Zhao wrote: > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index d0dc3200fa37..1cf3ef0faff7 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -3024,13 +3024,11 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > } > > ret = 0; > - do { > - err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), > - pfn_to_hpa(page_to_pfn(page)), > - &entry, &level_state); > - } while (err == TDX_ERROR_SEPT_BUSY); > + err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), > + pfn_to_hpa(page_to_pfn(page)), > + &entry, &level_state); > if (err) { > - ret = -EIO; > + ret = unlikely(err & TDX_OPERAND_BUSY) ? -EBUSY : -EIO; > goto out; > } Should we just squash this into "KVM: TDX: Add an ioctl to create initial guest memory"? I guess we get a little more specific log history on this corner as a separate patch, but seems strange to add and remove a loop before it even can get exercised.
On Wed, Jan 15, 2025 at 06:24:34AM +0800, Edgecombe, Rick P wrote: > On Mon, 2025-01-13 at 10:10 +0800, Yan Zhao wrote: > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index d0dc3200fa37..1cf3ef0faff7 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -3024,13 +3024,11 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > > } > > > > ret = 0; > > - do { > > - err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), > > - pfn_to_hpa(page_to_pfn(page)), > > - &entry, &level_state); > > - } while (err == TDX_ERROR_SEPT_BUSY); > > + err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), > > + pfn_to_hpa(page_to_pfn(page)), > > + &entry, &level_state); > > if (err) { > > - ret = -EIO; > > + ret = unlikely(err & TDX_OPERAND_BUSY) ? -EBUSY : -EIO; > > goto out; > > } > > Should we just squash this into "KVM: TDX: Add an ioctl to create initial guest > memory"? I guess we get a little more specific log history on this corner as a > separate patch, but seems strange to add and remove a loop before it even can > get exercised. No problem to me.
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d0dc3200fa37..1cf3ef0faff7 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -3024,13 +3024,11 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, } ret = 0; - do { - err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), - pfn_to_hpa(page_to_pfn(page)), - &entry, &level_state); - } while (err == TDX_ERROR_SEPT_BUSY); + err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), + pfn_to_hpa(page_to_pfn(page)), + &entry, &level_state); if (err) { - ret = -EIO; + ret = unlikely(err & TDX_OPERAND_BUSY) ? -EBUSY : -EIO; goto out; }
tdh_mem_page_add() is called during TD build time. Within the TDX module, it acquires the exclusive lock on the TDR resource (eliminating the need to hold locks for TDCS/SEPT tree) and the exclusive lock on the PAMT entry for the page to be added. The TDX module returns TDX_OPERAND_BUSY if tdh_mem_page_add() contends with other SEAMCALLs. SEAMCALL Lock Type Resource ----------------------------------------------------------------------- tdh_mem_page_add EXCLUSIVE TDR NO_LOCK TDCS NO_LOCK SEPT tree EXCLUSIVE PAMT entry for the page to add Given (1)-(4) and the expected behavior from userspace (5), KVM doesn't expect tdh_mem_page_add() to encounter TDX_OPERAND_BUSY: (1) tdx_vcpu_create() only allows vCPU creation when the TD state is TD_STATE_INITIALIZED, so tdh_mem_page_add(), as invoked in vCPU ioctl, does not contend with tdh_mng_create()/tdh_mng_addcx()/tdh_mng_key_config()/tdh_mng_init(). (2) tdx_vcpu_ioctl() bails out on TD_STATE_RUNNABLE, so tdh_mem_page_add() does not contend with tdh_vp_enter()/tdh_mem_page_aug()/tdh_mem_track() and TDCALLs. (3) By holding slots_lock and the filemap invalidate lock, tdh_mem_page_add() does not contend with tdh_mr_finalize(), tdh_mem_page_remove()/tdh_mem_range_block()/ tdh_phymem_page_wbinvd_hkid() or another tdh_mem_page_add(), tdh_mem_sept_add()/tdh_mr_extend(). (4) By holding reference to kvm, tdh_mem_page_add() does not contend with tdh_mng_vpflushdone()/tdh_phymem_cache_wb()/tdh_mng_key_freeid()/ tdh_phymem_page_wbinvd_tdr()/tdh_phymem_page_reclaim(). (5) A well-behaved userspace invokes ioctl KVM_TDX_INIT_MEM_REGION on one vCPU after initializing all vCPUs and does not invoke ioctls on the other vCPUs before the KVM_TDX_INIT_MEM_REGION completes. Thus, tdh_mem_page_add() does not contend with tdh_vp_create()/tdh_vp_addcx()/tdh_vp_init*()/tdh_vp_rd()/tdh_vp_wr()/ tdh_mng_rd()/tdh_vp_flush() on the other vCPUs. However, if the userspace breaks (5), tdh_mem_page_add() could encounter TDX_OPERAND_BUSY when trying to acquire the exclusive lock on the TDR resource in the TDX module. In this case, simply return -EBUSY. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- tdx_vcpu_pre_run() will check TD_STATE_RUNNABLE for (2). https://lore.kernel.org/kvm/3576c721-3ef2-40bd-8764-b50912df93a2@intel.com/ --- arch/x86/kvm/vmx/tdx.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)