diff mbox series

[1/7] KVM: TDX: Return -EBUSY when tdh_mem_page_add() encounters TDX_OPERAND_BUSY

Message ID 20250113021050.18828-1-yan.y.zhao@intel.com (mailing list archive)
State New
Headers show
Series KVM: TDX SEPT SEAMCALL retry | expand

Commit Message

Yan Zhao Jan. 13, 2025, 2:10 a.m. UTC
tdh_mem_page_add() is called during TD build time. Within the TDX module,
it acquires the exclusive lock on the TDR resource (eliminating the need to
hold locks for TDCS/SEPT tree) and the exclusive lock on the PAMT entry for
the page to be added. The TDX module returns TDX_OPERAND_BUSY if
tdh_mem_page_add() contends with other SEAMCALLs.

SEAMCALL                Lock Type        Resource
-----------------------------------------------------------------------
tdh_mem_page_add        EXCLUSIVE        TDR
                        NO_LOCK          TDCS
                        NO_LOCK          SEPT tree
                        EXCLUSIVE        PAMT entry for the page to add

Given (1)-(4) and the expected behavior from userspace (5), KVM doesn't
expect tdh_mem_page_add() to encounter TDX_OPERAND_BUSY:
(1) tdx_vcpu_create() only allows vCPU creation when the TD state is
    TD_STATE_INITIALIZED, so tdh_mem_page_add(), as invoked in vCPU ioctl,
    does not contend with
    tdh_mng_create()/tdh_mng_addcx()/tdh_mng_key_config()/tdh_mng_init().

(2) tdx_vcpu_ioctl() bails out on TD_STATE_RUNNABLE, so tdh_mem_page_add()
    does not contend with tdh_vp_enter()/tdh_mem_page_aug()/tdh_mem_track()
    and TDCALLs.

(3) By holding slots_lock and the filemap invalidate lock,
    tdh_mem_page_add() does not contend with tdh_mr_finalize(),
    tdh_mem_page_remove()/tdh_mem_range_block()/
    tdh_phymem_page_wbinvd_hkid() or another tdh_mem_page_add(),
    tdh_mem_sept_add()/tdh_mr_extend().

(4) By holding reference to kvm, tdh_mem_page_add() does not contend with
    tdh_mng_vpflushdone()/tdh_phymem_cache_wb()/tdh_mng_key_freeid()/
    tdh_phymem_page_wbinvd_tdr()/tdh_phymem_page_reclaim().

(5) A well-behaved userspace invokes ioctl KVM_TDX_INIT_MEM_REGION on one
    vCPU after initializing all vCPUs and does not invoke ioctls on the
    other vCPUs before the KVM_TDX_INIT_MEM_REGION completes. Thus,
    tdh_mem_page_add() does not contend with
    tdh_vp_create()/tdh_vp_addcx()/tdh_vp_init*()/tdh_vp_rd()/tdh_vp_wr()/
    tdh_mng_rd()/tdh_vp_flush() on the other vCPUs.

However, if the userspace breaks (5), tdh_mem_page_add() could encounter
TDX_OPERAND_BUSY when trying to acquire the exclusive lock on the TDR
resource in the TDX module. In this case, simply return -EBUSY.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
tdx_vcpu_pre_run() will check TD_STATE_RUNNABLE for (2).
https://lore.kernel.org/kvm/3576c721-3ef2-40bd-8764-b50912df93a2@intel.com/
---
 arch/x86/kvm/vmx/tdx.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Comments

Edgecombe, Rick P Jan. 14, 2025, 10:24 p.m. UTC | #1
On Mon, 2025-01-13 at 10:10 +0800, Yan Zhao wrote:
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index d0dc3200fa37..1cf3ef0faff7 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -3024,13 +3024,11 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>  	}
>  
>  	ret = 0;
> -	do {
> -		err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
> -				       pfn_to_hpa(page_to_pfn(page)),
> -				       &entry, &level_state);
> -	} while (err == TDX_ERROR_SEPT_BUSY);
> +	err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
> +			       pfn_to_hpa(page_to_pfn(page)),
> +			       &entry, &level_state);
>  	if (err) {
> -		ret = -EIO;
> +		ret = unlikely(err & TDX_OPERAND_BUSY) ? -EBUSY : -EIO;
>  		goto out;
>  	}

Should we just squash this into "KVM: TDX: Add an ioctl to create initial guest
memory"? I guess we get a little more specific log history on this corner as a
separate patch, but seems strange to add and remove a loop before it even can
get exercised.
Yan Zhao Jan. 15, 2025, 4:59 a.m. UTC | #2
On Wed, Jan 15, 2025 at 06:24:34AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-01-13 at 10:10 +0800, Yan Zhao wrote:
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index d0dc3200fa37..1cf3ef0faff7 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -3024,13 +3024,11 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> >  	}
> >  
> >  	ret = 0;
> > -	do {
> > -		err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
> > -				       pfn_to_hpa(page_to_pfn(page)),
> > -				       &entry, &level_state);
> > -	} while (err == TDX_ERROR_SEPT_BUSY);
> > +	err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
> > +			       pfn_to_hpa(page_to_pfn(page)),
> > +			       &entry, &level_state);
> >  	if (err) {
> > -		ret = -EIO;
> > +		ret = unlikely(err & TDX_OPERAND_BUSY) ? -EBUSY : -EIO;
> >  		goto out;
> >  	}
> 
> Should we just squash this into "KVM: TDX: Add an ioctl to create initial guest
> memory"? I guess we get a little more specific log history on this corner as a
> separate patch, but seems strange to add and remove a loop before it even can
> get exercised.
No problem to me.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d0dc3200fa37..1cf3ef0faff7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3024,13 +3024,11 @@  static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 	}
 
 	ret = 0;
-	do {
-		err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
-				       pfn_to_hpa(page_to_pfn(page)),
-				       &entry, &level_state);
-	} while (err == TDX_ERROR_SEPT_BUSY);
+	err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
+			       pfn_to_hpa(page_to_pfn(page)),
+			       &entry, &level_state);
 	if (err) {
-		ret = -EIO;
+		ret = unlikely(err & TDX_OPERAND_BUSY) ? -EBUSY : -EIO;
 		goto out;
 	}