[21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID

Message ID	20240812224820.34826-22-rick.p.edgecombe@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E86781A0708; Mon, 12 Aug 2024 22:48:46 +0000 (UTC) From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Date: Mon, 12 Aug 2024 15:48:16 -0700 Message-Id: <20240812224820.34826-22-rick.p.edgecombe@intel.com> In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	TDX vCPU/VM creation \| expand [00/25] TDX vCPU/VM creation [01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures [02/25] KVM: TDX: Define TDX architectural definitions [03/25] KVM: TDX: Add TDX "architectural" error codes [04/25] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module [05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error [06/25] x86/virt/tdx: Export TDX KeyID information [07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id [08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl [09/25] KVM: TDX: Get system-wide info about TDX module on initialization [10/25] KVM: TDX: Initialize KVM supported capabilities when module setup [11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES [12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests [13/25] KVM: TDX: create/destroy VM structure [14/25] KVM: TDX: initialize VM with TDX specific parameters [15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case [16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest [17/25] KVM: TDX: create/free TDX vcpu structure [18/25] KVM: TDX: Do TDX specific vcpu initialization [19/25] KVM: X86: Introduce kvm_get_supported_cpuid_internal() [20/25] KVM: X86: Introduce tdx_get_kvm_supported_cpuid() [21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID [22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW [23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level [24/25] KVM: x86: Filter directly configurable TDX CPUID bits [25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID

Edgecombe, Rick P Aug. 12, 2024, 10:48 p.m. UTC

From: Xiaoyao Li <xiaoyao.li@intel.com>

Implement an IOCTL to allow userspace to read the CPUID bit values for a
configured TD.

The TDX module doesn't provide the ability to set all CPUID bits. Instead
some are configured indirectly, or have fixed values. But it does allow
for the final resulting CPUID bits to be read. This information will be
useful for userspace to understand the configuration of the TD, and set
KVM's copy via KVM_SET_CPUID2.

To prevent userspace from starting to use features that might not have KVM
support yet, filter the reported values by KVM's support CPUID bits.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
uAPI breakout v1:
 - New patch
---
 arch/x86/include/uapi/asm/kvm.h |   1 +
 arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h          |   5 ++
 arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
 arch/x86/kvm/vmx/tdx_errno.h    |   1 +
 5 files changed, 143 insertions(+)

Tao Su Aug. 19, 2024, 2:59 a.m. UTC | #1

On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - New patch
> ---
>  arch/x86/include/uapi/asm/kvm.h |   1 +
>  arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/tdx.h          |   5 ++
>  arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>  arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>  5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>  	KVM_TDX_CAPABILITIES = 0,
>  	KVM_TDX_INIT_VM,
>  	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>  
>  	KVM_TDX_CMD_NR_MAX,
>  };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>  	return ret;
>  }
>  
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);
> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>  static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>  {
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>  	return r;
>  }
>  
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */
> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;
> +		}

I suggest putting all guest_phys_bits related WA in a WA-only patch, which will
be clearer.

> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);
> +	return r;
> +}
> +
>  static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>  {
>  	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>  	case KVM_TDX_INIT_VCPU:
>  		ret = tdx_vcpu_init(vcpu, &cmd);
>  		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>  	bool finalized;
>  
>  	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;

I don't see it is used in this patch set.

> +
> +	struct kvm_cpuid2 *cpuid;
>  };
>  
>  struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>  
>  #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>  
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>  #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>  #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>  #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>  #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>  
>  /*
>   * TDX module operand ID, appears in 31:0 part of error code as
> -- 
> 2.34.1
> 
>

Xu Yilun Aug. 19, 2024, 5:02 a.m. UTC | #2

On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>  - New patch
> ---
>  arch/x86/include/uapi/asm/kvm.h |   1 +
>  arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/tdx.h          |   5 ++
>  arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>  arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>  5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>  	KVM_TDX_CAPABILITIES = 0,
>  	KVM_TDX_INIT_VM,
>  	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>  
>  	KVM_TDX_CMD_NR_MAX,
>  };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>  	return ret;
>  }
>  
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);

Some fields contains a N-bits wide value instead of a bitmask, why a &=
just work?

> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>  static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>  {
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>  	return r;
>  }
>  
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);

Personally I don't like the definition of this function. I need to look
into the inner implementation to see if kfree(supported_cpuid); is needed
or safe. How about:

  supported_cpuid = tdx_get_kvm_supported_cpuid();
  if (!supported_cpuid)
	goto out_td_cpuid;

> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */
> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;

Is it possible this workaround escapes the KVM supported bits check?

> +		}
> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);

Traditionally we do:

  out_supported_cpuid:
	kfree(supported_cpuid);
  out_td_cpuid:
	kfree(td_cpuid);

I'm not sure what's the advantage to make people think more about whether
kfree is safe.

> +	return r;
> +}
> +
>  static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>  {
>  	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>  	case KVM_TDX_INIT_VCPU:
>  		ret = tdx_vcpu_init(vcpu, &cmd);
>  		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>  	bool finalized;
>  
>  	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;

This doesn't belong to this patch.

> +
> +	struct kvm_cpuid2 *cpuid;

Didn't find the usage of this field.

Thanks,
Yilun

>  };
>  
>  struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>  
>  #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>  
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>  #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>  #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>  #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>  #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>  
>  /*
>   * TDX module operand ID, appears in 31:0 part of error code as
> -- 
> 2.34.1
> 
>

Nikolay Borisov Aug. 26, 2024, 2:09 p.m. UTC | #3

On 13.08.24 г. 1:48 ч., Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - New patch
> ---
>   arch/x86/include/uapi/asm/kvm.h |   1 +
>   arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/tdx.h          |   5 ++
>   arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>   arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>   5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>   	KVM_TDX_CAPABILITIES = 0,
>   	KVM_TDX_INIT_VM,
>   	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>   
>   	KVM_TDX_CMD_NR_MAX,
>   };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>   	return ret;
>   }
>   
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);
> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>   static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>   {
>   	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>   	return r;
>   }
>   
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */


Define old TDX module? I believe the minimum supported TDX version is 
1.5 as EMR are the first public CPUs to support this, no? Module 1.0 was 
used for private previews etc? Can this be dropped altogether? It is 
much easier to mandate the minimum supported version now when nothing 
has been merged. Furthermore, in some of the earlier patches it's 
specifically required that the TDX module support NO_RBP_MOD which 
became available in 1.5, which already dictates that the minimum version 
we should care about is 1.5.


> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;
> +		}
> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);
> +	return r;
> +}
> +
>   static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>   {
>   	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>   	case KVM_TDX_INIT_VCPU:
>   		ret = tdx_vcpu_init(vcpu, &cmd);
>   		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>   	bool finalized;
>   
>   	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;
> +
> +	struct kvm_cpuid2 *cpuid;
>   };
>   
>   struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>   
>   #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>   
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>   #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>   #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>   #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>   #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>   
>   /*
>    * TDX module operand ID, appears in 31:0 part of error code as

Edgecombe, Rick P Aug. 26, 2024, 5:46 p.m. UTC | #4

On Mon, 2024-08-26 at 17:09 +0300, Nikolay Borisov wrote:
> > +               /*
> > +                * Work around missing support on old TDX modules, fetch
> > +                * guest maxpa from gfn_direct_bits.
> > +                */
> 
> 
> Define old TDX module? I believe the minimum supported TDX version is 
> 1.5 as EMR are the first public CPUs to support this, no? Module 1.0 was 
> used for private previews etc? Can this be dropped altogether? 

Well, today "old" means all released TDX modules. This is a new feature under
development, that KVM maintainers were ok working around being missing for now.
The comment should be improved.

See here for discussion of the design and purpose of the feature:
https://lore.kernel.org/kvm/f9f1da5dc94ad6b776490008dceee5963b451cda.camel@intel.com/

> It is 
> much easier to mandate the minimum supported version now when nothing 
> has been merged. Furthermore, in some of the earlier patches it's 
> specifically required that the TDX module support NO_RBP_MOD which 
> became available in 1.5, which already dictates that the minimum version 
> we should care about is 1.5.

There is some checking in Kai's TDX module init patches:
https://lore.kernel.org/kvm/d307d82a52ef604cfff8c7745ad8613d3ddfa0c8.1721186590.git.kai.huang@intel.com/

But beyond checking for supported features, there are also bug fixes that can
affect usability. In the NO_RBP_MOD case we need a specific recent TDX module in
order to remove the RBP workaround patches.

We could just check for a specific TDX module version instead, but I'm not sure
whether KVM would want to get into the game of picking preferred TDX module
versions. I guess in the case of any bugs that affect the host it will have to
do it though. So we will have to add a version check before live KVM support
lands upstream.

Hmm, thanks for the question.

Nikolay Borisov Aug. 27, 2024, 12:19 p.m. UTC | #5

On 26.08.24 г. 20:46 ч., Edgecombe, Rick P wrote:
> On Mon, 2024-08-26 at 17:09 +0300, Nikolay Borisov wrote:
>>> +               /*
>>> +                * Work around missing support on old TDX modules, fetch
>>> +                * guest maxpa from gfn_direct_bits.
>>> +                */
>>
>>
>> Define old TDX module? I believe the minimum supported TDX version is
>> 1.5 as EMR are the first public CPUs to support this, no? Module 1.0 was
>> used for private previews etc? Can this be dropped altogether?
> 
> Well, today "old" means all released TDX modules. This is a new feature under
> development, that KVM maintainers were ok working around being missing for now.
> The comment should be improved.
> 
> See here for discussion of the design and purpose of the feature:
> https://lore.kernel.org/kvm/f9f1da5dc94ad6b776490008dceee5963b451cda.camel@intel.com/
> 
>> It is
>> much easier to mandate the minimum supported version now when nothing
>> has been merged. Furthermore, in some of the earlier patches it's
>> specifically required that the TDX module support NO_RBP_MOD which
>> became available in 1.5, which already dictates that the minimum version
>> we should care about is 1.5.
> 
> There is some checking in Kai's TDX module init patches:
> https://lore.kernel.org/kvm/d307d82a52ef604cfff8c7745ad8613d3ddfa0c8.1721186590.git.kai.huang@intel.com/

Yes, that's why I mentioned this. I have already reviewed those patches :)

> 
> But beyond checking for supported features, there are also bug fixes that can
> affect usability. In the NO_RBP_MOD case we need a specific recent TDX module in
> order to remove the RBP workaround patches.

My point was that if having the NO_RPB_MOD implied that the CPUID 
0x8000000 configuration capability is also there (not that there is a 
direct connection between the too but it seems the TDX module isn't 
being updated that often, I might be wrong of course!), there is no 
point in having the workaround as NO_RPB_MOD is the minimum required 
version.

Anyway, this was an assumption on my part.

> 
> We could just check for a specific TDX module version instead, but I'm not sure
> whether KVM would want to get into the game of picking preferred TDX module
> versions. I guess in the case of any bugs that affect the host it will have to
> do it though. So we will have to add a version check before live KVM support
> lands upstream.
> 
> Hmm, thanks for the question.

Edgecombe, Rick P Aug. 27, 2024, 8:40 p.m. UTC | #6

On Tue, 2024-08-27 at 15:19 +0300, Nikolay Borisov wrote:

> > 
> > But beyond checking for supported features, there are also bug fixes that
> > can
> > affect usability. In the NO_RBP_MOD case we need a specific recent TDX
> > module in
> > order to remove the RBP workaround patches.
> 
> My point was that if having the NO_RPB_MOD implied that the CPUID 
> 0x8000000 configuration capability is also there (not that there is a 
> direct connection between the too but it seems the TDX module isn't 
> being updated that often, I might be wrong of course!), there is no 
> point in having the workaround as NO_RPB_MOD is the minimum required 
> version.

Having NO_RBP_MOD won't imply 0x80000008 configuration capability. We will have
to check for a new feature bit for that. We should wait until it's finalized to
add the code.

Tony Lindgren Sept. 3, 2024, 6:21 a.m. UTC | #7

On Mon, Aug 19, 2024 at 10:59:49AM +0800, Tao Su wrote:
> On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > +		/*
> > +		 * Work around missing support on old TDX modules, fetch
> > +		 * guest maxpa from gfn_direct_bits.
> > +		 */
> > +		if (output_e->function == 0x80000008) {
> > +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> > +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> > +
> > +			output_e->eax &= ~0x00ff0000;
> > +			output_e->eax |= g_maxpa << 16;
> > +		}
> 
> I suggest putting all guest_phys_bits related WA in a WA-only patch, which will
> be clearer.

The 80000008 workaround needs to be tidied up for sure, it's hard to follow.

> > --- a/arch/x86/kvm/vmx/tdx.h
> > +++ b/arch/x86/kvm/vmx/tdx.h
> > @@ -25,6 +25,11 @@ struct kvm_tdx {
> >  	bool finalized;
> >  
> >  	u64 tsc_offset;
> > +
> > +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> > +	atomic64_t nr_premapped;
> 
> I don't see it is used in this patch set.

Yes that should have been in a later patch.

Regards,

Tony

Tony Lindgren Sept. 3, 2024, 7:19 a.m. UTC | #8

On Mon, Aug 19, 2024 at 01:02:02PM +0800, Xu Yilun wrote:
> On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> > +{
> > +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> > +	u64 ebx_eax, edx_ecx;
> > +	u64 err = 0;
> > +
> > +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> > +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * bit 23:17, REVSERVED: reserved, must be 0;
> > +	 * bit 16,    LEAF_31: leaf number bit 31;
> > +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> > +	 *                      implicitly 0;
> > +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> > +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> > +	 *                         the SUBLEAF_6_0 is all-1.
> > +	 *                         sub-leaf bits 31:7 are implicitly 0;
> > +	 * bit 0,     ELEMENT_I: Element index within field;
> > +	 */
> > +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> > +	field_id |= (entry->function & 0x7f) << 9;
> > +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> > +		field_id |= (entry->index & 0x7f) << 1;
> > +	else
> > +		field_id |= 0x1fe;
> > +
> > +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> > +	if (err) //TODO check for specific errors
> > +		goto err_out;
> > +
> > +	entry->eax &= (u32) ebx_eax;
> > +	entry->ebx &= (u32) (ebx_eax >> 32);
> 
> Some fields contains a N-bits wide value instead of a bitmask, why a &=
> just work?

There's the CPUID 0x80000008 workaround, I wonder if we are missing some
other handling though. Do you have some specific CPUIDs bits in mind to
check?

The handling for the supported CPUID values mask from the TDX module is
a bit unclear for sure :)

> > +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> > +{
> > +	struct kvm_cpuid2 __user *output, *td_cpuid;
> > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > +	struct kvm_cpuid2 *supported_cpuid;
> > +	int r = 0, i, j = 0;
> > +
> > +	output = u64_to_user_ptr(cmd->data);
> > +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> > +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> > +			GFP_KERNEL);
> > +	if (!td_cpuid)
> > +		return -ENOMEM;
> > +
> > +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> 
> Personally I don't like the definition of this function. I need to look
> into the inner implementation to see if kfree(supported_cpuid); is needed
> or safe. How about:
> 
>   supported_cpuid = tdx_get_kvm_supported_cpuid();
>   if (!supported_cpuid)
> 	goto out_td_cpuid;

So allocate in tdx_get_kvm_supported_cpuid() and the caller frees. Sounds
cleaner to me.

> > +		/*
> > +		 * Work around missing support on old TDX modules, fetch
> > +		 * guest maxpa from gfn_direct_bits.
> > +		 */
> > +		if (output_e->function == 0x80000008) {
> > +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> > +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> > +
> > +			output_e->eax &= ~0x00ff0000;
> > +			output_e->eax |= g_maxpa << 16;
> 
> Is it possible this workaround escapes the KVM supported bits check?

Yes it might need a mask for (g_maxpa << 16) & 0x00ff0000 to avoid setting
the wrong bits, will check.

...
> > +out:
> > +	kfree(td_cpuid);
> > +	kfree(supported_cpuid);
> 
> Traditionally we do:
> 
>   out_supported_cpuid:
> 	kfree(supported_cpuid);
>   out_td_cpuid:
> 	kfree(td_cpuid);
> 
> I'm not sure what's the advantage to make people think more about whether
> kfree is safe.

I'll do a patch for this thanks.

> > --- a/arch/x86/kvm/vmx/tdx.h
> > +++ b/arch/x86/kvm/vmx/tdx.h
> > @@ -25,6 +25,11 @@ struct kvm_tdx {
> >  	bool finalized;
> >  
> >  	u64 tsc_offset;
> > +
> > +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> > +	atomic64_t nr_premapped;
> 
> This doesn't belong to this patch.
> 
> > +
> > +	struct kvm_cpuid2 *cpuid;
> 
> Didn't find the usage of this field.

Thanks will check and drop.

Regards,

Tony

Paolo Bonzini Sept. 10, 2024, 5:27 p.m. UTC | #9

On 9/3/24 08:21, Tony Lindgren wrote:
> On Mon, Aug 19, 2024 at 10:59:49AM +0800, Tao Su wrote:
>> On Mon, Aug 12, 2024 at 03:48:16PM -0700, Rick Edgecombe wrote:
>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>> +		/*
>>> +		 * Work around missing support on old TDX modules, fetch
>>> +		 * guest maxpa from gfn_direct_bits.
>>> +		 */
>>> +		if (output_e->function == 0x80000008) {
>>> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
>>> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
>>> +
>>> +			output_e->eax &= ~0x00ff0000;
>>> +			output_e->eax |= g_maxpa << 16;
>>> +		}
>>
>> I suggest putting all guest_phys_bits related WA in a WA-only patch, which will
>> be clearer.
> 
> The 80000008 workaround needs to be tidied up for sure, it's hard to follow.

I think it's okay if you just add a separate 
tdx_get_guest_phys_addr_bits(struct kvm *kvm).

>>> --- a/arch/x86/kvm/vmx/tdx.h
>>> +++ b/arch/x86/kvm/vmx/tdx.h
>>> @@ -25,6 +25,11 @@ struct kvm_tdx {
>>>   	bool finalized;
>>>   
>>>   	u64 tsc_offset;
>>> +
>>> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
>>> +	atomic64_t nr_premapped;
>>
>> I don't see it is used in this patch set.
> 
> Yes that should have been in a later patch.

Yes, it's used in the MMU prep part 2 series.

Paolo

Paolo Bonzini Sept. 10, 2024, 5:29 p.m. UTC | #10

On 9/3/24 09:19, Tony Lindgren wrote:
>>> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
>>> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
>>> +
>>> +			output_e->eax &= ~0x00ff0000;
>>> +			output_e->eax |= g_maxpa << 16;
>> Is it possible this workaround escapes the KVM supported bits check?
>
> Yes it might need a mask for (g_maxpa << 16) & 0x00ff0000 to avoid setting
> the wrong bits, will check.

The mask is okay, __ffs(gpa_bits) + 1 will be between 1 and 64.

The question is whether the TDX module will accept nonzero bits 16..23 
of CPUID[0x80000008].EAX.

Paolo

Tony Lindgren Sept. 11, 2024, 11:11 a.m. UTC | #11

On Tue, Sep 10, 2024 at 07:29:13PM +0200, Paolo Bonzini wrote:
> On 9/3/24 09:19, Tony Lindgren wrote:
> > > > +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> > > > +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> > > > +
> > > > +			output_e->eax &= ~0x00ff0000;
> > > > +			output_e->eax |= g_maxpa << 16;
> > > Is it possible this workaround escapes the KVM supported bits check?
> > 
> > Yes it might need a mask for (g_maxpa << 16) & 0x00ff0000 to avoid setting
> > the wrong bits, will check.
> 
> The mask is okay, __ffs(gpa_bits) + 1 will be between 1 and 64.

OK

> The question is whether the TDX module will accept nonzero bits 16..23 of
> CPUID[0x80000008].EAX.

Just for reference, that's the 0x80000008 quirk as you noticed in 22/25.

Regards,

Tony

Xiaoyao Li Sept. 30, 2024, 6:26 a.m. UTC | #12

On 8/13/2024 6:48 AM, Rick Edgecombe wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Implement an IOCTL to allow userspace to read the CPUID bit values for a
> configured TD.
> 
> The TDX module doesn't provide the ability to set all CPUID bits. Instead
> some are configured indirectly, or have fixed values. But it does allow
> for the final resulting CPUID bits to be read. This information will be
> useful for userspace to understand the configuration of the TD, and set
> KVM's copy via KVM_SET_CPUID2.
> 
> To prevent userspace from starting to use features that might not have KVM
> support yet, filter the reported values by KVM's support CPUID bits.

This patch lacks the documentation of KVM_TDX_GET_CPUID to describe what 
it returns and how the returned data is used or expected to be used by 
userspace.

Set aside the filtering of KVM's support CPUID bits, KVM_TDX_GET_CPUID 
only returns the CPUID leaves that can be readable by TDX module (in 
tdx_mask_cpuid()). So what about the leaves that aren't readable? e.g., 
CPUID leaf 6,9,0xc,etc, and leaf 0x40000000 and 0x40000001.

Should userspace to intercept it as the leaves that aren't covered by 
KVM_TDX_GET_CPUID are totally controlled by userspace itself and it's 
userspace's responsibility to set a sane and valid value?

> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> uAPI breakout v1:
>   - New patch
> ---
>   arch/x86/include/uapi/asm/kvm.h |   1 +
>   arch/x86/kvm/vmx/tdx.c          | 131 ++++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/tdx.h          |   5 ++
>   arch/x86/kvm/vmx/tdx_arch.h     |   5 ++
>   arch/x86/kvm/vmx/tdx_errno.h    |   1 +
>   5 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index b4f12997052d..39636be5c891 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>   	KVM_TDX_CAPABILITIES = 0,
>   	KVM_TDX_INIT_VM,
>   	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_GET_CPUID,
>   
>   	KVM_TDX_CMD_NR_MAX,
>   };
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b2ed031ac0d6..fe2bbc2ced41 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params,
>   	return ret;
>   }
>   
> +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id,
> +				      u64 *data)
> +{
> +	u64 err;
> +
> +	err = tdh_mng_rd(tdx, field_id, data);
> +
> +	return err;
> +}
> +
> +#define TDX_MD_UNREADABLE_LEAF_MASK	GENMASK(30, 7)
> +#define TDX_MD_UNREADABLE_SUBLEAF_MASK	GENMASK(31, 7)
> +
> +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry)
> +{
> +	u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES;
> +	u64 ebx_eax, edx_ecx;
> +	u64 err = 0;
> +
> +	if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK ||
> +	    entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK)
> +		return -EINVAL;
> +
> +	/*
> +	 * bit 23:17, REVSERVED: reserved, must be 0;
> +	 * bit 16,    LEAF_31: leaf number bit 31;
> +	 * bit 15:9,  LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are
> +	 *                      implicitly 0;
> +	 * bit 8,     SUBLEAF_NA: sub-leaf not applicable flag;
> +	 * bit 7:1,   SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1,
> +	 *                         the SUBLEAF_6_0 is all-1.
> +	 *                         sub-leaf bits 31:7 are implicitly 0;
> +	 * bit 0,     ELEMENT_I: Element index within field;
> +	 */
> +	field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16;
> +	field_id |= (entry->function & 0x7f) << 9;
> +	if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)
> +		field_id |= (entry->index & 0x7f) << 1;
> +	else
> +		field_id |= 0x1fe;
> +
> +	err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax);
> +	if (err) //TODO check for specific errors
> +		goto err_out;
> +
> +	entry->eax &= (u32) ebx_eax;
> +	entry->ebx &= (u32) (ebx_eax >> 32);
> +
> +	field_id++;
> +	err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx);
> +	/*
> +	 * It's weird that reading edx_ecx fails while reading ebx_eax
> +	 * succeeded.
> +	 */
> +	if (WARN_ON_ONCE(err))
> +		goto err_out;
> +
> +	entry->ecx &= (u32) edx_ecx;
> +	entry->edx &= (u32) (edx_ecx >> 32);
> +	return 0;
> +
> +err_out:
> +	entry->eax = 0;
> +	entry->ebx = 0;
> +	entry->ecx = 0;
> +	entry->edx = 0;
> +
> +	return -EIO;
> +}
> +
>   static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
>   {
>   	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid)
>   	return r;
>   }
>   
> +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct kvm_cpuid2 __user *output, *td_cpuid;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm_cpuid2 *supported_cpuid;
> +	int r = 0, i, j = 0;
> +
> +	output = u64_to_user_ptr(cmd->data);
> +	td_cpuid = kzalloc(sizeof(*td_cpuid) +
> +			sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES,
> +			GFP_KERNEL);
> +	if (!td_cpuid)
> +		return -ENOMEM;
> +
> +	r = tdx_get_kvm_supported_cpuid(&supported_cpuid);
> +	if (r)
> +		goto out;
> +
> +	for (i = 0; i < supported_cpuid->nent; i++) {
> +		struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i];
> +		struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j];
> +
> +		*output_e = *supported;
> +
> +		/* Only allow values of bits that KVM's supports to be exposed */
> +		if (tdx_mask_cpuid(kvm_tdx, output_e))
> +			continue;
> +
> +		/*
> +		 * Work around missing support on old TDX modules, fetch
> +		 * guest maxpa from gfn_direct_bits.
> +		 */
> +		if (output_e->function == 0x80000008) {
> +			gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
> +			unsigned int g_maxpa = __ffs(gpa_bits) + 1;
> +
> +			output_e->eax &= ~0x00ff0000;
> +			output_e->eax |= g_maxpa << 16;
> +		}
> +
> +		j++;
> +	}
> +	td_cpuid->nent = j;
> +
> +	if (copy_to_user(output, td_cpuid, sizeof(*output))) {
> +		r = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(output->entries, td_cpuid->entries,
> +			 td_cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> +		r = -EFAULT;
> +
> +out:
> +	kfree(td_cpuid);
> +	kfree(supported_cpuid);
> +	return r;
> +}
> +
>   static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>   {
>   	struct msr_data apic_base_msr;
> @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
>   	case KVM_TDX_INIT_VCPU:
>   		ret = tdx_vcpu_init(vcpu, &cmd);
>   		break;
> +	case KVM_TDX_GET_CPUID:
> +		ret = tdx_vcpu_get_cpuid(vcpu, &cmd);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 8349b542836e..7eeb54fbcae1 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -25,6 +25,11 @@ struct kvm_tdx {
>   	bool finalized;
>   
>   	u64 tsc_offset;
> +
> +	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	atomic64_t nr_premapped;
> +
> +	struct kvm_cpuid2 *cpuid;
>   };
>   
>   struct vcpu_tdx {
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> index d2d7f9cab740..815e74408a34 100644
> --- a/arch/x86/kvm/vmx/tdx_arch.h
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -157,4 +157,9 @@ struct td_params {
>   
>   #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM	BIT_ULL(20)
>   
> +/*
> + * TD scope metadata field ID.
> + */
> +#define TD_MD_FIELD_ID_CPUID_VALUES		0x9410000300000000ULL
> +
>   #endif /* __KVM_X86_TDX_ARCH_H */
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index dc3fa2a58c2c..f9dbb3a065cc 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -23,6 +23,7 @@
>   #define TDX_FLUSHVP_NOT_DONE			0x8000082400000000ULL
>   #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
>   #define TDX_EPT_ENTRY_STATE_INCORRECT		0xC0000B0D00000000ULL
> +#define TDX_METADATA_FIELD_NOT_READABLE		0xC0000C0200000000ULL
>   
>   /*
>    * TDX module operand ID, appears in 31:0 part of error code as

Edgecombe, Rick P Sept. 30, 2024, 4:22 p.m. UTC | #13

On Mon, 2024-09-30 at 14:26 +0800, Xiaoyao Li wrote:
> This patch lacks the documentation of KVM_TDX_GET_CPUID to describe what 
> it returns and how the returned data is used or expected to be used by 
> userspace.

Yes, we should add some docs.

> 
> Set aside the filtering of KVM's support CPUID bits, KVM_TDX_GET_CPUID 
> only returns the CPUID leaves that can be readable by TDX module (in 
> tdx_mask_cpuid()). So what about the leaves that aren't readable? e.g., 
> CPUID leaf 6,9,0xc,etc, and leaf 0x40000000 and 0x40000001.

Hmm. The purpose of this is to IOCTL is to read the values that the TDX module
knows about so it can set the values KVM knows about. So I think we should just
let it read straight from the TDX module.

> 
> Should userspace to intercept it as the leaves that aren't covered by 
> KVM_TDX_GET_CPUID are totally controlled by userspace itself and it's 
> userspace's responsibility to set a sane and valid value?

Not sure what you mean by "userspace to intercept it". But letting userspace be
responsible to "set a sane and valid value" seems consistent with general KVM
philosophy, and how we are planning to handle the issues around the TDX fixed
bits.

Can you elaborate? And on the consequences from QEMU's side?

[21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID

Commit Message

Comments

Patch