diff mbox series

[v3,10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory

Message ID 6230ef28be8c360ab326c8f592acf1964ac065c1.1649219184.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series TDX host kernel support | expand

Commit Message

Huang, Kai April 6, 2022, 4:49 a.m. UTC
TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums.  Not all memory
satisfies these requirements.

As a result, TDX introduced the concept of a "Convertible Memory Region"
(CMR).  During boot, the firmware builds a list of all of the memory
ranges which can provide the TDX security guarantees.  The list of these
ranges, along with TDX module information, is available to the kernel by
querying the TDX module.

In order to provide crypto protection to TD guests, the TDX architecture
also needs additional metadata to record things like which TD guest
"owns" a given page of memory.  This metadata essentially serves as the
'struct page' for the TDX module.  The space for this metadata is not
reserved by the hardware upfront and must be allocated by the kernel
and given to the TDX module.

Since this metadata consumes space, the VMM can choose whether or not to
allocate it for a given area of convertible memory.  If it chooses not
to, the memory cannot receive TDX protections and can not be used by TDX
guests as private memory.

For every memory region that the VMM wants to use as TDX memory, it sets
up a "TD Memory Region" (TDMR).  Each TDMR represents a physically
contiguous convertible range and must also have its own physically
contiguous metadata table, referred to as a Physical Address Metadata
Table (PAMT), to track status for each page in the TDMR range.

Unlike a CMR, each TDMR requires 1G granularity and alignment.  To
support physical RAM areas that don't meet those strict requirements,
each TDMR permits a number of internal "reserved areas" which can be
placed over memory holes.  If PAMT metadata is placed within a TDMR it
must be covered by one of these reserved areas.

Let's summarize the concepts:

 CMR - Firmware-enumerated physical ranges that support TDX.  CMRs are
       4K aligned.
TDMR - Physical address range which is chosen by the kernel to support
       TDX.  1G granularity and alignment required.  Each TDMR has
       reserved areas where TDX memory holes and overlapping PAMTs can
       be put into.
PAMT - Physically contiguous TDX metadata.  One table for each page size
       per TDMR.  Roughly 1/256th of TDMR in size.  256G TDMR = ~1G
       PAMT.

As one step of initializing the TDX module, the memory regions that TDX
module can use must be configured to the TDX module via an array of
TDMRs.

Constructing TDMRs to build the TDX memory consists below steps:

1) Create TDMRs to cover all memory regions that TDX module can use;
2) Allocate and set up PAMT for each TDMR;
3) Set up reserved areas for each TDMR.

Add a placeholder right after getting TDX module and CMRs information to
construct TDMRs to do the above steps, as the preparation to configure
the TDX module.  Always free TDMRs at the end of the initialization (no
matter successful or not), as TDMRs are only used during the
initialization.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 47 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++++++++++++
 2 files changed, 70 insertions(+)

Comments

Isaku Yamahata April 20, 2022, 8:48 p.m. UTC | #1
> Subject: Re: [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory

Nitpick: coveret => convert

Thanks,

On Wed, Apr 06, 2022 at 04:49:22PM +1200,
Kai Huang <kai.huang@intel.com> wrote:

> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums.  Not all memory
> satisfies these requirements.
> 
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR).  During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.  The list of these
> ranges, along with TDX module information, is available to the kernel by
> querying the TDX module.
> 
> In order to provide crypto protection to TD guests, the TDX architecture
> also needs additional metadata to record things like which TD guest
> "owns" a given page of memory.  This metadata essentially serves as the
> 'struct page' for the TDX module.  The space for this metadata is not
> reserved by the hardware upfront and must be allocated by the kernel
> and given to the TDX module.
> 
> Since this metadata consumes space, the VMM can choose whether or not to
> allocate it for a given area of convertible memory.  If it chooses not
> to, the memory cannot receive TDX protections and can not be used by TDX
> guests as private memory.
> 
> For every memory region that the VMM wants to use as TDX memory, it sets
> up a "TD Memory Region" (TDMR).  Each TDMR represents a physically
> contiguous convertible range and must also have its own physically
> contiguous metadata table, referred to as a Physical Address Metadata
> Table (PAMT), to track status for each page in the TDMR range.
> 
> Unlike a CMR, each TDMR requires 1G granularity and alignment.  To
> support physical RAM areas that don't meet those strict requirements,
> each TDMR permits a number of internal "reserved areas" which can be
> placed over memory holes.  If PAMT metadata is placed within a TDMR it
> must be covered by one of these reserved areas.
> 
> Let's summarize the concepts:
> 
>  CMR - Firmware-enumerated physical ranges that support TDX.  CMRs are
>        4K aligned.
> TDMR - Physical address range which is chosen by the kernel to support
>        TDX.  1G granularity and alignment required.  Each TDMR has
>        reserved areas where TDX memory holes and overlapping PAMTs can
>        be put into.
> PAMT - Physically contiguous TDX metadata.  One table for each page size
>        per TDMR.  Roughly 1/256th of TDMR in size.  256G TDMR = ~1G
>        PAMT.
> 
> As one step of initializing the TDX module, the memory regions that TDX
> module can use must be configured to the TDX module via an array of
> TDMRs.
> 
> Constructing TDMRs to build the TDX memory consists below steps:
> 
> 1) Create TDMRs to cover all memory regions that TDX module can use;
> 2) Allocate and set up PAMT for each TDMR;
> 3) Set up reserved areas for each TDMR.
> 
> Add a placeholder right after getting TDX module and CMRs information to
> construct TDMRs to do the above steps, as the preparation to configure
> the TDX module.  Always free TDMRs at the end of the initialization (no
> matter successful or not), as TDMRs are only used during the
> initialization.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 47 +++++++++++++++++++++++++++++++++++++
>  arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++++++++++++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 482e6d858181..ec27350d53c1 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -13,6 +13,7 @@
>  #include <linux/cpu.h>
>  #include <linux/smp.h>
>  #include <linux/atomic.h>
> +#include <linux/slab.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/cpufeature.h>
> @@ -594,8 +595,29 @@ static int tdx_get_sysinfo(void)
>  	return sanitize_cmrs(tdx_cmr_array, cmr_num);
>  }
>  
> +static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num)
> +{
> +	int i;
> +
> +	for (i = 0; i < tdmr_num; i++) {
> +		struct tdmr_info *tdmr = tdmr_array[i];
> +
> +		/* kfree() works with NULL */
> +		kfree(tdmr);
> +		tdmr_array[i] = NULL;
> +	}
> +}
> +
> +static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num)
> +{
> +	/* Return -EFAULT until constructing TDMRs is done */
> +	return -EFAULT;
> +}
> +
>  static int init_tdx_module(void)
>  {
> +	struct tdmr_info **tdmr_array;
> +	int tdmr_num;
>  	int ret;
>  
>  	/* TDX module global initialization */
> @@ -613,11 +635,36 @@ static int init_tdx_module(void)
>  	if (ret)
>  		goto out;
>  
> +	/*
> +	 * Prepare enough space to hold pointers of TDMRs (TDMR_INFO).
> +	 * TDX requires TDMR_INFO being 512 aligned.  Each TDMR is
> +	 * allocated individually within construct_tdmrs() to meet
> +	 * this requirement.
> +	 */
> +	tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *),
> +			GFP_KERNEL);
> +	if (!tdmr_array) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	/* Construct TDMRs to build TDX memory */
> +	ret = construct_tdmrs(tdmr_array, &tdmr_num);
> +	if (ret)
> +		goto out_free_tdmrs;
> +
>  	/*
>  	 * Return -EFAULT until all steps of TDX module
>  	 * initialization are done.
>  	 */
>  	ret = -EFAULT;
> +out_free_tdmrs:
> +	/*
> +	 * TDMRs are only used during initializing TDX module.  Always
> +	 * free them no matter the initialization was successful or not.
> +	 */
> +	free_tdmrs(tdmr_array, tdmr_num);
> +	kfree(tdmr_array);
>  out:
>  	return ret;
>  }
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 2f21c45df6ac..05bf9fe6bd00 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -89,6 +89,29 @@ struct tdsysinfo_struct {
>  	};
>  } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
>  
> +struct tdmr_reserved_area {
> +	u64 offset;
> +	u64 size;
> +} __packed;
> +
> +#define TDMR_INFO_ALIGNMENT	512
> +
> +struct tdmr_info {
> +	u64 base;
> +	u64 size;
> +	u64 pamt_1g_base;
> +	u64 pamt_1g_size;
> +	u64 pamt_2m_base;
> +	u64 pamt_2m_size;
> +	u64 pamt_4k_base;
> +	u64 pamt_4k_size;
> +	/*
> +	 * Actual number of reserved areas depends on
> +	 * 'struct tdsysinfo_struct'::max_reserved_per_tdmr.
> +	 */
> +	struct tdmr_reserved_area reserved_areas[0];
> +} __packed __aligned(TDMR_INFO_ALIGNMENT);
> +
>  /*
>   * P-SEAMLDR SEAMCALL leaf function
>   */
> -- 
> 2.35.1
>
Huang, Kai April 20, 2022, 10:38 p.m. UTC | #2
On Wed, 2022-04-20 at 13:48 -0700, Isaku Yamahata wrote:
> > Subject: Re: [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all
> > system RAM as TDX memory
> 
> Nitpick: coveret => convert
> 
> Thanks,

Thanks!
Dave Hansen April 27, 2022, 10:24 p.m. UTC | #3
On 4/5/22 21:49, Kai Huang wrote:
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums.  Not all memory
> satisfies these requirements.
> 
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR).  During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.  The list of these
> ranges, along with TDX module information, is available to the kernel by
> querying the TDX module.
> 
> In order to provide crypto protection to TD guests, the TDX architecture

There's that "crypto protection" thing again.  I'm not really a fan of
the changes made to this changelog since I wrote it. :)

> also needs additional metadata to record things like which TD guest
> "owns" a given page of memory.  This metadata essentially serves as the
> 'struct page' for the TDX module.  The space for this metadata is not
> reserved by the hardware upfront and must be allocated by the kernel

			    ^ "up front"

...
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 482e6d858181..ec27350d53c1 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -13,6 +13,7 @@
>  #include <linux/cpu.h>
>  #include <linux/smp.h>
>  #include <linux/atomic.h>
> +#include <linux/slab.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/cpufeature.h>
> @@ -594,8 +595,29 @@ static int tdx_get_sysinfo(void)
>  	return sanitize_cmrs(tdx_cmr_array, cmr_num);
>  }
>  
> +static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num)
> +{
> +	int i;
> +
> +	for (i = 0; i < tdmr_num; i++) {
> +		struct tdmr_info *tdmr = tdmr_array[i];
> +
> +		/* kfree() works with NULL */
> +		kfree(tdmr);
> +		tdmr_array[i] = NULL;
> +	}
> +}
> +
> +static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num)
> +{
> +	/* Return -EFAULT until constructing TDMRs is done */
> +	return -EFAULT;
> +}
> +
>  static int init_tdx_module(void)
>  {
> +	struct tdmr_info **tdmr_array;
> +	int tdmr_num;
>  	int ret;
>  
>  	/* TDX module global initialization */
> @@ -613,11 +635,36 @@ static int init_tdx_module(void)
>  	if (ret)
>  		goto out;
>  
> +	/*
> +	 * Prepare enough space to hold pointers of TDMRs (TDMR_INFO).
> +	 * TDX requires TDMR_INFO being 512 aligned.  Each TDMR is

					 ^ "512-byte aligned"

Right?

> +	 * allocated individually within construct_tdmrs() to meet
> +	 * this requirement.
> +	 */
> +	tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *),
> +			GFP_KERNEL);

Where, exactly is that alignment provided?  A 'struct tdmr_info *' is 8
bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee
64-byte alignment.

Also, I'm surprised that this is an array of virtual address pointers.
The previous interactions with the TDX module seemed to all take
physical addresses.  How is it that this hardware structure which has
hardware alignment constraints is holding virtual addresses?

> +	if (!tdmr_array) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	/* Construct TDMRs to build TDX memory */
> +	ret = construct_tdmrs(tdmr_array, &tdmr_num);
> +	if (ret)
> +		goto out_free_tdmrs;
> +
>  	/*
>  	 * Return -EFAULT until all steps of TDX module
>  	 * initialization are done.
>  	 */
>  	ret = -EFAULT;

There's the -EFAULT again.  I'd replace these with a better error code.

> +out_free_tdmrs:
> +	/*
> +	 * TDMRs are only used during initializing TDX module.  Always
> +	 * free them no matter the initialization was successful or not.
> +	 */
> +	free_tdmrs(tdmr_array, tdmr_num);
> +	kfree(tdmr_array);
>  out:
>  	return ret;
>  }
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 2f21c45df6ac..05bf9fe6bd00 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -89,6 +89,29 @@ struct tdsysinfo_struct {
>  	};
>  } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
>  
> +struct tdmr_reserved_area {
> +	u64 offset;
> +	u64 size;
> +} __packed;
> +
> +#define TDMR_INFO_ALIGNMENT	512
> +
> +struct tdmr_info {
> +	u64 base;
> +	u64 size;
> +	u64 pamt_1g_base;
> +	u64 pamt_1g_size;
> +	u64 pamt_2m_base;
> +	u64 pamt_2m_size;
> +	u64 pamt_4k_base;
> +	u64 pamt_4k_size;
> +	/*
> +	 * Actual number of reserved areas depends on
> +	 * 'struct tdsysinfo_struct'::max_reserved_per_tdmr.
> +	 */
> +	struct tdmr_reserved_area reserved_areas[0];
> +} __packed __aligned(TDMR_INFO_ALIGNMENT);
> +
>  /*
>   * P-SEAMLDR SEAMCALL leaf function
>   */
Huang, Kai April 28, 2022, 12:53 a.m. UTC | #4
On Wed, 2022-04-27 at 15:24 -0700, Dave Hansen wrote:
> On 4/5/22 21:49, Kai Huang wrote:
> > TDX provides increased levels of memory confidentiality and integrity.
> > This requires special hardware support for features like memory
> > encryption and storage of memory integrity checksums.  Not all memory
> > satisfies these requirements.
> > 
> > As a result, TDX introduced the concept of a "Convertible Memory Region"
> > (CMR).  During boot, the firmware builds a list of all of the memory
> > ranges which can provide the TDX security guarantees.  The list of these
> > ranges, along with TDX module information, is available to the kernel by
> > querying the TDX module.
> > 
> > In order to provide crypto protection to TD guests, the TDX architecture
> 
> There's that "crypto protection" thing again.  I'm not really a fan of
> the changes made to this changelog since I wrote it. :)

Sorry about that.  I'll remove "In order to provide crypto protection to TD
guests".

> 
> > also needs additional metadata to record things like which TD guest
> > "owns" a given page of memory.  This metadata essentially serves as the
> > 'struct page' for the TDX module.  The space for this metadata is not
> > reserved by the hardware upfront and must be allocated by the kernel
> 
> 			    ^ "up front"

Thanks will change to "up front".

Btw, the gmail grammar check gives me a red line if I use "up front", but it
doesn't complain "upfront".

> 
> ...
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 482e6d858181..ec27350d53c1 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -13,6 +13,7 @@
> >  #include <linux/cpu.h>
> >  #include <linux/smp.h>
> >  #include <linux/atomic.h>
> > +#include <linux/slab.h>
> >  #include <asm/msr-index.h>
> >  #include <asm/msr.h>
> >  #include <asm/cpufeature.h>
> > @@ -594,8 +595,29 @@ static int tdx_get_sysinfo(void)
> >  	return sanitize_cmrs(tdx_cmr_array, cmr_num);
> >  }
> >  
> > +static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < tdmr_num; i++) {
> > +		struct tdmr_info *tdmr = tdmr_array[i];
> > +
> > +		/* kfree() works with NULL */
> > +		kfree(tdmr);
> > +		tdmr_array[i] = NULL;
> > +	}
> > +}
> > +
> > +static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num)
> > +{
> > +	/* Return -EFAULT until constructing TDMRs is done */
> > +	return -EFAULT;
> > +}
> > +
> >  static int init_tdx_module(void)
> >  {
> > +	struct tdmr_info **tdmr_array;
> > +	int tdmr_num;
> >  	int ret;
> >  
> >  	/* TDX module global initialization */
> > @@ -613,11 +635,36 @@ static int init_tdx_module(void)
> >  	if (ret)
> >  		goto out;
> >  
> > +	/*
> > +	 * Prepare enough space to hold pointers of TDMRs (TDMR_INFO).
> > +	 * TDX requires TDMR_INFO being 512 aligned.  Each TDMR is
> 
> 					 ^ "512-byte aligned"
> 
> Right?

Yes.  Will update.

> 
> > +	 * allocated individually within construct_tdmrs() to meet
> > +	 * this requirement.
> > +	 */
> > +	tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *),
> > +			GFP_KERNEL);
> 
> Where, exactly is that alignment provided?  A 'struct tdmr_info *' is 8
> bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee
> 64-byte alignment.

The entries in the array only contain a pointer to TDMR_INFO.  The actual
TDMR_INFO is allocated separately. The array itself is never used by TDX
hardware so it doesn't matter.  We just need to guarantee each TDMR_INFO is
512B-byte aligned.

> 
> Also, I'm surprised that this is an array of virtual address pointers.
> The previous interactions with the TDX module seemed to all take
> physical addresses.  How is it that this hardware structure which has
> hardware alignment constraints is holding virtual addresses?

In later patches when TDMRs are configured to the TDX module, the input will be
converted to physical address, and there will be another array which is used by
the TDX module hardware.  This array is used to by kernel only to construct
TDMRs.

> 
> > +	if (!tdmr_array) {
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> > +	/* Construct TDMRs to build TDX memory */
> > +	ret = construct_tdmrs(tdmr_array, &tdmr_num);
> > +	if (ret)
> > +		goto out_free_tdmrs;
> > +
> >  	/*
> >  	 * Return -EFAULT until all steps of TDX module
> >  	 * initialization are done.
> >  	 */
> >  	ret = -EFAULT;
> 
> There's the -EFAULT again.  I'd replace these with a better error code.

I couldn't think out a better error code.  -EINVAL looks doesn't suit.  -EAGAIN
also doesn't make sense for now since we always shutdown the TDX module in case
of any error so caller should never retry.  I think we need some error code to
tell "the job isn't done yet".  Perhaps -EBUSY?
Dave Hansen April 28, 2022, 1:07 a.m. UTC | #5
On 4/27/22 17:53, Kai Huang wrote:
> On Wed, 2022-04-27 at 15:24 -0700, Dave Hansen wrote:
>> On 4/5/22 21:49, Kai Huang wrote:
>>> TDX provides increased levels of memory confidentiality and integrity.
>>> This requires special hardware support for features like memory
>>> encryption and storage of memory integrity checksums.  Not all memory
>>> satisfies these requirements.
>>>
>>> As a result, TDX introduced the concept of a "Convertible Memory Region"
>>> (CMR).  During boot, the firmware builds a list of all of the memory
>>> ranges which can provide the TDX security guarantees.  The list of these
>>> ranges, along with TDX module information, is available to the kernel by
>>> querying the TDX module.
>>>
>>> In order to provide crypto protection to TD guests, the TDX architecture
>>
>> There's that "crypto protection" thing again.  I'm not really a fan of
>> the changes made to this changelog since I wrote it. :)
> 
> Sorry about that.  I'll remove "In order to provide crypto protection to TD
> guests".

Seriously, though.  I took the effort to write these changelogs for you.
 They were fine.  I'm not stoked about needing to proofread them again.

>>> also needs additional metadata to record things like which TD guest
>>> "owns" a given page of memory.  This metadata essentially serves as the
>>> 'struct page' for the TDX module.  The space for this metadata is not
>>> reserved by the hardware upfront and must be allocated by the kernel
>>
>> 			    ^ "up front"
> 
> Thanks will change to "up front".
> 
> Btw, the gmail grammar check gives me a red line if I use "up front", but it
> doesn't complain "upfront".

I'm pretty sure it's wrong.  "up front" is an adverb that applies to
"reserved".  "Upfront" is an adjective and not how you used it in that
sentence.

>>> +	 * allocated individually within construct_tdmrs() to meet
>>> +	 * this requirement.
>>> +	 */
>>> +	tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *),
>>> +			GFP_KERNEL);
>>
>> Where, exactly is that alignment provided?  A 'struct tdmr_info *' is 8
>> bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee
>> 64-byte alignment.
> 
> The entries in the array only contain a pointer to TDMR_INFO.  The actual
> TDMR_INFO is allocated separately. The array itself is never used by TDX
> hardware so it doesn't matter.  We just need to guarantee each TDMR_INFO is
> 512B-byte aligned.

The comment was clear as mud about this.  If you're going to talk about
alignment, then do it near the allocation that guarantees the alignment,
not in some other function near *ANOTHER* allocation.

Also, considering that you're about to go allocate potentially gigabytes
of physically contiguous memory, it seems laughable that you'd go to any
trouble at all to allocate an array of pointers here.  Why not just

	kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...);

Or, heck, just vmalloc() the dang thing.  Why even bother with the array
of pointers?


>>> +	if (!tdmr_array) {
>>> +		ret = -ENOMEM;
>>> +		goto out;
>>> +	}
>>> +
>>> +	/* Construct TDMRs to build TDX memory */
>>> +	ret = construct_tdmrs(tdmr_array, &tdmr_num);
>>> +	if (ret)
>>> +		goto out_free_tdmrs;
>>> +
>>>  	/*
>>>  	 * Return -EFAULT until all steps of TDX module
>>>  	 * initialization are done.
>>>  	 */
>>>  	ret = -EFAULT;
>>
>> There's the -EFAULT again.  I'd replace these with a better error code.
> 
> I couldn't think out a better error code.  -EINVAL looks doesn't suit.  -EAGAIN
> also doesn't make sense for now since we always shutdown the TDX module in case
> of any error so caller should never retry.  I think we need some error code to
> tell "the job isn't done yet".  Perhaps -EBUSY?

Is this going to retry if it sees -EFAULT or -EBUSY?
Huang, Kai April 28, 2022, 1:35 a.m. UTC | #6
On Wed, 2022-04-27 at 18:07 -0700, Dave Hansen wrote:
> On 4/27/22 17:53, Kai Huang wrote:
> > On Wed, 2022-04-27 at 15:24 -0700, Dave Hansen wrote:
> > > On 4/5/22 21:49, Kai Huang wrote:
> > > > TDX provides increased levels of memory confidentiality and integrity.
> > > > This requires special hardware support for features like memory
> > > > encryption and storage of memory integrity checksums.  Not all memory
> > > > satisfies these requirements.
> > > > 
> > > > As a result, TDX introduced the concept of a "Convertible Memory Region"
> > > > (CMR).  During boot, the firmware builds a list of all of the memory
> > > > ranges which can provide the TDX security guarantees.  The list of these
> > > > ranges, along with TDX module information, is available to the kernel by
> > > > querying the TDX module.
> > > > 
> > > > In order to provide crypto protection to TD guests, the TDX architecture
> > > 
> > > There's that "crypto protection" thing again.  I'm not really a fan of
> > > the changes made to this changelog since I wrote it. :)
> > 
> > Sorry about that.  I'll remove "In order to provide crypto protection to TD
> > guests".
> 
> Seriously, though.  I took the effort to write these changelogs for you.
>  They were fine.  I'm not stoked about needing to proofread them again.

Yeah pretty clear to me now. Really thanks for your time.

Won't happen again.  If there's something I feel not right, I'll raise but not
slightly change.

> 
> > > > also needs additional metadata to record things like which TD guest
> > > > "owns" a given page of memory.  This metadata essentially serves as the
> > > > 'struct page' for the TDX module.  The space for this metadata is not
> > > > reserved by the hardware upfront and must be allocated by the kernel
> > > 
> > > 			    ^ "up front"
> > 
> > Thanks will change to "up front".
> > 
> > Btw, the gmail grammar check gives me a red line if I use "up front", but it
> > doesn't complain "upfront".
> 
> I'm pretty sure it's wrong.  "up front" is an adverb that applies to
> "reserved".  "Upfront" is an adjective and not how you used it in that
> sentence.

Thanks for explaining.  Anyway the gmail grammar can have bug.

> 
> > > > +	 * allocated individually within construct_tdmrs() to meet
> > > > +	 * this requirement.
> > > > +	 */
> > > > +	tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *),
> > > > +			GFP_KERNEL);
> > > 
> > > Where, exactly is that alignment provided?  A 'struct tdmr_info *' is 8
> > > bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee
> > > 64-byte alignment.
> > 
> > The entries in the array only contain a pointer to TDMR_INFO.  The actual
> > TDMR_INFO is allocated separately. The array itself is never used by TDX
> > hardware so it doesn't matter.  We just need to guarantee each TDMR_INFO is
> > 512B-byte aligned.
> 
> The comment was clear as mud about this.  If you're going to talk about
> alignment, then do it near the allocation that guarantees the alignment,
> not in some other function near *ANOTHER* allocation.
> 
> Also, considering that you're about to go allocate potentially gigabytes
> of physically contiguous memory, it seems laughable that you'd go to any
> trouble at all to allocate an array of pointers here.  Why not just
> 
> 	kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...);

kmalloc() guarantees the size-alignment if the size is power-of-two.  TDMR_INFO
(512-bytes) itself is  power of two, but the 'max_tdmrs x sizeof(TDMR_INFO)' may
not be power of two.  For instance, when max_tdmrs == 3, the result is not
power-of-two.

Or am I wrong? I am not good at math though.

> 
> Or, heck, just vmalloc() the dang thing.  Why even bother with the array
> of pointers?
> 
> 
> > > > +	if (!tdmr_array) {
> > > > +		ret = -ENOMEM;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	/* Construct TDMRs to build TDX memory */
> > > > +	ret = construct_tdmrs(tdmr_array, &tdmr_num);
> > > > +	if (ret)
> > > > +		goto out_free_tdmrs;
> > > > +
> > > >  	/*
> > > >  	 * Return -EFAULT until all steps of TDX module
> > > >  	 * initialization are done.
> > > >  	 */
> > > >  	ret = -EFAULT;
> > > 
> > > There's the -EFAULT again.  I'd replace these with a better error code.
> > 
> > I couldn't think out a better error code.  -EINVAL looks doesn't suit.  -EAGAIN
> > also doesn't make sense for now since we always shutdown the TDX module in case
> > of any error so caller should never retry.  I think we need some error code to
> > tell "the job isn't done yet".  Perhaps -EBUSY?
> 
> Is this going to retry if it sees -EFAULT or -EBUSY?

No.  Currently we always shutdown the module in case of any error.  Caller won't
be able to retry.

In the future, this can be optimized.  We don't shutdown the module in case of
*some* error (i.e. -ENOMEM), but record an internal state when error happened,
so the caller can retry again.  For now, there's no retry.
Dave Hansen April 28, 2022, 3:40 a.m. UTC | #7
On 4/27/22 18:35, Kai Huang wrote:
> On Wed, 2022-04-27 at 18:07 -0700, Dave Hansen wrote:
>> Also, considering that you're about to go allocate potentially gigabytes
>> of physically contiguous memory, it seems laughable that you'd go to any
>> trouble at all to allocate an array of pointers here.  Why not just
>>
>> 	kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...);
> 
> kmalloc() guarantees the size-alignment if the size is power-of-two.  TDMR_INFO
> (512-bytes) itself is  power of two, but the 'max_tdmrs x sizeof(TDMR_INFO)' may
> not be power of two.  For instance, when max_tdmrs == 3, the result is not
> power-of-two.
> 
> Or am I wrong? I am not good at math though.

No, you're right, the kcalloc() wouldn't work for odd sizes.

But, the point is still that you don't need an array of pointers.  Use
vmalloc().  Use a plain old alloc_pages_exact().  Why bother wasting
the memory and addiong the complexity of an array of pointers?

>> Or, heck, just vmalloc() the dang thing.  Why even bother with the array
>> of pointers?
>>
>>
>>>>> +	if (!tdmr_array) {
>>>>> +		ret = -ENOMEM;
>>>>> +		goto out;
>>>>> +	}
>>>>> +
>>>>> +	/* Construct TDMRs to build TDX memory */
>>>>> +	ret = construct_tdmrs(tdmr_array, &tdmr_num);
>>>>> +	if (ret)
>>>>> +		goto out_free_tdmrs;
>>>>> +
>>>>>  	/*
>>>>>  	 * Return -EFAULT until all steps of TDX module
>>>>>  	 * initialization are done.
>>>>>  	 */
>>>>>  	ret = -EFAULT;
>>>>
>>>> There's the -EFAULT again.  I'd replace these with a better error code.
>>>
>>> I couldn't think out a better error code.  -EINVAL looks doesn't suit.  -EAGAIN
>>> also doesn't make sense for now since we always shutdown the TDX module in case
>>> of any error so caller should never retry.  I think we need some error code to
>>> tell "the job isn't done yet".  Perhaps -EBUSY?
>>
>> Is this going to retry if it sees -EFAULT or -EBUSY?
> 
> No.  Currently we always shutdown the module in case of any error.  Caller won't
> be able to retry.
> 
> In the future, this can be optimized.  We don't shutdown the module in case of
> *some* error (i.e. -ENOMEM), but record an internal state when error happened,
> so the caller can retry again.  For now, there's no retry.

Just make the error codes -EINVAL, please.  I don't think anything else
makes sense.
Huang, Kai April 28, 2022, 3:55 a.m. UTC | #8
On Wed, 2022-04-27 at 20:40 -0700, Dave Hansen wrote:
> On 4/27/22 18:35, Kai Huang wrote:
> > On Wed, 2022-04-27 at 18:07 -0700, Dave Hansen wrote:
> > > Also, considering that you're about to go allocate potentially gigabytes
> > > of physically contiguous memory, it seems laughable that you'd go to any
> > > trouble at all to allocate an array of pointers here.  Why not just
> > > 
> > > 	kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...);
> > 
> > kmalloc() guarantees the size-alignment if the size is power-of-two.  TDMR_INFO
> > (512-bytes) itself is  power of two, but the 'max_tdmrs x sizeof(TDMR_INFO)' may
> > not be power of two.  For instance, when max_tdmrs == 3, the result is not
> > power-of-two.
> > 
> > Or am I wrong? I am not good at math though.
> 
> No, you're right, the kcalloc() wouldn't work for odd sizes.
> 
> But, the point is still that you don't need an array of pointers.  Use
> vmalloc().  Use a plain old alloc_pages_exact().  Why bother wasting
> the memory and addiong the complexity of an array of pointers?

OK.  This makes sense.

One thing I didn't say clearly is TDMR_INFO is 512-byte aligned, but not could
be larger than 512 bytes, and the maximum number of reserved areas in TDMR_INFO
is enumerated via TDSYSINFO_STRUCT.  We can always roundup TDMR_INFO size to be
512-byte aligned, and calculate enough pages to hold maximum number of
TDMR_INFO.  In this case, we can still guarantee each TDMR_INFO is 512-byte
aligned.

I'll change to use alloc_pages_exact(), since we can get physical address of
TDMR_INFO from it easily.

> 
> > > Or, heck, just vmalloc() the dang thing.  Why even bother with the array
> > > of pointers?
> > > 
> > > 
> > > > > > +	if (!tdmr_array) {
> > > > > > +		ret = -ENOMEM;
> > > > > > +		goto out;
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Construct TDMRs to build TDX memory */
> > > > > > +	ret = construct_tdmrs(tdmr_array, &tdmr_num);
> > > > > > +	if (ret)
> > > > > > +		goto out_free_tdmrs;
> > > > > > +
> > > > > >  	/*
> > > > > >  	 * Return -EFAULT until all steps of TDX module
> > > > > >  	 * initialization are done.
> > > > > >  	 */
> > > > > >  	ret = -EFAULT;
> > > > > 
> > > > > There's the -EFAULT again.  I'd replace these with a better error code.
> > > > 
> > > > I couldn't think out a better error code.  -EINVAL looks doesn't suit.  -EAGAIN
> > > > also doesn't make sense for now since we always shutdown the TDX module in case
> > > > of any error so caller should never retry.  I think we need some error code to
> > > > tell "the job isn't done yet".  Perhaps -EBUSY?
> > > 
> > > Is this going to retry if it sees -EFAULT or -EBUSY?
> > 
> > No.  Currently we always shutdown the module in case of any error.  Caller won't
> > be able to retry.
> > 
> > In the future, this can be optimized.  We don't shutdown the module in case of
> > *some* error (i.e. -ENOMEM), but record an internal state when error happened,
> > so the caller can retry again.  For now, there's no retry.
> 
> Just make the error codes -EINVAL, please.  I don't think anything else
> makes sense.
> 

OK will do.
diff mbox series

Patch

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 482e6d858181..ec27350d53c1 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -13,6 +13,7 @@ 
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/atomic.h>
+#include <linux/slab.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/cpufeature.h>
@@ -594,8 +595,29 @@  static int tdx_get_sysinfo(void)
 	return sanitize_cmrs(tdx_cmr_array, cmr_num);
 }
 
+static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num)
+{
+	int i;
+
+	for (i = 0; i < tdmr_num; i++) {
+		struct tdmr_info *tdmr = tdmr_array[i];
+
+		/* kfree() works with NULL */
+		kfree(tdmr);
+		tdmr_array[i] = NULL;
+	}
+}
+
+static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num)
+{
+	/* Return -EFAULT until constructing TDMRs is done */
+	return -EFAULT;
+}
+
 static int init_tdx_module(void)
 {
+	struct tdmr_info **tdmr_array;
+	int tdmr_num;
 	int ret;
 
 	/* TDX module global initialization */
@@ -613,11 +635,36 @@  static int init_tdx_module(void)
 	if (ret)
 		goto out;
 
+	/*
+	 * Prepare enough space to hold pointers of TDMRs (TDMR_INFO).
+	 * TDX requires TDMR_INFO being 512 aligned.  Each TDMR is
+	 * allocated individually within construct_tdmrs() to meet
+	 * this requirement.
+	 */
+	tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *),
+			GFP_KERNEL);
+	if (!tdmr_array) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Construct TDMRs to build TDX memory */
+	ret = construct_tdmrs(tdmr_array, &tdmr_num);
+	if (ret)
+		goto out_free_tdmrs;
+
 	/*
 	 * Return -EFAULT until all steps of TDX module
 	 * initialization are done.
 	 */
 	ret = -EFAULT;
+out_free_tdmrs:
+	/*
+	 * TDMRs are only used during initializing TDX module.  Always
+	 * free them no matter the initialization was successful or not.
+	 */
+	free_tdmrs(tdmr_array, tdmr_num);
+	kfree(tdmr_array);
 out:
 	return ret;
 }
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 2f21c45df6ac..05bf9fe6bd00 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -89,6 +89,29 @@  struct tdsysinfo_struct {
 	};
 } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
 
+struct tdmr_reserved_area {
+	u64 offset;
+	u64 size;
+} __packed;
+
+#define TDMR_INFO_ALIGNMENT	512
+
+struct tdmr_info {
+	u64 base;
+	u64 size;
+	u64 pamt_1g_base;
+	u64 pamt_1g_size;
+	u64 pamt_2m_base;
+	u64 pamt_2m_size;
+	u64 pamt_4k_base;
+	u64 pamt_4k_size;
+	/*
+	 * Actual number of reserved areas depends on
+	 * 'struct tdsysinfo_struct'::max_reserved_per_tdmr.
+	 */
+	struct tdmr_reserved_area reserved_areas[0];
+} __packed __aligned(TDMR_INFO_ALIGNMENT);
+
 /*
  * P-SEAMLDR SEAMCALL leaf function
  */