Message ID | 6230ef28be8c360ab326c8f592acf1964ac065c1.1649219184.git.kai.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDX host kernel support | expand |
> Subject: Re: [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory Nitpick: coveret => convert Thanks, On Wed, Apr 06, 2022 at 04:49:22PM +1200, Kai Huang <kai.huang@intel.com> wrote: > TDX provides increased levels of memory confidentiality and integrity. > This requires special hardware support for features like memory > encryption and storage of memory integrity checksums. Not all memory > satisfies these requirements. > > As a result, TDX introduced the concept of a "Convertible Memory Region" > (CMR). During boot, the firmware builds a list of all of the memory > ranges which can provide the TDX security guarantees. The list of these > ranges, along with TDX module information, is available to the kernel by > querying the TDX module. > > In order to provide crypto protection to TD guests, the TDX architecture > also needs additional metadata to record things like which TD guest > "owns" a given page of memory. This metadata essentially serves as the > 'struct page' for the TDX module. The space for this metadata is not > reserved by the hardware upfront and must be allocated by the kernel > and given to the TDX module. > > Since this metadata consumes space, the VMM can choose whether or not to > allocate it for a given area of convertible memory. If it chooses not > to, the memory cannot receive TDX protections and can not be used by TDX > guests as private memory. > > For every memory region that the VMM wants to use as TDX memory, it sets > up a "TD Memory Region" (TDMR). Each TDMR represents a physically > contiguous convertible range and must also have its own physically > contiguous metadata table, referred to as a Physical Address Metadata > Table (PAMT), to track status for each page in the TDMR range. > > Unlike a CMR, each TDMR requires 1G granularity and alignment. To > support physical RAM areas that don't meet those strict requirements, > each TDMR permits a number of internal "reserved areas" which can be > placed over memory holes. If PAMT metadata is placed within a TDMR it > must be covered by one of these reserved areas. > > Let's summarize the concepts: > > CMR - Firmware-enumerated physical ranges that support TDX. CMRs are > 4K aligned. > TDMR - Physical address range which is chosen by the kernel to support > TDX. 1G granularity and alignment required. Each TDMR has > reserved areas where TDX memory holes and overlapping PAMTs can > be put into. > PAMT - Physically contiguous TDX metadata. One table for each page size > per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G > PAMT. > > As one step of initializing the TDX module, the memory regions that TDX > module can use must be configured to the TDX module via an array of > TDMRs. > > Constructing TDMRs to build the TDX memory consists below steps: > > 1) Create TDMRs to cover all memory regions that TDX module can use; > 2) Allocate and set up PAMT for each TDMR; > 3) Set up reserved areas for each TDMR. > > Add a placeholder right after getting TDX module and CMRs information to > construct TDMRs to do the above steps, as the preparation to configure > the TDX module. Always free TDMRs at the end of the initialization (no > matter successful or not), as TDMRs are only used during the > initialization. > > Signed-off-by: Kai Huang <kai.huang@intel.com> > --- > arch/x86/virt/vmx/tdx/tdx.c | 47 +++++++++++++++++++++++++++++++++++++ > arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++++++++++++ > 2 files changed, 70 insertions(+) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 482e6d858181..ec27350d53c1 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -13,6 +13,7 @@ > #include <linux/cpu.h> > #include <linux/smp.h> > #include <linux/atomic.h> > +#include <linux/slab.h> > #include <asm/msr-index.h> > #include <asm/msr.h> > #include <asm/cpufeature.h> > @@ -594,8 +595,29 @@ static int tdx_get_sysinfo(void) > return sanitize_cmrs(tdx_cmr_array, cmr_num); > } > > +static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num) > +{ > + int i; > + > + for (i = 0; i < tdmr_num; i++) { > + struct tdmr_info *tdmr = tdmr_array[i]; > + > + /* kfree() works with NULL */ > + kfree(tdmr); > + tdmr_array[i] = NULL; > + } > +} > + > +static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num) > +{ > + /* Return -EFAULT until constructing TDMRs is done */ > + return -EFAULT; > +} > + > static int init_tdx_module(void) > { > + struct tdmr_info **tdmr_array; > + int tdmr_num; > int ret; > > /* TDX module global initialization */ > @@ -613,11 +635,36 @@ static int init_tdx_module(void) > if (ret) > goto out; > > + /* > + * Prepare enough space to hold pointers of TDMRs (TDMR_INFO). > + * TDX requires TDMR_INFO being 512 aligned. Each TDMR is > + * allocated individually within construct_tdmrs() to meet > + * this requirement. > + */ > + tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *), > + GFP_KERNEL); > + if (!tdmr_array) { > + ret = -ENOMEM; > + goto out; > + } > + > + /* Construct TDMRs to build TDX memory */ > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > + if (ret) > + goto out_free_tdmrs; > + > /* > * Return -EFAULT until all steps of TDX module > * initialization are done. > */ > ret = -EFAULT; > +out_free_tdmrs: > + /* > + * TDMRs are only used during initializing TDX module. Always > + * free them no matter the initialization was successful or not. > + */ > + free_tdmrs(tdmr_array, tdmr_num); > + kfree(tdmr_array); > out: > return ret; > } > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h > index 2f21c45df6ac..05bf9fe6bd00 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.h > +++ b/arch/x86/virt/vmx/tdx/tdx.h > @@ -89,6 +89,29 @@ struct tdsysinfo_struct { > }; > } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); > > +struct tdmr_reserved_area { > + u64 offset; > + u64 size; > +} __packed; > + > +#define TDMR_INFO_ALIGNMENT 512 > + > +struct tdmr_info { > + u64 base; > + u64 size; > + u64 pamt_1g_base; > + u64 pamt_1g_size; > + u64 pamt_2m_base; > + u64 pamt_2m_size; > + u64 pamt_4k_base; > + u64 pamt_4k_size; > + /* > + * Actual number of reserved areas depends on > + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. > + */ > + struct tdmr_reserved_area reserved_areas[0]; > +} __packed __aligned(TDMR_INFO_ALIGNMENT); > + > /* > * P-SEAMLDR SEAMCALL leaf function > */ > -- > 2.35.1 >
On Wed, 2022-04-20 at 13:48 -0700, Isaku Yamahata wrote: > > Subject: Re: [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all > > system RAM as TDX memory > > Nitpick: coveret => convert > > Thanks, Thanks!
On 4/5/22 21:49, Kai Huang wrote: > TDX provides increased levels of memory confidentiality and integrity. > This requires special hardware support for features like memory > encryption and storage of memory integrity checksums. Not all memory > satisfies these requirements. > > As a result, TDX introduced the concept of a "Convertible Memory Region" > (CMR). During boot, the firmware builds a list of all of the memory > ranges which can provide the TDX security guarantees. The list of these > ranges, along with TDX module information, is available to the kernel by > querying the TDX module. > > In order to provide crypto protection to TD guests, the TDX architecture There's that "crypto protection" thing again. I'm not really a fan of the changes made to this changelog since I wrote it. :) > also needs additional metadata to record things like which TD guest > "owns" a given page of memory. This metadata essentially serves as the > 'struct page' for the TDX module. The space for this metadata is not > reserved by the hardware upfront and must be allocated by the kernel ^ "up front" ... > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 482e6d858181..ec27350d53c1 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -13,6 +13,7 @@ > #include <linux/cpu.h> > #include <linux/smp.h> > #include <linux/atomic.h> > +#include <linux/slab.h> > #include <asm/msr-index.h> > #include <asm/msr.h> > #include <asm/cpufeature.h> > @@ -594,8 +595,29 @@ static int tdx_get_sysinfo(void) > return sanitize_cmrs(tdx_cmr_array, cmr_num); > } > > +static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num) > +{ > + int i; > + > + for (i = 0; i < tdmr_num; i++) { > + struct tdmr_info *tdmr = tdmr_array[i]; > + > + /* kfree() works with NULL */ > + kfree(tdmr); > + tdmr_array[i] = NULL; > + } > +} > + > +static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num) > +{ > + /* Return -EFAULT until constructing TDMRs is done */ > + return -EFAULT; > +} > + > static int init_tdx_module(void) > { > + struct tdmr_info **tdmr_array; > + int tdmr_num; > int ret; > > /* TDX module global initialization */ > @@ -613,11 +635,36 @@ static int init_tdx_module(void) > if (ret) > goto out; > > + /* > + * Prepare enough space to hold pointers of TDMRs (TDMR_INFO). > + * TDX requires TDMR_INFO being 512 aligned. Each TDMR is ^ "512-byte aligned" Right? > + * allocated individually within construct_tdmrs() to meet > + * this requirement. > + */ > + tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *), > + GFP_KERNEL); Where, exactly is that alignment provided? A 'struct tdmr_info *' is 8 bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee 64-byte alignment. Also, I'm surprised that this is an array of virtual address pointers. The previous interactions with the TDX module seemed to all take physical addresses. How is it that this hardware structure which has hardware alignment constraints is holding virtual addresses? > + if (!tdmr_array) { > + ret = -ENOMEM; > + goto out; > + } > + > + /* Construct TDMRs to build TDX memory */ > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > + if (ret) > + goto out_free_tdmrs; > + > /* > * Return -EFAULT until all steps of TDX module > * initialization are done. > */ > ret = -EFAULT; There's the -EFAULT again. I'd replace these with a better error code. > +out_free_tdmrs: > + /* > + * TDMRs are only used during initializing TDX module. Always > + * free them no matter the initialization was successful or not. > + */ > + free_tdmrs(tdmr_array, tdmr_num); > + kfree(tdmr_array); > out: > return ret; > } > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h > index 2f21c45df6ac..05bf9fe6bd00 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.h > +++ b/arch/x86/virt/vmx/tdx/tdx.h > @@ -89,6 +89,29 @@ struct tdsysinfo_struct { > }; > } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); > > +struct tdmr_reserved_area { > + u64 offset; > + u64 size; > +} __packed; > + > +#define TDMR_INFO_ALIGNMENT 512 > + > +struct tdmr_info { > + u64 base; > + u64 size; > + u64 pamt_1g_base; > + u64 pamt_1g_size; > + u64 pamt_2m_base; > + u64 pamt_2m_size; > + u64 pamt_4k_base; > + u64 pamt_4k_size; > + /* > + * Actual number of reserved areas depends on > + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. > + */ > + struct tdmr_reserved_area reserved_areas[0]; > +} __packed __aligned(TDMR_INFO_ALIGNMENT); > + > /* > * P-SEAMLDR SEAMCALL leaf function > */
On Wed, 2022-04-27 at 15:24 -0700, Dave Hansen wrote: > On 4/5/22 21:49, Kai Huang wrote: > > TDX provides increased levels of memory confidentiality and integrity. > > This requires special hardware support for features like memory > > encryption and storage of memory integrity checksums. Not all memory > > satisfies these requirements. > > > > As a result, TDX introduced the concept of a "Convertible Memory Region" > > (CMR). During boot, the firmware builds a list of all of the memory > > ranges which can provide the TDX security guarantees. The list of these > > ranges, along with TDX module information, is available to the kernel by > > querying the TDX module. > > > > In order to provide crypto protection to TD guests, the TDX architecture > > There's that "crypto protection" thing again. I'm not really a fan of > the changes made to this changelog since I wrote it. :) Sorry about that. I'll remove "In order to provide crypto protection to TD guests". > > > also needs additional metadata to record things like which TD guest > > "owns" a given page of memory. This metadata essentially serves as the > > 'struct page' for the TDX module. The space for this metadata is not > > reserved by the hardware upfront and must be allocated by the kernel > > ^ "up front" Thanks will change to "up front". Btw, the gmail grammar check gives me a red line if I use "up front", but it doesn't complain "upfront". > > ... > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index 482e6d858181..ec27350d53c1 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -13,6 +13,7 @@ > > #include <linux/cpu.h> > > #include <linux/smp.h> > > #include <linux/atomic.h> > > +#include <linux/slab.h> > > #include <asm/msr-index.h> > > #include <asm/msr.h> > > #include <asm/cpufeature.h> > > @@ -594,8 +595,29 @@ static int tdx_get_sysinfo(void) > > return sanitize_cmrs(tdx_cmr_array, cmr_num); > > } > > > > +static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num) > > +{ > > + int i; > > + > > + for (i = 0; i < tdmr_num; i++) { > > + struct tdmr_info *tdmr = tdmr_array[i]; > > + > > + /* kfree() works with NULL */ > > + kfree(tdmr); > > + tdmr_array[i] = NULL; > > + } > > +} > > + > > +static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num) > > +{ > > + /* Return -EFAULT until constructing TDMRs is done */ > > + return -EFAULT; > > +} > > + > > static int init_tdx_module(void) > > { > > + struct tdmr_info **tdmr_array; > > + int tdmr_num; > > int ret; > > > > /* TDX module global initialization */ > > @@ -613,11 +635,36 @@ static int init_tdx_module(void) > > if (ret) > > goto out; > > > > + /* > > + * Prepare enough space to hold pointers of TDMRs (TDMR_INFO). > > + * TDX requires TDMR_INFO being 512 aligned. Each TDMR is > > ^ "512-byte aligned" > > Right? Yes. Will update. > > > + * allocated individually within construct_tdmrs() to meet > > + * this requirement. > > + */ > > + tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *), > > + GFP_KERNEL); > > Where, exactly is that alignment provided? A 'struct tdmr_info *' is 8 > bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee > 64-byte alignment. The entries in the array only contain a pointer to TDMR_INFO. The actual TDMR_INFO is allocated separately. The array itself is never used by TDX hardware so it doesn't matter. We just need to guarantee each TDMR_INFO is 512B-byte aligned. > > Also, I'm surprised that this is an array of virtual address pointers. > The previous interactions with the TDX module seemed to all take > physical addresses. How is it that this hardware structure which has > hardware alignment constraints is holding virtual addresses? In later patches when TDMRs are configured to the TDX module, the input will be converted to physical address, and there will be another array which is used by the TDX module hardware. This array is used to by kernel only to construct TDMRs. > > > + if (!tdmr_array) { > > + ret = -ENOMEM; > > + goto out; > > + } > > + > > + /* Construct TDMRs to build TDX memory */ > > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > > + if (ret) > > + goto out_free_tdmrs; > > + > > /* > > * Return -EFAULT until all steps of TDX module > > * initialization are done. > > */ > > ret = -EFAULT; > > There's the -EFAULT again. I'd replace these with a better error code. I couldn't think out a better error code. -EINVAL looks doesn't suit. -EAGAIN also doesn't make sense for now since we always shutdown the TDX module in case of any error so caller should never retry. I think we need some error code to tell "the job isn't done yet". Perhaps -EBUSY?
On 4/27/22 17:53, Kai Huang wrote: > On Wed, 2022-04-27 at 15:24 -0700, Dave Hansen wrote: >> On 4/5/22 21:49, Kai Huang wrote: >>> TDX provides increased levels of memory confidentiality and integrity. >>> This requires special hardware support for features like memory >>> encryption and storage of memory integrity checksums. Not all memory >>> satisfies these requirements. >>> >>> As a result, TDX introduced the concept of a "Convertible Memory Region" >>> (CMR). During boot, the firmware builds a list of all of the memory >>> ranges which can provide the TDX security guarantees. The list of these >>> ranges, along with TDX module information, is available to the kernel by >>> querying the TDX module. >>> >>> In order to provide crypto protection to TD guests, the TDX architecture >> >> There's that "crypto protection" thing again. I'm not really a fan of >> the changes made to this changelog since I wrote it. :) > > Sorry about that. I'll remove "In order to provide crypto protection to TD > guests". Seriously, though. I took the effort to write these changelogs for you. They were fine. I'm not stoked about needing to proofread them again. >>> also needs additional metadata to record things like which TD guest >>> "owns" a given page of memory. This metadata essentially serves as the >>> 'struct page' for the TDX module. The space for this metadata is not >>> reserved by the hardware upfront and must be allocated by the kernel >> >> ^ "up front" > > Thanks will change to "up front". > > Btw, the gmail grammar check gives me a red line if I use "up front", but it > doesn't complain "upfront". I'm pretty sure it's wrong. "up front" is an adverb that applies to "reserved". "Upfront" is an adjective and not how you used it in that sentence. >>> + * allocated individually within construct_tdmrs() to meet >>> + * this requirement. >>> + */ >>> + tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *), >>> + GFP_KERNEL); >> >> Where, exactly is that alignment provided? A 'struct tdmr_info *' is 8 >> bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee >> 64-byte alignment. > > The entries in the array only contain a pointer to TDMR_INFO. The actual > TDMR_INFO is allocated separately. The array itself is never used by TDX > hardware so it doesn't matter. We just need to guarantee each TDMR_INFO is > 512B-byte aligned. The comment was clear as mud about this. If you're going to talk about alignment, then do it near the allocation that guarantees the alignment, not in some other function near *ANOTHER* allocation. Also, considering that you're about to go allocate potentially gigabytes of physically contiguous memory, it seems laughable that you'd go to any trouble at all to allocate an array of pointers here. Why not just kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...); Or, heck, just vmalloc() the dang thing. Why even bother with the array of pointers? >>> + if (!tdmr_array) { >>> + ret = -ENOMEM; >>> + goto out; >>> + } >>> + >>> + /* Construct TDMRs to build TDX memory */ >>> + ret = construct_tdmrs(tdmr_array, &tdmr_num); >>> + if (ret) >>> + goto out_free_tdmrs; >>> + >>> /* >>> * Return -EFAULT until all steps of TDX module >>> * initialization are done. >>> */ >>> ret = -EFAULT; >> >> There's the -EFAULT again. I'd replace these with a better error code. > > I couldn't think out a better error code. -EINVAL looks doesn't suit. -EAGAIN > also doesn't make sense for now since we always shutdown the TDX module in case > of any error so caller should never retry. I think we need some error code to > tell "the job isn't done yet". Perhaps -EBUSY? Is this going to retry if it sees -EFAULT or -EBUSY?
On Wed, 2022-04-27 at 18:07 -0700, Dave Hansen wrote: > On 4/27/22 17:53, Kai Huang wrote: > > On Wed, 2022-04-27 at 15:24 -0700, Dave Hansen wrote: > > > On 4/5/22 21:49, Kai Huang wrote: > > > > TDX provides increased levels of memory confidentiality and integrity. > > > > This requires special hardware support for features like memory > > > > encryption and storage of memory integrity checksums. Not all memory > > > > satisfies these requirements. > > > > > > > > As a result, TDX introduced the concept of a "Convertible Memory Region" > > > > (CMR). During boot, the firmware builds a list of all of the memory > > > > ranges which can provide the TDX security guarantees. The list of these > > > > ranges, along with TDX module information, is available to the kernel by > > > > querying the TDX module. > > > > > > > > In order to provide crypto protection to TD guests, the TDX architecture > > > > > > There's that "crypto protection" thing again. I'm not really a fan of > > > the changes made to this changelog since I wrote it. :) > > > > Sorry about that. I'll remove "In order to provide crypto protection to TD > > guests". > > Seriously, though. I took the effort to write these changelogs for you. > They were fine. I'm not stoked about needing to proofread them again. Yeah pretty clear to me now. Really thanks for your time. Won't happen again. If there's something I feel not right, I'll raise but not slightly change. > > > > > also needs additional metadata to record things like which TD guest > > > > "owns" a given page of memory. This metadata essentially serves as the > > > > 'struct page' for the TDX module. The space for this metadata is not > > > > reserved by the hardware upfront and must be allocated by the kernel > > > > > > ^ "up front" > > > > Thanks will change to "up front". > > > > Btw, the gmail grammar check gives me a red line if I use "up front", but it > > doesn't complain "upfront". > > I'm pretty sure it's wrong. "up front" is an adverb that applies to > "reserved". "Upfront" is an adjective and not how you used it in that > sentence. Thanks for explaining. Anyway the gmail grammar can have bug. > > > > > + * allocated individually within construct_tdmrs() to meet > > > > + * this requirement. > > > > + */ > > > > + tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *), > > > > + GFP_KERNEL); > > > > > > Where, exactly is that alignment provided? A 'struct tdmr_info *' is 8 > > > bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee > > > 64-byte alignment. > > > > The entries in the array only contain a pointer to TDMR_INFO. The actual > > TDMR_INFO is allocated separately. The array itself is never used by TDX > > hardware so it doesn't matter. We just need to guarantee each TDMR_INFO is > > 512B-byte aligned. > > The comment was clear as mud about this. If you're going to talk about > alignment, then do it near the allocation that guarantees the alignment, > not in some other function near *ANOTHER* allocation. > > Also, considering that you're about to go allocate potentially gigabytes > of physically contiguous memory, it seems laughable that you'd go to any > trouble at all to allocate an array of pointers here. Why not just > > kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...); kmalloc() guarantees the size-alignment if the size is power-of-two. TDMR_INFO (512-bytes) itself is power of two, but the 'max_tdmrs x sizeof(TDMR_INFO)' may not be power of two. For instance, when max_tdmrs == 3, the result is not power-of-two. Or am I wrong? I am not good at math though. > > Or, heck, just vmalloc() the dang thing. Why even bother with the array > of pointers? > > > > > > + if (!tdmr_array) { > > > > + ret = -ENOMEM; > > > > + goto out; > > > > + } > > > > + > > > > + /* Construct TDMRs to build TDX memory */ > > > > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > > > > + if (ret) > > > > + goto out_free_tdmrs; > > > > + > > > > /* > > > > * Return -EFAULT until all steps of TDX module > > > > * initialization are done. > > > > */ > > > > ret = -EFAULT; > > > > > > There's the -EFAULT again. I'd replace these with a better error code. > > > > I couldn't think out a better error code. -EINVAL looks doesn't suit. -EAGAIN > > also doesn't make sense for now since we always shutdown the TDX module in case > > of any error so caller should never retry. I think we need some error code to > > tell "the job isn't done yet". Perhaps -EBUSY? > > Is this going to retry if it sees -EFAULT or -EBUSY? No. Currently we always shutdown the module in case of any error. Caller won't be able to retry. In the future, this can be optimized. We don't shutdown the module in case of *some* error (i.e. -ENOMEM), but record an internal state when error happened, so the caller can retry again. For now, there's no retry.
On 4/27/22 18:35, Kai Huang wrote: > On Wed, 2022-04-27 at 18:07 -0700, Dave Hansen wrote: >> Also, considering that you're about to go allocate potentially gigabytes >> of physically contiguous memory, it seems laughable that you'd go to any >> trouble at all to allocate an array of pointers here. Why not just >> >> kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...); > > kmalloc() guarantees the size-alignment if the size is power-of-two. TDMR_INFO > (512-bytes) itself is power of two, but the 'max_tdmrs x sizeof(TDMR_INFO)' may > not be power of two. For instance, when max_tdmrs == 3, the result is not > power-of-two. > > Or am I wrong? I am not good at math though. No, you're right, the kcalloc() wouldn't work for odd sizes. But, the point is still that you don't need an array of pointers. Use vmalloc(). Use a plain old alloc_pages_exact(). Why bother wasting the memory and addiong the complexity of an array of pointers? >> Or, heck, just vmalloc() the dang thing. Why even bother with the array >> of pointers? >> >> >>>>> + if (!tdmr_array) { >>>>> + ret = -ENOMEM; >>>>> + goto out; >>>>> + } >>>>> + >>>>> + /* Construct TDMRs to build TDX memory */ >>>>> + ret = construct_tdmrs(tdmr_array, &tdmr_num); >>>>> + if (ret) >>>>> + goto out_free_tdmrs; >>>>> + >>>>> /* >>>>> * Return -EFAULT until all steps of TDX module >>>>> * initialization are done. >>>>> */ >>>>> ret = -EFAULT; >>>> >>>> There's the -EFAULT again. I'd replace these with a better error code. >>> >>> I couldn't think out a better error code. -EINVAL looks doesn't suit. -EAGAIN >>> also doesn't make sense for now since we always shutdown the TDX module in case >>> of any error so caller should never retry. I think we need some error code to >>> tell "the job isn't done yet". Perhaps -EBUSY? >> >> Is this going to retry if it sees -EFAULT or -EBUSY? > > No. Currently we always shutdown the module in case of any error. Caller won't > be able to retry. > > In the future, this can be optimized. We don't shutdown the module in case of > *some* error (i.e. -ENOMEM), but record an internal state when error happened, > so the caller can retry again. For now, there's no retry. Just make the error codes -EINVAL, please. I don't think anything else makes sense.
On Wed, 2022-04-27 at 20:40 -0700, Dave Hansen wrote: > On 4/27/22 18:35, Kai Huang wrote: > > On Wed, 2022-04-27 at 18:07 -0700, Dave Hansen wrote: > > > Also, considering that you're about to go allocate potentially gigabytes > > > of physically contiguous memory, it seems laughable that you'd go to any > > > trouble at all to allocate an array of pointers here. Why not just > > > > > > kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...); > > > > kmalloc() guarantees the size-alignment if the size is power-of-two. TDMR_INFO > > (512-bytes) itself is power of two, but the 'max_tdmrs x sizeof(TDMR_INFO)' may > > not be power of two. For instance, when max_tdmrs == 3, the result is not > > power-of-two. > > > > Or am I wrong? I am not good at math though. > > No, you're right, the kcalloc() wouldn't work for odd sizes. > > But, the point is still that you don't need an array of pointers. Use > vmalloc(). Use a plain old alloc_pages_exact(). Why bother wasting > the memory and addiong the complexity of an array of pointers? OK. This makes sense. One thing I didn't say clearly is TDMR_INFO is 512-byte aligned, but not could be larger than 512 bytes, and the maximum number of reserved areas in TDMR_INFO is enumerated via TDSYSINFO_STRUCT. We can always roundup TDMR_INFO size to be 512-byte aligned, and calculate enough pages to hold maximum number of TDMR_INFO. In this case, we can still guarantee each TDMR_INFO is 512-byte aligned. I'll change to use alloc_pages_exact(), since we can get physical address of TDMR_INFO from it easily. > > > > Or, heck, just vmalloc() the dang thing. Why even bother with the array > > > of pointers? > > > > > > > > > > > > + if (!tdmr_array) { > > > > > > + ret = -ENOMEM; > > > > > > + goto out; > > > > > > + } > > > > > > + > > > > > > + /* Construct TDMRs to build TDX memory */ > > > > > > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > > > > > > + if (ret) > > > > > > + goto out_free_tdmrs; > > > > > > + > > > > > > /* > > > > > > * Return -EFAULT until all steps of TDX module > > > > > > * initialization are done. > > > > > > */ > > > > > > ret = -EFAULT; > > > > > > > > > > There's the -EFAULT again. I'd replace these with a better error code. > > > > > > > > I couldn't think out a better error code. -EINVAL looks doesn't suit. -EAGAIN > > > > also doesn't make sense for now since we always shutdown the TDX module in case > > > > of any error so caller should never retry. I think we need some error code to > > > > tell "the job isn't done yet". Perhaps -EBUSY? > > > > > > Is this going to retry if it sees -EFAULT or -EBUSY? > > > > No. Currently we always shutdown the module in case of any error. Caller won't > > be able to retry. > > > > In the future, this can be optimized. We don't shutdown the module in case of > > *some* error (i.e. -ENOMEM), but record an internal state when error happened, > > so the caller can retry again. For now, there's no retry. > > Just make the error codes -EINVAL, please. I don't think anything else > makes sense. > OK will do.
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 482e6d858181..ec27350d53c1 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -13,6 +13,7 @@ #include <linux/cpu.h> #include <linux/smp.h> #include <linux/atomic.h> +#include <linux/slab.h> #include <asm/msr-index.h> #include <asm/msr.h> #include <asm/cpufeature.h> @@ -594,8 +595,29 @@ static int tdx_get_sysinfo(void) return sanitize_cmrs(tdx_cmr_array, cmr_num); } +static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num) +{ + int i; + + for (i = 0; i < tdmr_num; i++) { + struct tdmr_info *tdmr = tdmr_array[i]; + + /* kfree() works with NULL */ + kfree(tdmr); + tdmr_array[i] = NULL; + } +} + +static int construct_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num) +{ + /* Return -EFAULT until constructing TDMRs is done */ + return -EFAULT; +} + static int init_tdx_module(void) { + struct tdmr_info **tdmr_array; + int tdmr_num; int ret; /* TDX module global initialization */ @@ -613,11 +635,36 @@ static int init_tdx_module(void) if (ret) goto out; + /* + * Prepare enough space to hold pointers of TDMRs (TDMR_INFO). + * TDX requires TDMR_INFO being 512 aligned. Each TDMR is + * allocated individually within construct_tdmrs() to meet + * this requirement. + */ + tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *), + GFP_KERNEL); + if (!tdmr_array) { + ret = -ENOMEM; + goto out; + } + + /* Construct TDMRs to build TDX memory */ + ret = construct_tdmrs(tdmr_array, &tdmr_num); + if (ret) + goto out_free_tdmrs; + /* * Return -EFAULT until all steps of TDX module * initialization are done. */ ret = -EFAULT; +out_free_tdmrs: + /* + * TDMRs are only used during initializing TDX module. Always + * free them no matter the initialization was successful or not. + */ + free_tdmrs(tdmr_array, tdmr_num); + kfree(tdmr_array); out: return ret; } diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 2f21c45df6ac..05bf9fe6bd00 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -89,6 +89,29 @@ struct tdsysinfo_struct { }; } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); +struct tdmr_reserved_area { + u64 offset; + u64 size; +} __packed; + +#define TDMR_INFO_ALIGNMENT 512 + +struct tdmr_info { + u64 base; + u64 size; + u64 pamt_1g_base; + u64 pamt_1g_size; + u64 pamt_2m_base; + u64 pamt_2m_size; + u64 pamt_4k_base; + u64 pamt_4k_size; + /* + * Actual number of reserved areas depends on + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. + */ + struct tdmr_reserved_area reserved_areas[0]; +} __packed __aligned(TDMR_INFO_ALIGNMENT); + /* * P-SEAMLDR SEAMCALL leaf function */
TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges, along with TDX module information, is available to the kernel by querying the TDX module. In order to provide crypto protection to TD guests, the TDX architecture also needs additional metadata to record things like which TD guest "owns" a given page of memory. This metadata essentially serves as the 'struct page' for the TDX module. The space for this metadata is not reserved by the hardware upfront and must be allocated by the kernel and given to the TDX module. Since this metadata consumes space, the VMM can choose whether or not to allocate it for a given area of convertible memory. If it chooses not to, the memory cannot receive TDX protections and can not be used by TDX guests as private memory. For every memory region that the VMM wants to use as TDX memory, it sets up a "TD Memory Region" (TDMR). Each TDMR represents a physically contiguous convertible range and must also have its own physically contiguous metadata table, referred to as a Physical Address Metadata Table (PAMT), to track status for each page in the TDMR range. Unlike a CMR, each TDMR requires 1G granularity and alignment. To support physical RAM areas that don't meet those strict requirements, each TDMR permits a number of internal "reserved areas" which can be placed over memory holes. If PAMT metadata is placed within a TDMR it must be covered by one of these reserved areas. Let's summarize the concepts: CMR - Firmware-enumerated physical ranges that support TDX. CMRs are 4K aligned. TDMR - Physical address range which is chosen by the kernel to support TDX. 1G granularity and alignment required. Each TDMR has reserved areas where TDX memory holes and overlapping PAMTs can be put into. PAMT - Physically contiguous TDX metadata. One table for each page size per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G PAMT. As one step of initializing the TDX module, the memory regions that TDX module can use must be configured to the TDX module via an array of TDMRs. Constructing TDMRs to build the TDX memory consists below steps: 1) Create TDMRs to cover all memory regions that TDX module can use; 2) Allocate and set up PAMT for each TDMR; 3) Set up reserved areas for each TDMR. Add a placeholder right after getting TDX module and CMRs information to construct TDMRs to do the above steps, as the preparation to configure the TDX module. Always free TDMRs at the end of the initialization (no matter successful or not), as TDMRs are only used during the initialization. Signed-off-by: Kai Huang <kai.huang@intel.com> --- arch/x86/virt/vmx/tdx/tdx.c | 47 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++++++++++++ 2 files changed, 70 insertions(+)