diff mbox series

[v11,07/20] x86/virt/tdx: Add skeleton to enable TDX on demand

Message ID 21b3a45cb73b4e1917c1eba75b7769781a15aa14.1685887183.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series TDX host kernel support | expand

Commit Message

Huang, Kai June 4, 2023, 2:27 p.m. UTC
To enable TDX the kernel needs to initialize TDX from two perspectives:
1) Do a set of SEAMCALLs to initialize the TDX module to make it ready
to create and run TDX guests; 2) Do the per-cpu initialization SEAMCALL
on one logical cpu before the kernel wants to make any other SEAMCALLs
on that cpu (including those involved during module initialization and
running TDX guests).

The TDX module can be initialized only once in its lifetime.  Instead
of always initializing it at boot time, this implementation chooses an
"on demand" approach to initialize TDX until there is a real need (e.g
when requested by KVM).  This approach has below pros:

1) It avoids consuming the memory that must be allocated by kernel and
given to the TDX module as metadata (~1/256th of the TDX-usable memory),
and also saves the CPU cycles of initializing the TDX module (and the
metadata) when TDX is not used at all.

2) The TDX module design allows it to be updated while the system is
running.  The update procedure shares quite a few steps with this "on
demand" initialization mechanism.  The hope is that much of "on demand"
mechanism can be shared with a future "update" mechanism.  A boot-time
TDX module implementation would not be able to share much code with the
update mechanism.

3) Making SEAMCALL requires VMX to be enabled.  Currently, only the KVM
code mucks with VMX enabling.  If the TDX module were to be initialized
separately from KVM (like at boot), the boot code would need to be
taught how to muck with VMX enabling and KVM would need to be taught how
to cope with that.  Making KVM itself responsible for TDX initialization
lets the rest of the kernel stay blissfully unaware of VMX.

Similar to module initialization, also make the per-cpu initialization
"on demand" as it also depends on VMX being enabled.

Add two functions, tdx_enable() and tdx_cpu_enable(), to enable the TDX
module and enable TDX on local cpu respectively.  For now tdx_enable()
is a placeholder.  The TODO list will be pared down as functionality is
added.

In tdx_enable() use a state machine protected by mutex to make sure the
initialization will only be done once, as tdx_enable() can be called
multiple times (i.e. KVM module can be reloaded) and may be called
concurrently by other kernel components in the future.

The per-cpu initialization on each cpu can only be done once during the
module's life time.  Use a per-cpu variable to track its status to make
sure it is only done once in tdx_cpu_enable().

Also, a SEAMCALL to do TDX module global initialization must be done
once on any logical cpu before any per-cpu initialization SEAMCALL.  Do
it inside tdx_cpu_enable() too (if hasn't been done).

tdx_enable() can potentially invoke SEAMCALLs on any online cpus.  The
per-cpu initialization must be done before those SEAMCALLs are invoked
on some cpu.  To keep things simple, in tdx_cpu_enable(), always do the
per-cpu initialization regardless of whether the TDX module has been
initialized or not.  And in tdx_enable(), don't call tdx_cpu_enable()
but assume the caller has disabled CPU hotplug, done VMXON and
tdx_cpu_enable() on all online cpus before calling tdx_enable().

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
---

v10 -> v11:
 - Return -NODEV instead of -EINVAL when CONFIG_INTEL_TDX_HOST is off.
 - Return the actual error code for tdx_enable() instead of -EINVAL.
 - Added Isaku's Reviewed-by.

v9 -> v10:
 - Merged the patch to handle per-cpu initialization to this patch to
   tell the story better.
 - Changed how to handle the per-cpu initialization to only provide a
   tdx_cpu_enable() function to let the user of TDX to do it when the
   user wants to run TDX code on a certain cpu.
 - Changed tdx_enable() to not call cpus_read_lock() explicitly, but
   call lockdep_assert_cpus_held() to assume the caller has done that.
 - Improved comments around tdx_enable() and tdx_cpu_enable().
 - Improved changelog to tell the story better accordingly.

v8 -> v9:
 - Removed detailed TODO list in the changelog (Dave).
 - Added back steps to do module global initialization and per-cpu
   initialization in the TODO list comment.
 - Moved the 'enum tdx_module_status_t' from tdx.c to local tdx.h

v7 -> v8:
 - Refined changelog (Dave).
 - Removed "all BIOS-enabled cpus" related code (Peter/Thomas/Dave).
 - Add a "TODO list" comment in init_tdx_module() to list all steps of
   initializing the TDX Module to tell the story (Dave).
 - Made tdx_enable() unverisally return -EINVAL, and removed nonsense
   comments (Dave).
 - Simplified __tdx_enable() to only handle success or failure.
 - TDX_MODULE_SHUTDOWN -> TDX_MODULE_ERROR
 - Removed TDX_MODULE_NONE (not loaded) as it is not necessary.
 - Improved comments (Dave).
 - Pointed out 'tdx_module_status' is software thing (Dave).

v6 -> v7:
 - No change.

v5 -> v6:
 - Added code to set status to TDX_MODULE_NONE if TDX module is not
   loaded (Chao)
 - Added Chao's Reviewed-by.
 - Improved comments around cpus_read_lock().

- v3->v5 (no feedback on v4):
 - Removed the check that SEAMRR and TDX KeyID have been detected on
   all present cpus.
 - Removed tdx_detect().
 - Added num_online_cpus() to MADT-enabled CPUs check within the CPU
   hotplug lock and return early with error message.
 - Improved dmesg printing for TDX module detection and initialization.


---
 arch/x86/include/asm/tdx.h  |   4 +
 arch/x86/virt/vmx/tdx/tdx.c | 179 ++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  13 +++
 3 files changed, 196 insertions(+)

Comments

Isaku Yamahata June 5, 2023, 9:23 p.m. UTC | #1
On Mon, Jun 05, 2023 at 02:27:20AM +1200,
Kai Huang <kai.huang@intel.com> wrote:

> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index b489b5b9de5d..03f74851608f 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -102,8 +102,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>  
>  #ifdef CONFIG_INTEL_TDX_HOST
>  bool platform_tdx_enabled(void);
> +int tdx_cpu_enable(void);
> +int tdx_enable(void);
>  #else	/* !CONFIG_INTEL_TDX_HOST */
>  static inline bool platform_tdx_enabled(void) { return false; }
> +static inline int tdx_cpu_enable(void) { return -ENODEV; }
> +static inline int tdx_enable(void)  { return -ENODEV; }
>  #endif	/* CONFIG_INTEL_TDX_HOST */
>  
>  #endif /* !__ASSEMBLY__ */

Please include errno.h.
In the case of !INTEL_TDX_HOST && INTEL_TDX_GUEST,
drivers/virt/coco/tdx-guest/tdx-guest.c results in compile error.

Although there are other files that include asm/tdx.h, they seem to luckily
include errno.h directly or indirectly.
Huang, Kai June 5, 2023, 11:04 p.m. UTC | #2
On Mon, 2023-06-05 at 14:23 -0700, Isaku Yamahata wrote:
> On Mon, Jun 05, 2023 at 02:27:20AM +1200,
> Kai Huang <kai.huang@intel.com> wrote:
> 
> > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> > index b489b5b9de5d..03f74851608f 100644
> > --- a/arch/x86/include/asm/tdx.h
> > +++ b/arch/x86/include/asm/tdx.h
> > @@ -102,8 +102,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> >  
> >  #ifdef CONFIG_INTEL_TDX_HOST
> >  bool platform_tdx_enabled(void);
> > +int tdx_cpu_enable(void);
> > +int tdx_enable(void);
> >  #else	/* !CONFIG_INTEL_TDX_HOST */
> >  static inline bool platform_tdx_enabled(void) { return false; }
> > +static inline int tdx_cpu_enable(void) { return -ENODEV; }
> > +static inline int tdx_enable(void)  { return -ENODEV; }
> >  #endif	/* CONFIG_INTEL_TDX_HOST */
> >  
> >  #endif /* !__ASSEMBLY__ */
> 
> Please include errno.h.
> In the case of !INTEL_TDX_HOST && INTEL_TDX_GUEST,
> drivers/virt/coco/tdx-guest/tdx-guest.c results in compile error.
> 
> Although there are other files that include asm/tdx.h, they seem to luckily
> include errno.h directly or indirectly.
> 
I've changed to -ENODEV which doesn't cause build error in my test.

I agree it's better to include errno.h explicitly but IMHO it's a problem in the
existing upstream code (tdx_kvm_hypercall() returns -ENODEV when INTEL_TDX_GUEST
is off), thus should be addressed in a separate patch but not folded into this
one.

Since now there's no build error, I hesitated to added such patch.  I can do it
separately if it should be addressed.
Dave Hansen June 5, 2023, 11:08 p.m. UTC | #3
On 6/5/23 16:04, Huang, Kai wrote:
> I've changed to -ENODEV which doesn't cause build error in my test.
> 
> I agree it's better to include errno.h explicitly but IMHO it's a problem in the
> existing upstream code (tdx_kvm_hypercall() returns -ENODEV when INTEL_TDX_GUEST
> is off), thus should be addressed in a separate patch but not folded into this
> one.
> 
> Since now there's no build error, I hesitated to added such patch.  I can do it
> separately if it should be addressed.

Yes, it should be addressed.
Huang, Kai June 5, 2023, 11:24 p.m. UTC | #4
On Mon, 2023-06-05 at 16:08 -0700, Hansen, Dave wrote:
> On 6/5/23 16:04, Huang, Kai wrote:
> > I've changed to -ENODEV which doesn't cause build error in my test.
> > 
> > I agree it's better to include errno.h explicitly but IMHO it's a problem in the
> > existing upstream code (tdx_kvm_hypercall() returns -ENODEV when INTEL_TDX_GUEST
> > is off), thus should be addressed in a separate patch but not folded into this
> > one.
> > 
> > Since now there's no build error, I hesitated to added such patch.  I can do it
> > separately if it should be addressed.
> 
> Yes, it should be addressed.
> 

I'll send out a separate patch to address.  Thanks!
Dave Hansen June 7, 2023, 3:22 p.m. UTC | #5
On 6/4/23 07:27, Kai Huang wrote:
...
> +static int try_init_module_global(void)
> +{
> +	unsigned long flags;
> +	int ret;
> +
> +	/*
> +	 * The TDX module global initialization only needs to be done
> +	 * once on any cpu.
> +	 */
> +	raw_spin_lock_irqsave(&tdx_global_init_lock, flags);

Why is this "raw_"?

There's zero mention of it anywhere.

> +	if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) {
> +		ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ?
> +			-EINVAL : 0;
> +		goto out;
> +	}
> +
> +	/* All '0's are just unused parameters. */
> +	ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL);
> +
> +	tdx_global_init_status = TDX_GLOBAL_INIT_DONE;
> +	if (ret)
> +		tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED;
> +out:
> +	raw_spin_unlock_irqrestore(&tdx_global_init_lock, flags);
> +
> +	return ret;
> +}
> +
> +/**
> + * tdx_cpu_enable - Enable TDX on local cpu
> + *
> + * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module
> + * global initialization SEAMCALL if not done) on local cpu to make this
> + * cpu be ready to run any other SEAMCALLs.
> + *
> + * Note this function must be called when preemption is not possible
> + * (i.e. via SMP call or in per-cpu thread).  It is not IRQ safe either
> + * (i.e. cannot be called in per-cpu thread and via SMP call from remote
> + * cpu simultaneously).

lockdep_assert_*() are your friends.  Unlike comments, they will
actually tell you if this goes wrong.

> +int tdx_cpu_enable(void)
> +{
> +	unsigned int lp_status;
> +	int ret;
> +
> +	if (!platform_tdx_enabled())
> +		return -EINVAL;
> +
> +	lp_status = __this_cpu_read(tdx_lp_init_status);
> +
> +	/* Already done */
> +	if (lp_status & TDX_LP_INIT_DONE)
> +		return lp_status & TDX_LP_INIT_FAILED ? -EINVAL : 0;
> +
> +	/*
> +	 * The TDX module global initialization is the very first step
> +	 * to enable TDX.  Need to do it first (if hasn't been done)
> +	 * before doing the per-cpu initialization.
> +	 */
> +	ret = try_init_module_global();
> +
> +	/*
> +	 * If the module global initialization failed, there's no point
> +	 * to do the per-cpu initialization.  Just mark it as done but
> +	 * failed.
> +	 */
> +	if (ret)
> +		goto update_status;
> +
> +	/* All '0's are just unused parameters */
> +	ret = seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL);
> +
> +update_status:
> +	lp_status = TDX_LP_INIT_DONE;
> +	if (ret)
> +		lp_status |= TDX_LP_INIT_FAILED;
> +
> +	this_cpu_write(tdx_lp_init_status, lp_status);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_cpu_enable);

You danced around it in the changelog, but the reason for the exports is
not clear.

> +static int init_tdx_module(void)
> +{
> +	/*
> +	 * TODO:
> +	 *
> +	 *  - Get TDX module information and TDX-capable memory regions.
> +	 *  - Build the list of TDX-usable memory regions.
> +	 *  - Construct a list of "TD Memory Regions" (TDMRs) to cover
> +	 *    all TDX-usable memory regions.
> +	 *  - Configure the TDMRs and the global KeyID to the TDX module.
> +	 *  - Configure the global KeyID on all packages.
> +	 *  - Initialize all TDMRs.
> +	 *
> +	 *  Return error before all steps are done.
> +	 */
> +	return -EINVAL;
> +}
> +
> +static int __tdx_enable(void)
> +{
> +	int ret;
> +
> +	ret = init_tdx_module();
> +	if (ret) {
> +		pr_err("TDX module initialization failed (%d)\n", ret);

Have you actually gone any  looked at how this pr_*()'s look?

Won't they say:

	tdx: TDX module initialized

Isn't that a _bit_ silly?  Why not just say:

	pr_info("module initialized.\n");
Huang, Kai June 8, 2023, 2:10 a.m. UTC | #6
On Wed, 2023-06-07 at 08:22 -0700, Dave Hansen wrote:
> On 6/4/23 07:27, Kai Huang wrote:
> ...
> > +static int try_init_module_global(void)
> > +{
> > +	unsigned long flags;
> > +	int ret;
> > +
> > +	/*
> > +	 * The TDX module global initialization only needs to be done
> > +	 * once on any cpu.
> > +	 */
> > +	raw_spin_lock_irqsave(&tdx_global_init_lock, flags);
> 
> Why is this "raw_"?
> 
> There's zero mention of it anywhere.

Isaku pointed out the normal spinlock_t is converted to sleeping lock for
PREEMPT_RT kernel.  KVM calls this with IRQ disabled, thus requires a non-
sleeping lock.

How about adding below comment here?

	/*
	 * Normal spinlock_t is converted to sleeping lock in PREEMPT_RT
	 * kernel.  Use raw_spinlock_t instead so this function can be called
	 * even when IRQ is disabled in any kernel configuration.
	 */


> 
> > +	if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) {
> > +		ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ?
> > +			-EINVAL : 0;
> > +		goto out;
> > +	}
> > +
> > +	/* All '0's are just unused parameters. */
> > +	ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL);
> > +
> > +	tdx_global_init_status = TDX_GLOBAL_INIT_DONE;
> > +	if (ret)
> > +		tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED;
> > +out:
> > +	raw_spin_unlock_irqrestore(&tdx_global_init_lock, flags);
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * tdx_cpu_enable - Enable TDX on local cpu
> > + *
> > + * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module
> > + * global initialization SEAMCALL if not done) on local cpu to make this
> > + * cpu be ready to run any other SEAMCALLs.
> > + *
> > + * Note this function must be called when preemption is not possible
> > + * (i.e. via SMP call or in per-cpu thread).  It is not IRQ safe either
> > + * (i.e. cannot be called in per-cpu thread and via SMP call from remote
> > + * cpu simultaneously).
> 
> lockdep_assert_*() are your friends.  Unlike comments, they will
> actually tell you if this goes wrong.

Yeah.  Will do.  Thanks for reminding.

> 
> > +int tdx_cpu_enable(void)
> > +{
> > +	unsigned int lp_status;
> > +	int ret;
> > +
> > +	if (!platform_tdx_enabled())
> > +		return -EINVAL;
> > +
> > +	lp_status = __this_cpu_read(tdx_lp_init_status);
> > +
> > +	/* Already done */
> > +	if (lp_status & TDX_LP_INIT_DONE)
> > +		return lp_status & TDX_LP_INIT_FAILED ? -EINVAL : 0;
> > +
> > +	/*
> > +	 * The TDX module global initialization is the very first step
> > +	 * to enable TDX.  Need to do it first (if hasn't been done)
> > +	 * before doing the per-cpu initialization.
> > +	 */
> > +	ret = try_init_module_global();
> > +
> > +	/*
> > +	 * If the module global initialization failed, there's no point
> > +	 * to do the per-cpu initialization.  Just mark it as done but
> > +	 * failed.
> > +	 */
> > +	if (ret)
> > +		goto update_status;
> > +
> > +	/* All '0's are just unused parameters */
> > +	ret = seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL);
> > +
> > +update_status:
> > +	lp_status = TDX_LP_INIT_DONE;
> > +	if (ret)
> > +		lp_status |= TDX_LP_INIT_FAILED;
> > +
> > +	this_cpu_write(tdx_lp_init_status, lp_status);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_cpu_enable);
> 
> You danced around it in the changelog, but the reason for the exports is
> not clear.

I'll add one sentence to the changelog to explain:

	Export both tdx_cpu_enable() and tdx_enable() as KVM will be the kernel
	component to use TDX.

> 
> > +static int init_tdx_module(void)
> > +{
> > +	/*
> > +	 * TODO:
> > +	 *
> > +	 *  - Get TDX module information and TDX-capable memory regions.
> > +	 *  - Build the list of TDX-usable memory regions.
> > +	 *  - Construct a list of "TD Memory Regions" (TDMRs) to cover
> > +	 *    all TDX-usable memory regions.
> > +	 *  - Configure the TDMRs and the global KeyID to the TDX module.
> > +	 *  - Configure the global KeyID on all packages.
> > +	 *  - Initialize all TDMRs.
> > +	 *
> > +	 *  Return error before all steps are done.
> > +	 */
> > +	return -EINVAL;
> > +}
> > +
> > +static int __tdx_enable(void)
> > +{
> > +	int ret;
> > +
> > +	ret = init_tdx_module();
> > +	if (ret) {
> > +		pr_err("TDX module initialization failed (%d)\n", ret);
> 
> Have you actually gone any  looked at how this pr_*()'s look?
> 
> Won't they say:
> 
> 	tdx: TDX module initialized
> 
> Isn't that a _bit_ silly?  Why not just say:
> 
> 	pr_info("module initialized.\n");

I did.  However I might have a bad taste :)

Will change (and change other pr() if there's similar problem).
Dave Hansen June 8, 2023, 1:43 p.m. UTC | #7
On 6/7/23 19:10, Huang, Kai wrote:
> On Wed, 2023-06-07 at 08:22 -0700, Dave Hansen wrote:
>> On 6/4/23 07:27, Kai Huang wrote:
>> ...
>>> +static int try_init_module_global(void)
>>> +{
>>> +   unsigned long flags;
>>> +   int ret;
>>> +
>>> +   /*
>>> +    * The TDX module global initialization only needs to be done
>>> +    * once on any cpu.
>>> +    */
>>> +   raw_spin_lock_irqsave(&tdx_global_init_lock, flags);
>>
>> Why is this "raw_"?
>>
>> There's zero mention of it anywhere.
> 
> Isaku pointed out the normal spinlock_t is converted to sleeping lock for
> PREEMPT_RT kernel.  KVM calls this with IRQ disabled, thus requires a non-
> sleeping lock.
> 
> How about adding below comment here?
> 
>         /*
>          * Normal spinlock_t is converted to sleeping lock in PREEMPT_RT
>          * kernel.  Use raw_spinlock_t instead so this function can be called
>          * even when IRQ is disabled in any kernel configuration.
>          */

Go look at *EVERY* *OTHER* raw_spinlock_t in the kernel.  Do any of them
say this?

Comment the function, say that it's always called with interrupts and
preempt disabled.  Leaves it at that.  *Maybe* add on that it needs raw
spinlocks because of it.  But don't (try to) explain the background of
the lock type.




>>> +int tdx_cpu_enable(void)
>>> +{
>>> +   unsigned int lp_status;
>>> +   int ret;
>>> +
>>> +   if (!platform_tdx_enabled())
>>> +           return -EINVAL;
>>> +
>>> +   lp_status = __this_cpu_read(tdx_lp_init_status);
>>> +
>>> +   /* Already done */
>>> +   if (lp_status & TDX_LP_INIT_DONE)
>>> +           return lp_status & TDX_LP_INIT_FAILED ? -EINVAL : 0;
>>> +
>>> +   /*
>>> +    * The TDX module global initialization is the very first step
>>> +    * to enable TDX.  Need to do it first (if hasn't been done)
>>> +    * before doing the per-cpu initialization.
>>> +    */
>>> +   ret = try_init_module_global();
>>> +
>>> +   /*
>>> +    * If the module global initialization failed, there's no point
>>> +    * to do the per-cpu initialization.  Just mark it as done but
>>> +    * failed.
>>> +    */
>>> +   if (ret)
>>> +           goto update_status;
>>> +
>>> +   /* All '0's are just unused parameters */
>>> +   ret = seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL);
>>> +
>>> +update_status:
>>> +   lp_status = TDX_LP_INIT_DONE;
>>> +   if (ret)
>>> +           lp_status |= TDX_LP_INIT_FAILED;
>>> +
>>> +   this_cpu_write(tdx_lp_init_status, lp_status);
>>> +
>>> +   return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(tdx_cpu_enable);
>>
>> You danced around it in the changelog, but the reason for the exports is
>> not clear.
> 
> I'll add one sentence to the changelog to explain:
> 
>         Export both tdx_cpu_enable() and tdx_enable() as KVM will be the kernel
>         component to use TDX.

Intel doesn't pay me by the word.  Do you get paid that way?  If not,
please just say:

	Export both tdx_cpu_enable() and tdx_enable() for KVM use.
Huang, Kai June 12, 2023, 11:21 a.m. UTC | #8
On Thu, 2023-06-08 at 06:43 -0700, Dave Hansen wrote:
> On 6/7/23 19:10, Huang, Kai wrote:
> > On Wed, 2023-06-07 at 08:22 -0700, Dave Hansen wrote:
> > > On 6/4/23 07:27, Kai Huang wrote:
> > > ...
> > > > +static int try_init_module_global(void)
> > > > +{
> > > > +   unsigned long flags;
> > > > +   int ret;
> > > > +
> > > > +   /*
> > > > +    * The TDX module global initialization only needs to be done
> > > > +    * once on any cpu.
> > > > +    */
> > > > +   raw_spin_lock_irqsave(&tdx_global_init_lock, flags);
> > > 
> > > Why is this "raw_"?
> > > 
> > > There's zero mention of it anywhere.
> > 
> > Isaku pointed out the normal spinlock_t is converted to sleeping lock for
> > PREEMPT_RT kernel.  KVM calls this with IRQ disabled, thus requires a non-
> > sleeping lock.
> > 
> > How about adding below comment here?
> > 
> >         /*
> >          * Normal spinlock_t is converted to sleeping lock in PREEMPT_RT
> >          * kernel.  Use raw_spinlock_t instead so this function can be called
> >          * even when IRQ is disabled in any kernel configuration.
> >          */
> 
> Go look at *EVERY* *OTHER* raw_spinlock_t in the kernel.  Do any of them
> say this?
> 
> Comment the function, say that it's always called with interrupts and
> preempt disabled.  Leaves it at that.  *Maybe* add on that it needs raw
> spinlocks because of it.  But don't (try to) explain the background of
> the lock type.
> 

Thanks.  Will do, with one minor:

I'd like to replace "it's always called with interrupts and preempt disabled"
with "it can be called with interrupts disabled", because in the future non-KVM
code may call this when interrupt is enabled but preemption is disabled.

[...]

> > > > +EXPORT_SYMBOL_GPL(tdx_cpu_enable);
> > > 
> > > You danced around it in the changelog, but the reason for the exports is
> > > not clear.
> > 
> > I'll add one sentence to the changelog to explain:
> > 
> >         Export both tdx_cpu_enable() and tdx_enable() as KVM will be the kernel
> >         component to use TDX.
> 
> Intel doesn't pay me by the word.  Do you get paid that way?  If not,
> please just say:
> 
> 	Export both tdx_cpu_enable() and tdx_enable() for KVM use.

Thanks!
David Hildenbrand June 19, 2023, 1:16 p.m. UTC | #9
On 04.06.23 16:27, Kai Huang wrote:
> To enable TDX the kernel needs to initialize TDX from two perspectives:
> 1) Do a set of SEAMCALLs to initialize the TDX module to make it ready
> to create and run TDX guests; 2) Do the per-cpu initialization SEAMCALL
> on one logical cpu before the kernel wants to make any other SEAMCALLs
> on that cpu (including those involved during module initialization and
> running TDX guests).
> 
> The TDX module can be initialized only once in its lifetime.  Instead
> of always initializing it at boot time, this implementation chooses an
> "on demand" approach to initialize TDX until there is a real need (e.g
> when requested by KVM).  This approach has below pros:
> 
> 1) It avoids consuming the memory that must be allocated by kernel and
> given to the TDX module as metadata (~1/256th of the TDX-usable memory),
> and also saves the CPU cycles of initializing the TDX module (and the
> metadata) when TDX is not used at all.
> 
> 2) The TDX module design allows it to be updated while the system is
> running.  The update procedure shares quite a few steps with this "on
> demand" initialization mechanism.  The hope is that much of "on demand"
> mechanism can be shared with a future "update" mechanism.  A boot-time
> TDX module implementation would not be able to share much code with the
> update mechanism.
> 
> 3) Making SEAMCALL requires VMX to be enabled.  Currently, only the KVM
> code mucks with VMX enabling.  If the TDX module were to be initialized
> separately from KVM (like at boot), the boot code would need to be
> taught how to muck with VMX enabling and KVM would need to be taught how
> to cope with that.  Making KVM itself responsible for TDX initialization
> lets the rest of the kernel stay blissfully unaware of VMX.
> 
> Similar to module initialization, also make the per-cpu initialization
> "on demand" as it also depends on VMX being enabled.
> 
> Add two functions, tdx_enable() and tdx_cpu_enable(), to enable the TDX
> module and enable TDX on local cpu respectively.  For now tdx_enable()
> is a placeholder.  The TODO list will be pared down as functionality is
> added.
> 
> In tdx_enable() use a state machine protected by mutex to make sure the
> initialization will only be done once, as tdx_enable() can be called
> multiple times (i.e. KVM module can be reloaded) and may be called
> concurrently by other kernel components in the future.
> 
> The per-cpu initialization on each cpu can only be done once during the
> module's life time.  Use a per-cpu variable to track its status to make
> sure it is only done once in tdx_cpu_enable().
> 
> Also, a SEAMCALL to do TDX module global initialization must be done
> once on any logical cpu before any per-cpu initialization SEAMCALL.  Do
> it inside tdx_cpu_enable() too (if hasn't been done).
> 
> tdx_enable() can potentially invoke SEAMCALLs on any online cpus.  The
> per-cpu initialization must be done before those SEAMCALLs are invoked
> on some cpu.  To keep things simple, in tdx_cpu_enable(), always do the
> per-cpu initialization regardless of whether the TDX module has been
> initialized or not.  And in tdx_enable(), don't call tdx_cpu_enable()
> but assume the caller has disabled CPU hotplug, done VMXON and
> tdx_cpu_enable() on all online cpus before calling tdx_enable().
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> 
> v10 -> v11:
>   - Return -NODEV instead of -EINVAL when CONFIG_INTEL_TDX_HOST is off.
>   - Return the actual error code for tdx_enable() instead of -EINVAL.
>   - Added Isaku's Reviewed-by.
> 
> v9 -> v10:
>   - Merged the patch to handle per-cpu initialization to this patch to
>     tell the story better.
>   - Changed how to handle the per-cpu initialization to only provide a
>     tdx_cpu_enable() function to let the user of TDX to do it when the
>     user wants to run TDX code on a certain cpu.
>   - Changed tdx_enable() to not call cpus_read_lock() explicitly, but
>     call lockdep_assert_cpus_held() to assume the caller has done that.
>   - Improved comments around tdx_enable() and tdx_cpu_enable().
>   - Improved changelog to tell the story better accordingly.
> 
> v8 -> v9:
>   - Removed detailed TODO list in the changelog (Dave).
>   - Added back steps to do module global initialization and per-cpu
>     initialization in the TODO list comment.
>   - Moved the 'enum tdx_module_status_t' from tdx.c to local tdx.h
> 
> v7 -> v8:
>   - Refined changelog (Dave).
>   - Removed "all BIOS-enabled cpus" related code (Peter/Thomas/Dave).
>   - Add a "TODO list" comment in init_tdx_module() to list all steps of
>     initializing the TDX Module to tell the story (Dave).
>   - Made tdx_enable() unverisally return -EINVAL, and removed nonsense
>     comments (Dave).
>   - Simplified __tdx_enable() to only handle success or failure.
>   - TDX_MODULE_SHUTDOWN -> TDX_MODULE_ERROR
>   - Removed TDX_MODULE_NONE (not loaded) as it is not necessary.
>   - Improved comments (Dave).
>   - Pointed out 'tdx_module_status' is software thing (Dave).
> 
> v6 -> v7:
>   - No change.
> 
> v5 -> v6:
>   - Added code to set status to TDX_MODULE_NONE if TDX module is not
>     loaded (Chao)
>   - Added Chao's Reviewed-by.
>   - Improved comments around cpus_read_lock().
> 
> - v3->v5 (no feedback on v4):
>   - Removed the check that SEAMRR and TDX KeyID have been detected on
>     all present cpus.
>   - Removed tdx_detect().
>   - Added num_online_cpus() to MADT-enabled CPUs check within the CPU
>     hotplug lock and return early with error message.
>   - Improved dmesg printing for TDX module detection and initialization.
> 
> 
> ---
>   arch/x86/include/asm/tdx.h  |   4 +
>   arch/x86/virt/vmx/tdx/tdx.c | 179 ++++++++++++++++++++++++++++++++++++
>   arch/x86/virt/vmx/tdx/tdx.h |  13 +++
>   3 files changed, 196 insertions(+)
> 
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index b489b5b9de5d..03f74851608f 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -102,8 +102,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>   
>   #ifdef CONFIG_INTEL_TDX_HOST
>   bool platform_tdx_enabled(void);
> +int tdx_cpu_enable(void);
> +int tdx_enable(void);
>   #else	/* !CONFIG_INTEL_TDX_HOST */
>   static inline bool platform_tdx_enabled(void) { return false; }
> +static inline int tdx_cpu_enable(void) { return -ENODEV; }
> +static inline int tdx_enable(void)  { return -ENODEV; }
>   #endif	/* CONFIG_INTEL_TDX_HOST */
>   
>   #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index e62e978eba1b..bcf2b2d15a2e 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -13,6 +13,10 @@
>   #include <linux/errno.h>
>   #include <linux/printk.h>
>   #include <linux/smp.h>
> +#include <linux/cpu.h>
> +#include <linux/spinlock.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/mutex.h>
>   #include <asm/msr-index.h>
>   #include <asm/msr.h>
>   #include <asm/archrandom.h>
> @@ -23,6 +27,18 @@ static u32 tdx_global_keyid __ro_after_init;
>   static u32 tdx_guest_keyid_start __ro_after_init;
>   static u32 tdx_nr_guest_keyids __ro_after_init;
>   
> +static unsigned int tdx_global_init_status;
> +static DEFINE_RAW_SPINLOCK(tdx_global_init_lock);
> +#define TDX_GLOBAL_INIT_DONE	_BITUL(0)
> +#define TDX_GLOBAL_INIT_FAILED	_BITUL(1)
> +
> +static DEFINE_PER_CPU(unsigned int, tdx_lp_init_status);
> +#define TDX_LP_INIT_DONE	_BITUL(0)
> +#define TDX_LP_INIT_FAILED	_BITUL(1)

I'm curious, why do we have to track three states: uninitialized 
(!done), initialized (done + ! failed), permanent error (done + failed).

[besides: why can't you use an enum and share that between global and pcpu?]

Why can't you have a pcpu "bool tdx_lp_initialized" and "bool 
tdx_global_initialized"?

I mean, if there was an error during previous initialization, it's not 
initialized: you'd try initializing again -- and possibly fail again -- 
on the next attempt. I doubt that a "try to cache failed status to keep 
failing fast" is really required.

Is there any other reason (e.g., second init attempt would set your 
computer on fire) why it can't be simpler?


[...]
Huang, Kai June 19, 2023, 11:28 p.m. UTC | #10
On Mon, 2023-06-19 at 15:16 +0200, David Hildenbrand wrote:
> On 04.06.23 16:27, Kai Huang wrote:
> > To enable TDX the kernel needs to initialize TDX from two perspectives:
> > 1) Do a set of SEAMCALLs to initialize the TDX module to make it ready
> > to create and run TDX guests; 2) Do the per-cpu initialization SEAMCALL
> > on one logical cpu before the kernel wants to make any other SEAMCALLs
> > on that cpu (including those involved during module initialization and
> > running TDX guests).
> > 
> > The TDX module can be initialized only once in its lifetime.  Instead
> > of always initializing it at boot time, this implementation chooses an
> > "on demand" approach to initialize TDX until there is a real need (e.g
> > when requested by KVM).  This approach has below pros:
> > 
> > 1) It avoids consuming the memory that must be allocated by kernel and
> > given to the TDX module as metadata (~1/256th of the TDX-usable memory),
> > and also saves the CPU cycles of initializing the TDX module (and the
> > metadata) when TDX is not used at all.
> > 
> > 2) The TDX module design allows it to be updated while the system is
> > running.  The update procedure shares quite a few steps with this "on
> > demand" initialization mechanism.  The hope is that much of "on demand"
> > mechanism can be shared with a future "update" mechanism.  A boot-time
> > TDX module implementation would not be able to share much code with the
> > update mechanism.
> > 
> > 3) Making SEAMCALL requires VMX to be enabled.  Currently, only the KVM
> > code mucks with VMX enabling.  If the TDX module were to be initialized
> > separately from KVM (like at boot), the boot code would need to be
> > taught how to muck with VMX enabling and KVM would need to be taught how
> > to cope with that.  Making KVM itself responsible for TDX initialization
> > lets the rest of the kernel stay blissfully unaware of VMX.
> > 
> > Similar to module initialization, also make the per-cpu initialization
> > "on demand" as it also depends on VMX being enabled.
> > 
> > Add two functions, tdx_enable() and tdx_cpu_enable(), to enable the TDX
> > module and enable TDX on local cpu respectively.  For now tdx_enable()
> > is a placeholder.  The TODO list will be pared down as functionality is
> > added.
> > 
> > In tdx_enable() use a state machine protected by mutex to make sure the
> > initialization will only be done once, as tdx_enable() can be called
> > multiple times (i.e. KVM module can be reloaded) and may be called
> > concurrently by other kernel components in the future.
> > 
> > The per-cpu initialization on each cpu can only be done once during the
> > module's life time.  Use a per-cpu variable to track its status to make
> > sure it is only done once in tdx_cpu_enable().
> > 
> > Also, a SEAMCALL to do TDX module global initialization must be done
> > once on any logical cpu before any per-cpu initialization SEAMCALL.  Do
> > it inside tdx_cpu_enable() too (if hasn't been done).
> > 
> > tdx_enable() can potentially invoke SEAMCALLs on any online cpus.  The
> > per-cpu initialization must be done before those SEAMCALLs are invoked
> > on some cpu.  To keep things simple, in tdx_cpu_enable(), always do the
> > per-cpu initialization regardless of whether the TDX module has been
> > initialized or not.  And in tdx_enable(), don't call tdx_cpu_enable()
> > but assume the caller has disabled CPU hotplug, done VMXON and
> > tdx_cpu_enable() on all online cpus before calling tdx_enable().
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> > 
> > v10 -> v11:
> >   - Return -NODEV instead of -EINVAL when CONFIG_INTEL_TDX_HOST is off.
> >   - Return the actual error code for tdx_enable() instead of -EINVAL.
> >   - Added Isaku's Reviewed-by.
> > 
> > v9 -> v10:
> >   - Merged the patch to handle per-cpu initialization to this patch to
> >     tell the story better.
> >   - Changed how to handle the per-cpu initialization to only provide a
> >     tdx_cpu_enable() function to let the user of TDX to do it when the
> >     user wants to run TDX code on a certain cpu.
> >   - Changed tdx_enable() to not call cpus_read_lock() explicitly, but
> >     call lockdep_assert_cpus_held() to assume the caller has done that.
> >   - Improved comments around tdx_enable() and tdx_cpu_enable().
> >   - Improved changelog to tell the story better accordingly.
> > 
> > v8 -> v9:
> >   - Removed detailed TODO list in the changelog (Dave).
> >   - Added back steps to do module global initialization and per-cpu
> >     initialization in the TODO list comment.
> >   - Moved the 'enum tdx_module_status_t' from tdx.c to local tdx.h
> > 
> > v7 -> v8:
> >   - Refined changelog (Dave).
> >   - Removed "all BIOS-enabled cpus" related code (Peter/Thomas/Dave).
> >   - Add a "TODO list" comment in init_tdx_module() to list all steps of
> >     initializing the TDX Module to tell the story (Dave).
> >   - Made tdx_enable() unverisally return -EINVAL, and removed nonsense
> >     comments (Dave).
> >   - Simplified __tdx_enable() to only handle success or failure.
> >   - TDX_MODULE_SHUTDOWN -> TDX_MODULE_ERROR
> >   - Removed TDX_MODULE_NONE (not loaded) as it is not necessary.
> >   - Improved comments (Dave).
> >   - Pointed out 'tdx_module_status' is software thing (Dave).
> > 
> > v6 -> v7:
> >   - No change.
> > 
> > v5 -> v6:
> >   - Added code to set status to TDX_MODULE_NONE if TDX module is not
> >     loaded (Chao)
> >   - Added Chao's Reviewed-by.
> >   - Improved comments around cpus_read_lock().
> > 
> > - v3->v5 (no feedback on v4):
> >   - Removed the check that SEAMRR and TDX KeyID have been detected on
> >     all present cpus.
> >   - Removed tdx_detect().
> >   - Added num_online_cpus() to MADT-enabled CPUs check within the CPU
> >     hotplug lock and return early with error message.
> >   - Improved dmesg printing for TDX module detection and initialization.
> > 
> > 
> > ---
> >   arch/x86/include/asm/tdx.h  |   4 +
> >   arch/x86/virt/vmx/tdx/tdx.c | 179 ++++++++++++++++++++++++++++++++++++
> >   arch/x86/virt/vmx/tdx/tdx.h |  13 +++
> >   3 files changed, 196 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> > index b489b5b9de5d..03f74851608f 100644
> > --- a/arch/x86/include/asm/tdx.h
> > +++ b/arch/x86/include/asm/tdx.h
> > @@ -102,8 +102,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> >   
> >   #ifdef CONFIG_INTEL_TDX_HOST
> >   bool platform_tdx_enabled(void);
> > +int tdx_cpu_enable(void);
> > +int tdx_enable(void);
> >   #else	/* !CONFIG_INTEL_TDX_HOST */
> >   static inline bool platform_tdx_enabled(void) { return false; }
> > +static inline int tdx_cpu_enable(void) { return -ENODEV; }
> > +static inline int tdx_enable(void)  { return -ENODEV; }
> >   #endif	/* CONFIG_INTEL_TDX_HOST */
> >   
> >   #endif /* !__ASSEMBLY__ */
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index e62e978eba1b..bcf2b2d15a2e 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -13,6 +13,10 @@
> >   #include <linux/errno.h>
> >   #include <linux/printk.h>
> >   #include <linux/smp.h>
> > +#include <linux/cpu.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/percpu-defs.h>
> > +#include <linux/mutex.h>
> >   #include <asm/msr-index.h>
> >   #include <asm/msr.h>
> >   #include <asm/archrandom.h>
> > @@ -23,6 +27,18 @@ static u32 tdx_global_keyid __ro_after_init;
> >   static u32 tdx_guest_keyid_start __ro_after_init;
> >   static u32 tdx_nr_guest_keyids __ro_after_init;
> >   
> > +static unsigned int tdx_global_init_status;
> > +static DEFINE_RAW_SPINLOCK(tdx_global_init_lock);
> > +#define TDX_GLOBAL_INIT_DONE	_BITUL(0)
> > +#define TDX_GLOBAL_INIT_FAILED	_BITUL(1)
> > +
> > +static DEFINE_PER_CPU(unsigned int, tdx_lp_init_status);
> > +#define TDX_LP_INIT_DONE	_BITUL(0)
> > +#define TDX_LP_INIT_FAILED	_BITUL(1)
> 
> I'm curious, why do we have to track three states: uninitialized 
> (!done), initialized (done + ! failed), permanent error (done + failed).
> 
> [besides: why can't you use an enum and share that between global and pcpu?]
> 
> Why can't you have a pcpu "bool tdx_lp_initialized" and "bool 
> tdx_global_initialized"?
> 
> I mean, if there was an error during previous initialization, it's not 
> initialized: you'd try initializing again -- and possibly fail again -- 
> on the next attempt. I doubt that a "try to cache failed status to keep 
> failing fast" is really required.
> 
> Is there any other reason (e.g., second init attempt would set your 
> computer on fire) why it can't be simpler?

No other reasons but only the one that you mentioned above: I didn't want to
retry in case of permanent error.

Yes I agree we can have a pcpu "bool tdx_lp_initialized" and a "bool
tdx_global_initialized" to simplify the logic.

Thanks!
diff mbox series

Patch

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index b489b5b9de5d..03f74851608f 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -102,8 +102,12 @@  static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 
 #ifdef CONFIG_INTEL_TDX_HOST
 bool platform_tdx_enabled(void);
+int tdx_cpu_enable(void);
+int tdx_enable(void);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
+static inline int tdx_cpu_enable(void) { return -ENODEV; }
+static inline int tdx_enable(void)  { return -ENODEV; }
 #endif	/* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index e62e978eba1b..bcf2b2d15a2e 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -13,6 +13,10 @@ 
 #include <linux/errno.h>
 #include <linux/printk.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
+#include <linux/spinlock.h>
+#include <linux/percpu-defs.h>
+#include <linux/mutex.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/archrandom.h>
@@ -23,6 +27,18 @@  static u32 tdx_global_keyid __ro_after_init;
 static u32 tdx_guest_keyid_start __ro_after_init;
 static u32 tdx_nr_guest_keyids __ro_after_init;
 
+static unsigned int tdx_global_init_status;
+static DEFINE_RAW_SPINLOCK(tdx_global_init_lock);
+#define TDX_GLOBAL_INIT_DONE	_BITUL(0)
+#define TDX_GLOBAL_INIT_FAILED	_BITUL(1)
+
+static DEFINE_PER_CPU(unsigned int, tdx_lp_init_status);
+#define TDX_LP_INIT_DONE	_BITUL(0)
+#define TDX_LP_INIT_FAILED	_BITUL(1)
+
+static enum tdx_module_status_t tdx_module_status;
+static DEFINE_MUTEX(tdx_module_lock);
+
 /*
  * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
  * to kernel error code.  @seamcall_ret and @out contain the SEAMCALL
@@ -89,6 +105,169 @@  static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
 	return ret;
 }
 
+static int try_init_module_global(void)
+{
+	unsigned long flags;
+	int ret;
+
+	/*
+	 * The TDX module global initialization only needs to be done
+	 * once on any cpu.
+	 */
+	raw_spin_lock_irqsave(&tdx_global_init_lock, flags);
+
+	if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) {
+		ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ?
+			-EINVAL : 0;
+		goto out;
+	}
+
+	/* All '0's are just unused parameters. */
+	ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL);
+
+	tdx_global_init_status = TDX_GLOBAL_INIT_DONE;
+	if (ret)
+		tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED;
+out:
+	raw_spin_unlock_irqrestore(&tdx_global_init_lock, flags);
+
+	return ret;
+}
+
+/**
+ * tdx_cpu_enable - Enable TDX on local cpu
+ *
+ * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module
+ * global initialization SEAMCALL if not done) on local cpu to make this
+ * cpu be ready to run any other SEAMCALLs.
+ *
+ * Note this function must be called when preemption is not possible
+ * (i.e. via SMP call or in per-cpu thread).  It is not IRQ safe either
+ * (i.e. cannot be called in per-cpu thread and via SMP call from remote
+ * cpu simultaneously).
+ *
+ * Return 0 on success, otherwise errors.
+ */
+int tdx_cpu_enable(void)
+{
+	unsigned int lp_status;
+	int ret;
+
+	if (!platform_tdx_enabled())
+		return -EINVAL;
+
+	lp_status = __this_cpu_read(tdx_lp_init_status);
+
+	/* Already done */
+	if (lp_status & TDX_LP_INIT_DONE)
+		return lp_status & TDX_LP_INIT_FAILED ? -EINVAL : 0;
+
+	/*
+	 * The TDX module global initialization is the very first step
+	 * to enable TDX.  Need to do it first (if hasn't been done)
+	 * before doing the per-cpu initialization.
+	 */
+	ret = try_init_module_global();
+
+	/*
+	 * If the module global initialization failed, there's no point
+	 * to do the per-cpu initialization.  Just mark it as done but
+	 * failed.
+	 */
+	if (ret)
+		goto update_status;
+
+	/* All '0's are just unused parameters */
+	ret = seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL);
+
+update_status:
+	lp_status = TDX_LP_INIT_DONE;
+	if (ret)
+		lp_status |= TDX_LP_INIT_FAILED;
+
+	this_cpu_write(tdx_lp_init_status, lp_status);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_cpu_enable);
+
+static int init_tdx_module(void)
+{
+	/*
+	 * TODO:
+	 *
+	 *  - Get TDX module information and TDX-capable memory regions.
+	 *  - Build the list of TDX-usable memory regions.
+	 *  - Construct a list of "TD Memory Regions" (TDMRs) to cover
+	 *    all TDX-usable memory regions.
+	 *  - Configure the TDMRs and the global KeyID to the TDX module.
+	 *  - Configure the global KeyID on all packages.
+	 *  - Initialize all TDMRs.
+	 *
+	 *  Return error before all steps are done.
+	 */
+	return -EINVAL;
+}
+
+static int __tdx_enable(void)
+{
+	int ret;
+
+	ret = init_tdx_module();
+	if (ret) {
+		pr_err("TDX module initialization failed (%d)\n", ret);
+		tdx_module_status = TDX_MODULE_ERROR;
+		return ret;
+	}
+
+	pr_info("TDX module initialized.\n");
+	tdx_module_status = TDX_MODULE_INITIALIZED;
+
+	return 0;
+}
+
+/**
+ * tdx_enable - Enable TDX module to make it ready to run TDX guests
+ *
+ * This function assumes the caller has: 1) held read lock of CPU hotplug
+ * lock to prevent any new cpu from becoming online; 2) done both VMXON
+ * and tdx_cpu_enable() on all online cpus.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return 0 if TDX is enabled successfully, otherwise error.
+ */
+int tdx_enable(void)
+{
+	int ret;
+
+	if (!platform_tdx_enabled())
+		return -ENODEV;
+
+	lockdep_assert_cpus_held();
+
+	mutex_lock(&tdx_module_lock);
+
+	switch (tdx_module_status) {
+	case TDX_MODULE_UNKNOWN:
+		ret = __tdx_enable();
+		break;
+	case TDX_MODULE_INITIALIZED:
+		/* Already initialized, great, tell the caller. */
+		ret = 0;
+		break;
+	default:
+		/* Failed to initialize in the previous attempts */
+		ret = -EINVAL;
+		break;
+	}
+
+	mutex_unlock(&tdx_module_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_enable);
+
 static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
 					    u32 *nr_tdx_keyids)
 {
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 55dbb1b8c971..9fb46033c852 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -16,11 +16,24 @@ 
  */
 #define TDX_RND_NO_ENTROPY	0x8000020300000000ULL
 
+/*
+ * TDX module SEAMCALL leaf functions
+ */
+#define TDH_SYS_INIT		33
+#define TDH_SYS_LP_INIT		35
+
 /*
  * Do not put any hardware-defined TDX structure representations below
  * this comment!
  */
 
+/* Kernel defined TDX module status during module initialization. */
+enum tdx_module_status_t {
+	TDX_MODULE_UNKNOWN,
+	TDX_MODULE_INITIALIZED,
+	TDX_MODULE_ERROR
+};
+
 struct tdx_module_output;
 u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
 	       struct tdx_module_output *out);