diff mbox series

[v3,04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand

Message ID 32dcf4c7acc95244a391458d79cd6907125c5c29.1649219184.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series TDX host kernel support | expand

Commit Message

Kai Huang April 6, 2022, 4:49 a.m. UTC
The TDX module is essentially a CPU-attested software module running
in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
host and certain physical attacks.  The TDX module implements the
functions to build, tear down and start execution of the protected VMs
called Trusted Domains (TD).  Before the TDX module can be used to
create and run TD guests, it must be loaded into the SEAM Range Register
(SEAMRR) and properly initialized.  The TDX module is expected to be
loaded by BIOS before booting to the kernel, and the kernel is expected
to detect and initialize it, using the SEAMCALLs defined by TDX
architecture.

The TDX module can be initialized only once in its lifetime.  Instead
of always initializing it at boot time, this implementation chooses an
on-demand approach to initialize TDX until there is a real need (e.g
when requested by KVM).  This avoids consuming the memory that must be
allocated by kernel and given to the TDX module as metadata (~1/256th of
the TDX-usable memory), and also saves the time of initializing the TDX
module (and the metadata) when TDX is not used at all.  Initializing the
TDX module at runtime on-demand also is more flexible to support TDX
module runtime updating in the future (after updating the TDX module, it
needs to be initialized again).

Introduce two placeholders tdx_detect() and tdx_init() to detect and
initialize the TDX module on demand, with a state machine introduced to
orchestrate the entire process (in case of multiple callers).

To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs.  The
TDX module is reported as not loaded if either SEAMRR is not enabled, or
there are no enough TDX private KeyIDs to create any TD guest.  The TDX
module itself requires one global TDX private KeyID to crypto protect
its metadata.

And tdx_init() is currently empty.  The TDX module will be initialized
in multi-steps defined by the TDX architecture:

  1) Global initialization;
  2) Logical-CPU scope initialization;
  3) Enumerate the TDX module capabilities and platform configuration;
  4) Configure the TDX module about usable memory ranges and global
     KeyID information;
  5) Package-scope configuration for the global KeyID;
  6) Initialize usable memory ranges based on 4).

The TDX module can also be shut down at any time during its lifetime.
In case of any error during the initialization process, shut down the
module.  It's pointless to leave the module in any intermediate state
during the initialization.

SEAMCALL requires SEAMRR being enabled and CPU being already in VMX
operation (VMXON has been done), otherwise it generates #UD.  So far
only KVM handles VMXON/VMXOFF.  Choose to not handle VMXON/VMXOFF in
tdx_detect() and tdx_init() but depend on the caller to guarantee that,
since so far KVM is the only user of TDX.  In the long term, more kernel
components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX
module runtime update), so a reference-based approach to do VMXON/VMXOFF
is likely needed.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/tdx.h  |   4 +
 arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++
 2 files changed, 226 insertions(+)

Comments

Kuppuswamy Sathyanarayanan April 19, 2022, 2:53 p.m. UTC | #1
On 4/5/22 9:49 PM, Kai Huang wrote:
> The TDX module is essentially a CPU-attested software module running
> in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
> host and certain physical attacks.  The TDX module implements the

/s/host/hosts

> functions to build, tear down and start execution of the protected VMs
> called Trusted Domains (TD).  Before the TDX module can be used to

/s/Trusted/Trust

> create and run TD guests, it must be loaded into the SEAM Range Register
> (SEAMRR) and properly initialized.  The TDX module is expected to be
> loaded by BIOS before booting to the kernel, and the kernel is expected
> to detect and initialize it, using the SEAMCALLs defined by TDX
> architecture.
> 
> The TDX module can be initialized only once in its lifetime.  Instead
> of always initializing it at boot time, this implementation chooses an
> on-demand approach to initialize TDX until there is a real need (e.g
> when requested by KVM).  This avoids consuming the memory that must be
> allocated by kernel and given to the TDX module as metadata (~1/256th of

allocated by the kernel

> the TDX-usable memory), and also saves the time of initializing the TDX
> module (and the metadata) when TDX is not used at all.  Initializing the
> TDX module at runtime on-demand also is more flexible to support TDX
> module runtime updating in the future (after updating the TDX module, it
> needs to be initialized again).
> 
> Introduce two placeholders tdx_detect() and tdx_init() to detect and
> initialize the TDX module on demand, with a state machine introduced to
> orchestrate the entire process (in case of multiple callers).
> 
> To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs.  The
> TDX module is reported as not loaded if either SEAMRR is not enabled, or
> there are no enough TDX private KeyIDs to create any TD guest.  The TDX
> module itself requires one global TDX private KeyID to crypto protect
> its metadata.
> 
> And tdx_init() is currently empty.  The TDX module will be initialized
> in multi-steps defined by the TDX architecture:
> 
>    1) Global initialization;
>    2) Logical-CPU scope initialization;
>    3) Enumerate the TDX module capabilities and platform configuration;
>    4) Configure the TDX module about usable memory ranges and global
>       KeyID information;
>    5) Package-scope configuration for the global KeyID;
>    6) Initialize usable memory ranges based on 4).
> 
> The TDX module can also be shut down at any time during its lifetime.
> In case of any error during the initialization process, shut down the
> module.  It's pointless to leave the module in any intermediate state
> during the initialization.
> 
> SEAMCALL requires SEAMRR being enabled and CPU being already in VMX
> operation (VMXON has been done), otherwise it generates #UD.  So far
> only KVM handles VMXON/VMXOFF.  Choose to not handle VMXON/VMXOFF in
> tdx_detect() and tdx_init() but depend on the caller to guarantee that,
> since so far KVM is the only user of TDX.  In the long term, more kernel
> components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX
> module runtime update), so a reference-based approach to do VMXON/VMXOFF
> is likely needed.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>   arch/x86/include/asm/tdx.h  |   4 +
>   arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 226 insertions(+)
> 
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 1f29813b1646..c8af2ba6bb8a 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -92,8 +92,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>   
>   #ifdef CONFIG_INTEL_TDX_HOST
>   void tdx_detect_cpu(struct cpuinfo_x86 *c);
> +int tdx_detect(void);
> +int tdx_init(void);
>   #else
>   static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
> +static inline int tdx_detect(void) { return -ENODEV; }
> +static inline int tdx_init(void) { return -ENODEV; }
>   #endif /* CONFIG_INTEL_TDX_HOST */
>   
>   #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index ba2210001ea8..53093d4ad458 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -9,6 +9,8 @@
>   
>   #include <linux/types.h>
>   #include <linux/cpumask.h>
> +#include <linux/mutex.h>
> +#include <linux/cpu.h>
>   #include <asm/msr-index.h>
>   #include <asm/msr.h>
>   #include <asm/cpufeature.h>
> @@ -45,12 +47,33 @@
>   		((u32)(((_keyid_part) & 0xffffffffull) + 1))
>   #define TDX_KEYID_NUM(_keyid_part)	((u32)((_keyid_part) >> 32))
>   
> +/*
> + * TDX module status during initialization
> + */
> +enum tdx_module_status_t {
> +	/* TDX module status is unknown */
> +	TDX_MODULE_UNKNOWN,
> +	/* TDX module is not loaded */
> +	TDX_MODULE_NONE,
> +	/* TDX module is loaded, but not initialized */
> +	TDX_MODULE_LOADED,
> +	/* TDX module is fully initialized */
> +	TDX_MODULE_INITIALIZED,
> +	/* TDX module is shutdown due to error during initialization */
> +	TDX_MODULE_SHUTDOWN,
> +};
> +

May be adding these states when you really need will make
more sense. Currently this patch only uses SHUTDOWN and
NONE states. Other state usage is not very clear.

>   /* BIOS must configure SEAMRR registers for all cores consistently */
>   static u64 seamrr_base, seamrr_mask;
>   
>   static u32 tdx_keyid_start;
>   static u32 tdx_keyid_num;
>   
> +static enum tdx_module_status_t tdx_module_status;
> +
> +/* Prevent concurrent attempts on TDX detection and initialization */
> +static DEFINE_MUTEX(tdx_module_lock);

Any possible concurrent usage models?

> +
>   static bool __seamrr_enabled(void)
>   {
>   	return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS;
> @@ -172,3 +195,202 @@ void tdx_detect_cpu(struct cpuinfo_x86 *c)
>   	detect_seam(c);
>   	detect_tdx_keyids(c);
>   }
> +
> +static bool seamrr_enabled(void)
> +{
> +	/*
> +	 * To detect any BIOS misconfiguration among cores, all logical
> +	 * cpus must have been brought up at least once.  This is true
> +	 * unless 'maxcpus' kernel command line is used to limit the
> +	 * number of cpus to be brought up during boot time.  However
> +	 * 'maxcpus' is basically an invalid operation mode due to the
> +	 * MCE broadcast problem, and it should not be used on a TDX
> +	 * capable machine.  Just do paranoid check here and do not

a paranoid check

> +	 * report SEAMRR as enabled in this case.
> +	 */
> +	if (!cpumask_equal(&cpus_booted_once_mask,
> +					cpu_present_mask))
> +		return false;
> +
> +	return __seamrr_enabled();
> +}
> +
> +static bool tdx_keyid_sufficient(void)
> +{
> +	if (!cpumask_equal(&cpus_booted_once_mask,
> +					cpu_present_mask))
> +		return false;
> +
> +	/*
> +	 * TDX requires at least two KeyIDs: one global KeyID to
> +	 * protect the metadata of the TDX module and one or more
> +	 * KeyIDs to run TD guests.
> +	 */
> +	return tdx_keyid_num >= 2;
> +}
> +
> +static int __tdx_detect(void)
> +{
> +	/* The TDX module is not loaded if SEAMRR is disabled */
> +	if (!seamrr_enabled()) {
> +		pr_info("SEAMRR not enabled.\n");
> +		goto no_tdx_module;
> +	}
> +
> +	/*
> +	 * Also do not report the TDX module as loaded if there's
> +	 * no enough TDX private KeyIDs to run any TD guests.
> +	 */

You are not returning TDX_MODULE_LOADED under any current
scenarios. So think above comment is not accurate.

> +	if (!tdx_keyid_sufficient()) {
> +		pr_info("Number of TDX private KeyIDs too small: %u.\n",
> +				tdx_keyid_num);
> +		goto no_tdx_module;
> +	}
> +
> +	/* Return -ENODEV until the TDX module is detected */
> +no_tdx_module:
> +	tdx_module_status = TDX_MODULE_NONE;
> +	return -ENODEV;
> +}
> +
> +static int init_tdx_module(void)
> +{
> +	/*
> +	 * Return -EFAULT until all steps of TDX module
> +	 * initialization are done.
> +	 */
> +	return -EFAULT;
> +}
> +
> +static void shutdown_tdx_module(void)
> +{
> +	/* TODO: Shut down the TDX module */
> +	tdx_module_status = TDX_MODULE_SHUTDOWN;
> +}
> +
> +static int __tdx_init(void)
> +{
> +	int ret;
> +
> +	/*
> +	 * Logical-cpu scope initialization requires calling one SEAMCALL
> +	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
> +	 * module also has such requirement.  Further more, configuring

such a requirement

> +	 * the key of the global KeyID requires calling one SEAMCALL for
> +	 * each package.  For simplicity, disable CPU hotplug in the whole
> +	 * initialization process.
> +	 *
> +	 * It's perhaps better to check whether all BIOS-enabled cpus are
> +	 * online before starting initializing, and return early if not.
> +	 * But none of 'possible', 'present' and 'online' CPU masks
> +	 * represents BIOS-enabled cpus.  For example, 'possible' mask is
> +	 * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
> +	 * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
> +	 * online.
> +	 */
> +	cpus_read_lock();
> +
> +	ret = init_tdx_module();
> +
> +	/*
> +	 * Shut down the TDX module in case of any error during the
> +	 * initialization process.  It's meaningless to leave the TDX
> +	 * module in any middle state of the initialization process.
> +	 */
> +	if (ret)
> +		shutdown_tdx_module();
> +
> +	cpus_read_unlock();
> +
> +	return ret;
> +}
> +
> +/**
> + * tdx_detect - Detect whether the TDX module has been loaded
> + *
> + * Detect whether the TDX module has been loaded and ready for
> + * initialization.  Only call this function when all cpus are
> + * already in VMX operation.
> + *
> + * This function can be called in parallel by multiple callers.
> + *
> + * Return:
> + *
> + * * -0:	The TDX module has been loaded and ready for
> + *		initialization.
> + * * -ENODEV:	The TDX module is not loaded.
> + * * -EPERM:	CPU is not in VMX operation.
> + * * -EFAULT:	Other internal fatal errors.
> + */
> +int tdx_detect(void)

Will this function be used separately or always along with
tdx_init()?

> +{
> +	int ret;
> +
> +	mutex_lock(&tdx_module_lock);
> +
> +	switch (tdx_module_status) {
> +	case TDX_MODULE_UNKNOWN:
> +		ret = __tdx_detect();
> +		break;
> +	case TDX_MODULE_NONE:
> +		ret = -ENODEV;
> +		break;
> +	case TDX_MODULE_LOADED:
> +	case TDX_MODULE_INITIALIZED:
> +		ret = 0;
> +		break;
> +	case TDX_MODULE_SHUTDOWN:
> +		ret = -EFAULT;
> +		break;
> +	default:
> +		WARN_ON(1);
> +		ret = -EFAULT;
> +	}
> +
> +	mutex_unlock(&tdx_module_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_detect);
> +
> +/**
> + * tdx_init - Initialize the TDX module

If it for tdx module initialization, why not call it
tdx_module_init()? If not, update the description
appropriately.

> + *
> + * Initialize the TDX module to make it ready to run TD guests.  This
> + * function should be called after tdx_detect() returns successful.
> + * Only call this function when all cpus are online and are in VMX
> + * operation.  CPU hotplug is temporarily disabled internally.
> + *
> + * This function can be called in parallel by multiple callers.
> + *
> + * Return:
> + *
> + * * -0:	The TDX module has been successfully initialized.
> + * * -ENODEV:	The TDX module is not loaded.
> + * * -EPERM:	The CPU which does SEAMCALL is not in VMX operation.
> + * * -EFAULT:	Other internal fatal errors.
> + */

You return differnt error values just for debug prints or there are
other uses for it?

> +int tdx_init(void)
> +{
> +	int ret;
> +
> +	mutex_lock(&tdx_module_lock);
> +
> +	switch (tdx_module_status) {
> +	case TDX_MODULE_NONE:
> +		ret = -ENODEV;
> +		break;
> +	case TDX_MODULE_LOADED:

> +		ret = __tdx_init();
> +		break;
> +	case TDX_MODULE_INITIALIZED:
> +		ret = 0;
> +		break;
> +	default:
> +		ret = -EFAULT;
> +		break;
> +	}
> +	mutex_unlock(&tdx_module_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_init);
Kai Huang April 20, 2022, 4:37 a.m. UTC | #2
On Tue, 2022-04-19 at 07:53 -0700, Sathyanarayanan Kuppuswamy wrote:
> 
> On 4/5/22 9:49 PM, Kai Huang wrote:
> > The TDX module is essentially a CPU-attested software module running
> > in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
> > host and certain physical attacks.  The TDX module implements the
> 
> /s/host/hosts

I don't quite get.  Could you explain why there are multiple hosts?

> 
> > functions to build, tear down and start execution of the protected VMs
> > called Trusted Domains (TD).  Before the TDX module can be used to
> 
> /s/Trusted/Trust

Thanks.

> 
> > create and run TD guests, it must be loaded into the SEAM Range Register
> > (SEAMRR) and properly initialized.  The TDX module is expected to be
> > loaded by BIOS before booting to the kernel, and the kernel is expected
> > to detect and initialize it, using the SEAMCALLs defined by TDX
> > architecture.
> > 
> > The TDX module can be initialized only once in its lifetime.  Instead
> > of always initializing it at boot time, this implementation chooses an
> > on-demand approach to initialize TDX until there is a real need (e.g
> > when requested by KVM).  This avoids consuming the memory that must be
> > allocated by kernel and given to the TDX module as metadata (~1/256th of
> 
> allocated by the kernel

Ok.

> 
> > the TDX-usable memory), and also saves the time of initializing the TDX
> > module (and the metadata) when TDX is not used at all.  Initializing the
> > TDX module at runtime on-demand also is more flexible to support TDX
> > module runtime updating in the future (after updating the TDX module, it
> > needs to be initialized again).
> > 
> > Introduce two placeholders tdx_detect() and tdx_init() to detect and
> > initialize the TDX module on demand, with a state machine introduced to
> > orchestrate the entire process (in case of multiple callers).
> > 
> > To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs.  The
> > TDX module is reported as not loaded if either SEAMRR is not enabled, or
> > there are no enough TDX private KeyIDs to create any TD guest.  The TDX
> > module itself requires one global TDX private KeyID to crypto protect
> > its metadata.
> > 
> > And tdx_init() is currently empty.  The TDX module will be initialized
> > in multi-steps defined by the TDX architecture:
> > 
> >    1) Global initialization;
> >    2) Logical-CPU scope initialization;
> >    3) Enumerate the TDX module capabilities and platform configuration;
> >    4) Configure the TDX module about usable memory ranges and global
> >       KeyID information;
> >    5) Package-scope configuration for the global KeyID;
> >    6) Initialize usable memory ranges based on 4).
> > 
> > The TDX module can also be shut down at any time during its lifetime.
> > In case of any error during the initialization process, shut down the
> > module.  It's pointless to leave the module in any intermediate state
> > during the initialization.
> > 
> > SEAMCALL requires SEAMRR being enabled and CPU being already in VMX
> > operation (VMXON has been done), otherwise it generates #UD.  So far
> > only KVM handles VMXON/VMXOFF.  Choose to not handle VMXON/VMXOFF in
> > tdx_detect() and tdx_init() but depend on the caller to guarantee that,
> > since so far KVM is the only user of TDX.  In the long term, more kernel
> > components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX
> > module runtime update), so a reference-based approach to do VMXON/VMXOFF
> > is likely needed.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >   arch/x86/include/asm/tdx.h  |   4 +
> >   arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++
> >   2 files changed, 226 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> > index 1f29813b1646..c8af2ba6bb8a 100644
> > --- a/arch/x86/include/asm/tdx.h
> > +++ b/arch/x86/include/asm/tdx.h
> > @@ -92,8 +92,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> >   
> >   #ifdef CONFIG_INTEL_TDX_HOST
> >   void tdx_detect_cpu(struct cpuinfo_x86 *c);
> > +int tdx_detect(void);
> > +int tdx_init(void);
> >   #else
> >   static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
> > +static inline int tdx_detect(void) { return -ENODEV; }
> > +static inline int tdx_init(void) { return -ENODEV; }
> >   #endif /* CONFIG_INTEL_TDX_HOST */
> >   
> >   #endif /* !__ASSEMBLY__ */
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index ba2210001ea8..53093d4ad458 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -9,6 +9,8 @@
> >   
> >   #include <linux/types.h>
> >   #include <linux/cpumask.h>
> > +#include <linux/mutex.h>
> > +#include <linux/cpu.h>
> >   #include <asm/msr-index.h>
> >   #include <asm/msr.h>
> >   #include <asm/cpufeature.h>
> > @@ -45,12 +47,33 @@
> >   		((u32)(((_keyid_part) & 0xffffffffull) + 1))
> >   #define TDX_KEYID_NUM(_keyid_part)	((u32)((_keyid_part) >> 32))
> >   
> > +/*
> > + * TDX module status during initialization
> > + */
> > +enum tdx_module_status_t {
> > +	/* TDX module status is unknown */
> > +	TDX_MODULE_UNKNOWN,
> > +	/* TDX module is not loaded */
> > +	TDX_MODULE_NONE,
> > +	/* TDX module is loaded, but not initialized */
> > +	TDX_MODULE_LOADED,
> > +	/* TDX module is fully initialized */
> > +	TDX_MODULE_INITIALIZED,
> > +	/* TDX module is shutdown due to error during initialization */
> > +	TDX_MODULE_SHUTDOWN,
> > +};
> > +
> 
> May be adding these states when you really need will make
> more sense. Currently this patch only uses SHUTDOWN and
> NONE states. Other state usage is not very clear.

They are all used in tdx_detect() and tdx_init(), no?

> 
> >   /* BIOS must configure SEAMRR registers for all cores consistently */
> >   static u64 seamrr_base, seamrr_mask;
> >   
> >   static u32 tdx_keyid_start;
> >   static u32 tdx_keyid_num;
> >   
> > +static enum tdx_module_status_t tdx_module_status;
> > +
> > +/* Prevent concurrent attempts on TDX detection and initialization */
> > +static DEFINE_MUTEX(tdx_module_lock);
> 
> Any possible concurrent usage models?

tdx_detect() and tdx_init() are called on demand by callers, so it's possible
multiple callers can call into them concurrently.

> 
> > +
> >   static bool __seamrr_enabled(void)
> >   {
> >   	return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS;
> > @@ -172,3 +195,202 @@ void tdx_detect_cpu(struct cpuinfo_x86 *c)
> >   	detect_seam(c);
> >   	detect_tdx_keyids(c);
> >   }
> > +
> > +static bool seamrr_enabled(void)
> > +{
> > +	/*
> > +	 * To detect any BIOS misconfiguration among cores, all logical
> > +	 * cpus must have been brought up at least once.  This is true
> > +	 * unless 'maxcpus' kernel command line is used to limit the
> > +	 * number of cpus to be brought up during boot time.  However
> > +	 * 'maxcpus' is basically an invalid operation mode due to the
> > +	 * MCE broadcast problem, and it should not be used on a TDX
> > +	 * capable machine.  Just do paranoid check here and do not
> 
> a paranoid check

Ok.

> 
> > +	 * report SEAMRR as enabled in this case.
> > +	 */
> > +	if (!cpumask_equal(&cpus_booted_once_mask,
> > +					cpu_present_mask))
> > +		return false;
> > +
> > +	return __seamrr_enabled();
> > +}
> > +
> > +static bool tdx_keyid_sufficient(void)
> > +{
> > +	if (!cpumask_equal(&cpus_booted_once_mask,
> > +					cpu_present_mask))
> > +		return false;
> > +
> > +	/*
> > +	 * TDX requires at least two KeyIDs: one global KeyID to
> > +	 * protect the metadata of the TDX module and one or more
> > +	 * KeyIDs to run TD guests.
> > +	 */
> > +	return tdx_keyid_num >= 2;
> > +}
> > +
> > +static int __tdx_detect(void)
> > +{
> > +	/* The TDX module is not loaded if SEAMRR is disabled */
> > +	if (!seamrr_enabled()) {
> > +		pr_info("SEAMRR not enabled.\n");
> > +		goto no_tdx_module;
> > +	}
> > +
> > +	/*
> > +	 * Also do not report the TDX module as loaded if there's
> > +	 * no enough TDX private KeyIDs to run any TD guests.
> > +	 */
> 
> You are not returning TDX_MODULE_LOADED under any current
> scenarios. So think above comment is not accurate.

This comment is to explain the logic behind of below TDX KeyID check.  I don't
see how is it related to your comments?

This patch is pretty much a placeholder to express the idea of how are
tdx_detect() and tdx_init() going to be implemented.  In below after the
tdx_keyid_sufficient() check, I also have a comment to explain the module hasn't
been detected yet which means there will be code to detect the module here, and
at that time, logically this function will return TDX_MODULE_LOADED.  I don't
see this is hard to understand?

> 
> > +	if (!tdx_keyid_sufficient()) {
> > +		pr_info("Number of TDX private KeyIDs too small: %u.\n",
> > +				tdx_keyid_num);
> > +		goto no_tdx_module;
> > +	}
> > +
> > +	/* Return -ENODEV until the TDX module is detected */
> > +no_tdx_module:
> > +	tdx_module_status = TDX_MODULE_NONE;
> > +	return -ENODEV;
> > +}
> > +
> > +static int init_tdx_module(void)
> > +{
> > +	/*
> > +	 * Return -EFAULT until all steps of TDX module
> > +	 * initialization are done.
> > +	 */
> > +	return -EFAULT;
> > +}
> > +
> > +static void shutdown_tdx_module(void)
> > +{
> > +	/* TODO: Shut down the TDX module */
> > +	tdx_module_status = TDX_MODULE_SHUTDOWN;
> > +}
> > +
> > +static int __tdx_init(void)
> > +{
> > +	int ret;
> > +
> > +	/*
> > +	 * Logical-cpu scope initialization requires calling one SEAMCALL
> > +	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
> > +	 * module also has such requirement.  Further more, configuring
> 
> such a requirement

Thanks.

> 
> > +	 * the key of the global KeyID requires calling one SEAMCALL for
> > +	 * each package.  For simplicity, disable CPU hotplug in the whole
> > +	 * initialization process.
> > +	 *
> > +	 * It's perhaps better to check whether all BIOS-enabled cpus are
> > +	 * online before starting initializing, and return early if not.
> > +	 * But none of 'possible', 'present' and 'online' CPU masks
> > +	 * represents BIOS-enabled cpus.  For example, 'possible' mask is
> > +	 * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
> > +	 * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
> > +	 * online.
> > +	 */
> > +	cpus_read_lock();
> > +
> > +	ret = init_tdx_module();
> > +
> > +	/*
> > +	 * Shut down the TDX module in case of any error during the
> > +	 * initialization process.  It's meaningless to leave the TDX
> > +	 * module in any middle state of the initialization process.
> > +	 */
> > +	if (ret)
> > +		shutdown_tdx_module();
> > +
> > +	cpus_read_unlock();
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * tdx_detect - Detect whether the TDX module has been loaded
> > + *
> > + * Detect whether the TDX module has been loaded and ready for
> > + * initialization.  Only call this function when all cpus are
> > + * already in VMX operation.
> > + *
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * -0:	The TDX module has been loaded and ready for
> > + *		initialization.
> > + * * -ENODEV:	The TDX module is not loaded.
> > + * * -EPERM:	CPU is not in VMX operation.
> > + * * -EFAULT:	Other internal fatal errors.
> > + */
> > +int tdx_detect(void)
> 
> Will this function be used separately or always along with
> tdx_init()?

The caller should first use tdx_detect() and then use tdx_init().  If caller
only uses tdx_detect(), then TDX module won't be initialized (unless other
caller does this).  If caller calls tdx_init() before tdx_detect(),  it will get
error.

> 
> > +{
> > +	int ret;
> > +
> > +	mutex_lock(&tdx_module_lock);
> > +
> > +	switch (tdx_module_status) {
> > +	case TDX_MODULE_UNKNOWN:
> > +		ret = __tdx_detect();
> > +		break;
> > +	case TDX_MODULE_NONE:
> > +		ret = -ENODEV;
> > +		break;
> > +	case TDX_MODULE_LOADED:
> > +	case TDX_MODULE_INITIALIZED:
> > +		ret = 0;
> > +		break;
> > +	case TDX_MODULE_SHUTDOWN:
> > +		ret = -EFAULT;
> > +		break;
> > +	default:
> > +		WARN_ON(1);
> > +		ret = -EFAULT;
> > +	}
> > +
> > +	mutex_unlock(&tdx_module_lock);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_detect);
> > +
> > +/**
> > + * tdx_init - Initialize the TDX module
> 
> If it for tdx module initialization, why not call it
> tdx_module_init()? If not, update the description
> appropriately.

Besides do the actual module initialization, it also has a state machine.

But point taken, and I'll try to refine the description.  Thanks.

> 
> > + *
> > + * Initialize the TDX module to make it ready to run TD guests.  This
> > + * function should be called after tdx_detect() returns successful.
> > + * Only call this function when all cpus are online and are in VMX
> > + * operation.  CPU hotplug is temporarily disabled internally.
> > + *
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * -0:	The TDX module has been successfully initialized.
> > + * * -ENODEV:	The TDX module is not loaded.
> > + * * -EPERM:	The CPU which does SEAMCALL is not in VMX operation.
> > + * * -EFAULT:	Other internal fatal errors.
> > + */
> 
> You return differnt error values just for debug prints or there are
> other uses for it?

Caller can distinguish them and act differently.  Even w/o any purpose, I think
it's better to return different error codes to reflect different error reasons.
Dave Hansen April 20, 2022, 5:21 a.m. UTC | #3
On 4/19/22 21:37, Kai Huang wrote:
> On Tue, 2022-04-19 at 07:53 -0700, Sathyanarayanan Kuppuswamy wrote:
>> On 4/5/22 9:49 PM, Kai Huang wrote:
>>> The TDX module is essentially a CPU-attested software module running
>>> in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
>>> host and certain physical attacks.  The TDX module implements the
>> /s/host/hosts
> I don't quite get.  Could you explain why there are multiple hosts?

This one is an arbitrary language tweak.  This:

	to protect VMs from malicious host and certain physical attacks.

could also be written:

	to protect VMs from malicious host attacks and certain physical
	attacks.

But, it's somewhat more compact to do what was writen.  I agree that the
language is a bit clumsy and could be cleaned up, but just doing
s/host/hosts/ doesn't really improve anything.
Kuppuswamy Sathyanarayanan April 20, 2022, 2:30 p.m. UTC | #4
On 4/19/22 9:37 PM, Kai Huang wrote:
> On Tue, 2022-04-19 at 07:53 -0700, Sathyanarayanan Kuppuswamy wrote:
>>
>> On 4/5/22 9:49 PM, Kai Huang wrote:
>>> The TDX module is essentially a CPU-attested software module running
>>> in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
>>> host and certain physical attacks.  The TDX module implements the
>>
>> /s/host/hosts
> 
> I don't quite get.  Could you explain why there are multiple hosts?

Sorry, I misread it. It is correct, so ignore it.

> 
>>

>>> +
>>> +/**
>>> + * tdx_detect - Detect whether the TDX module has been loaded
>>> + *
>>> + * Detect whether the TDX module has been loaded and ready for
>>> + * initialization.  Only call this function when all cpus are
>>> + * already in VMX operation.
>>> + *
>>> + * This function can be called in parallel by multiple callers.
>>> + *
>>> + * Return:
>>> + *
>>> + * * -0:	The TDX module has been loaded and ready for
>>> + *		initialization.
>>> + * * -ENODEV:	The TDX module is not loaded.
>>> + * * -EPERM:	CPU is not in VMX operation.
>>> + * * -EFAULT:	Other internal fatal errors.
>>> + */
>>> +int tdx_detect(void)
>>
>> Will this function be used separately or always along with
>> tdx_init()?
> 
> The caller should first use tdx_detect() and then use tdx_init().  If caller
> only uses tdx_detect(), then TDX module won't be initialized (unless other
> caller does this).  If caller calls tdx_init() before tdx_detect(),  it will get
> error.
> 

I just checked your patch set to understand where you are using
tdx_detect()/tdx_init(). But I did not find any callers. Did I miss it? 
or it is not used in your patch set?
Kai Huang April 20, 2022, 10:35 p.m. UTC | #5
> 
> > > > +
> > > > +/**
> > > > + * tdx_detect - Detect whether the TDX module has been loaded
> > > > + *
> > > > + * Detect whether the TDX module has been loaded and ready for
> > > > + * initialization.  Only call this function when all cpus are
> > > > + * already in VMX operation.
> > > > + *
> > > > + * This function can be called in parallel by multiple callers.
> > > > + *
> > > > + * Return:
> > > > + *
> > > > + * * -0:	The TDX module has been loaded and ready for
> > > > + *		initialization.
> > > > + * * -ENODEV:	The TDX module is not loaded.
> > > > + * * -EPERM:	CPU is not in VMX operation.
> > > > + * * -EFAULT:	Other internal fatal errors.
> > > > + */
> > > > +int tdx_detect(void)
> > > 
> > > Will this function be used separately or always along with
> > > tdx_init()?
> > 
> > The caller should first use tdx_detect() and then use tdx_init().  If caller
> > only uses tdx_detect(), then TDX module won't be initialized (unless other
> > caller does this).  If caller calls tdx_init() before tdx_detect(),  it will get
> > error.
> > 
> 
> I just checked your patch set to understand where you are using
> tdx_detect()/tdx_init(). But I did not find any callers. Did I miss it? 
> or it is not used in your patch set?
> 

No you didn't.  They are not called in this series.  KVM series which is under
upstream process by Isaku will call them.  Dave once said w/o caller is fine as
for this particular case people know KVM is going to use them.  In cover letter
I also mentioned KVM support is under development by another series.  Next
version in cover letter, I'll explicitly call out this series doesn't have
caller of them but depends on KVM to call them.
Dave Hansen April 26, 2022, 8:53 p.m. UTC | #6
On 4/5/22 21:49, Kai Huang wrote:
> The TDX module is essentially a CPU-attested software module running
> in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
> host and certain physical attacks.  The TDX module implements the
> functions to build, tear down and start execution of the protected VMs
> called Trusted Domains (TD).  Before the TDX module can be used to
> create and run TD guests, it must be loaded into the SEAM Range Register
> (SEAMRR) and properly initialized.

The module isn't loaded into a register, right?

It's loaded into a memory area pointed to *by* the register.

>  The TDX module is expected to be
> loaded by BIOS before booting to the kernel, and the kernel is expected
> to detect and initialize it, using the SEAMCALLs defined by TDX
> architecture.

Wait a sec...  So, what was all this gobleygook about TDX module loading
and SEAMRR's if the kernel just has the TDX module *handed* to it
already loaded?

It looks to me like you wrote all of this before the TDX module was
being loaded by the BIOS and neglected to go and update these changelogs.

> The TDX module can be initialized only once in its lifetime.  Instead
> of always initializing it at boot time, this implementation chooses an
> on-demand approach to initialize TDX until there is a real need (e.g
> when requested by KVM).  This avoids consuming the memory that must be
> allocated by kernel and given to the TDX module as metadata (~1/256th of
> the TDX-usable memory), and also saves the time of initializing the TDX
> module (and the metadata) when TDX is not used at all.  Initializing the
> TDX module at runtime on-demand also is more flexible to support TDX
> module runtime updating in the future (after updating the TDX module, it
> needs to be initialized again).
> 
> Introduce two placeholders tdx_detect() and tdx_init() to detect and
> initialize the TDX module on demand, with a state machine introduced to
> orchestrate the entire process (in case of multiple callers).
> 
> To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs.  The
> TDX module is reported as not loaded if either SEAMRR is not enabled, or
> there are no enough TDX private KeyIDs to create any TD guest.  The TDX
> module itself requires one global TDX private KeyID to crypto protect
> its metadata.

This is stepping over the line into telling me what the code does
instead of why.

> And tdx_init() is currently empty.  The TDX module will be initialized
> in multi-steps defined by the TDX architecture:
> 
>   1) Global initialization;
>   2) Logical-CPU scope initialization;
>   3) Enumerate the TDX module capabilities and platform configuration;
>   4) Configure the TDX module about usable memory ranges and global
>      KeyID information;
>   5) Package-scope configuration for the global KeyID;
>   6) Initialize usable memory ranges based on 4).
> 
> The TDX module can also be shut down at any time during its lifetime.
> In case of any error during the initialization process, shut down the
> module.  It's pointless to leave the module in any intermediate state
> during the initialization.
> 
> SEAMCALL requires SEAMRR being enabled and CPU being already in VMX
> operation (VMXON has been done), otherwise it generates #UD.  So far
> only KVM handles VMXON/VMXOFF.  Choose to not handle VMXON/VMXOFF in
> tdx_detect() and tdx_init() but depend on the caller to guarantee that,
> since so far KVM is the only user of TDX.  In the long term, more kernel
> components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX
> module runtime update), so a reference-based approach to do VMXON/VMXOFF
> is likely needed.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/include/asm/tdx.h  |   4 +
>  arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 226 insertions(+)
> 
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 1f29813b1646..c8af2ba6bb8a 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -92,8 +92,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>  
>  #ifdef CONFIG_INTEL_TDX_HOST
>  void tdx_detect_cpu(struct cpuinfo_x86 *c);
> +int tdx_detect(void);
> +int tdx_init(void);
>  #else
>  static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
> +static inline int tdx_detect(void) { return -ENODEV; }
> +static inline int tdx_init(void) { return -ENODEV; }
>  #endif /* CONFIG_INTEL_TDX_HOST */
>  
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index ba2210001ea8..53093d4ad458 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -9,6 +9,8 @@
>  
>  #include <linux/types.h>
>  #include <linux/cpumask.h>
> +#include <linux/mutex.h>
> +#include <linux/cpu.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/cpufeature.h>
> @@ -45,12 +47,33 @@
>  		((u32)(((_keyid_part) & 0xffffffffull) + 1))
>  #define TDX_KEYID_NUM(_keyid_part)	((u32)((_keyid_part) >> 32))
>  
> +/*
> + * TDX module status during initialization
> + */
> +enum tdx_module_status_t {
> +	/* TDX module status is unknown */
> +	TDX_MODULE_UNKNOWN,
> +	/* TDX module is not loaded */
> +	TDX_MODULE_NONE,
> +	/* TDX module is loaded, but not initialized */
> +	TDX_MODULE_LOADED,
> +	/* TDX module is fully initialized */
> +	TDX_MODULE_INITIALIZED,
> +	/* TDX module is shutdown due to error during initialization */
> +	TDX_MODULE_SHUTDOWN,
> +};
> +
>  /* BIOS must configure SEAMRR registers for all cores consistently */
>  static u64 seamrr_base, seamrr_mask;
>  
>  static u32 tdx_keyid_start;
>  static u32 tdx_keyid_num;
>  
> +static enum tdx_module_status_t tdx_module_status;
> +
> +/* Prevent concurrent attempts on TDX detection and initialization */
> +static DEFINE_MUTEX(tdx_module_lock);
> +
>  static bool __seamrr_enabled(void)
>  {
>  	return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS;
> @@ -172,3 +195,202 @@ void tdx_detect_cpu(struct cpuinfo_x86 *c)
>  	detect_seam(c);
>  	detect_tdx_keyids(c);
>  }
> +
> +static bool seamrr_enabled(void)
> +{
> +	/*
> +	 * To detect any BIOS misconfiguration among cores, all logical
> +	 * cpus must have been brought up at least once.  This is true
> +	 * unless 'maxcpus' kernel command line is used to limit the
> +	 * number of cpus to be brought up during boot time.  However
> +	 * 'maxcpus' is basically an invalid operation mode due to the
> +	 * MCE broadcast problem, and it should not be used on a TDX
> +	 * capable machine.  Just do paranoid check here and do not
> +	 * report SEAMRR as enabled in this case.
> +	 */
> +	if (!cpumask_equal(&cpus_booted_once_mask,
> +					cpu_present_mask))
> +		return false;
> +
> +	return __seamrr_enabled();
> +}
> +
> +static bool tdx_keyid_sufficient(void)
> +{
> +	if (!cpumask_equal(&cpus_booted_once_mask,
> +					cpu_present_mask))
> +		return false;

I'd move this cpumask_equal() to a helper.

> +	/*
> +	 * TDX requires at least two KeyIDs: one global KeyID to
> +	 * protect the metadata of the TDX module and one or more
> +	 * KeyIDs to run TD guests.
> +	 */
> +	return tdx_keyid_num >= 2;
> +}
> +
> +static int __tdx_detect(void)
> +{
> +	/* The TDX module is not loaded if SEAMRR is disabled */
> +	if (!seamrr_enabled()) {
> +		pr_info("SEAMRR not enabled.\n");
> +		goto no_tdx_module;
> +	}

Why even bother with the SEAMRR stuff?  It sounded like you can "ping"
the module with SEAMCALL.  Why not just use that directly?

> +	/*
> +	 * Also do not report the TDX module as loaded if there's
> +	 * no enough TDX private KeyIDs to run any TD guests.
> +	 */
> +	if (!tdx_keyid_sufficient()) {
> +		pr_info("Number of TDX private KeyIDs too small: %u.\n",
> +				tdx_keyid_num);
> +		goto no_tdx_module;
> +	}
> +
> +	/* Return -ENODEV until the TDX module is detected */
> +no_tdx_module:
> +	tdx_module_status = TDX_MODULE_NONE;
> +	return -ENODEV;
> +}
> +
> +static int init_tdx_module(void)
> +{
> +	/*
> +	 * Return -EFAULT until all steps of TDX module
> +	 * initialization are done.
> +	 */
> +	return -EFAULT;
> +}
> +
> +static void shutdown_tdx_module(void)
> +{
> +	/* TODO: Shut down the TDX module */
> +	tdx_module_status = TDX_MODULE_SHUTDOWN;
> +}
> +
> +static int __tdx_init(void)
> +{
> +	int ret;
> +
> +	/*
> +	 * Logical-cpu scope initialization requires calling one SEAMCALL
> +	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
> +	 * module also has such requirement.  Further more, configuring
> +	 * the key of the global KeyID requires calling one SEAMCALL for
> +	 * each package.  For simplicity, disable CPU hotplug in the whole
> +	 * initialization process.
> +	 *
> +	 * It's perhaps better to check whether all BIOS-enabled cpus are
> +	 * online before starting initializing, and return early if not.

But you did some of this cpumask checking above.  Right?

> +	 * But none of 'possible', 'present' and 'online' CPU masks
> +	 * represents BIOS-enabled cpus.  For example, 'possible' mask is
> +	 * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
> +	 * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
> +	 * online.
> +	 */
> +	cpus_read_lock();
> +
> +	ret = init_tdx_module();
> +
> +	/*
> +	 * Shut down the TDX module in case of any error during the
> +	 * initialization process.  It's meaningless to leave the TDX
> +	 * module in any middle state of the initialization process.
> +	 */
> +	if (ret)
> +		shutdown_tdx_module();
> +
> +	cpus_read_unlock();
> +
> +	return ret;
> +}
> +
> +/**
> + * tdx_detect - Detect whether the TDX module has been loaded
> + *
> + * Detect whether the TDX module has been loaded and ready for
> + * initialization.  Only call this function when all cpus are
> + * already in VMX operation.
> + *
> + * This function can be called in parallel by multiple callers.
> + *
> + * Return:
> + *
> + * * -0:	The TDX module has been loaded and ready for
> + *		initialization.

"-0", eh?

> + * * -ENODEV:	The TDX module is not loaded.
> + * * -EPERM:	CPU is not in VMX operation.
> + * * -EFAULT:	Other internal fatal errors.
> + */
> +int tdx_detect(void)
> +{
> +	int ret;
> +
> +	mutex_lock(&tdx_module_lock);
> +
> +	switch (tdx_module_status) {
> +	case TDX_MODULE_UNKNOWN:
> +		ret = __tdx_detect();
> +		break;
> +	case TDX_MODULE_NONE:
> +		ret = -ENODEV;
> +		break;
> +	case TDX_MODULE_LOADED:
> +	case TDX_MODULE_INITIALIZED:
> +		ret = 0;
> +		break;
> +	case TDX_MODULE_SHUTDOWN:
> +		ret = -EFAULT;
> +		break;
> +	default:
> +		WARN_ON(1);
> +		ret = -EFAULT;
> +	}
> +
> +	mutex_unlock(&tdx_module_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_detect);
> +
> +/**
> + * tdx_init - Initialize the TDX module
> + *
> + * Initialize the TDX module to make it ready to run TD guests.  This
> + * function should be called after tdx_detect() returns successful.
> + * Only call this function when all cpus are online and are in VMX
> + * operation.  CPU hotplug is temporarily disabled internally.
> + *
> + * This function can be called in parallel by multiple callers.
> + *
> + * Return:
> + *
> + * * -0:	The TDX module has been successfully initialized.
> + * * -ENODEV:	The TDX module is not loaded.
> + * * -EPERM:	The CPU which does SEAMCALL is not in VMX operation.
> + * * -EFAULT:	Other internal fatal errors.
> + */
> +int tdx_init(void)
> +{
> +	int ret;
> +
> +	mutex_lock(&tdx_module_lock);
> +
> +	switch (tdx_module_status) {
> +	case TDX_MODULE_NONE:
> +		ret = -ENODEV;
> +		break;
> +	case TDX_MODULE_LOADED:
> +		ret = __tdx_init();
> +		break;
> +	case TDX_MODULE_INITIALIZED:
> +		ret = 0;
> +		break;
> +	default:
> +		ret = -EFAULT;
> +		break;
> +	}
> +	mutex_unlock(&tdx_module_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_init);

Why does this need both a tdx_detect() and a tdx_init()?  Shouldn't the
interface from outside just be "get TDX up and running, please?"
Kai Huang April 27, 2022, 12:43 a.m. UTC | #7
On Tue, 2022-04-26 at 13:53 -0700, Dave Hansen wrote:
> On 4/5/22 21:49, Kai Huang wrote:
> > The TDX module is essentially a CPU-attested software module running
> > in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
> > host and certain physical attacks.  The TDX module implements the
> > functions to build, tear down and start execution of the protected VMs
> > called Trusted Domains (TD).  Before the TDX module can be used to
> > create and run TD guests, it must be loaded into the SEAM Range Register
> > (SEAMRR) and properly initialized.
> 
> The module isn't loaded into a register, right?
> 
> It's loaded into a memory area pointed to *by* the register.

Yes.  Should be below:

"..., it must be loaded into the memory area pointed to by the SEAM Ranger
Register (SEAMRR) ...".

> 
> >  The TDX module is expected to be
> > loaded by BIOS before booting to the kernel, and the kernel is expected
> > to detect and initialize it, using the SEAMCALLs defined by TDX
> > architecture.
> 
> Wait a sec...  So, what was all this gobleygook about TDX module loading
> and SEAMRR's if the kernel just has the TDX module *handed* to it
> already loaded?
> 
> It looks to me like you wrote all of this before the TDX module was
> being loaded by the BIOS and neglected to go and update these changelogs.

Those were written on purpose after we changed to loading the TDX module in the
BIOS.  In the code, I checks seamrr_enabled() as the first step to detect the
TDX module, so I thought it would be better to talk a little bit about "the TDX
module needs to be loaded to SEAMRR" thing.

> 
> > The TDX module can be initialized only once in its lifetime.  Instead
> > of always initializing it at boot time, this implementation chooses an
> > on-demand approach to initialize TDX until there is a real need (e.g
> > when requested by KVM).  This avoids consuming the memory that must be
> > allocated by kernel and given to the TDX module as metadata (~1/256th of
> > the TDX-usable memory), and also saves the time of initializing the TDX
> > module (and the metadata) when TDX is not used at all.  Initializing the
> > TDX module at runtime on-demand also is more flexible to support TDX
> > module runtime updating in the future (after updating the TDX module, it
> > needs to be initialized again).
> > 
> > Introduce two placeholders tdx_detect() and tdx_init() to detect and
> > initialize the TDX module on demand, with a state machine introduced to
> > orchestrate the entire process (in case of multiple callers).
> > 
> > To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs.  The
> > TDX module is reported as not loaded if either SEAMRR is not enabled, or
> > there are no enough TDX private KeyIDs to create any TD guest.  The TDX
> > module itself requires one global TDX private KeyID to crypto protect
> > its metadata.
> 
> This is stepping over the line into telling me what the code does
> instead of why.

Will remove.

[...]

> > +
> > +static bool seamrr_enabled(void)
> > +{
> > +	/*
> > +	 * To detect any BIOS misconfiguration among cores, all logical
> > +	 * cpus must have been brought up at least once.  This is true
> > +	 * unless 'maxcpus' kernel command line is used to limit the
> > +	 * number of cpus to be brought up during boot time.  However
> > +	 * 'maxcpus' is basically an invalid operation mode due to the
> > +	 * MCE broadcast problem, and it should not be used on a TDX
> > +	 * capable machine.  Just do paranoid check here and do not
> > +	 * report SEAMRR as enabled in this case.
> > +	 */
> > +	if (!cpumask_equal(&cpus_booted_once_mask,
> > +					cpu_present_mask))
> > +		return false;
> > +
> > +	return __seamrr_enabled();
> > +}
> > +
> > +static bool tdx_keyid_sufficient(void)
> > +{
> > +	if (!cpumask_equal(&cpus_booted_once_mask,
> > +					cpu_present_mask))
> > +		return false;
> 
> I'd move this cpumask_equal() to a helper.

Sorry to double confirm, do you want something like:

static bool tdx_detected_on_all_cpus(void)
{
	/*
	 * To detect any BIOS misconfiguration among cores, all logical
	 * cpus must have been brought up at least once.  This is true
	 * unless 'maxcpus' kernel command line is used to limit the
	 * number of cpus to be brought up during boot time.  However
	 * 'maxcpus' is basically an invalid operation mode due to the
	 * MCE broadcast problem, and it should not be used on a TDX
	 * capable machine.  Just do paranoid check here and do not
	 * report SEAMRR as enabled in this case.
	 */
	return cpumask_equal(&cpus_booted_once_mask, cpu_present_mask);
}

static bool seamrr_enabled(void)
{
	if (!tdx_detected_on_all_cpus())
		return false;

	return __seamrr_enabled();
}

static bool tdx_keyid_sufficient()
{
	if (!tdx_detected_on_all_cpus())
		return false;

	...
}

> 
> > +	/*
> > +	 * TDX requires at least two KeyIDs: one global KeyID to
> > +	 * protect the metadata of the TDX module and one or more
> > +	 * KeyIDs to run TD guests.
> > +	 */
> > +	return tdx_keyid_num >= 2;
> > +}
> > +
> > +static int __tdx_detect(void)
> > +{
> > +	/* The TDX module is not loaded if SEAMRR is disabled */
> > +	if (!seamrr_enabled()) {
> > +		pr_info("SEAMRR not enabled.\n");
> > +		goto no_tdx_module;
> > +	}
> 
> Why even bother with the SEAMRR stuff?  It sounded like you can "ping"
> the module with SEAMCALL.  Why not just use that directly?

SEAMCALL will cause #GP if SEAMRR is not enabled.  We should check whether
SEAMRR is enabled before making SEAMCALL.

> 
> > +	/*
> > +	 * Also do not report the TDX module as loaded if there's
> > +	 * no enough TDX private KeyIDs to run any TD guests.
> > +	 */
> > +	if (!tdx_keyid_sufficient()) {
> > +		pr_info("Number of TDX private KeyIDs too small: %u.\n",
> > +				tdx_keyid_num);
> > +		goto no_tdx_module;
> > +	}
> > +
> > +	/* Return -ENODEV until the TDX module is detected */
> > +no_tdx_module:
> > +	tdx_module_status = TDX_MODULE_NONE;
> > +	return -ENODEV;
> > +}
> > +
> > +static int init_tdx_module(void)
> > +{
> > +	/*
> > +	 * Return -EFAULT until all steps of TDX module
> > +	 * initialization are done.
> > +	 */
> > +	return -EFAULT;
> > +}
> > +
> > +static void shutdown_tdx_module(void)
> > +{
> > +	/* TODO: Shut down the TDX module */
> > +	tdx_module_status = TDX_MODULE_SHUTDOWN;
> > +}
> > +
> > +static int __tdx_init(void)
> > +{
> > +	int ret;
> > +
> > +	/*
> > +	 * Logical-cpu scope initialization requires calling one SEAMCALL
> > +	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
> > +	 * module also has such requirement.  Further more, configuring
> > +	 * the key of the global KeyID requires calling one SEAMCALL for
> > +	 * each package.  For simplicity, disable CPU hotplug in the whole
> > +	 * initialization process.
> > +	 *
> > +	 * It's perhaps better to check whether all BIOS-enabled cpus are
> > +	 * online before starting initializing, and return early if not.
> 
> But you did some of this cpumask checking above.  Right?

Above check only guarantees SEAMRR/TDX KeyID has been detected on all presnet
cpus.  the 'present' cpumask doesn't equal to all BIOS-enabled CPUs.

> 
> > +	 * But none of 'possible', 'present' and 'online' CPU masks
> > +	 * represents BIOS-enabled cpus.  For example, 'possible' mask is
> > +	 * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
> > +	 * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
> > +	 * online.
> > +	 */
> > +	cpus_read_lock();
> > +
> > +	ret = init_tdx_module();
> > +
> > +	/*
> > +	 * Shut down the TDX module in case of any error during the
> > +	 * initialization process.  It's meaningless to leave the TDX
> > +	 * module in any middle state of the initialization process.
> > +	 */
> > +	if (ret)
> > +		shutdown_tdx_module();
> > +
> > +	cpus_read_unlock();
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * tdx_detect - Detect whether the TDX module has been loaded
> > + *
> > + * Detect whether the TDX module has been loaded and ready for
> > + * initialization.  Only call this function when all cpus are
> > + * already in VMX operation.
> > + *
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * -0:	The TDX module has been loaded and ready for
> > + *		initialization.
> 
> "-0", eh?

Sorry.  Originally I meant to have below:

	- 0:
	- -ENODEV:
	...

I changed to want to have below:

	0:
	-ENODEV:
	...

But forgot to remove the '-' before 0. I'll remove it.

> 
> > + * * -ENODEV:	The TDX module is not loaded.
> > + * * -EPERM:	CPU is not in VMX operation.
> > + * * -EFAULT:	Other internal fatal errors.
> > + */
> > +int tdx_detect(void)
> > +{
> > +	int ret;
> > +
> > +	mutex_lock(&tdx_module_lock);
> > +
> > +	switch (tdx_module_status) {
> > +	case TDX_MODULE_UNKNOWN:
> > +		ret = __tdx_detect();
> > +		break;
> > +	case TDX_MODULE_NONE:
> > +		ret = -ENODEV;
> > +		break;
> > +	case TDX_MODULE_LOADED:
> > +	case TDX_MODULE_INITIALIZED:
> > +		ret = 0;
> > +		break;
> > +	case TDX_MODULE_SHUTDOWN:
> > +		ret = -EFAULT;
> > +		break;
> > +	default:
> > +		WARN_ON(1);
> > +		ret = -EFAULT;
> > +	}
> > +
> > +	mutex_unlock(&tdx_module_lock);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_detect);
> > +
> > +/**
> > + * tdx_init - Initialize the TDX module
> > + *
> > + * Initialize the TDX module to make it ready to run TD guests.  This
> > + * function should be called after tdx_detect() returns successful.
> > + * Only call this function when all cpus are online and are in VMX
> > + * operation.  CPU hotplug is temporarily disabled internally.
> > + *
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * -0:	The TDX module has been successfully initialized.
> > + * * -ENODEV:	The TDX module is not loaded.
> > + * * -EPERM:	The CPU which does SEAMCALL is not in VMX operation.
> > + * * -EFAULT:	Other internal fatal errors.
> > + */
> > +int tdx_init(void)
> > +{
> > +	int ret;
> > +
> > +	mutex_lock(&tdx_module_lock);
> > +
> > +	switch (tdx_module_status) {
> > +	case TDX_MODULE_NONE:
> > +		ret = -ENODEV;
> > +		break;
> > +	case TDX_MODULE_LOADED:
> > +		ret = __tdx_init();
> > +		break;
> > +	case TDX_MODULE_INITIALIZED:
> > +		ret = 0;
> > +		break;
> > +	default:
> > +		ret = -EFAULT;
> > +		break;
> > +	}
> > +	mutex_unlock(&tdx_module_lock);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_init);
> 
> Why does this need both a tdx_detect() and a tdx_init()?  Shouldn't the
> interface from outside just be "get TDX up and running, please?"

We can have a single tdx_init().  However tdx_init() can be heavy, and having a
separate non-heavy tdx_detect() may be useful if caller wants to separate
"detecting the TDX module" and "initializing the TDX module", i.e. to do
something in the middle.

However tdx_detect() basically only detects P-SEAMLDR.  If we move P-SEAMLDR
detection to tdx_init(), or we git rid of P-SEAMLDR completely, then we don't
need tdx_detect() anymore.  We can expose seamrr_enabled() and TDX KeyID
variables or functions so caller can use them to see whether it should do TDX
related staff and then call tdx_init().
Dave Hansen April 27, 2022, 2:49 p.m. UTC | #8
On 4/26/22 17:43, Kai Huang wrote:
> On Tue, 2022-04-26 at 13:53 -0700, Dave Hansen wrote:
>> On 4/5/22 21:49, Kai Huang wrote:
...
>>> +static bool tdx_keyid_sufficient(void)
>>> +{
>>> +	if (!cpumask_equal(&cpus_booted_once_mask,
>>> +					cpu_present_mask))
>>> +		return false;
>>
>> I'd move this cpumask_equal() to a helper.
> 
> Sorry to double confirm, do you want something like:
> 
> static bool tdx_detected_on_all_cpus(void)
> {
> 	/*
> 	 * To detect any BIOS misconfiguration among cores, all logical
> 	 * cpus must have been brought up at least once.  This is true
> 	 * unless 'maxcpus' kernel command line is used to limit the
> 	 * number of cpus to be brought up during boot time.  However
> 	 * 'maxcpus' is basically an invalid operation mode due to the
> 	 * MCE broadcast problem, and it should not be used on a TDX
> 	 * capable machine.  Just do paranoid check here and do not
> 	 * report SEAMRR as enabled in this case.
> 	 */
> 	return cpumask_equal(&cpus_booted_once_mask, cpu_present_mask);
> }

That's logically the right idea, but I hate the name since the actual
test has nothing to do with TDX being detected.  The comment is also
rather verbose and rambling.

It should be named something like:

	all_cpus_booted()

and with a comment like this:

/*
 * To initialize TDX, the kernel needs to run some code on every
 * present CPU.  Detect cases where present CPUs have not been
 * booted, like when maxcpus=N is used.
 */

> static bool seamrr_enabled(void)
> {
> 	if (!tdx_detected_on_all_cpus())
> 		return false;
> 
> 	return __seamrr_enabled();
> }
> 
> static bool tdx_keyid_sufficient()
> {
> 	if (!tdx_detected_on_all_cpus())
> 		return false;
> 
> 	...
> }

Although, looking at those, it's *still* unclear why you need this.  I
assume it's because some later TDX SEAMCALL will fail if you get this
wrong, and you want to be able to provide a better error message.

*BUT* this code doesn't actually provide halfway reasonable error
messages.  If someone uses maxcpus=99, then this code will report:

	pr_info("SEAMRR not enabled.\n");

right?  That's bonkers.

>>> +	/*
>>> +	 * TDX requires at least two KeyIDs: one global KeyID to
>>> +	 * protect the metadata of the TDX module and one or more
>>> +	 * KeyIDs to run TD guests.
>>> +	 */
>>> +	return tdx_keyid_num >= 2;
>>> +}
>>> +
>>> +static int __tdx_detect(void)
>>> +{
>>> +	/* The TDX module is not loaded if SEAMRR is disabled */
>>> +	if (!seamrr_enabled()) {
>>> +		pr_info("SEAMRR not enabled.\n");
>>> +		goto no_tdx_module;
>>> +	}
>>
>> Why even bother with the SEAMRR stuff?  It sounded like you can "ping"
>> the module with SEAMCALL.  Why not just use that directly?
> 
> SEAMCALL will cause #GP if SEAMRR is not enabled.  We should check whether
> SEAMRR is enabled before making SEAMCALL.

So...  You could actually get rid of all this code.  if SEAMCALL #GP's,
then you say, "Whoops, the firmware didn't load the TDX module
correctly, sorry."

Why is all this code here?  What is it for?

>>> +	/*
>>> +	 * Also do not report the TDX module as loaded if there's
>>> +	 * no enough TDX private KeyIDs to run any TD guests.
>>> +	 */
>>> +	if (!tdx_keyid_sufficient()) {
>>> +		pr_info("Number of TDX private KeyIDs too small: %u.\n",
>>> +				tdx_keyid_num);
>>> +		goto no_tdx_module;
>>> +	}
>>> +
>>> +	/* Return -ENODEV until the TDX module is detected */
>>> +no_tdx_module:
>>> +	tdx_module_status = TDX_MODULE_NONE;
>>> +	return -ENODEV;
>>> +}

Again, if someone uses maxcpus=1234 and we get down here, then it
reports to the user:
	
	Number of TDX private KeyIDs too small: ...

????  When the root of the problem has nothing to do with KeyIDs.

>>> +static int init_tdx_module(void)
>>> +{
>>> +	/*
>>> +	 * Return -EFAULT until all steps of TDX module
>>> +	 * initialization are done.
>>> +	 */
>>> +	return -EFAULT;
>>> +}
>>> +
>>> +static void shutdown_tdx_module(void)
>>> +{
>>> +	/* TODO: Shut down the TDX module */
>>> +	tdx_module_status = TDX_MODULE_SHUTDOWN;
>>> +}
>>> +
>>> +static int __tdx_init(void)
>>> +{
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * Logical-cpu scope initialization requires calling one SEAMCALL
>>> +	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
>>> +	 * module also has such requirement.  Further more, configuring
>>> +	 * the key of the global KeyID requires calling one SEAMCALL for
>>> +	 * each package.  For simplicity, disable CPU hotplug in the whole
>>> +	 * initialization process.
>>> +	 *
>>> +	 * It's perhaps better to check whether all BIOS-enabled cpus are
>>> +	 * online before starting initializing, and return early if not.
>>
>> But you did some of this cpumask checking above.  Right?
> 
> Above check only guarantees SEAMRR/TDX KeyID has been detected on all presnet
> cpus.  the 'present' cpumask doesn't equal to all BIOS-enabled CPUs.

I have no idea what this is saying.  In general, I have no idea what the
comment is saying.  It makes zero sense.  The locking pattern for stuff
like this is:

	cpus_read_lock();

	for_each_online_cpu()
		do_something();

	cpus_read_unlock();

because you need to make sure that you don't miss "do_something()" on a
CPU that comes online during the loop.

But, now that I think about it, all of the checks I've seen so far are
for *booted* CPUs.  While the lock (I assume) would keep new CPUs from
booting, it doesn't do any good really since the "cpus_booted_once_mask"
bits are only set and not cleared.  A CPU doesn't un-become booted once.

Again, we seem to have a long, verbose comment that says very little and
only confuses me.

...
>> Why does this need both a tdx_detect() and a tdx_init()?  Shouldn't the
>> interface from outside just be "get TDX up and running, please?"
> 
> We can have a single tdx_init().  However tdx_init() can be heavy, and having a
> separate non-heavy tdx_detect() may be useful if caller wants to separate
> "detecting the TDX module" and "initializing the TDX module", i.e. to do
> something in the middle.

<Sigh>  So, this "design" went unmentioned, *and* I can't review if the
actual callers of this need the functionality or not because they're not
in this series.

> However tdx_detect() basically only detects P-SEAMLDR.  If we move P-SEAMLDR
> detection to tdx_init(), or we git rid of P-SEAMLDR completely, then we don't
> need tdx_detect() anymore.  We can expose seamrr_enabled() and TDX KeyID
> variables or functions so caller can use them to see whether it should do TDX
> related staff and then call tdx_init().

I don't think you've made a strong case for why P-SEAMLDR detection is
even necessary in this series.
Kai Huang April 28, 2022, midnight UTC | #9
On Wed, 2022-04-27 at 07:49 -0700, Dave Hansen wrote:
> On 4/26/22 17:43, Kai Huang wrote:
> > On Tue, 2022-04-26 at 13:53 -0700, Dave Hansen wrote:
> > > On 4/5/22 21:49, Kai Huang wrote:
> ...
> > > > +static bool tdx_keyid_sufficient(void)
> > > > +{
> > > > +	if (!cpumask_equal(&cpus_booted_once_mask,
> > > > +					cpu_present_mask))
> > > > +		return false;
> > > 
> > > I'd move this cpumask_equal() to a helper.
> > 
> > Sorry to double confirm, do you want something like:
> > 
> > static bool tdx_detected_on_all_cpus(void)
> > {
> > 	/*
> > 	 * To detect any BIOS misconfiguration among cores, all logical
> > 	 * cpus must have been brought up at least once.  This is true
> > 	 * unless 'maxcpus' kernel command line is used to limit the
> > 	 * number of cpus to be brought up during boot time.  However
> > 	 * 'maxcpus' is basically an invalid operation mode due to the
> > 	 * MCE broadcast problem, and it should not be used on a TDX
> > 	 * capable machine.  Just do paranoid check here and do not
> > 	 * report SEAMRR as enabled in this case.
> > 	 */
> > 	return cpumask_equal(&cpus_booted_once_mask, cpu_present_mask);
> > }
> 
> That's logically the right idea, but I hate the name since the actual
> test has nothing to do with TDX being detected.  The comment is also
> rather verbose and rambling.
> 
> It should be named something like:
> 
> 	all_cpus_booted()
> 
> and with a comment like this:
> 
> /*
>  * To initialize TDX, the kernel needs to run some code on every
>  * present CPU.  Detect cases where present CPUs have not been
>  * booted, like when maxcpus=N is used.
>  */

Thank you.

> 
> > static bool seamrr_enabled(void)
> > {
> > 	if (!tdx_detected_on_all_cpus())
> > 		return false;
> > 
> > 	return __seamrr_enabled();
> > }
> > 
> > static bool tdx_keyid_sufficient()
> > {
> > 	if (!tdx_detected_on_all_cpus())
> > 		return false;
> > 
> > 	...
> > }
> 
> Although, looking at those, it's *still* unclear why you need this.  I
> assume it's because some later TDX SEAMCALL will fail if you get this
> wrong, and you want to be able to provide a better error message.
> 
> *BUT* this code doesn't actually provide halfway reasonable error
> messages.  If someone uses maxcpus=99, then this code will report:
> 
> 	pr_info("SEAMRR not enabled.\n");
> 
> right?  That's bonkers.

Right this isn't good.

I think we can use pr_info_once() when all_cpus_booted() returns false, and get
rid of printing "SEAMRR not enabled" in seamrr_enabled().  How about below?

static bool seamrr_enabled(void)
{
	if (!all_cpus_booted())
		pr_info_once("Not all present CPUs have been booted.  Report
SEAMRR as not enabled");

	return __seamrr_enabled();
}

And we don't print "SEAMRR not enabled".

> 
> > > > +	/*
> > > > +	 * TDX requires at least two KeyIDs: one global KeyID to
> > > > +	 * protect the metadata of the TDX module and one or more
> > > > +	 * KeyIDs to run TD guests.
> > > > +	 */
> > > > +	return tdx_keyid_num >= 2;
> > > > +}
> > > > +
> > > > +static int __tdx_detect(void)
> > > > +{
> > > > +	/* The TDX module is not loaded if SEAMRR is disabled */
> > > > +	if (!seamrr_enabled()) {
> > > > +		pr_info("SEAMRR not enabled.\n");
> > > > +		goto no_tdx_module;
> > > > +	}
> > > 
> > > Why even bother with the SEAMRR stuff?  It sounded like you can "ping"
> > > the module with SEAMCALL.  Why not just use that directly?
> > 
> > SEAMCALL will cause #GP if SEAMRR is not enabled.  We should check whether
> > SEAMRR is enabled before making SEAMCALL.
> 
> So...  You could actually get rid of all this code.  if SEAMCALL #GP's,
> then you say, "Whoops, the firmware didn't load the TDX module
> correctly, sorry."

Yes we can just use the first SEAMCALL (TDH.SYS.INIT) to detect whether TDX
module is loaded.  If SEAMCALL is successful, the module is loaded.

One problem is currently the patch to flush cache for kexec() uses
seamrr_enabled() and tdx_keyid_sufficient() to determine whether we need to
flush the cache.  The reason is, similar to SME, the flush is done in
stop_this_cpu(), but the status of TDX module initialization is protected by
mutex, so we cannot use TDX module status in stop_this_cpu() to determine
whether to flush.

If that patch makes sense, I think we still need to detect SEAMRR?

> 
> Why is all this code here?  What is it for?
> 
> > > > +	/*
> > > > +	 * Also do not report the TDX module as loaded if there's
> > > > +	 * no enough TDX private KeyIDs to run any TD guests.
> > > > +	 */
> > > > +	if (!tdx_keyid_sufficient()) {
> > > > +		pr_info("Number of TDX private KeyIDs too small: %u.\n",
> > > > +				tdx_keyid_num);
> > > > +		goto no_tdx_module;
> > > > +	}
> > > > +
> > > > +	/* Return -ENODEV until the TDX module is detected */
> > > > +no_tdx_module:
> > > > +	tdx_module_status = TDX_MODULE_NONE;
> > > > +	return -ENODEV;
> > > > +}
> 
> Again, if someone uses maxcpus=1234 and we get down here, then it
> reports to the user:
> 	
> 	Number of TDX private KeyIDs too small: ...
> 
> ????  When the root of the problem has nothing to do with KeyIDs.

Thanks for catching.  Similar to seamrr_enabled() above.

> 
> > > > +static int init_tdx_module(void)
> > > > +{
> > > > +	/*
> > > > +	 * Return -EFAULT until all steps of TDX module
> > > > +	 * initialization are done.
> > > > +	 */
> > > > +	return -EFAULT;
> > > > +}
> > > > +
> > > > +static void shutdown_tdx_module(void)
> > > > +{
> > > > +	/* TODO: Shut down the TDX module */
> > > > +	tdx_module_status = TDX_MODULE_SHUTDOWN;
> > > > +}
> > > > +
> > > > +static int __tdx_init(void)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	/*
> > > > +	 * Logical-cpu scope initialization requires calling one SEAMCALL
> > > > +	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
> > > > +	 * module also has such requirement.  Further more, configuring
> > > > +	 * the key of the global KeyID requires calling one SEAMCALL for
> > > > +	 * each package.  For simplicity, disable CPU hotplug in the whole
> > > > +	 * initialization process.
> > > > +	 *
> > > > +	 * It's perhaps better to check whether all BIOS-enabled cpus are
> > > > +	 * online before starting initializing, and return early if not.
> > > 
> > > But you did some of this cpumask checking above.  Right?
> > 
> > Above check only guarantees SEAMRR/TDX KeyID has been detected on all presnet
> > cpus.  the 'present' cpumask doesn't equal to all BIOS-enabled CPUs.
> 
> I have no idea what this is saying.  In general, I have no idea what the
> comment is saying.  It makes zero sense.  The locking pattern for stuff
> like this is:
> 
> 	cpus_read_lock();
> 
> 	for_each_online_cpu()
> 		do_something();
> 
> 	cpus_read_unlock();
> 
> because you need to make sure that you don't miss "do_something()" on a
> CPU that comes online during the loop.

I don't want any CPU going offline so  "do_something" will be done on all online
CPUs.

> 
> But, now that I think about it, all of the checks I've seen so far are
> for *booted* CPUs.  While the lock (I assume) would keep new CPUs from
> booting, it doesn't do any good really since the "cpus_booted_once_mask"
> bits are only set and not cleared.  A CPU doesn't un-become booted once.
> 
> Again, we seem to have a long, verbose comment that says very little and
> only confuses me.

How about below:

"During initializing the TDX module, one step requires some SEAMCALL must be
done on all logical cpus enabled by BIOS, otherwise a later step will fail. 
Disable CPU hotplug during the initialization process to prevent any CPU going
offline during initializing the TDX module.  Note it is caller's responsibility
to guarantee all BIOS-enabled CPUs are in cpu_present_mask and all present CPUs
are online."


> 
> ...
> > > Why does this need both a tdx_detect() and a tdx_init()?  Shouldn't the
> > > interface from outside just be "get TDX up and running, please?"
> > 
> > We can have a single tdx_init().  However tdx_init() can be heavy, and having a
> > separate non-heavy tdx_detect() may be useful if caller wants to separate
> > "detecting the TDX module" and "initializing the TDX module", i.e. to do
> > something in the middle.
> 
> <Sigh>  So, this "design" went unmentioned, *and* I can't review if the
> actual callers of this need the functionality or not because they're not
> in this series.

I'll remove tdx_detect().  Currently KVM doesn't do anything between
tdx_detect() and tdx_init(). 

https://lore.kernel.org/lkml/cover.1646422845.git.isaku.yamahata@intel.com/T/#mc7d5bb37107131b65ca7142b418b3e17da36a9ca

> 
> > However tdx_detect() basically only detects P-SEAMLDR.  If we move P-SEAMLDR
> > detection to tdx_init(), or we git rid of P-SEAMLDR completely, then we don't
> > need tdx_detect() anymore.  We can expose seamrr_enabled() and TDX KeyID
> > variables or functions so caller can use them to see whether it should do TDX
> > related staff and then call tdx_init().
> 
> I don't think you've made a strong case for why P-SEAMLDR detection is
> even necessary in this series.

Will remove P-SEAMLDR code and tdx_detect().
Dave Hansen April 28, 2022, 2:27 p.m. UTC | #10
On 4/27/22 17:00, Kai Huang wrote:
> On Wed, 2022-04-27 at 07:49 -0700, Dave Hansen wrote:
> I think we can use pr_info_once() when all_cpus_booted() returns false, and get
> rid of printing "SEAMRR not enabled" in seamrr_enabled().  How about below?
> 
> static bool seamrr_enabled(void)
> {
> 	if (!all_cpus_booted())
> 		pr_info_once("Not all present CPUs have been booted.  Report
> SEAMRR as not enabled");
> 
> 	return __seamrr_enabled();
> }
> 
> And we don't print "SEAMRR not enabled".

That's better, but even better than that would be removing all that
SEAMRR gunk in the first place.

>>>>> +	/*
>>>>> +	 * TDX requires at least two KeyIDs: one global KeyID to
>>>>> +	 * protect the metadata of the TDX module and one or more
>>>>> +	 * KeyIDs to run TD guests.
>>>>> +	 */
>>>>> +	return tdx_keyid_num >= 2;
>>>>> +}
>>>>> +
>>>>> +static int __tdx_detect(void)
>>>>> +{
>>>>> +	/* The TDX module is not loaded if SEAMRR is disabled */
>>>>> +	if (!seamrr_enabled()) {
>>>>> +		pr_info("SEAMRR not enabled.\n");
>>>>> +		goto no_tdx_module;
>>>>> +	}
>>>>
>>>> Why even bother with the SEAMRR stuff?  It sounded like you can "ping"
>>>> the module with SEAMCALL.  Why not just use that directly?
>>>
>>> SEAMCALL will cause #GP if SEAMRR is not enabled.  We should check whether
>>> SEAMRR is enabled before making SEAMCALL.
>>
>> So...  You could actually get rid of all this code.  if SEAMCALL #GP's,
>> then you say, "Whoops, the firmware didn't load the TDX module
>> correctly, sorry."
> 
> Yes we can just use the first SEAMCALL (TDH.SYS.INIT) to detect whether TDX
> module is loaded.  If SEAMCALL is successful, the module is loaded.
> 
> One problem is currently the patch to flush cache for kexec() uses
> seamrr_enabled() and tdx_keyid_sufficient() to determine whether we need to
> flush the cache.  The reason is, similar to SME, the flush is done in
> stop_this_cpu(), but the status of TDX module initialization is protected by
> mutex, so we cannot use TDX module status in stop_this_cpu() to determine
> whether to flush.
> 
> If that patch makes sense, I think we still need to detect SEAMRR?

Please go look at stop_this_cpu() closely.  What are the AMD folks doing
for SME exactly?  Do they, for instance, do the WBINVD when the kernel
used SME?  No, they just use a pretty low-level check if the processor
supports SME.

Doing the same kind of thing for TDX is fine.  You could check the MTRR
MSR bits that tell you if SEAMRR is supported and then read the MSR
directly.  You could check the CPUID enumeration for MKTME or
CPUID.B.0.EDX (I'm not even sure what this is but the SEAMCALL spec says
it is part of SEAMCALL operation).

Just like the SME test, it doesn't even need to be precise.  It just
needs to be 100% accurate in that it is *ALWAYS* set for any system that
might have dirtied cache aliases.

I'm not sure why you are so fixated on SEAMRR specifically for this.


...
> "During initializing the TDX module, one step requires some SEAMCALL must be
> done on all logical cpus enabled by BIOS, otherwise a later step will fail. 
> Disable CPU hotplug during the initialization process to prevent any CPU going
> offline during initializing the TDX module.  Note it is caller's responsibility
> to guarantee all BIOS-enabled CPUs are in cpu_present_mask and all present CPUs
> are online."

But, what if a CPU went offline just before this lock was taken?  What
if the caller make sure all present CPUs are online, makes the call,
then a CPU is taken offline.  The lock wouldn't do any good.

What purpose does the lock serve?
Kai Huang April 28, 2022, 11:44 p.m. UTC | #11
On Thu, 2022-04-28 at 07:27 -0700, Dave Hansen wrote:
> On 4/27/22 17:00, Kai Huang wrote:
> > On Wed, 2022-04-27 at 07:49 -0700, Dave Hansen wrote:
> > I think we can use pr_info_once() when all_cpus_booted() returns false, and get
> > rid of printing "SEAMRR not enabled" in seamrr_enabled().  How about below?
> > 
> > static bool seamrr_enabled(void)
> > {
> > 	if (!all_cpus_booted())
> > 		pr_info_once("Not all present CPUs have been booted.  Report
> > SEAMRR as not enabled");
> > 
> > 	return __seamrr_enabled();
> > }
> > 
> > And we don't print "SEAMRR not enabled".
> 
> That's better, but even better than that would be removing all that
> SEAMRR gunk in the first place.

Agreed.

> > > > > > +	/*
> > > > > > +	 * TDX requires at least two KeyIDs: one global KeyID to
> > > > > > +	 * protect the metadata of the TDX module and one or more
> > > > > > +	 * KeyIDs to run TD guests.
> > > > > > +	 */
> > > > > > +	return tdx_keyid_num >= 2;
> > > > > > +}
> > > > > > +
> > > > > > +static int __tdx_detect(void)
> > > > > > +{
> > > > > > +	/* The TDX module is not loaded if SEAMRR is disabled */
> > > > > > +	if (!seamrr_enabled()) {
> > > > > > +		pr_info("SEAMRR not enabled.\n");
> > > > > > +		goto no_tdx_module;
> > > > > > +	}
> > > > > 
> > > > > Why even bother with the SEAMRR stuff?  It sounded like you can "ping"
> > > > > the module with SEAMCALL.  Why not just use that directly?
> > > > 
> > > > SEAMCALL will cause #GP if SEAMRR is not enabled.  We should check whether
> > > > SEAMRR is enabled before making SEAMCALL.
> > > 
> > > So...  You could actually get rid of all this code.  if SEAMCALL #GP's,
> > > then you say, "Whoops, the firmware didn't load the TDX module
> > > correctly, sorry."
> > 
> > Yes we can just use the first SEAMCALL (TDH.SYS.INIT) to detect whether TDX
> > module is loaded.  If SEAMCALL is successful, the module is loaded.
> > 
> > One problem is currently the patch to flush cache for kexec() uses
> > seamrr_enabled() and tdx_keyid_sufficient() to determine whether we need to
> > flush the cache.  The reason is, similar to SME, the flush is done in
> > stop_this_cpu(), but the status of TDX module initialization is protected by
> > mutex, so we cannot use TDX module status in stop_this_cpu() to determine
> > whether to flush.
> > 
> > If that patch makes sense, I think we still need to detect SEAMRR?
> 
> Please go look at stop_this_cpu() closely.  What are the AMD folks doing
> for SME exactly?  Do they, for instance, do the WBINVD when the kernel
> used SME?  No, they just use a pretty low-level check if the processor
> supports SME.
> 
> Doing the same kind of thing for TDX is fine.  You could check the MTRR
> MSR bits that tell you if SEAMRR is supported and then read the MSR
> directly.  You could check the CPUID enumeration for MKTME or
> CPUID.B.0.EDX (I'm not even sure what this is but the SEAMCALL spec says
> it is part of SEAMCALL operation).

I am not sure about this CPUID either.  

> 
> Just like the SME test, it doesn't even need to be precise.  It just
> needs to be 100% accurate in that it is *ALWAYS* set for any system that
> might have dirtied cache aliases.
> 
> I'm not sure why you are so fixated on SEAMRR specifically for this.

I see.  I think I can simply use MTRR.SEAMRR bit check.  If CPU supports SEAMRR,
then basically it supports MKTME.

Is this look good for you?

	
> 
> 
> ...
> > "During initializing the TDX module, one step requires some SEAMCALL must be
> > done on all logical cpus enabled by BIOS, otherwise a later step will fail. 
> > Disable CPU hotplug during the initialization process to prevent any CPU going
> > offline during initializing the TDX module.  Note it is caller's responsibility
> > to guarantee all BIOS-enabled CPUs are in cpu_present_mask and all present CPUs
> > are online."
> 
> But, what if a CPU went offline just before this lock was taken?  What
> if the caller make sure all present CPUs are online, makes the call,
> then a CPU is taken offline.  The lock wouldn't do any good.
> 
> What purpose does the lock serve?

I thought cpus_read_lock() can prevent any CPU from going offline, no?
Dave Hansen April 28, 2022, 11:53 p.m. UTC | #12
On 4/28/22 16:44, Kai Huang wrote:
>> Just like the SME test, it doesn't even need to be precise.  It just
>> needs to be 100% accurate in that it is *ALWAYS* set for any system that
>> might have dirtied cache aliases.
>>
>> I'm not sure why you are so fixated on SEAMRR specifically for this.
> I see.  I think I can simply use MTRR.SEAMRR bit check.  If CPU supports SEAMRR,
> then basically it supports MKTME.
> 
> Is this look good for you?

Sure, fine, as long as it comes with a coherent description that
explains why the check is good enough.

>>> "During initializing the TDX module, one step requires some SEAMCALL must be
>>> done on all logical cpus enabled by BIOS, otherwise a later step will fail. 
>>> Disable CPU hotplug during the initialization process to prevent any CPU going
>>> offline during initializing the TDX module.  Note it is caller's responsibility
>>> to guarantee all BIOS-enabled CPUs are in cpu_present_mask and all present CPUs
>>> are online."
>> But, what if a CPU went offline just before this lock was taken?  What
>> if the caller make sure all present CPUs are online, makes the call,
>> then a CPU is taken offline.  The lock wouldn't do any good.
>>
>> What purpose does the lock serve?
> I thought cpus_read_lock() can prevent any CPU from going offline, no?

It doesn't prevent squat before the lock is taken, though.
Kai Huang April 29, 2022, 12:11 a.m. UTC | #13
On Thu, 2022-04-28 at 16:53 -0700, Dave Hansen wrote:
> On 4/28/22 16:44, Kai Huang wrote:
> > > Just like the SME test, it doesn't even need to be precise.  It just
> > > needs to be 100% accurate in that it is *ALWAYS* set for any system that
> > > might have dirtied cache aliases.
> > > 
> > > I'm not sure why you are so fixated on SEAMRR specifically for this.
> > I see.  I think I can simply use MTRR.SEAMRR bit check.  If CPU supports SEAMRR,
> > then basically it supports MKTME.
> > 
> > Is this look good for you?
> 
> Sure, fine, as long as it comes with a coherent description that
> explains why the check is good enough.
> 
> > > > "During initializing the TDX module, one step requires some SEAMCALL must be
> > > > done on all logical cpus enabled by BIOS, otherwise a later step will fail. 
> > > > Disable CPU hotplug during the initialization process to prevent any CPU going
> > > > offline during initializing the TDX module.  Note it is caller's responsibility
> > > > to guarantee all BIOS-enabled CPUs are in cpu_present_mask and all present CPUs
> > > > are online."
> > > But, what if a CPU went offline just before this lock was taken?  What
> > > if the caller make sure all present CPUs are online, makes the call,
> > > then a CPU is taken offline.  The lock wouldn't do any good.
> > > 
> > > What purpose does the lock serve?
> > I thought cpus_read_lock() can prevent any CPU from going offline, no?
> 
> It doesn't prevent squat before the lock is taken, though.

This is true.  So I think w/o taking the lock is also fine, as the TDX module
initialization is a state machine.  If any cpu goes offline during logical-cpu
level initialization and TDH.SYS.LP.INIT isn't done on that cpu, then later the
TDH.SYS.CONFIG will fail.  Similarly, if any cpu going offline causes
TDH.SYS.KEY.CONFIG is not done for any package, then TDH.SYS.TDMR.INIT will
fail.

A problem (I realized it exists in current implementation too) is shutting down
the TDX module, which requires calling TDH.SYS.LP.SHUTDOWN on all BIOS-enabled
cpus.  Kernel can do this SEAMCALL at most for all present cpus.  However when
any cpu is offline, this SEAMCALL won't be called on it, and it seems we need to
add new CPU hotplug callback to call this SEAMCALL when the cpu is online again.

Any suggestion?  Thanks!
Dave Hansen April 29, 2022, 12:26 a.m. UTC | #14
On 4/28/22 17:11, Kai Huang wrote:
> This is true.  So I think w/o taking the lock is also fine, as the TDX module
> initialization is a state machine.  If any cpu goes offline during logical-cpu
> level initialization and TDH.SYS.LP.INIT isn't done on that cpu, then later the
> TDH.SYS.CONFIG will fail.  Similarly, if any cpu going offline causes
> TDH.SYS.KEY.CONFIG is not done for any package, then TDH.SYS.TDMR.INIT will
> fail.

Right.  The worst-case scenario is someone is mucking around with CPU
hotplug during TDX initialization is that TDX initialization will fail.

We *can* fix some of this at least and provide coherent error messages
with a pattern like this:

	cpus_read_lock();
	// check that all MADT-enumerated CPUs are online
	tdx_init();
	cpus_read_unlock();

That, of course, *does* prevent CPUs from going offline during
tdx_init().  It also provides a nice place for an error message:

	pr_warn("You offlined a CPU then want to use TDX?  Sod off.\n");

> A problem (I realized it exists in current implementation too) is shutting down
> the TDX module, which requires calling TDH.SYS.LP.SHUTDOWN on all BIOS-enabled
> cpus.  Kernel can do this SEAMCALL at most for all present cpus.  However when
> any cpu is offline, this SEAMCALL won't be called on it, and it seems we need to
> add new CPU hotplug callback to call this SEAMCALL when the cpu is online again.

Hold on a sec.  If you call TDH.SYS.LP.SHUTDOWN on any CPU, then TDX
stops working everywhere, right?  But, if someone offlines one CPU, we
don't want TDX to stop working everywhere.
Kai Huang April 29, 2022, 12:59 a.m. UTC | #15
On Thu, 2022-04-28 at 17:26 -0700, Dave Hansen wrote:
> On 4/28/22 17:11, Kai Huang wrote:
> > This is true.  So I think w/o taking the lock is also fine, as the TDX module
> > initialization is a state machine.  If any cpu goes offline during logical-cpu
> > level initialization and TDH.SYS.LP.INIT isn't done on that cpu, then later the
> > TDH.SYS.CONFIG will fail.  Similarly, if any cpu going offline causes
> > TDH.SYS.KEY.CONFIG is not done for any package, then TDH.SYS.TDMR.INIT will
> > fail.
> 
> Right.  The worst-case scenario is someone is mucking around with CPU
> hotplug during TDX initialization is that TDX initialization will fail.
> 
> We *can* fix some of this at least and provide coherent error messages
> with a pattern like this:
> 
> 	cpus_read_lock();
> 	// check that all MADT-enumerated CPUs are online
> 	tdx_init();
> 	cpus_read_unlock();
> 
> That, of course, *does* prevent CPUs from going offline during
> tdx_init().  It also provides a nice place for an error message:
> 
> 	pr_warn("You offlined a CPU then want to use TDX?  Sod off.\n");

Yes this is better.

The problem is how to check MADT-enumerated CPUs are online?

I checked the code, and it seems we can use 'num_processors + disabled_cpus' as
MADT-enumerated CPUs?  In fact, there should be no 'disabled_cpus' for TDX, so I
think:

	if (disabled_cpus || num_processors != num_online_cpus()) {
		pr_err("Initializing the TDX module requires all MADT-
enumerated CPUs being onine.");
		return -EINVAL;
	}

But I may have misunderstanding.

> 
> > A problem (I realized it exists in current implementation too) is shutting down
> > the TDX module, which requires calling TDH.SYS.LP.SHUTDOWN on all BIOS-enabled
> > cpus.  Kernel can do this SEAMCALL at most for all present cpus.  However when
> > any cpu is offline, this SEAMCALL won't be called on it, and it seems we need to
> > add new CPU hotplug callback to call this SEAMCALL when the cpu is online again.
> 
> Hold on a sec.  If you call TDH.SYS.LP.SHUTDOWN on any CPU, then TDX
> stops working everywhere, right?  
> 

Yes.

But tot shut down the TDX module, it's better to call  LP.SHUTDOWN on all 
logical cpus as suggested by spec.

> But, if someone offlines one CPU, we
> don't want TDX to stop working everywhere.

Right.   I am talking about when initializing fails due to any reason (i.e. -
ENOMEM), currently we shutdown the TDX module.  When shutting down the TDX
module, we want to call LP.SHUTDOWN on all logical cpus.  If there's any CPU
being offline when we do the shutdown, then LP.SHUTDOWN won't be called for that
cpu. 

But as you suggested above, if we have an early check whether all MADT-
enumerated CPUs are online and if not we return w/o shutting down the TDX
module, then if we shutdown the module the LP.SHUTDOWN will be called on all
cpus.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 1f29813b1646..c8af2ba6bb8a 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -92,8 +92,12 @@  static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 
 #ifdef CONFIG_INTEL_TDX_HOST
 void tdx_detect_cpu(struct cpuinfo_x86 *c);
+int tdx_detect(void);
+int tdx_init(void);
 #else
 static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
+static inline int tdx_detect(void) { return -ENODEV; }
+static inline int tdx_init(void) { return -ENODEV; }
 #endif /* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index ba2210001ea8..53093d4ad458 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -9,6 +9,8 @@ 
 
 #include <linux/types.h>
 #include <linux/cpumask.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/cpufeature.h>
@@ -45,12 +47,33 @@ 
 		((u32)(((_keyid_part) & 0xffffffffull) + 1))
 #define TDX_KEYID_NUM(_keyid_part)	((u32)((_keyid_part) >> 32))
 
+/*
+ * TDX module status during initialization
+ */
+enum tdx_module_status_t {
+	/* TDX module status is unknown */
+	TDX_MODULE_UNKNOWN,
+	/* TDX module is not loaded */
+	TDX_MODULE_NONE,
+	/* TDX module is loaded, but not initialized */
+	TDX_MODULE_LOADED,
+	/* TDX module is fully initialized */
+	TDX_MODULE_INITIALIZED,
+	/* TDX module is shutdown due to error during initialization */
+	TDX_MODULE_SHUTDOWN,
+};
+
 /* BIOS must configure SEAMRR registers for all cores consistently */
 static u64 seamrr_base, seamrr_mask;
 
 static u32 tdx_keyid_start;
 static u32 tdx_keyid_num;
 
+static enum tdx_module_status_t tdx_module_status;
+
+/* Prevent concurrent attempts on TDX detection and initialization */
+static DEFINE_MUTEX(tdx_module_lock);
+
 static bool __seamrr_enabled(void)
 {
 	return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS;
@@ -172,3 +195,202 @@  void tdx_detect_cpu(struct cpuinfo_x86 *c)
 	detect_seam(c);
 	detect_tdx_keyids(c);
 }
+
+static bool seamrr_enabled(void)
+{
+	/*
+	 * To detect any BIOS misconfiguration among cores, all logical
+	 * cpus must have been brought up at least once.  This is true
+	 * unless 'maxcpus' kernel command line is used to limit the
+	 * number of cpus to be brought up during boot time.  However
+	 * 'maxcpus' is basically an invalid operation mode due to the
+	 * MCE broadcast problem, and it should not be used on a TDX
+	 * capable machine.  Just do paranoid check here and do not
+	 * report SEAMRR as enabled in this case.
+	 */
+	if (!cpumask_equal(&cpus_booted_once_mask,
+					cpu_present_mask))
+		return false;
+
+	return __seamrr_enabled();
+}
+
+static bool tdx_keyid_sufficient(void)
+{
+	if (!cpumask_equal(&cpus_booted_once_mask,
+					cpu_present_mask))
+		return false;
+
+	/*
+	 * TDX requires at least two KeyIDs: one global KeyID to
+	 * protect the metadata of the TDX module and one or more
+	 * KeyIDs to run TD guests.
+	 */
+	return tdx_keyid_num >= 2;
+}
+
+static int __tdx_detect(void)
+{
+	/* The TDX module is not loaded if SEAMRR is disabled */
+	if (!seamrr_enabled()) {
+		pr_info("SEAMRR not enabled.\n");
+		goto no_tdx_module;
+	}
+
+	/*
+	 * Also do not report the TDX module as loaded if there's
+	 * no enough TDX private KeyIDs to run any TD guests.
+	 */
+	if (!tdx_keyid_sufficient()) {
+		pr_info("Number of TDX private KeyIDs too small: %u.\n",
+				tdx_keyid_num);
+		goto no_tdx_module;
+	}
+
+	/* Return -ENODEV until the TDX module is detected */
+no_tdx_module:
+	tdx_module_status = TDX_MODULE_NONE;
+	return -ENODEV;
+}
+
+static int init_tdx_module(void)
+{
+	/*
+	 * Return -EFAULT until all steps of TDX module
+	 * initialization are done.
+	 */
+	return -EFAULT;
+}
+
+static void shutdown_tdx_module(void)
+{
+	/* TODO: Shut down the TDX module */
+	tdx_module_status = TDX_MODULE_SHUTDOWN;
+}
+
+static int __tdx_init(void)
+{
+	int ret;
+
+	/*
+	 * Logical-cpu scope initialization requires calling one SEAMCALL
+	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
+	 * module also has such requirement.  Further more, configuring
+	 * the key of the global KeyID requires calling one SEAMCALL for
+	 * each package.  For simplicity, disable CPU hotplug in the whole
+	 * initialization process.
+	 *
+	 * It's perhaps better to check whether all BIOS-enabled cpus are
+	 * online before starting initializing, and return early if not.
+	 * But none of 'possible', 'present' and 'online' CPU masks
+	 * represents BIOS-enabled cpus.  For example, 'possible' mask is
+	 * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
+	 * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
+	 * online.
+	 */
+	cpus_read_lock();
+
+	ret = init_tdx_module();
+
+	/*
+	 * Shut down the TDX module in case of any error during the
+	 * initialization process.  It's meaningless to leave the TDX
+	 * module in any middle state of the initialization process.
+	 */
+	if (ret)
+		shutdown_tdx_module();
+
+	cpus_read_unlock();
+
+	return ret;
+}
+
+/**
+ * tdx_detect - Detect whether the TDX module has been loaded
+ *
+ * Detect whether the TDX module has been loaded and ready for
+ * initialization.  Only call this function when all cpus are
+ * already in VMX operation.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return:
+ *
+ * * -0:	The TDX module has been loaded and ready for
+ *		initialization.
+ * * -ENODEV:	The TDX module is not loaded.
+ * * -EPERM:	CPU is not in VMX operation.
+ * * -EFAULT:	Other internal fatal errors.
+ */
+int tdx_detect(void)
+{
+	int ret;
+
+	mutex_lock(&tdx_module_lock);
+
+	switch (tdx_module_status) {
+	case TDX_MODULE_UNKNOWN:
+		ret = __tdx_detect();
+		break;
+	case TDX_MODULE_NONE:
+		ret = -ENODEV;
+		break;
+	case TDX_MODULE_LOADED:
+	case TDX_MODULE_INITIALIZED:
+		ret = 0;
+		break;
+	case TDX_MODULE_SHUTDOWN:
+		ret = -EFAULT;
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EFAULT;
+	}
+
+	mutex_unlock(&tdx_module_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_detect);
+
+/**
+ * tdx_init - Initialize the TDX module
+ *
+ * Initialize the TDX module to make it ready to run TD guests.  This
+ * function should be called after tdx_detect() returns successful.
+ * Only call this function when all cpus are online and are in VMX
+ * operation.  CPU hotplug is temporarily disabled internally.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return:
+ *
+ * * -0:	The TDX module has been successfully initialized.
+ * * -ENODEV:	The TDX module is not loaded.
+ * * -EPERM:	The CPU which does SEAMCALL is not in VMX operation.
+ * * -EFAULT:	Other internal fatal errors.
+ */
+int tdx_init(void)
+{
+	int ret;
+
+	mutex_lock(&tdx_module_lock);
+
+	switch (tdx_module_status) {
+	case TDX_MODULE_NONE:
+		ret = -ENODEV;
+		break;
+	case TDX_MODULE_LOADED:
+		ret = __tdx_init();
+		break;
+	case TDX_MODULE_INITIALIZED:
+		ret = 0;
+		break;
+	default:
+		ret = -EFAULT;
+		break;
+	}
+	mutex_unlock(&tdx_module_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_init);