diff mbox series

[v2,04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand

Message ID 279af00f90a93491d5ec86672506146153909e5c.1647167475.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series TDX host kernel support | expand

Commit Message

Huang, Kai March 13, 2022, 10:49 a.m. UTC
The TDX module is essentially a CPU-attested software module running
in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
host and certain physical attacks.  The TDX module implements the
functions to build, tear down and start execution of the protected VMs
called Trusted Domains (TD).  Before the TDX module can be used to
create and run TD guests, it must be loaded into the SEAM Range Register
(SEAMRR) and properly initialized.  The TDX module is expected to be
loaded by BIOS before booting to the kernel, and the kernel is expected
to detect and initialize it, using the SEAMCALLs defined by TDX
architecture.

The TDX module can be initialized only once in its lifetime.  Instead
of always initializing it at boot time, this implementation chooses an
on-demand approach to initialize TDX until there is a real need (e.g
when requested by KVM).  This avoids consuming the memory that must be
allocated by kernel and given to the TDX module as metadata (~1/256th of
the TDX-usable memory), and also saves the time of initializing the TDX
module (and the metadata) when TDX is not used at all.  Initializing the
TDX module at runtime on-demand also is more flexible to support TDX
module runtime updating in the future (after updating the TDX module, it
needs to be initialized again).

Introduce two placeholders tdx_detect() and tdx_init() to detect and
initialize the TDX module on demand, with a state machine introduced to
orchestrate the entire process (in case of multiple callers).

To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs.  The
TDX module is reported as not loaded if either SEAMRR is not enabled, or
there are no enough TDX private KeyIDs to create any TD guest.  The TDX
module itself requires one global TDX private KeyID to crypto protect
its metadata.

And tdx_init() is currently empty.  The TDX module will be initialized
in multi-steps defined by the TDX architecture:

  1) Global initialization;
  2) Logical-CPU scope initialization;
  3) Enumerate the TDX module capabilities and platform configuration;
  4) Configure the TDX module about usable memory ranges and global
     KeyID information;
  5) Package-scope configuration for the global KeyID;
  6) Initialize usable memory ranges based on 4).

The TDX module can also be shut down at any time during its lifetime.
In case of any error during the initialization process, shut down the
module.  It's pointless to leave the module in any intermediate state
during the initialization.

SEAMCALLs used in the above steps (including shutting down the TDX
module) require SEAMRR being enabled and CPU being already in VMX
operation (VMXON has been done), otherwise it generates #UD.  So far
only KVM handles entering/leaving VMX operation (VMXON/VMXOFF).
Handling of VMXON/VMXOFF could be added to this implementation, for
instance, right around the SEAMCALL.  But in long term, more kernel
components are likely to use VMXON/VMXOFF, so a reference based approach
to enter/leave VMX operation may be needed.  KVM so far is the only user
of TDX, therefore this implementation chooses to not handle VMXON/VMXOFF
in tdx_detect() and tdx_init() but requires caller to guarantee that.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/tdx.h |   4 +
 arch/x86/virt/vmx/tdx.c    | 222 +++++++++++++++++++++++++++++++++++++
 2 files changed, 226 insertions(+)

Comments

Tian, Kevin March 23, 2022, 6:49 a.m. UTC | #1
> From: Kai Huang <kai.huang@intel.com>
> Sent: Sunday, March 13, 2022 6:50 PM
> +static bool seamrr_enabled(void)
> +{
> +	/*
> +	 * To detect any BIOS misconfiguration among cores, all logical
> +	 * cpus must have been brought up at least once.  This is true
> +	 * unless 'maxcpus' kernel command line is used to limit the
> +	 * number of cpus to be brought up during boot time.  However
> +	 * 'maxcpus' is basically an invalid operation mode due to the
> +	 * MCE broadcast problem, and it should not be used on a TDX
> +	 * capable machine.  Just do paranoid check here and WARN()
> +	 * if not the case.
> +	 */
> +	if (WARN_ON_ONCE(!cpumask_equal(&cpus_booted_once_mask,
> +					cpu_present_mask)))
> +		return false;
> +

cpu_present_mask doesn't always represent BIOS-enabled CPUs as it
can be further restricted by 'nr_cpus' and 'possible_cpus'. From this
angle above check doesn't capture all misconfigured boot options
which is incompatible with TDX. Then is such partial check still useful
or better to just document those restrictions and let TDX module
capture any violation later as what you explained in __init_tdx()?

Thanks
Kevin
Huang, Kai March 28, 2022, 1:57 a.m. UTC | #2
On Wed, 2022-03-23 at 19:49 +1300, Tian, Kevin wrote:
> > From: Kai Huang <kai.huang@intel.com>
> > Sent: Sunday, March 13, 2022 6:50 PM
> > +static bool seamrr_enabled(void)
> > +{
> > +     /*
> > +      * To detect any BIOS misconfiguration among cores, all logical
> > +      * cpus must have been brought up at least once.  This is true
> > +      * unless 'maxcpus' kernel command line is used to limit the
> > +      * number of cpus to be brought up during boot time.  However
> > +      * 'maxcpus' is basically an invalid operation mode due to the
> > +      * MCE broadcast problem, and it should not be used on a TDX
> > +      * capable machine.  Just do paranoid check here and WARN()
> > +      * if not the case.
> > +      */
> > +     if (WARN_ON_ONCE(!cpumask_equal(&cpus_booted_once_mask,
> > +                                     cpu_present_mask)))
> > +             return false;
> > +
> 
> cpu_present_mask doesn't always represent BIOS-enabled CPUs as it
> can be further restricted by 'nr_cpus' and 'possible_cpus'. From this
> angle above check doesn't capture all misconfigured boot options
> which is incompatible with TDX. Then is such partial check still useful
> or better to just document those restrictions and let TDX module
> capture any violation later as what you explained in __init_tdx()?
> 
> Thanks
> Kevin

The purpose of checking cpus_booted_once_mask aganist cpu_present_mask is NOT to
check whether all BIOS-enabled CPUs are brought up at least once.  Instead, the
purpose is to check whether all cpus that kernel can use are brought up at least
once (TDX-capable machine doesn't support ACPI CPU hotplug and all cpus are
marked as enabled in MADT table, therefore cpu_present_mask is used instead of
cpu_possible_mask).  This is used to make sure SEAMRR has been detected on all
cpus that kernel can use.  

Checking whether "all BIOS-enabled cpus are up" is not done here (neither in
this series as we discussed it seems there's no appropriate variable to
represent it).  And we just let TDH.SYS.CONFIG to fail if TDH.SYS.LP.INIT is not
done on all BIOS-enabled CPUs.
Tian, Kevin March 28, 2022, 8:26 a.m. UTC | #3
> From: Huang, Kai <kai.huang@intel.com>
> Sent: Monday, March 28, 2022 9:58 AM
> 
> On Wed, 2022-03-23 at 19:49 +1300, Tian, Kevin wrote:
> > > From: Kai Huang <kai.huang@intel.com>
> > > Sent: Sunday, March 13, 2022 6:50 PM
> > > +static bool seamrr_enabled(void)
> > > +{
> > > +     /*
> > > +      * To detect any BIOS misconfiguration among cores, all logical
> > > +      * cpus must have been brought up at least once.  This is true
> > > +      * unless 'maxcpus' kernel command line is used to limit the
> > > +      * number of cpus to be brought up during boot time.  However
> > > +      * 'maxcpus' is basically an invalid operation mode due to the
> > > +      * MCE broadcast problem, and it should not be used on a TDX
> > > +      * capable machine.  Just do paranoid check here and WARN()
> > > +      * if not the case.
> > > +      */
> > > +     if (WARN_ON_ONCE(!cpumask_equal(&cpus_booted_once_mask,
> > > +                                     cpu_present_mask)))
> > > +             return false;
> > > +
> >
> > cpu_present_mask doesn't always represent BIOS-enabled CPUs as it
> > can be further restricted by 'nr_cpus' and 'possible_cpus'. From this
> > angle above check doesn't capture all misconfigured boot options
> > which is incompatible with TDX. Then is such partial check still useful
> > or better to just document those restrictions and let TDX module
> > capture any violation later as what you explained in __init_tdx()?
> >
> > Thanks
> > Kevin
> 
> The purpose of checking cpus_booted_once_mask aganist
> cpu_present_mask is NOT to
> check whether all BIOS-enabled CPUs are brought up at least once.  Instead,
> the
> purpose is to check whether all cpus that kernel can use are brought up at
> least
> once (TDX-capable machine doesn't support ACPI CPU hotplug and all cpus
> are
> marked as enabled in MADT table, therefore cpu_present_mask is used
> instead of
> cpu_possible_mask).  This is used to make sure SEAMRR has been detected
> on all
> cpus that kernel can use.
> 
> Checking whether "all BIOS-enabled cpus are up" is not done here (neither in
> this series as we discussed it seems there's no appropriate variable to
> represent it).  And we just let TDH.SYS.CONFIG to fail if TDH.SYS.LP.INIT is not
> done on all BIOS-enabled CPUs.
> 

TDX requires all BIOS-enabled CPUs to execute certain SEAMCALLs for TDX
initialization. 

cpu_present_mask does not always represent BIOS-enabled CPUs due
to those boot options. Then why do we care whether CPUs in this mask
(if only representing a subset of BIOS-enabled CPUs) are at least brought
up once? It will fail at TDH.SYS.CONFIG anyway.

btw your comment said that 'maxcpus' is basically an invalid mode
due to MCE broadcase problem. I didn't find any code to block it when
MCE is enabled, thus wonder the rationale behind and whether that
rationale can be brought to this series (i.e. no check against those
conflicting boot options and just let SEAMCALL itself to detect and fail).

@Thomas, any guidance here?

Thanks
Kevin
Huang, Kai March 28, 2022, 9:24 a.m. UTC | #4
> 
> cpu_present_mask does not always represent BIOS-enabled CPUs due
> to those boot options. Then why do we care whether CPUs in this mask
> (if only representing a subset of BIOS-enabled CPUs) are at least brought
> up once? It will fail at TDH.SYS.CONFIG anyway.

As I said, this is used to make sure SEAMRR has been detected on all cpus, so
that any BIOS misconfiguration on SEAMRR has been detected.  Otherwise,
seamrr_enabled() may not be reliable (theoretically).

Alternatively, I think we can also add check to disable TDX when 'maxcpus' has
been specified, but I think the current way is better.

> 
> btw your comment said that 'maxcpus' is basically an invalid mode
> due to MCE broadcase problem. I didn't find any code to block it when
> MCE is enabled, 

Please see below comment in cpu_smt_allowed():

static inline bool cpu_smt_allowed(unsigned int cpu)
{
	...
        /*
         * On x86 it's required to boot all logical CPUs at least once so
         * that the init code can get a chance to set CR4.MCE on each
         * CPU. Otherwise, a broadcasted MCE observing CR4.MCE=0b on any
         * core will shutdown the machine.
         */
	 return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
}

> thus wonder the rationale behind and whether that
> rationale can be brought to this series (i.e. no check against those
> conflicting boot options and just let SEAMCALL itself to detect and fail).
> 
> @Thomas, any guidance here?
> 
> Thanks
> Kevin
Tian, Kevin March 28, 2022, 11:47 a.m. UTC | #5
> From: Huang, Kai <kai.huang@intel.com>
> Sent: Monday, March 28, 2022 5:24 PM
> >
> > cpu_present_mask does not always represent BIOS-enabled CPUs due
> > to those boot options. Then why do we care whether CPUs in this mask
> > (if only representing a subset of BIOS-enabled CPUs) are at least brought
> > up once? It will fail at TDH.SYS.CONFIG anyway.
> 
> As I said, this is used to make sure SEAMRR has been detected on all cpus, so
> that any BIOS misconfiguration on SEAMRR has been detected.  Otherwise,
> seamrr_enabled() may not be reliable (theoretically).

*all cpus* is questionable. 

Say BIOS enabled 8 CPUs: [0 - 7]

cpu_present_map covers [0 - 5], due to nr_cpus=6

You compared cpus_booted_once_mask to cpu_present_mask so if maxcpus
is set to a number < nr_cpus SEAMRR is considered disabled because you
cannot verify CPUs between [max_cpus, nr_cpus). If following the same
rationale then you also need a proper way to detect the case where nr_cpus
< BIOS enabled number i.e. when you cannot verify SEAMRR on CPUs
between [nr_cpus, 7]. otherwise this check is just incomplete.

But the entire check is actually unnecessary. You just need to verify SEAMRR
and do TDX cpu init on online CPUs. Any gap between online ones and BIOS
enabled ones will be caught by the TDX module at TDH.SYS.CONFIG point.

> 
> Alternatively, I think we can also add check to disable TDX when 'maxcpus'
> has
> been specified, but I think the current way is better.
> 
> >
> > btw your comment said that 'maxcpus' is basically an invalid mode
> > due to MCE broadcase problem. I didn't find any code to block it when
> > MCE is enabled,
> 
> Please see below comment in cpu_smt_allowed():
> 
> static inline bool cpu_smt_allowed(unsigned int cpu)
> {
> 	...
>         /*
>          * On x86 it's required to boot all logical CPUs at least once so
>          * that the init code can get a chance to set CR4.MCE on each
>          * CPU. Otherwise, a broadcasted MCE observing CR4.MCE=0b on any
>          * core will shutdown the machine.
>          */
> 	 return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
> }

I saw that code. My point is more about your statement that maxcpus
is almost invalid due to above situation then why didn't we do anything
to document such restriction or throw out a warning when it's
misconfigured...

> 
> > thus wonder the rationale behind and whether that
> > rationale can be brought to this series (i.e. no check against those
> > conflicting boot options and just let SEAMCALL itself to detect and fail).
> >
> > @Thomas, any guidance here?
> >
> > Thanks
> > Kevin
Huang, Kai March 28, 2022, 10:55 p.m. UTC | #6
On Tue, 2022-03-29 at 00:47 +1300, Tian, Kevin wrote:
> > From: Huang, Kai <kai.huang@intel.com>
> > Sent: Monday, March 28, 2022 5:24 PM
> > > 
> > > cpu_present_mask does not always represent BIOS-enabled CPUs due
> > > to those boot options. Then why do we care whether CPUs in this mask
> > > (if only representing a subset of BIOS-enabled CPUs) are at least brought
> > > up once? It will fail at TDH.SYS.CONFIG anyway.
> > 
> > As I said, this is used to make sure SEAMRR has been detected on all cpus,
> > so
> > that any BIOS misconfiguration on SEAMRR has been detected.  Otherwise,
> > seamrr_enabled() may not be reliable (theoretically).
> 
> *all cpus* is questionable.
> 
> Say BIOS enabled 8 CPUs: [0 - 7]
> 
> cpu_present_map covers [0 - 5], due to nr_cpus=6
> 
> You compared cpus_booted_once_mask to cpu_present_mask so if maxcpus
> is set to a number < nr_cpus SEAMRR is considered disabled because you
> cannot verify CPUs between [max_cpus, nr_cpus). 

SEAMRR is not considered as disabled in this case, at least in my intention.  My
understanding on the spec is if SEAMRR is configured as enabled on one core (the
SEAMRR MSRs are core-scope), the SEAMCALL instruction can work on that core.  It
is TDX's requirement that some SEAMCALL needs to be done on all BIOS-enabled
CPUs to finish TDX initialization, but not SEAM's.

From this perspective, if we forget TDX at this moment but talk about SEAM
alone, it might make sense to not just treat SEAMRR as disabled if kernel usable
cpus are limited by 'nr_cpus'.  The chance that BIOS misconfigured SEAMRR is
really rare.  If kernel can detect potential BIOS misconfiguration, it should do
it.  Otherwise, perhaps it's more reasonable not to just treat SEAM as disabled.


> If following the same
> rationale then you also need a proper way to detect the case where nr_cpus
> < BIOS enabled number i.e. when you cannot verify SEAMRR on CPUs
> between [nr_cpus, 7]. otherwise this check is just incomplete.
> 
> But the entire check is actually unnecessary. You just need to verify SEAMRR
> and do TDX cpu init on online CPUs. Any gap between online ones and BIOS
> enabled ones will be caught by the TDX module at TDH.SYS.CONFIG point.

This is equivalent to not having the paranoid check in seamrr_enabled(). By
detecting SEAMRR in identify_cpu(), the detection has already been done for any
online cpu.

> 
> > 
> > Alternatively, I think we can also add check to disable TDX when 'maxcpus'
> > has
> > been specified, but I think the current way is better.
> > 
> > > 
> > > btw your comment said that 'maxcpus' is basically an invalid mode
> > > due to MCE broadcase problem. I didn't find any code to block it when
> > > MCE is enabled,
> > 
> > Please see below comment in cpu_smt_allowed():
> > 
> > static inline bool cpu_smt_allowed(unsigned int cpu)
> > {
> >       ...
> >         /*
> >          * On x86 it's required to boot all logical CPUs at least once so
> >          * that the init code can get a chance to set CR4.MCE on each
> >          * CPU. Otherwise, a broadcasted MCE observing CR4.MCE=0b on any
> >          * core will shutdown the machine.
> >          */
> >        return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
> > }
> 
> I saw that code. My point is more about your statement that maxcpus
> is almost invalid due to above situation then why didn't we do anything
> to document such restriction or throw out a warning when it's
> misconfigured...

The sentence "'maxcpus' is an invalid operation mode due to the MCE broadcast
problem" was from Thomas.  Perhaps I should not just put it into the comment.

Also, Thomas suggested:

"you should have a paranoia check which checks for the maxcpus
command line parameter and if it's there and smaller than the number of
present CPUs then you just refuse to enable TDX.

Alternatively you just have a separate cpumask tdx_availabe_mask and
keep track of the CPUs which have been checked. When TDX is initialized
you then can do:

    if (!cpumask_equal(cpu_present_mask, tdx_available_mask))
    	     return;
"

I found we can just use cpus_booted_once_mask, instead of tdx_available_mask, so
I used the second way.  And instead of putting the check when initializing TDX,
I put to seamrr_enabled() since I guess it's more reasonable to be here as the
logic is to make sure SEAMRR has been detected on all cpus.

Hi Thomas,

If you see this, sorry for quoting your words here.  Just want to have a better
discussion.  And appreciate if you can have some guidance here.

> 
> > 
> > > thus wonder the rationale behind and whether that
> > > rationale can be brought to this series (i.e. no check against those
> > > conflicting boot options and just let SEAMCALL itself to detect and fail).
> > > 
> > > @Thomas, any guidance here?
> > > 
> > > Thanks
> > > Kevin
>
Tian, Kevin March 29, 2022, 2:36 a.m. UTC | #7
> From: Huang, Kai <kai.huang@intel.com>
> Sent: Tuesday, March 29, 2022 6:55 AM
> 
> On Tue, 2022-03-29 at 00:47 +1300, Tian, Kevin wrote:
> > > From: Huang, Kai <kai.huang@intel.com>
> > > Sent: Monday, March 28, 2022 5:24 PM
> > > >
> > > > cpu_present_mask does not always represent BIOS-enabled CPUs due
> > > > to those boot options. Then why do we care whether CPUs in this mask
> > > > (if only representing a subset of BIOS-enabled CPUs) are at least
> brought
> > > > up once? It will fail at TDH.SYS.CONFIG anyway.
> > >
> > > As I said, this is used to make sure SEAMRR has been detected on all cpus,
> > > so
> > > that any BIOS misconfiguration on SEAMRR has been detected.
> Otherwise,
> > > seamrr_enabled() may not be reliable (theoretically).
> >
> > *all cpus* is questionable.
> >
> > Say BIOS enabled 8 CPUs: [0 - 7]
> >
> > cpu_present_map covers [0 - 5], due to nr_cpus=6
> >
> > You compared cpus_booted_once_mask to cpu_present_mask so if
> maxcpus
> > is set to a number < nr_cpus SEAMRR is considered disabled because you
> > cannot verify CPUs between [max_cpus, nr_cpus).
> 
> SEAMRR is not considered as disabled in this case, at least in my intention.

the function is called seamrr_enabled(), and false is returned if above
check is not passed. So it is the intention from the code.

> My
> understanding on the spec is if SEAMRR is configured as enabled on one core
> (the
> SEAMRR MSRs are core-scope), the SEAMCALL instruction can work on that
> core.  It
> is TDX's requirement that some SEAMCALL needs to be done on all BIOS-
> enabled
> CPUs to finish TDX initialization, but not SEAM's.
> 
> From this perspective, if we forget TDX at this moment but talk about SEAM
> alone, it might make sense to not just treat SEAMRR as disabled if kernel
> usable
> cpus are limited by 'nr_cpus'.  The chance that BIOS misconfigured SEAMRR is
> really rare.  If kernel can detect potential BIOS misconfiguration, it should do
> it.  Otherwise, perhaps it's more reasonable not to just treat SEAM as
> disabled.

My problem is just that you didn't adopt consistency policy for CPUs in
[maxcpus, nr_cpus) and CPUs in [nr_cpus, nr_bios_enabled_cpus). This is
the only trouble to me no matter what policy you want to pursue...

> 
> 
> > If following the same
> > rationale then you also need a proper way to detect the case where
> nr_cpus
> > < BIOS enabled number i.e. when you cannot verify SEAMRR on CPUs
> > between [nr_cpus, 7]. otherwise this check is just incomplete.
> >
> > But the entire check is actually unnecessary. You just need to verify
> SEAMRR
> > and do TDX cpu init on online CPUs. Any gap between online ones and BIOS
> > enabled ones will be caught by the TDX module at TDH.SYS.CONFIG point.
> 
> This is equivalent to not having the paranoid check in seamrr_enabled(). By
> detecting SEAMRR in identify_cpu(), the detection has already been done for
> any
> online cpu.
> 
> >
> > >
> > > Alternatively, I think we can also add check to disable TDX when
> 'maxcpus'
> > > has
> > > been specified, but I think the current way is better.
> > >
> > > >
> > > > btw your comment said that 'maxcpus' is basically an invalid mode
> > > > due to MCE broadcase problem. I didn't find any code to block it when
> > > > MCE is enabled,
> > >
> > > Please see below comment in cpu_smt_allowed():
> > >
> > > static inline bool cpu_smt_allowed(unsigned int cpu)
> > > {
> > >       ...
> > >         /*
> > >          * On x86 it's required to boot all logical CPUs at least once so
> > >          * that the init code can get a chance to set CR4.MCE on each
> > >          * CPU. Otherwise, a broadcasted MCE observing CR4.MCE=0b on
> any
> > >          * core will shutdown the machine.
> > >          */
> > >        return !cpumask_test_cpu(cpu, &cpus_booted_once_mask);
> > > }
> >
> > I saw that code. My point is more about your statement that maxcpus
> > is almost invalid due to above situation then why didn't we do anything
> > to document such restriction or throw out a warning when it's
> > misconfigured...
> 
> The sentence "'maxcpus' is an invalid operation mode due to the MCE
> broadcast
> problem" was from Thomas.  Perhaps I should not just put it into the
> comment.
> 
> Also, Thomas suggested:
> 
> "you should have a paranoia check which checks for the maxcpus
> command line parameter and if it's there and smaller than the number of
> present CPUs then you just refuse to enable TDX.
> 
> Alternatively you just have a separate cpumask tdx_availabe_mask and
> keep track of the CPUs which have been checked. When TDX is initialized
> you then can do:
> 
>     if (!cpumask_equal(cpu_present_mask, tdx_available_mask))
>     	     return;
> "
> 
> I found we can just use cpus_booted_once_mask, instead of
> tdx_available_mask, so
> I used the second way.  And instead of putting the check when initializing TDX,
> I put to seamrr_enabled() since I guess it's more reasonable to be here as the
> logic is to make sure SEAMRR has been detected on all cpus.

I'm not sure whether Thomas's comment just takes maxcpus as an example
which should be extended to other boot options like nr_cpus or he really
only cares about maxcpus. 

> 
> Hi Thomas,
> 
> If you see this, sorry for quoting your words here.  Just want to have a better
> discussion.  And appreciate if you can have some guidance here.
> 
> >
> > >
> > > > thus wonder the rationale behind and whether that
> > > > rationale can be brought to this series (i.e. no check against those
> > > > conflicting boot options and just let SEAMCALL itself to detect and fail).
> > > >
> > > > @Thomas, any guidance here?
> > > >
> > > > Thanks
> > > > Kevin
> >
Huang, Kai March 29, 2022, 3:10 a.m. UTC | #8
> > > 
> > > *all cpus* is questionable.
> > > 
> > > Say BIOS enabled 8 CPUs: [0 - 7]
> > > 
> > > cpu_present_map covers [0 - 5], due to nr_cpus=6
> > > 
> > > You compared cpus_booted_once_mask to cpu_present_mask so if
> > maxcpus
> > > is set to a number < nr_cpus SEAMRR is considered disabled because you
> > > cannot verify CPUs between [max_cpus, nr_cpus).

Sorry my bad.  We can verify CPUs between [max_cpus, nr_cpus).  When any cpu
within that range becomes online, the detection code is run.  The paranoid check
in seamrr_enabled() is used to check whether all cpus within [max_cpus, nr_cpus)
(if there's any -- cpus within [0, max_cpus) are up during boot) have been
brought up at least once. 

> > 
> > SEAMRR is not considered as disabled in this case, at least in my intention.
> 
> the function is called seamrr_enabled(), and false is returned if above
> check is not passed. So it is the intention from the code.

The false is returned if something error is discovered among cpus [0 -
present_cpus].  It returns true even if we cannot verify [present_cpus,
bios_enabled_cpus).

> 
> > My
> > understanding on the spec is if SEAMRR is configured as enabled on one core
> > (the
> > SEAMRR MSRs are core-scope), the SEAMCALL instruction can work on that
> > core.  It
> > is TDX's requirement that some SEAMCALL needs to be done on all BIOS-
> > enabled
> > CPUs to finish TDX initialization, but not SEAM's.
> > 
> > From this perspective, if we forget TDX at this moment but talk about SEAM
> > alone, it might make sense to not just treat SEAMRR as disabled if kernel
> > usable
> > cpus are limited by 'nr_cpus'.  The chance that BIOS misconfigured SEAMRR is
> > really rare.  If kernel can detect potential BIOS misconfiguration, it
> > should do
> > it.  Otherwise, perhaps it's more reasonable not to just treat SEAM as
> > disabled.
> 
> My problem is just that you didn't adopt consistency policy for CPUs in
> [maxcpus, nr_cpus) and CPUs in [nr_cpus, nr_bios_enabled_cpus). This is
> the only trouble to me no matter what policy you want to pursue...

Let's separate SEAMRR detection and TDX initialization.  The paranoid check is
only for SEAM detection, but not for TDX initialization.  As I said, it is TDX's
requirement that some SEAMCALL must be run on all bios-enabled cpus, but not
SEAM's.
Huang, Kai March 29, 2022, 3:17 a.m. UTC | #9
On Tue, 2022-03-29 at 16:10 +1300, Kai Huang wrote:
> > > > 
> > > > *all cpus* is questionable.
> > > > 
> > > > Say BIOS enabled 8 CPUs: [0 - 7]
> > > > 
> > > > cpu_present_map covers [0 - 5], due to nr_cpus=6
> > > > 
> > > > You compared cpus_booted_once_mask to cpu_present_mask so if
> > > maxcpus
> > > > is set to a number < nr_cpus SEAMRR is considered disabled because you
> > > > cannot verify CPUs between [max_cpus, nr_cpus).
> 
> Sorry my bad.  We can verify CPUs between [max_cpus, nr_cpus).  When any cpu
> within that range becomes online, the detection code is run.  The paranoid
> check
> in seamrr_enabled() is used to check whether all cpus within [max_cpus,
> nr_cpus)
> (if there's any -- cpus within [0, max_cpus) are up during boot) have been
> brought up at least once. 
> 
> > > 
> > > SEAMRR is not considered as disabled in this case, at least in my
> > > intention.
> > 
> > the function is called seamrr_enabled(), and false is returned if above
> > check is not passed. So it is the intention from the code.
> 
> The false is returned if something error is discovered among cpus [0 -
> present_cpus].  It returns true even if we cannot verify [present_cpus,
> bios_enabled_cpus).

Sorry typo.  Not "something error is discovered among cpus [0 - present_cpus)",
but "when there's any cpu within [0 - present_cpus) hasn't been brought up
once".
 
> 
> > 
> > > My
> > > understanding on the spec is if SEAMRR is configured as enabled on one
> > > core
> > > (the
> > > SEAMRR MSRs are core-scope), the SEAMCALL instruction can work on that
> > > core.  It
> > > is TDX's requirement that some SEAMCALL needs to be done on all BIOS-
> > > enabled
> > > CPUs to finish TDX initialization, but not SEAM's.
> > > 
> > > From this perspective, if we forget TDX at this moment but talk about SEAM
> > > alone, it might make sense to not just treat SEAMRR as disabled if kernel
> > > usable
> > > cpus are limited by 'nr_cpus'.  The chance that BIOS misconfigured SEAMRR
> > > is
> > > really rare.  If kernel can detect potential BIOS misconfiguration, it
> > > should do
> > > it.  Otherwise, perhaps it's more reasonable not to just treat SEAM as
> > > disabled.
> > 
> > My problem is just that you didn't adopt consistency policy for CPUs in
> > [maxcpus, nr_cpus) and CPUs in [nr_cpus, nr_bios_enabled_cpus). This is
> > the only trouble to me no matter what policy you want to pursue...
> 
> Let's separate SEAMRR detection and TDX initialization.  The paranoid check is
> only for SEAM detection, but not for TDX initialization.  As I said, it is
> TDX's
> requirement that some SEAMCALL must be run on all bios-enabled cpus, but not
> SEAM's.
> 
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 605d87ab580e..b526d41c4bbf 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -83,8 +83,12 @@  static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 
 #ifdef CONFIG_INTEL_TDX_HOST
 void tdx_detect_cpu(struct cpuinfo_x86 *c);
+int tdx_detect(void);
+int tdx_init(void);
 #else
 static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
+static inline int tdx_detect(void) { return -ENODEV; }
+static inline int tdx_init(void) { return -ENODEV; }
 #endif /* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx.c b/arch/x86/virt/vmx/tdx.c
index ba2210001ea8..a3517e221578 100644
--- a/arch/x86/virt/vmx/tdx.c
+++ b/arch/x86/virt/vmx/tdx.c
@@ -9,6 +9,8 @@ 
 
 #include <linux/types.h>
 #include <linux/cpumask.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/cpufeature.h>
@@ -45,12 +47,33 @@ 
 		((u32)(((_keyid_part) & 0xffffffffull) + 1))
 #define TDX_KEYID_NUM(_keyid_part)	((u32)((_keyid_part) >> 32))
 
+/*
+ * TDX module status during initialization
+ */
+enum tdx_module_status_t {
+	/* TDX module status is unknown */
+	TDX_MODULE_UNKNOWN,
+	/* TDX module is not loaded */
+	TDX_MODULE_NONE,
+	/* TDX module is loaded, but not initialized */
+	TDX_MODULE_LOADED,
+	/* TDX module is fully initialized */
+	TDX_MODULE_INITIALIZED,
+	/* TDX module is shutdown due to error during initialization */
+	TDX_MODULE_SHUTDOWN,
+};
+
 /* BIOS must configure SEAMRR registers for all cores consistently */
 static u64 seamrr_base, seamrr_mask;
 
 static u32 tdx_keyid_start;
 static u32 tdx_keyid_num;
 
+static enum tdx_module_status_t tdx_module_status;
+
+/* Prevent concurrent attempts on TDX detection and initialization */
+static DEFINE_MUTEX(tdx_module_lock);
+
 static bool __seamrr_enabled(void)
 {
 	return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS;
@@ -172,3 +195,202 @@  void tdx_detect_cpu(struct cpuinfo_x86 *c)
 	detect_seam(c);
 	detect_tdx_keyids(c);
 }
+
+static bool seamrr_enabled(void)
+{
+	/*
+	 * To detect any BIOS misconfiguration among cores, all logical
+	 * cpus must have been brought up at least once.  This is true
+	 * unless 'maxcpus' kernel command line is used to limit the
+	 * number of cpus to be brought up during boot time.  However
+	 * 'maxcpus' is basically an invalid operation mode due to the
+	 * MCE broadcast problem, and it should not be used on a TDX
+	 * capable machine.  Just do paranoid check here and WARN()
+	 * if not the case.
+	 */
+	if (WARN_ON_ONCE(!cpumask_equal(&cpus_booted_once_mask,
+					cpu_present_mask)))
+		return false;
+
+	return __seamrr_enabled();
+}
+
+static bool tdx_keyid_sufficient(void)
+{
+	if (WARN_ON_ONCE(!cpumask_equal(&cpus_booted_once_mask,
+					cpu_present_mask)))
+		return false;
+
+	/*
+	 * TDX requires at least two KeyIDs: one global KeyID to
+	 * protect the metadata of the TDX module and one or more
+	 * KeyIDs to run TD guests.
+	 */
+	return tdx_keyid_num >= 2;
+}
+
+static int __tdx_detect(void)
+{
+	/* The TDX module is not loaded if SEAMRR is disabled */
+	if (!seamrr_enabled()) {
+		pr_info("SEAMRR not enabled.\n");
+		goto no_tdx_module;
+	}
+
+	/*
+	 * Also do not report the TDX module as loaded if there's
+	 * no enough TDX private KeyIDs to run any TD guests.
+	 */
+	if (!tdx_keyid_sufficient()) {
+		pr_info("Number of TDX private KeyIDs too small: %u.\n",
+				tdx_keyid_num);
+		goto no_tdx_module;
+	}
+
+	/* Return -ENODEV until the TDX module is detected */
+no_tdx_module:
+	tdx_module_status = TDX_MODULE_NONE;
+	return -ENODEV;
+}
+
+static int init_tdx_module(void)
+{
+	/*
+	 * Return -EFAULT until all steps of TDX module
+	 * initialization are done.
+	 */
+	return -EFAULT;
+}
+
+static void shutdown_tdx_module(void)
+{
+	/* TODO: Shut down the TDX module */
+	tdx_module_status = TDX_MODULE_SHUTDOWN;
+}
+
+static int __tdx_init(void)
+{
+	int ret;
+
+	/*
+	 * Logical-cpu scope initialization requires calling one SEAMCALL
+	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
+	 * module also has such requirement.  Further more, configuring
+	 * the key of the global KeyID requires calling one SEAMCALL for
+	 * each package.  For simplicity, disable CPU hotplug in the whole
+	 * initialization process.
+	 *
+	 * It's perhaps better to check whether all BIOS-enabled cpus are
+	 * online before starting initializing, and return early if not.
+	 * But none of 'possible', 'present' and 'online' CPU masks
+	 * represents BIOS-enabled cpus.  For example, 'possible' mask is
+	 * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
+	 * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
+	 * online.
+	 */
+	cpus_read_lock();
+
+	ret = init_tdx_module();
+
+	/*
+	 * Shut down the TDX module in case of any error during the
+	 * initialization process.  It's meaningless to leave the TDX
+	 * module in any middle state of the initialization process.
+	 */
+	if (ret)
+		shutdown_tdx_module();
+
+	cpus_read_unlock();
+
+	return ret;
+}
+
+/**
+ * tdx_detect - Detect whether the TDX module has been loaded
+ *
+ * Detect whether the TDX module has been loaded and ready for
+ * initialization.  Only call this function when all cpus are
+ * already in VMX operation.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return:
+ *
+ * * -0:	The TDX module has been loaded and ready for
+ *		initialization.
+ * * -ENODEV:	The TDX module is not loaded.
+ * * -EPERM:	CPU is not in VMX operation.
+ * * -EFAULT:	Other internal fatal errors.
+ */
+int tdx_detect(void)
+{
+	int ret;
+
+	mutex_lock(&tdx_module_lock);
+
+	switch (tdx_module_status) {
+	case TDX_MODULE_UNKNOWN:
+		ret = __tdx_detect();
+		break;
+	case TDX_MODULE_NONE:
+		ret = -ENODEV;
+		break;
+	case TDX_MODULE_LOADED:
+	case TDX_MODULE_INITIALIZED:
+		ret = 0;
+		break;
+	case TDX_MODULE_SHUTDOWN:
+		ret = -EFAULT;
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EFAULT;
+	}
+
+	mutex_unlock(&tdx_module_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_detect);
+
+/**
+ * tdx_init - Initialize the TDX module
+ *
+ * Initialize the TDX module to make it ready to run TD guests.  This
+ * function should be called after tdx_detect() returns successful.
+ * Only call this function when all cpus are online and are in VMX
+ * operation.  CPU hotplug is temporarily disabled internally.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return:
+ *
+ * * -0:	The TDX module has been successfully initialized.
+ * * -ENODEV:	The TDX module is not loaded.
+ * * -EPERM:	The CPU which does SEAMCALL is not in VMX operation.
+ * * -EFAULT:	Other internal fatal errors.
+ */
+int tdx_init(void)
+{
+	int ret;
+
+	mutex_lock(&tdx_module_lock);
+
+	switch (tdx_module_status) {
+	case TDX_MODULE_NONE:
+		ret = -ENODEV;
+		break;
+	case TDX_MODULE_LOADED:
+		ret = __tdx_init();
+		break;
+	case TDX_MODULE_INITIALIZED:
+		ret = 0;
+		break;
+	default:
+		ret = -EFAULT;
+		break;
+	}
+	mutex_unlock(&tdx_module_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_init);