diff mbox series

[v10,02/16] x86/virt/tdx: Detect TDX during kernel boot

Message ID 35a2421ca97d9e8dd938dcd744674602f4faa617.1678111292.git.kai.huang@intel.com (mailing list archive)
State New
Headers show
Series TDX host kernel support | expand

Commit Message

Huang, Kai March 6, 2023, 2:13 p.m. UTC
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks.  A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.

Pre-TDX Intel hardware has support for a memory encryption architecture
called MKTME.  The memory encryption hardware underpinning MKTME is also
used for Intel TDX.  TDX ends up "stealing" some of the physical address
space from the MKTME architecture for crypto-protection to VMs.  The
BIOS is responsible for partitioning the "KeyID" space between legacy
MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
KeyIDs' or 'TDX KeyIDs' for short.

TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
private KeyIDs are consistently and correctly programmed by the BIOS
across all CPU packages before it enables TDX on any CPU core.  A valid
TDX private KeyID range on BSP indicates TDX has been enabled by the
BIOS, otherwise the BIOS is buggy.

The TDX module is expected to be loaded by the BIOS when it enables TDX,
but the kernel needs to properly initialize it before it can be used to
create and run any TDX guests.  The TDX module will be initialized by
the KVM subsystem when KVM wants to use TDX.

Add a new early_initcall(tdx_init) to detect the TDX by detecting TDX
private KeyIDs.  Also add a function to report whether TDX is enabled by
the BIOS.  Similar to AMD SME, kexec() will use it to determine whether
cache flush is needed.

The TDX module itself requires one TDX KeyID as the 'TDX global KeyID'
to protect its metadata.  Each TDX guest also needs a TDX KeyID for its
own protection.  Just use the first TDX KeyID as the global KeyID and
leave the rest for TDX guests.  If no TDX KeyID is left for TDX guests,
disable TDX as initializing the TDX module alone is useless.

To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
to opt-in TDX host kernel support (to distinguish with TDX guest kernel
support).  So far only KVM uses TDX.  Make the new config option depend
on KVM_INTEL.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---

v9 -> v10:
 - No change.

v8 -> v9:
 - Moved MSR macro from local tdx.h to <asm/msr-index.h> (Dave).
 - Moved reserving the TDX global KeyID from later patch to here.
 - Changed 'tdx_keyid_start' and 'nr_tdx_keyids' to
   'tdx_guest_keyid_start' and 'tdx_nr_guest_keyids' to represent KeyIDs
   can be used by guest. (Dave)
 - Slight changelog update according to above changes.

v7 -> v8: (address Dave's comments)
 - Improved changelog:
    - "KVM user" -> "The TDX module will be initialized by KVM when ..."
    - Changed "tdx_int" part to "Just say what this patch is doing"
    - Fixed the last sentence of "kexec()" paragraph
  - detect_tdx() -> record_keyid_partitioning()
  - Improved how to calculate tdx_keyid_start.
  - tdx_keyid_num -> nr_tdx_keyids.
  - Improved dmesg printing.
  - Add comment to clear_tdx().

v6 -> v7:
 - No change.

v5 -> v6:
 - Removed SEAMRR detection to make code simpler.
 - Removed the 'default N' in the KVM_TDX_HOST Kconfig (Kirill).
 - Changed to use 'obj-y' in arch/x86/virt/vmx/tdx/Makefile (Kirill).

---
 arch/x86/Kconfig                 |  12 ++++
 arch/x86/Makefile                |   2 +
 arch/x86/include/asm/msr-index.h |   3 +
 arch/x86/include/asm/tdx.h       |   7 +++
 arch/x86/virt/Makefile           |   2 +
 arch/x86/virt/vmx/Makefile       |   2 +
 arch/x86/virt/vmx/tdx/Makefile   |   2 +
 arch/x86/virt/vmx/tdx/tdx.c      | 105 +++++++++++++++++++++++++++++++
 8 files changed, 135 insertions(+)
 create mode 100644 arch/x86/virt/Makefile
 create mode 100644 arch/x86/virt/vmx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/tdx.c

Comments

David Hildenbrand March 16, 2023, 12:48 p.m. UTC | #1
On 06.03.23 15:13, Kai Huang wrote:
> Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks.  A CPU-attested software module
> called 'the TDX module' runs inside a new isolated memory range as a
> trusted hypervisor to manage and run protected VMs.
> 
> Pre-TDX Intel hardware has support for a memory encryption architecture
> called MKTME.  The memory encryption hardware underpinning MKTME is also
> used for Intel TDX.  TDX ends up "stealing" some of the physical address
> space from the MKTME architecture for crypto-protection to VMs.  The
> BIOS is responsible for partitioning the "KeyID" space between legacy
> MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
> KeyIDs' or 'TDX KeyIDs' for short.
> 
> TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
> private KeyIDs are consistently and correctly programmed by the BIOS
> across all CPU packages before it enables TDX on any CPU core.  A valid
> TDX private KeyID range on BSP indicates TDX has been enabled by the
> BIOS, otherwise the BIOS is buggy.

So we don't trust the BIOS, but trust the BIOS that it won't hot-remove 
physical memory or hotplug physical CPUS (if I understood the cover 
letter correctly)? :)

> 
> The TDX module is expected to be loaded by the BIOS when it enables TDX,
> but the kernel needs to properly initialize it before it can be used to
> create and run any TDX guests.  The TDX module will be initialized by
> the KVM subsystem when KVM wants to use TDX.
> 
> Add a new early_initcall(tdx_init) to detect the TDX by detecting TDX
> private KeyIDs.  Also add a function to report whether TDX is enabled by
> the BIOS.  Similar to AMD SME, kexec() will use it to determine whether
> cache flush is needed.
> 
> The TDX module itself requires one TDX KeyID as the 'TDX global KeyID'
> to protect its metadata.  Each TDX guest also needs a TDX KeyID for its
> own protection.  Just use the first TDX KeyID as the global KeyID and
> leave the rest for TDX guests.  If no TDX KeyID is left for TDX guests,
> disable TDX as initializing the TDX module alone is useless.

Does that really happen in practice that we care about that at all? 
Seems weird and rather like a broken firmware or sth like that ...

> 
> To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
> TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
> to opt-in TDX host kernel support (to distinguish with TDX guest kernel
> support).  So far only KVM uses TDX.  Make the new config option depend
> on KVM_INTEL.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>


[...]

> ---
>   arch/x86/Kconfig                 |  12 ++++
>   arch/x86/Makefile                |   2 +
>   arch/x86/include/asm/msr-index.h |   3 +
>   arch/x86/include/asm/tdx.h       |   7 +++
>   arch/x86/virt/Makefile           |   2 +
>   arch/x86/virt/vmx/Makefile       |   2 +
>   arch/x86/virt/vmx/tdx/Makefile   |   2 +
>   arch/x86/virt/vmx/tdx/tdx.c      | 105 +++++++++++++++++++++++++++++++
>   8 files changed, 135 insertions(+)
>   create mode 100644 arch/x86/virt/Makefile
>   create mode 100644 arch/x86/virt/vmx/Makefile
>   create mode 100644 arch/x86/virt/vmx/tdx/Makefile
>   create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 3604074a878b..fc010973a6ff 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1952,6 +1952,18 @@ config X86_SGX
>   
>   	  If unsure, say N.
>   
> +config INTEL_TDX_HOST
> +	bool "Intel Trust Domain Extensions (TDX) host support"
> +	depends on CPU_SUP_INTEL
> +	depends on X86_64
> +	depends on KVM_INTEL
> +	help
> +	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> +	  host and certain physical attacks.  This option enables necessary TDX
> +	  support in host kernel to run protected VMs.

s/in host/in the host/ ?

Also, is "protected VMs" the right term to use here? "Encrypted VMs", 
"Confidential VMs" ... ?

> +
> +	  If unsure, say N.
> +
>   config EFI
>   	bool "EFI runtime service support"
>   	depends on ACPI
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 9cf07322875a..972b5a64ce38 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -252,6 +252,8 @@ archheaders:
>   
>   libs-y  += arch/x86/lib/
>   
> +core-y += arch/x86/virt/
> +
>   # drivers-y are linked after core-y
>   drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
>   drivers-$(CONFIG_PCI)            += arch/x86/pci/
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 37ff47552bcb..952374ddb167 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -512,6 +512,9 @@
>   #define MSR_RELOAD_PMC0			0x000014c1
>   #define MSR_RELOAD_FIXED_CTR0		0x00001309
>   
> +/* KeyID partitioning between MKTME and TDX */
> +#define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
> +
>   /*
>    * AMD64 MSRs. Not complete. See the architecture manual for a more
>    * complete list.
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 25fd6070dc0b..4dfe2e794411 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -94,5 +94,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>   	return -ENODEV;
>   }
>   #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
> +
> +#ifdef CONFIG_INTEL_TDX_HOST
> +bool platform_tdx_enabled(void);
> +#else	/* !CONFIG_INTEL_TDX_HOST */
> +static inline bool platform_tdx_enabled(void) { return false; }
> +#endif	/* CONFIG_INTEL_TDX_HOST */
> +
>   #endif /* !__ASSEMBLY__ */
>   #endif /* _ASM_X86_TDX_H */
> diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
> new file mode 100644
> index 000000000000..1e36502cd738
> --- /dev/null
> +++ b/arch/x86/virt/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y	+= vmx/
> diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
> new file mode 100644
> index 000000000000..feebda21d793
> --- /dev/null
> +++ b/arch/x86/virt/vmx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_INTEL_TDX_HOST)	+= tdx/
> diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
> new file mode 100644
> index 000000000000..93ca8b73e1f1
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y += tdx.o
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> new file mode 100644
> index 000000000000..a600b5d0879d
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright(c) 2023 Intel Corporation.
> + *
> + * Intel Trusted Domain Extensions (TDX) support
> + */
> +
> +#define pr_fmt(fmt)	"tdx: " fmt
> +
> +#include <linux/types.h>
> +#include <linux/cache.h>
> +#include <linux/init.h>
> +#include <linux/errno.h>
> +#include <linux/printk.h>
> +#include <asm/msr-index.h>
> +#include <asm/msr.h>
> +#include <asm/tdx.h>
> +
> +static u32 tdx_global_keyid __ro_after_init;
> +static u32 tdx_guest_keyid_start __ro_after_init;
> +static u32 tdx_nr_guest_keyids __ro_after_init;
> +
> +/*
> + * Use tdx_global_keyid to indicate that TDX is uninitialized.
> + * This is used in TDX initialization error paths to take it from
> + * initialized -> uninitialized.
> + */
> +static void __init clear_tdx(void)
> +{
> +	tdx_global_keyid = 0;
> +}

Why not set "tdx_global_keyid" last, such that you don't have to clear 
when anything goes wrong before that? Seems more straight forward.

> +
> +static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
> +					    u32 *nr_tdx_keyids)
> +{
> +	u32 _nr_mktme_keyids, _tdx_keyid_start, _nr_tdx_keyids;
> +	int ret;
> +
> +	/*
> +	 * IA32_MKTME_KEYID_PARTIONING:
> +	 *   Bit [31:0]:	Number of MKTME KeyIDs.
> +	 *   Bit [63:32]:	Number of TDX private KeyIDs.
> +	 */
> +	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &_nr_mktme_keyids,
> +			&_nr_tdx_keyids);
> +	if (ret)
> +		return -ENODEV;
> +
> +	if (!_nr_tdx_keyids)
> +		return -ENODEV;
> +
> +	/* TDX KeyIDs start after the last MKTME KeyID. */
> +	_tdx_keyid_start = _nr_mktme_keyids + 1;
> +
> +	*tdx_keyid_start = _tdx_keyid_start;
> +	*nr_tdx_keyids = _nr_tdx_keyids;
> +
> +	return 0;
> +}
Huang, Kai March 16, 2023, 10:37 p.m. UTC | #2
On Thu, 2023-03-16 at 13:48 +0100, David Hildenbrand wrote:
> On 06.03.23 15:13, Kai Huang wrote:
> > Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> > host and certain physical attacks.  A CPU-attested software module
> > called 'the TDX module' runs inside a new isolated memory range as a
> > trusted hypervisor to manage and run protected VMs.
> > 
> > Pre-TDX Intel hardware has support for a memory encryption architecture
> > called MKTME.  The memory encryption hardware underpinning MKTME is also
> > used for Intel TDX.  TDX ends up "stealing" some of the physical address
> > space from the MKTME architecture for crypto-protection to VMs.  The
> > BIOS is responsible for partitioning the "KeyID" space between legacy
> > MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
> > KeyIDs' or 'TDX KeyIDs' for short.
> > 
> > TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
> > private KeyIDs are consistently and correctly programmed by the BIOS
> > across all CPU packages before it enables TDX on any CPU core.  A valid
> > TDX private KeyID range on BSP indicates TDX has been enabled by the
> > BIOS, otherwise the BIOS is buggy.
> 
> So we don't trust the BIOS, but trust the BIOS that it won't hot-remove 
> physical memory or hotplug physical CPUS (if I understood the cover 
> letter correctly)? :)

The "trust" in this context means security, but not functionality.  BIOS needs
to do the right thing in order to make things work correctly in terms of
functionality.  

For physical memory hotplug or CPU hotplug, we don't have patch to _explicitly_
distinguish them (from logical memory hotplug and logical cpu online/offline),
but actually they are kinda also handled:  For memory hotplug, and hot-added
memory is rejected to go online (because they cannot be in TDX's convertible
memory ranges).  For CPU hotplug, we have a function to do per-cpu
initialization (tdx_cpu_enable() in patch 5), and it will return error for hot-
added physical cpu.

> 
> > 
> > The TDX module is expected to be loaded by the BIOS when it enables TDX,
> > but the kernel needs to properly initialize it before it can be used to
> > create and run any TDX guests.  The TDX module will be initialized by
> > the KVM subsystem when KVM wants to use TDX.
> > 
> > Add a new early_initcall(tdx_init) to detect the TDX by detecting TDX
> > private KeyIDs.  Also add a function to report whether TDX is enabled by
> > the BIOS.  Similar to AMD SME, kexec() will use it to determine whether
> > cache flush is needed.
> > 
> > The TDX module itself requires one TDX KeyID as the 'TDX global KeyID'
> > to protect its metadata.  Each TDX guest also needs a TDX KeyID for its
> > own protection.  Just use the first TDX KeyID as the global KeyID and
> > leave the rest for TDX guests.  If no TDX KeyID is left for TDX guests,
> > disable TDX as initializing the TDX module alone is useless.
> 
> Does that really happen in practice that we care about that at all? 
> Seems weird and rather like a broken firmware or sth like that ...

No it doesn't happen in practice, because the BIOS is sane enough.

But since the public spec doesn't explicitly say it is guaranteed this doesn't
happen when TDX is enabled, I just added this sanity check.

> 
> > 
> > To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
> > TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
> > to opt-in TDX host kernel support (to distinguish with TDX guest kernel
> > support).  So far only KVM uses TDX.  Make the new config option depend
> > on KVM_INTEL.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> 
> [...]
> 
> > ---
> >   arch/x86/Kconfig                 |  12 ++++
> >   arch/x86/Makefile                |   2 +
> >   arch/x86/include/asm/msr-index.h |   3 +
> >   arch/x86/include/asm/tdx.h       |   7 +++
> >   arch/x86/virt/Makefile           |   2 +
> >   arch/x86/virt/vmx/Makefile       |   2 +
> >   arch/x86/virt/vmx/tdx/Makefile   |   2 +
> >   arch/x86/virt/vmx/tdx/tdx.c      | 105 +++++++++++++++++++++++++++++++
> >   8 files changed, 135 insertions(+)
> >   create mode 100644 arch/x86/virt/Makefile
> >   create mode 100644 arch/x86/virt/vmx/Makefile
> >   create mode 100644 arch/x86/virt/vmx/tdx/Makefile
> >   create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 3604074a878b..fc010973a6ff 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1952,6 +1952,18 @@ config X86_SGX
> >   
> >   	  If unsure, say N.
> >   
> > +config INTEL_TDX_HOST
> > +	bool "Intel Trust Domain Extensions (TDX) host support"
> > +	depends on CPU_SUP_INTEL
> > +	depends on X86_64
> > +	depends on KVM_INTEL
> > +	help
> > +	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> > +	  host and certain physical attacks.  This option enables necessary TDX
> > +	  support in host kernel to run protected VMs.
> 
> s/in host/in the host/ ?

Sure.

> 
> Also, is "protected VMs" the right term to use here? "Encrypted VMs", 
> "Confidential VMs" ... ?

"Encrypted VM" perhaps is not a good choice, because there are more things than
encryption.  I am also OK with "Confidential VMs", but "protected VMs" is also
used in the KVM series (not upstreamed yet), and also used by s390 by looking at
the git log.

So both "protected VM" and "confidential VM" work for me.

Not sure anyone else wants to comment?

> 
[...]

> > +static u32 tdx_global_keyid __ro_after_init;
> > +static u32 tdx_guest_keyid_start __ro_after_init;
> > +static u32 tdx_nr_guest_keyids __ro_after_init;
> > +
> > +/*
> > + * Use tdx_global_keyid to indicate that TDX is uninitialized.
> > + * This is used in TDX initialization error paths to take it from
> > + * initialized -> uninitialized.
> > + */
> > +static void __init clear_tdx(void)
> > +{
> > +	tdx_global_keyid = 0;
> > +}
> 
> Why not set "tdx_global_keyid" last, such that you don't have to clear 
> when anything goes wrong before that? Seems more straight forward.

My thinking was by reserving the global keyid and taking it out first, I can
check the remaining keyids for TDX guests easily:


+	if (!nr_tdx_keyids) {
+		pr_info("initialization failed: too few private KeyIDs
available.\n");
+		goto no_tdx;
+	}

Otherwise need to do:

	if (nr_tdx_keyids < 2) {
		...
	}

Also, in the later patch to handle memory hotplug we will add an additional step
to register_memory_notifier() which can also fail, so I just introduced
clear_tdx() here. 

But nothing is big deal, and yes we can set the global keyid at last and remove
clear_tdx().

I'll do what you suggested.

Thanks.

> 
> > +
> > +static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
> > +					    u32 *nr_tdx_keyids)
> > +{
> > +	u32 _nr_mktme_keyids, _tdx_keyid_start, _nr_tdx_keyids;
> > +	int ret;
> > +
> > +	/*
> > +	 * IA32_MKTME_KEYID_PARTIONING:
> > +	 *   Bit [31:0]:	Number of MKTME KeyIDs.
> > +	 *   Bit [63:32]:	Number of TDX private KeyIDs.
> > +	 */
> > +	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &_nr_mktme_keyids,
> > +			&_nr_tdx_keyids);
> > +	if (ret)
> > +		return -ENODEV;
> > +
> > +	if (!_nr_tdx_keyids)
> > +		return -ENODEV;
> > +
> > +	/* TDX KeyIDs start after the last MKTME KeyID. */
> > +	_tdx_keyid_start = _nr_mktme_keyids + 1;
> > +
> > +	*tdx_keyid_start = _tdx_keyid_start;
> > +	*nr_tdx_keyids = _nr_tdx_keyids;
> > +
> > +	return 0;
> > +}
>
David Hildenbrand March 23, 2023, 5:02 p.m. UTC | #3
On 16.03.23 23:37, Huang, Kai wrote:
> On Thu, 2023-03-16 at 13:48 +0100, David Hildenbrand wrote:
>> On 06.03.23 15:13, Kai Huang wrote:
>>> Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
>>> host and certain physical attacks.  A CPU-attested software module
>>> called 'the TDX module' runs inside a new isolated memory range as a
>>> trusted hypervisor to manage and run protected VMs.
>>>
>>> Pre-TDX Intel hardware has support for a memory encryption architecture
>>> called MKTME.  The memory encryption hardware underpinning MKTME is also
>>> used for Intel TDX.  TDX ends up "stealing" some of the physical address
>>> space from the MKTME architecture for crypto-protection to VMs.  The
>>> BIOS is responsible for partitioning the "KeyID" space between legacy
>>> MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
>>> KeyIDs' or 'TDX KeyIDs' for short.
>>>
>>> TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
>>> private KeyIDs are consistently and correctly programmed by the BIOS
>>> across all CPU packages before it enables TDX on any CPU core.  A valid
>>> TDX private KeyID range on BSP indicates TDX has been enabled by the
>>> BIOS, otherwise the BIOS is buggy.
>>

Sorry for the late reply!

>> So we don't trust the BIOS, but trust the BIOS that it won't hot-remove
>> physical memory or hotplug physical CPUS (if I understood the cover
>> letter correctly)? :)
> 
> The "trust" in this context means security, but not functionality.  BIOS needs
> to do the right thing in order to make things work correctly in terms of
> functionality.
> 
> For physical memory hotplug or CPU hotplug, we don't have patch to _explicitly_
> distinguish them (from logical memory hotplug and logical cpu online/offline),
> but actually they are kinda also handled:  For memory hotplug, and hot-added
> memory is rejected to go online (because they cannot be in TDX's convertible
> memory ranges).  For CPU hotplug, we have a function to do per-cpu
> initialization (tdx_cpu_enable() in patch 5), and it will return error for hot-
> added physical cpu.

Make sense, thanks!

> 
>>
>>>
>>> The TDX module is expected to be loaded by the BIOS when it enables TDX,
>>> but the kernel needs to properly initialize it before it can be used to
>>> create and run any TDX guests.  The TDX module will be initialized by
>>> the KVM subsystem when KVM wants to use TDX.
>>>
>>> Add a new early_initcall(tdx_init) to detect the TDX by detecting TDX
>>> private KeyIDs.  Also add a function to report whether TDX is enabled by
>>> the BIOS.  Similar to AMD SME, kexec() will use it to determine whether
>>> cache flush is needed.
>>>
>>> The TDX module itself requires one TDX KeyID as the 'TDX global KeyID'
>>> to protect its metadata.  Each TDX guest also needs a TDX KeyID for its
>>> own protection.  Just use the first TDX KeyID as the global KeyID and
>>> leave the rest for TDX guests.  If no TDX KeyID is left for TDX guests,
>>> disable TDX as initializing the TDX module alone is useless.
>>
>> Does that really happen in practice that we care about that at all?
>> Seems weird and rather like a broken firmware or sth like that ...
> 
> No it doesn't happen in practice, because the BIOS is sane enough.
> 
> But since the public spec doesn't explicitly say it is guaranteed this doesn't
> happen when TDX is enabled, I just added this sanity check.

Okay!

> 
>>
>>>
>>> To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
>>> TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
>>> to opt-in TDX host kernel support (to distinguish with TDX guest kernel
>>> support).  So far only KVM uses TDX.  Make the new config option depend
>>> on KVM_INTEL.
>>>
>>> Signed-off-by: Kai Huang <kai.huang@intel.com>
>>> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>
>>
>> [...]
>>
>>> ---
>>>    arch/x86/Kconfig                 |  12 ++++
>>>    arch/x86/Makefile                |   2 +
>>>    arch/x86/include/asm/msr-index.h |   3 +
>>>    arch/x86/include/asm/tdx.h       |   7 +++
>>>    arch/x86/virt/Makefile           |   2 +
>>>    arch/x86/virt/vmx/Makefile       |   2 +
>>>    arch/x86/virt/vmx/tdx/Makefile   |   2 +
>>>    arch/x86/virt/vmx/tdx/tdx.c      | 105 +++++++++++++++++++++++++++++++
>>>    8 files changed, 135 insertions(+)
>>>    create mode 100644 arch/x86/virt/Makefile
>>>    create mode 100644 arch/x86/virt/vmx/Makefile
>>>    create mode 100644 arch/x86/virt/vmx/tdx/Makefile
>>>    create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
>>>
>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>> index 3604074a878b..fc010973a6ff 100644
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -1952,6 +1952,18 @@ config X86_SGX
>>>    
>>>    	  If unsure, say N.
>>>    
>>> +config INTEL_TDX_HOST
>>> +	bool "Intel Trust Domain Extensions (TDX) host support"
>>> +	depends on CPU_SUP_INTEL
>>> +	depends on X86_64
>>> +	depends on KVM_INTEL
>>> +	help
>>> +	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
>>> +	  host and certain physical attacks.  This option enables necessary TDX
>>> +	  support in host kernel to run protected VMs.
>>
>> s/in host/in the host/ ?
> 
> Sure.
> 
>>
>> Also, is "protected VMs" the right term to use here? "Encrypted VMs",
>> "Confidential VMs" ... ?
> 
> "Encrypted VM" perhaps is not a good choice, because there are more things than
> encryption.  I am also OK with "Confidential VMs", but "protected VMs" is also
> used in the KVM series (not upstreamed yet), and also used by s390 by looking at
> the git log.
> 
> So both "protected VM" and "confidential VM" work for me.
> 
> Not sure anyone else wants to comment?

I'm fine as long as it's used consistently. "Protected VM" would have 
been the one out of the 3 alternatives that I have heard least frequently.

> 
>>
> [...]
> 
>>> +static u32 tdx_global_keyid __ro_after_init;
>>> +static u32 tdx_guest_keyid_start __ro_after_init;
>>> +static u32 tdx_nr_guest_keyids __ro_after_init;
>>> +
>>> +/*
>>> + * Use tdx_global_keyid to indicate that TDX is uninitialized.
>>> + * This is used in TDX initialization error paths to take it from
>>> + * initialized -> uninitialized.
>>> + */
>>> +static void __init clear_tdx(void)
>>> +{
>>> +	tdx_global_keyid = 0;
>>> +}
>>
>> Why not set "tdx_global_keyid" last, such that you don't have to clear
>> when anything goes wrong before that? Seems more straight forward.
> 
> My thinking was by reserving the global keyid and taking it out first, I can
> check the remaining keyids for TDX guests easily:
> 
> 
> +	if (!nr_tdx_keyids) {
> +		pr_info("initialization failed: too few private KeyIDs
> available.\n");
> +		goto no_tdx;
> +	}
> 
> Otherwise need to do:
> 
> 	if (nr_tdx_keyids < 2) {
> 		...
> 	}
> 
> Also, in the later patch to handle memory hotplug we will add an additional step
> to register_memory_notifier() which can also fail, so I just introduced
> clear_tdx() here.
> 
> But nothing is big deal, and yes we can set the global keyid at last and remove
> clear_tdx().

Good, that simplifies things, thanks!
Huang, Kai March 23, 2023, 10:15 p.m. UTC | #4
On Thu, 2023-03-23 at 18:02 +0100, David Hildenbrand wrote:
> On 16.03.23 23:37, Huang, Kai wrote:
> > On Thu, 2023-03-16 at 13:48 +0100, David Hildenbrand wrote:
> > > On 06.03.23 15:13, Kai Huang wrote:
> > > > Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> > > > host and certain physical attacks.  A CPU-attested software module
> > > > called 'the TDX module' runs inside a new isolated memory range as a
> > > > trusted hypervisor to manage and run protected VMs.
> > > > 
> > > > Pre-TDX Intel hardware has support for a memory encryption architecture
> > > > called MKTME.  The memory encryption hardware underpinning MKTME is also
> > > > used for Intel TDX.  TDX ends up "stealing" some of the physical address
> > > > space from the MKTME architecture for crypto-protection to VMs.  The
> > > > BIOS is responsible for partitioning the "KeyID" space between legacy
> > > > MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
> > > > KeyIDs' or 'TDX KeyIDs' for short.
> > > > 
> > > > TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
> > > > private KeyIDs are consistently and correctly programmed by the BIOS
> > > > across all CPU packages before it enables TDX on any CPU core.  A valid
> > > > TDX private KeyID range on BSP indicates TDX has been enabled by the
> > > > BIOS, otherwise the BIOS is buggy.
> > > 
> 
> Sorry for the late reply!

Not late for me :)  Thanks!

[...]


> > > >    
> > > > +config INTEL_TDX_HOST
> > > > +	bool "Intel Trust Domain Extensions (TDX) host support"
> > > > +	depends on CPU_SUP_INTEL
> > > > +	depends on X86_64
> > > > +	depends on KVM_INTEL
> > > > +	help
> > > > +	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> > > > +	  host and certain physical attacks.  This option enables necessary TDX
> > > > +	  support in host kernel to run protected VMs.
> > > 
> > > s/in host/in the host/ ?
> > 
> > Sure.
> > 
> > > 
> > > Also, is "protected VMs" the right term to use here? "Encrypted VMs",
> > > "Confidential VMs" ... ?
> > 
> > "Encrypted VM" perhaps is not a good choice, because there are more things than
> > encryption.  I am also OK with "Confidential VMs", but "protected VMs" is also
> > used in the KVM series (not upstreamed yet), and also used by s390 by looking at
> > the git log.
> > 
> > So both "protected VM" and "confidential VM" work for me.
> > 
> > Not sure anyone else wants to comment?
> 
> I'm fine as long as it's used consistently. "Protected VM" would have 
> been the one out of the 3 alternatives that I have heard least frequently.
> > 

Yes I'll make sure it is used consistently.  Thanks!

I am also glad to change to "Confidential VMs" if anyone else believes it is
better.
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3604074a878b..fc010973a6ff 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1952,6 +1952,18 @@  config X86_SGX
 
 	  If unsure, say N.
 
+config INTEL_TDX_HOST
+	bool "Intel Trust Domain Extensions (TDX) host support"
+	depends on CPU_SUP_INTEL
+	depends on X86_64
+	depends on KVM_INTEL
+	help
+	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
+	  host and certain physical attacks.  This option enables necessary TDX
+	  support in host kernel to run protected VMs.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 9cf07322875a..972b5a64ce38 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -252,6 +252,8 @@  archheaders:
 
 libs-y  += arch/x86/lib/
 
+core-y += arch/x86/virt/
+
 # drivers-y are linked after core-y
 drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
 drivers-$(CONFIG_PCI)            += arch/x86/pci/
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 37ff47552bcb..952374ddb167 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -512,6 +512,9 @@ 
 #define MSR_RELOAD_PMC0			0x000014c1
 #define MSR_RELOAD_FIXED_CTR0		0x00001309
 
+/* KeyID partitioning between MKTME and TDX */
+#define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
+
 /*
  * AMD64 MSRs. Not complete. See the architecture manual for a more
  * complete list.
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 25fd6070dc0b..4dfe2e794411 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -94,5 +94,12 @@  static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 	return -ENODEV;
 }
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
+
+#ifdef CONFIG_INTEL_TDX_HOST
+bool platform_tdx_enabled(void);
+#else	/* !CONFIG_INTEL_TDX_HOST */
+static inline bool platform_tdx_enabled(void) { return false; }
+#endif	/* CONFIG_INTEL_TDX_HOST */
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
new file mode 100644
index 000000000000..1e36502cd738
--- /dev/null
+++ b/arch/x86/virt/Makefile
@@ -0,0 +1,2 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y	+= vmx/
diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
new file mode 100644
index 000000000000..feebda21d793
--- /dev/null
+++ b/arch/x86/virt/vmx/Makefile
@@ -0,0 +1,2 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_INTEL_TDX_HOST)	+= tdx/
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
new file mode 100644
index 000000000000..93ca8b73e1f1
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -0,0 +1,2 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y += tdx.o
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
new file mode 100644
index 000000000000..a600b5d0879d
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -0,0 +1,105 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2023 Intel Corporation.
+ *
+ * Intel Trusted Domain Extensions (TDX) support
+ */
+
+#define pr_fmt(fmt)	"tdx: " fmt
+
+#include <linux/types.h>
+#include <linux/cache.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/printk.h>
+#include <asm/msr-index.h>
+#include <asm/msr.h>
+#include <asm/tdx.h>
+
+static u32 tdx_global_keyid __ro_after_init;
+static u32 tdx_guest_keyid_start __ro_after_init;
+static u32 tdx_nr_guest_keyids __ro_after_init;
+
+/*
+ * Use tdx_global_keyid to indicate that TDX is uninitialized.
+ * This is used in TDX initialization error paths to take it from
+ * initialized -> uninitialized.
+ */
+static void __init clear_tdx(void)
+{
+	tdx_global_keyid = 0;
+}
+
+static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
+					    u32 *nr_tdx_keyids)
+{
+	u32 _nr_mktme_keyids, _tdx_keyid_start, _nr_tdx_keyids;
+	int ret;
+
+	/*
+	 * IA32_MKTME_KEYID_PARTIONING:
+	 *   Bit [31:0]:	Number of MKTME KeyIDs.
+	 *   Bit [63:32]:	Number of TDX private KeyIDs.
+	 */
+	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &_nr_mktme_keyids,
+			&_nr_tdx_keyids);
+	if (ret)
+		return -ENODEV;
+
+	if (!_nr_tdx_keyids)
+		return -ENODEV;
+
+	/* TDX KeyIDs start after the last MKTME KeyID. */
+	_tdx_keyid_start = _nr_mktme_keyids + 1;
+
+	*tdx_keyid_start = _tdx_keyid_start;
+	*nr_tdx_keyids = _nr_tdx_keyids;
+
+	return 0;
+}
+
+static int __init tdx_init(void)
+{
+	u32 tdx_keyid_start, nr_tdx_keyids;
+	int err;
+
+	err = record_keyid_partitioning(&tdx_keyid_start, &nr_tdx_keyids);
+	if (err)
+		return err;
+
+	pr_info("BIOS enabled: private KeyID range [%u, %u)\n",
+			tdx_keyid_start, tdx_keyid_start + nr_tdx_keyids);
+
+	/*
+	 * The TDX module itself requires one 'TDX global KeyID' to
+	 * protect its metadata.  Just use the first one.
+	 */
+	tdx_global_keyid = tdx_keyid_start;
+	tdx_keyid_start++;
+	nr_tdx_keyids--;
+
+	/*
+	 * If there's no more TDX KeyID left, KVM won't be able to run
+	 * any TDX guest.  Disable TDX in this case as initializing the
+	 * TDX module alone is meaningless.
+	 */
+	if (!nr_tdx_keyids) {
+		pr_info("initialization failed: too few private KeyIDs available.\n");
+		goto no_tdx;
+	}
+
+	tdx_guest_keyid_start = tdx_keyid_start;
+	tdx_nr_guest_keyids = nr_tdx_keyids;
+
+	return 0;
+no_tdx:
+	clear_tdx();
+	return -ENODEV;
+}
+early_initcall(tdx_init);
+
+/* Return whether the BIOS has enabled TDX */
+bool platform_tdx_enabled(void)
+{
+	return !!tdx_global_keyid;
+}