diff mbox series

[08/13] x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages

Message ID 20250101074959.412696-9-pbonzini@redhat.com (mailing list archive)
State New
Headers show
Series x86/virt/tdx: Add SEAMCALL wrappers for KVM | expand

Commit Message

Paolo Bonzini Jan. 1, 2025, 7:49 a.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX architecture introduces the concept of private GPA vs shared GPA,
depending on the GPA.SHARED bit. The TDX module maintains a Secure EPT
(S-EPT or SEPT) tree per TD to translate TD's private memory accessed
using a private GPA. Wrap the SEAMCALL TDH.MEM.PAGE.ADD with
tdh_mem_page_add() and TDH.MEM.PAGE.AUG with tdh_mem_page_aug() to add TD
private pages and map them to the TD's private GPAs in the SEPT.

Callers of tdh_mem_page_add() and tdh_mem_page_aug() allocate and provide
normal pages to the wrappers, who further pass those pages to the TDX
module. Before passing the pages to the TDX module, tdh_mem_page_add() and
tdh_mem_page_aug() perform a CLFLUSH on the page mapped with keyID 0 to
ensure that any dirty cache lines don't write back later and clobber TD
memory or control structures. Don't worry about the other MK-TME keyIDs
because the kernel doesn't use them. The TDX docs specify that this flush
is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC
feature bit. Do the CLFLUSH unconditionally for two reasons: make the
solution simpler by having a single path that can handle both
!CLFLUSH_BEFORE_ALLOC and CLFLUSH_BEFORE_ALLOC cases. Avoid wading into any
correctness uncertainty by going with a conservative solution to start.

Call tdh_mem_page_add() to add a private page to a TD during the TD's build
time (i.e., before TDH.MR.FINALIZE). Specify which GPA the 4K private page
will map to. No need to specify level info since TDH.MEM.PAGE.ADD only adds
pages at 4K level. To provide initial contents to TD, provide an additional
source page residing in memory managed by the host kernel itself (encrypted
with a shared keyID). The TDX module will copy the initial contents from
the source page in shared memory into the private page after mapping the
page in the SEPT to the specified private GPA. The TDX module allows the
source page to be the same page as the private page to be added. In that
case, the TDX module converts and encrypts the source page as a TD private
page.

Call tdh_mem_page_aug() to add a private page to a TD during the TD's
runtime (i.e., after TDH.MR.FINALIZE). TDH.MEM.PAGE.AUG supports adding
huge pages. Specify which GPA the private page will map to, along with
level info embedded in the lower bits of the GPA. The TDX module will
recognize the added page as the TD's private page after the TD's acceptance
with TDCALL TDG.MEM.PAGE.ACCEPT.

tdh_mem_page_add() and tdh_mem_page_aug() may fail. Callers can check
function return value and retrieve extended error info from the function
output parameters.

The TDX module has many internal locks. To avoid staying in SEAM mode for
too long, SEAMCALLs returns a BUSY error code to the kernel instead of
spinning on the locks. Depending on the specific SEAMCALL, the caller
may need to handle this error in specific ways (e.g., retry). Therefore,
return the SEAMCALL error code directly to the caller. Don't attempt to
handle it in the core kernel.

[Kai: Switched from generic seamcall export]
[Yan: Re-wrote the changelog]
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241112073636.22129-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/tdx.h  |  2 ++
 arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  2 ++
 3 files changed, 43 insertions(+)

Comments

Edgecombe, Rick P Jan. 2, 2025, 11:32 p.m. UTC | #1
On Wed, 2025-01-01 at 02:49 -0500, Paolo Bonzini wrote:
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index a97a470dda23..f39197d4eafc 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1491,6 +1491,26 @@ u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page)
>  }
>  EXPORT_SYMBOL_GPL(tdh_mng_addcx);
>  
> +u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx)
> +{

u64 gpa could be gfn_t.

hpa and source should be struct pages, per:
https://lore.kernel.org/kvm/d92e5301-9ca4-469a-8ae5-b36426e67356@intel.com/

> +	struct tdx_module_args args = {
> +		.rcx = gpa,

This could potentially also use union tdx_sept_gpa_mapping_info.

> +		.rdx = tdx_tdr_pa(td),
> +		.r8 = hpa,
> +		.r9 = source,
> +	};
> +	u64 ret;
> +
> +	clflush_cache_range(__va(hpa), PAGE_SIZE);
> +	ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args);
> +
> +	*rcx = args.rcx;
> +	*rdx = args.rdx;

Similar to the last patch, these could be extended_err1, extended_err2.

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdh_mem_page_add);
> +
>  u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx)
>  {
>  	struct tdx_module_args args = {
> @@ -1522,6 +1542,25 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
>  }
>  EXPORT_SYMBOL_GPL(tdh_vp_addcx);
>  
> +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx)
> +{

hpa should be struct page, or as Yan had been ready to propose a folio and idx.
I would have thought a struct page would be sufficient for now. She also planned
to add a level arg, which today should always be 4k, but would be needed for
future huge page support.

I think we should try to keep it as simple as possible for now.

> +	struct tdx_module_args args = {
> +		.rcx = gpa,
> +		.rdx = tdx_tdr_pa(td),
> +		.r8 = hpa,
> +	};
> +	u64 ret;
> +
> +	clflush_cache_range(__va(hpa), PAGE_SIZE);
> +	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> +
> +	*rcx = args.rcx;
> +	*rdx = args.rdx;

Similar to the others, these could be extended_err1, extended_err2.

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdh_mem_page_aug);
> +
>  u64 tdh_mng_key_config(struct tdx_td *td)
>  {
>  	struct tdx_module_args args = {
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 308d3aa565d7..80e6ef006085 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -16,8 +16,10 @@
>   * TDX module SEAMCALL leaf functions
>   */
>  #define TDH_MNG_ADDCX			1
> +#define TDH_MEM_PAGE_ADD		2
>  #define TDH_MEM_SEPT_ADD		3
>  #define TDH_VP_ADDCX			4
> +#define TDH_MEM_PAGE_AUG		6
>  #define TDH_MNG_KEY_CONFIG		8
>  #define TDH_MNG_CREATE			9
>  #define TDH_MNG_RD			11
Yan Zhao Jan. 6, 2025, 5:50 a.m. UTC | #2
On Fri, Jan 03, 2025 at 07:32:46AM +0800, Edgecombe, Rick P wrote:
> > +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx)
> > +{
> 
> hpa should be struct page, or as Yan had been ready to propose a folio and idx.
The consideration of folio + idx is to make sure the physical addresses for
"level" is contiguous, while allowing KVM to request mapping of a small page
contained in a huge folio.

> I would have thought a struct page would be sufficient for now. She also planned
> to add a level arg, which today should always be 4k, but would be needed for
> future huge page support.
Yes, in previous version, "level" is embedded in param "gpa" and implicitly set
to 0 by KVM.


The planned changes to tdh_mem_page_aug() are as follows:
- Use struct tdx_td instead of raw TDR u64.
- Use extended_err1/2 instead of rcx/rdx for output.
- Change "u64 gpa" to "gfn_t gfn".
- Use union tdx_sept_gpa_mapping_info to initialize args.rcx.
- Use "struct folio *" + "unsigned long folio_page_idx" instead of raw
  hpa for guest page in tdh_mem_page_aug() and fail if a page (huge
  or not) to aug is not contained in a single folio.
- Add a new param "level".
- Fail the wrapper if "level" is not 4K-1G.
- Call tdx_clflush_page() instead of clflush_cache_range() and loops
  tdx_clflush_page() for each 4k page in a huge page to aug.


+u64 tdh_mem_page_aug(struct tdx_td *td, gfn_t gfn, int level, struct folio *private_folio,
+                    unsigned long folio_page_idx, u64 *extended_err1, u64 *extended_err2)
+{
+       union tdx_sept_gpa_mapping_info gpa_info = { .level = level, .gfn = gfn, };
+       struct tdx_module_args args = {
+               .rcx = gpa_info.full,
+               .rdx = tdx_tdr_pa(td),
+               .r8 = page_to_phys(folio_page(private_folio, folio_page_idx)),
+       };
+       unsigned long nr_pages = 1 << (level * 9);
+       u64 ret;
+
+       if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
+           (folio_page_idx + nr_pages > folio_nr_pages(private_folio)))
+               return -EINVAL;
+
+       while (nr_pages--)
+               tdx_clflush_page(folio_page(private_folio, folio_page_idx++));
+
+       ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
+
+       *extended_err1 = args.rcx;
+       *extended_err2 = args.rdx;
+
+       return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_aug);

The corresponding changes in KVM:

static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn)
 {
-       put_page(pfn_to_page(pfn));
+       folio_put(page_folio(pfn_to_page(pfn)));
 }
 
 static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
                            enum pg_level level, kvm_pfn_t pfn)
 {
        struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
-       hpa_t tdr_pa = page_to_phys(kvm_tdx->td.tdr_page);
-       hpa_t hpa = pfn_to_hpa(pfn);
-       gpa_t gpa = gfn_to_gpa(gfn);
+       int tdx_level = pg_level_to_tdx_sept_level(level);
+       struct page *private_page = pfn_to_page(pfn);
        u64 entry, level_state;
        u64 err;
 
-       err = tdh_mem_page_aug(tdr_pa, gpa, hpa, &entry, &level_state);
+       err = tdh_mem_page_aug(&kvm_tdx->td, gfn, tdx_level, page_folio(private_page),
+                              folio_page_idx(page_folio(private_page), private_page),
+                              &entry, &level_state);
        if (unlikely(err & TDX_OPERAND_BUSY)) {
                tdx_unpin(kvm, pfn);
                return -EBUSY;
@@ -1620,9 +1621,9 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
         * migration.  Until guest_memfd supports page migration, prevent page
         * migration.
         * TODO: Once guest_memfd introduces callback on page migration,
-        * implement it and remove get_page/put_page().
+        * implement it and remove folio_get/folio_put().
         */
-       get_page(pfn_to_page(pfn));
+       folio_get(page_folio(pfn_to_page(pfn)));

> I think we should try to keep it as simple as possible for now.
Yeah.
So, do you think we need to have tdh_mem_page_aug() to support 4K level page
only and ask for Dave's review again for huge page?

Do we need to add param "level" ?
- if yes, "struct page" looks not fit.
- if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?
Edgecombe, Rick P Jan. 6, 2025, 7:21 p.m. UTC | #3
On Mon, 2025-01-06 at 13:50 +0800, Yan Zhao wrote:
> > I think we should try to keep it as simple as possible for now.
> Yeah.
> So, do you think we need to have tdh_mem_page_aug() to support 4K level page
> only and ask for Dave's review again for huge page?
> 
> Do we need to add param "level" ?
> - if yes, "struct page" looks not fit.
> - if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?

My thoughts would be we should export just what is needed for today to keep
things simple and speedy (skip level arg, support order 0 only), especially if
we can drop all folio checks. The SEAMCALL wrappers will not be set in stone and
it will be easier to review huge page required stuff in the context of already
settled 4k support.
Yan Zhao Jan. 7, 2025, 6:37 a.m. UTC | #4
On Tue, Jan 07, 2025 at 03:21:40AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-01-06 at 13:50 +0800, Yan Zhao wrote:
> > > I think we should try to keep it as simple as possible for now.
> > Yeah.
> > So, do you think we need to have tdh_mem_page_aug() to support 4K level page
> > only and ask for Dave's review again for huge page?
> > 
> > Do we need to add param "level" ?
> > - if yes, "struct page" looks not fit.
> > - if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?
> 
> My thoughts would be we should export just what is needed for today to keep
> things simple and speedy (skip level arg, support order 0 only), especially if
> we can drop all folio checks. The SEAMCALL wrappers will not be set in stone and
> it will be easier to review huge page required stuff in the context of already
> settled 4k support.
Ok. Attached the new diff for tdh_mem_page_aug() to support 4K only.
Have compiled and tested in my local env.
(I kept the tdx_level in tdh_mem_range_block() and tdh_mem_page_remove() in
later patches).

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 787c359a5fc9..1db93e4886df 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -144,9 +144,14 @@ struct tdx_vp {
 };

 u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
+u64 tdh_mem_page_add(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    struct page *source_page, u64 *extended_err1, u64 *extended_err2);
 u64 tdh_mem_sept_add(struct tdx_td *td, gfn_t gfn, int tdx_level, struct page *sept_page,
                     u64 *extended_err1, u64 *extended_err2);
 u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page);
+struct folio;
+u64 tdh_mem_page_aug(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    u64 *extended_err1, u64 *extended_err2);
 u64 tdh_mng_key_config(struct tdx_td *td);
 u64 tdh_mng_create(struct tdx_td *td, u64 hkid);
 u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp);
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index adb2059b6b5f..cfedff43e1e0 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1583,6 +1583,28 @@ u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page)
 }
 EXPORT_SYMBOL_GPL(tdh_mng_addcx);

+u64 tdh_mem_page_add(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    struct page *source_page, u64 *extended_err1, u64 *extended_err2)
+{
+       union tdx_sept_gpa_mapping_info gpa_info = { .level = 0, .gfn = gfn, };
+       struct tdx_module_args args = {
+               .rcx = gpa_info.full,
+               .rdx = tdx_tdr_pa(td),
+               .r8 = page_to_phys(private_page),
+               .r9 = page_to_phys(source_page),
+       };
+       u64 ret;
+
+       tdx_clflush_page(private_page);
+       ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args);
+
+       *extended_err1 = args.rcx;
+       *extended_err2 = args.rdx;
+
+       return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_add);
+
 u64 tdh_mem_sept_add(struct tdx_td *td, gfn_t gfn, int tdx_level, struct page *sept_page,
                     u64 *extended_err1, u64 *extended_err2)
 {
@@ -1616,6 +1638,28 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
 }
 EXPORT_SYMBOL_GPL(tdh_vp_addcx);

+u64 tdh_mem_page_aug(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    u64 *extended_err1, u64 *extended_err2)
+{
+       union tdx_sept_gpa_mapping_info gpa_info = { .level = 0, .gfn = gfn, };
+       struct tdx_module_args args = {
+               .rcx = gpa_info.full,
+               .rdx = tdx_tdr_pa(td),
+               .r8 = page_to_phys(private_page),
+       };
+       u64 ret;
+
+       tdx_clflush_page(private_page);
+
+       ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
+
+       *extended_err1 = args.rcx;
+       *extended_err2 = args.rdx;
+
+       return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_aug);
+
 u64 tdh_mng_key_config(struct tdx_td *td)
 {
        struct tdx_module_args args = {
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 0d1ba0d0ac82..8a56e790f64d 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -18,8 +18,10 @@
  * TDX module SEAMCALL leaf functions
  */
 #define TDH_MNG_ADDCX                  1
+#define TDH_MEM_PAGE_ADD               2
 #define TDH_MEM_SEPT_ADD               3
 #define TDH_VP_ADDCX                   4
+#define TDH_MEM_PAGE_AUG               6
 #define TDH_MNG_KEY_CONFIG             8
 #define TDH_MNG_CREATE                 9
 #define TDH_MNG_RD                     11
diff mbox series

Patch

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 9e0c60a41d5d..32f3981d56c5 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -138,8 +138,10 @@  struct tdx_vp {
 };
 
 u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
+u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx);
 u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx);
 u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page);
+u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx);
 u64 tdh_mng_key_config(struct tdx_td *td);
 u64 tdh_mng_create(struct tdx_td *td, u64 hkid);
 u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp);
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index a97a470dda23..f39197d4eafc 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1491,6 +1491,26 @@  u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page)
 }
 EXPORT_SYMBOL_GPL(tdh_mng_addcx);
 
+u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args args = {
+		.rcx = gpa,
+		.rdx = tdx_tdr_pa(td),
+		.r8 = hpa,
+		.r9 = source,
+	};
+	u64 ret;
+
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args);
+
+	*rcx = args.rcx;
+	*rdx = args.rdx;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_add);
+
 u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx)
 {
 	struct tdx_module_args args = {
@@ -1522,6 +1542,25 @@  u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
 }
 EXPORT_SYMBOL_GPL(tdh_vp_addcx);
 
+u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args args = {
+		.rcx = gpa,
+		.rdx = tdx_tdr_pa(td),
+		.r8 = hpa,
+	};
+	u64 ret;
+
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
+
+	*rcx = args.rcx;
+	*rdx = args.rdx;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_aug);
+
 u64 tdh_mng_key_config(struct tdx_td *td)
 {
 	struct tdx_module_args args = {
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 308d3aa565d7..80e6ef006085 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -16,8 +16,10 @@ 
  * TDX module SEAMCALL leaf functions
  */
 #define TDH_MNG_ADDCX			1
+#define TDH_MEM_PAGE_ADD		2
 #define TDH_MEM_SEPT_ADD		3
 #define TDH_VP_ADDCX			4
+#define TDH_MEM_PAGE_AUG		6
 #define TDH_MNG_KEY_CONFIG		8
 #define TDH_MNG_CREATE			9
 #define TDH_MNG_RD			11