diff mbox series

[08/13] x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages

Message ID 20250101074959.412696-9-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series x86/virt/tdx: Add SEAMCALL wrappers for KVM | expand

Commit Message

Paolo Bonzini Jan. 1, 2025, 7:49 a.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX architecture introduces the concept of private GPA vs shared GPA,
depending on the GPA.SHARED bit. The TDX module maintains a Secure EPT
(S-EPT or SEPT) tree per TD to translate TD's private memory accessed
using a private GPA. Wrap the SEAMCALL TDH.MEM.PAGE.ADD with
tdh_mem_page_add() and TDH.MEM.PAGE.AUG with tdh_mem_page_aug() to add TD
private pages and map them to the TD's private GPAs in the SEPT.

Callers of tdh_mem_page_add() and tdh_mem_page_aug() allocate and provide
normal pages to the wrappers, who further pass those pages to the TDX
module. Before passing the pages to the TDX module, tdh_mem_page_add() and
tdh_mem_page_aug() perform a CLFLUSH on the page mapped with keyID 0 to
ensure that any dirty cache lines don't write back later and clobber TD
memory or control structures. Don't worry about the other MK-TME keyIDs
because the kernel doesn't use them. The TDX docs specify that this flush
is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC
feature bit. Do the CLFLUSH unconditionally for two reasons: make the
solution simpler by having a single path that can handle both
!CLFLUSH_BEFORE_ALLOC and CLFLUSH_BEFORE_ALLOC cases. Avoid wading into any
correctness uncertainty by going with a conservative solution to start.

Call tdh_mem_page_add() to add a private page to a TD during the TD's build
time (i.e., before TDH.MR.FINALIZE). Specify which GPA the 4K private page
will map to. No need to specify level info since TDH.MEM.PAGE.ADD only adds
pages at 4K level. To provide initial contents to TD, provide an additional
source page residing in memory managed by the host kernel itself (encrypted
with a shared keyID). The TDX module will copy the initial contents from
the source page in shared memory into the private page after mapping the
page in the SEPT to the specified private GPA. The TDX module allows the
source page to be the same page as the private page to be added. In that
case, the TDX module converts and encrypts the source page as a TD private
page.

Call tdh_mem_page_aug() to add a private page to a TD during the TD's
runtime (i.e., after TDH.MR.FINALIZE). TDH.MEM.PAGE.AUG supports adding
huge pages. Specify which GPA the private page will map to, along with
level info embedded in the lower bits of the GPA. The TDX module will
recognize the added page as the TD's private page after the TD's acceptance
with TDCALL TDG.MEM.PAGE.ACCEPT.

tdh_mem_page_add() and tdh_mem_page_aug() may fail. Callers can check
function return value and retrieve extended error info from the function
output parameters.

The TDX module has many internal locks. To avoid staying in SEAM mode for
too long, SEAMCALLs returns a BUSY error code to the kernel instead of
spinning on the locks. Depending on the specific SEAMCALL, the caller
may need to handle this error in specific ways (e.g., retry). Therefore,
return the SEAMCALL error code directly to the caller. Don't attempt to
handle it in the core kernel.

[Kai: Switched from generic seamcall export]
[Yan: Re-wrote the changelog]
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241112073636.22129-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/tdx.h  |  2 ++
 arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  2 ++
 3 files changed, 43 insertions(+)

Comments

Edgecombe, Rick P Jan. 2, 2025, 11:32 p.m. UTC | #1
On Wed, 2025-01-01 at 02:49 -0500, Paolo Bonzini wrote:
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index a97a470dda23..f39197d4eafc 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1491,6 +1491,26 @@ u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page)
>  }
>  EXPORT_SYMBOL_GPL(tdh_mng_addcx);
>  
> +u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx)
> +{

u64 gpa could be gfn_t.

hpa and source should be struct pages, per:
https://lore.kernel.org/kvm/d92e5301-9ca4-469a-8ae5-b36426e67356@intel.com/

> +	struct tdx_module_args args = {
> +		.rcx = gpa,

This could potentially also use union tdx_sept_gpa_mapping_info.

> +		.rdx = tdx_tdr_pa(td),
> +		.r8 = hpa,
> +		.r9 = source,
> +	};
> +	u64 ret;
> +
> +	clflush_cache_range(__va(hpa), PAGE_SIZE);
> +	ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args);
> +
> +	*rcx = args.rcx;
> +	*rdx = args.rdx;

Similar to the last patch, these could be extended_err1, extended_err2.

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdh_mem_page_add);
> +
>  u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx)
>  {
>  	struct tdx_module_args args = {
> @@ -1522,6 +1542,25 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
>  }
>  EXPORT_SYMBOL_GPL(tdh_vp_addcx);
>  
> +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx)
> +{

hpa should be struct page, or as Yan had been ready to propose a folio and idx.
I would have thought a struct page would be sufficient for now. She also planned
to add a level arg, which today should always be 4k, but would be needed for
future huge page support.

I think we should try to keep it as simple as possible for now.

> +	struct tdx_module_args args = {
> +		.rcx = gpa,
> +		.rdx = tdx_tdr_pa(td),
> +		.r8 = hpa,
> +	};
> +	u64 ret;
> +
> +	clflush_cache_range(__va(hpa), PAGE_SIZE);
> +	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> +
> +	*rcx = args.rcx;
> +	*rdx = args.rdx;

Similar to the others, these could be extended_err1, extended_err2.

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdh_mem_page_aug);
> +
>  u64 tdh_mng_key_config(struct tdx_td *td)
>  {
>  	struct tdx_module_args args = {
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 308d3aa565d7..80e6ef006085 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -16,8 +16,10 @@
>   * TDX module SEAMCALL leaf functions
>   */
>  #define TDH_MNG_ADDCX			1
> +#define TDH_MEM_PAGE_ADD		2
>  #define TDH_MEM_SEPT_ADD		3
>  #define TDH_VP_ADDCX			4
> +#define TDH_MEM_PAGE_AUG		6
>  #define TDH_MNG_KEY_CONFIG		8
>  #define TDH_MNG_CREATE			9
>  #define TDH_MNG_RD			11
Yan Zhao Jan. 6, 2025, 5:50 a.m. UTC | #2
On Fri, Jan 03, 2025 at 07:32:46AM +0800, Edgecombe, Rick P wrote:
> > +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx)
> > +{
> 
> hpa should be struct page, or as Yan had been ready to propose a folio and idx.
The consideration of folio + idx is to make sure the physical addresses for
"level" is contiguous, while allowing KVM to request mapping of a small page
contained in a huge folio.

> I would have thought a struct page would be sufficient for now. She also planned
> to add a level arg, which today should always be 4k, but would be needed for
> future huge page support.
Yes, in previous version, "level" is embedded in param "gpa" and implicitly set
to 0 by KVM.


The planned changes to tdh_mem_page_aug() are as follows:
- Use struct tdx_td instead of raw TDR u64.
- Use extended_err1/2 instead of rcx/rdx for output.
- Change "u64 gpa" to "gfn_t gfn".
- Use union tdx_sept_gpa_mapping_info to initialize args.rcx.
- Use "struct folio *" + "unsigned long folio_page_idx" instead of raw
  hpa for guest page in tdh_mem_page_aug() and fail if a page (huge
  or not) to aug is not contained in a single folio.
- Add a new param "level".
- Fail the wrapper if "level" is not 4K-1G.
- Call tdx_clflush_page() instead of clflush_cache_range() and loops
  tdx_clflush_page() for each 4k page in a huge page to aug.


+u64 tdh_mem_page_aug(struct tdx_td *td, gfn_t gfn, int level, struct folio *private_folio,
+                    unsigned long folio_page_idx, u64 *extended_err1, u64 *extended_err2)
+{
+       union tdx_sept_gpa_mapping_info gpa_info = { .level = level, .gfn = gfn, };
+       struct tdx_module_args args = {
+               .rcx = gpa_info.full,
+               .rdx = tdx_tdr_pa(td),
+               .r8 = page_to_phys(folio_page(private_folio, folio_page_idx)),
+       };
+       unsigned long nr_pages = 1 << (level * 9);
+       u64 ret;
+
+       if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
+           (folio_page_idx + nr_pages > folio_nr_pages(private_folio)))
+               return -EINVAL;
+
+       while (nr_pages--)
+               tdx_clflush_page(folio_page(private_folio, folio_page_idx++));
+
+       ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
+
+       *extended_err1 = args.rcx;
+       *extended_err2 = args.rdx;
+
+       return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_aug);

The corresponding changes in KVM:

static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn)
 {
-       put_page(pfn_to_page(pfn));
+       folio_put(page_folio(pfn_to_page(pfn)));
 }
 
 static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
                            enum pg_level level, kvm_pfn_t pfn)
 {
        struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
-       hpa_t tdr_pa = page_to_phys(kvm_tdx->td.tdr_page);
-       hpa_t hpa = pfn_to_hpa(pfn);
-       gpa_t gpa = gfn_to_gpa(gfn);
+       int tdx_level = pg_level_to_tdx_sept_level(level);
+       struct page *private_page = pfn_to_page(pfn);
        u64 entry, level_state;
        u64 err;
 
-       err = tdh_mem_page_aug(tdr_pa, gpa, hpa, &entry, &level_state);
+       err = tdh_mem_page_aug(&kvm_tdx->td, gfn, tdx_level, page_folio(private_page),
+                              folio_page_idx(page_folio(private_page), private_page),
+                              &entry, &level_state);
        if (unlikely(err & TDX_OPERAND_BUSY)) {
                tdx_unpin(kvm, pfn);
                return -EBUSY;
@@ -1620,9 +1621,9 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
         * migration.  Until guest_memfd supports page migration, prevent page
         * migration.
         * TODO: Once guest_memfd introduces callback on page migration,
-        * implement it and remove get_page/put_page().
+        * implement it and remove folio_get/folio_put().
         */
-       get_page(pfn_to_page(pfn));
+       folio_get(page_folio(pfn_to_page(pfn)));

> I think we should try to keep it as simple as possible for now.
Yeah.
So, do you think we need to have tdh_mem_page_aug() to support 4K level page
only and ask for Dave's review again for huge page?

Do we need to add param "level" ?
- if yes, "struct page" looks not fit.
- if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?
Edgecombe, Rick P Jan. 6, 2025, 7:21 p.m. UTC | #3
On Mon, 2025-01-06 at 13:50 +0800, Yan Zhao wrote:
> > I think we should try to keep it as simple as possible for now.
> Yeah.
> So, do you think we need to have tdh_mem_page_aug() to support 4K level page
> only and ask for Dave's review again for huge page?
> 
> Do we need to add param "level" ?
> - if yes, "struct page" looks not fit.
> - if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?

My thoughts would be we should export just what is needed for today to keep
things simple and speedy (skip level arg, support order 0 only), especially if
we can drop all folio checks. The SEAMCALL wrappers will not be set in stone and
it will be easier to review huge page required stuff in the context of already
settled 4k support.
Yan Zhao Jan. 7, 2025, 6:37 a.m. UTC | #4
On Tue, Jan 07, 2025 at 03:21:40AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-01-06 at 13:50 +0800, Yan Zhao wrote:
> > > I think we should try to keep it as simple as possible for now.
> > Yeah.
> > So, do you think we need to have tdh_mem_page_aug() to support 4K level page
> > only and ask for Dave's review again for huge page?
> > 
> > Do we need to add param "level" ?
> > - if yes, "struct page" looks not fit.
> > - if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?
> 
> My thoughts would be we should export just what is needed for today to keep
> things simple and speedy (skip level arg, support order 0 only), especially if
> we can drop all folio checks. The SEAMCALL wrappers will not be set in stone and
> it will be easier to review huge page required stuff in the context of already
> settled 4k support.
Ok. Attached the new diff for tdh_mem_page_aug() to support 4K only.
Have compiled and tested in my local env.
(I kept the tdx_level in tdh_mem_range_block() and tdh_mem_page_remove() in
later patches).

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 787c359a5fc9..1db93e4886df 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -144,9 +144,14 @@ struct tdx_vp {
 };

 u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
+u64 tdh_mem_page_add(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    struct page *source_page, u64 *extended_err1, u64 *extended_err2);
 u64 tdh_mem_sept_add(struct tdx_td *td, gfn_t gfn, int tdx_level, struct page *sept_page,
                     u64 *extended_err1, u64 *extended_err2);
 u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page);
+struct folio;
+u64 tdh_mem_page_aug(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    u64 *extended_err1, u64 *extended_err2);
 u64 tdh_mng_key_config(struct tdx_td *td);
 u64 tdh_mng_create(struct tdx_td *td, u64 hkid);
 u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp);
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index adb2059b6b5f..cfedff43e1e0 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1583,6 +1583,28 @@ u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page)
 }
 EXPORT_SYMBOL_GPL(tdh_mng_addcx);

+u64 tdh_mem_page_add(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    struct page *source_page, u64 *extended_err1, u64 *extended_err2)
+{
+       union tdx_sept_gpa_mapping_info gpa_info = { .level = 0, .gfn = gfn, };
+       struct tdx_module_args args = {
+               .rcx = gpa_info.full,
+               .rdx = tdx_tdr_pa(td),
+               .r8 = page_to_phys(private_page),
+               .r9 = page_to_phys(source_page),
+       };
+       u64 ret;
+
+       tdx_clflush_page(private_page);
+       ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args);
+
+       *extended_err1 = args.rcx;
+       *extended_err2 = args.rdx;
+
+       return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_add);
+
 u64 tdh_mem_sept_add(struct tdx_td *td, gfn_t gfn, int tdx_level, struct page *sept_page,
                     u64 *extended_err1, u64 *extended_err2)
 {
@@ -1616,6 +1638,28 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
 }
 EXPORT_SYMBOL_GPL(tdh_vp_addcx);

+u64 tdh_mem_page_aug(struct tdx_td *td, gfn_t gfn, struct page *private_page,
+                    u64 *extended_err1, u64 *extended_err2)
+{
+       union tdx_sept_gpa_mapping_info gpa_info = { .level = 0, .gfn = gfn, };
+       struct tdx_module_args args = {
+               .rcx = gpa_info.full,
+               .rdx = tdx_tdr_pa(td),
+               .r8 = page_to_phys(private_page),
+       };
+       u64 ret;
+
+       tdx_clflush_page(private_page);
+
+       ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
+
+       *extended_err1 = args.rcx;
+       *extended_err2 = args.rdx;
+
+       return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_aug);
+
 u64 tdh_mng_key_config(struct tdx_td *td)
 {
        struct tdx_module_args args = {
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 0d1ba0d0ac82..8a56e790f64d 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -18,8 +18,10 @@
  * TDX module SEAMCALL leaf functions
  */
 #define TDH_MNG_ADDCX                  1
+#define TDH_MEM_PAGE_ADD               2
 #define TDH_MEM_SEPT_ADD               3
 #define TDH_VP_ADDCX                   4
+#define TDH_MEM_PAGE_AUG               6
 #define TDH_MNG_KEY_CONFIG             8
 #define TDH_MNG_CREATE                 9
 #define TDH_MNG_RD                     11
Paolo Bonzini Jan. 14, 2025, 11:32 p.m. UTC | #5
On 1/6/25 06:50, Yan Zhao wrote:
> Yeah.
> So, do you think we need to have tdh_mem_page_aug() to support 4K level page
> only and ask for Dave's review again for huge page?

You're right that TDH.MEM.PAGE.AUG is basically the only case in which a 
struct folio is involved; on the other hand that also means that the 
arch/x86/virt part of large page support will be tiny and I don't think 
it will be a problem to review it again (for either Dave or myself).

> Do we need to add param "level" ?
> - if yes, "struct page" looks not fit.

Maybe, but I think adding folio knowledge now would be a bit too 
hypothetical.

> - if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?

I think it makes sense to add "int level" now everywhere, even if it is 
just to match the SEPT API and to have the same style for computing the 
SEAMCALL arguments.  I'd rather keep the arguments simple with just "gpa 
| level" (i.e. gpa/level instead of gfn/level) as the computation: 
that's because gpa is more obviously a u64.

I've pushed to kvm-coco-queue; if you have some time to double check 
what I did that's great, otherwise if I don't hear from you I'll post 
around noon European time the v3 of this series.

I have also asked Amazon, since they use KVM without struct page, 
whether it is a problem to have struct page pervasively in the API and 
they don't care.

Paolo
Edgecombe, Rick P Jan. 15, 2025, 12:49 a.m. UTC | #6
On Wed, 2025-01-15 at 00:32 +0100, Paolo Bonzini wrote:
> I've pushed to kvm-coco-queue; if you have some time to double check 
> what I did that's great, otherwise if I don't hear from you I'll post 
> around noon European time the v3 of this series.

Level handling and gpa encoding looks fine to me.

I get a build error:

In file included from /home/rpedgeco/repos/linux/include/asm-
generic/memory_model.h:5,
                 from arch/x86/include/asm/page.h:89,
                 from linux/arch/x86/include/asm/processor.h:20,
                 from linux/arch/x86/include/asm/timex.h:5,
                 from linux/include/linux/timex.h:67,
                 from linux/include/linux/clocksource.h:13,
                 from linux/include/linux/clockchips.h:14,
                 from linux/arch/x86/kernel/i8253.c:6:
linux/arch/x86/include/asm/tdx.h: In function ‘mk_keyed_paddr’:
linux/include/asm-generic/memory_model.h:38:58: error: ‘vmemmap’ undeclared
(first use in this function); did you mean ‘mem_map’?
   38 | #define __page_to_pfn(page)     (unsigned long)((page) - vmemmap)
      |                                                          ^~~~~~~

...and needed this:

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 201f2e910411..a8a6fbd7bf71 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -35,6 +35,7 @@
 
 #include <uapi/asm/mce.h>
 #include <asm/tdx_global_metadata.h>
+#include <linux/pgtable.h>
 
 /*
  * Used by the #VE exception handler to gather the #VE exception


Nit, whitespace errors:

+static inline u64 mk_keyed_paddr(u16 hkid, struct page *page)
+{
+       u64 ret;
+
+       ret = page_to_phys(page);
+       /* KeyID bits are just above the physical address bits: */
+       ret |= hkid << boot_cpu_data.x86_phys_bits;
+       
  ^ extra tab here
+       return ret;
+}
+

+static inline int pg_level_to_tdx_sept_level(enum pg_level level)
+{
+        WARN_ON_ONCE(level == PG_LEVEL_NONE);
+        return level - 1;
+ ^ spaces instead of tabs
+
+}

Lastly, TD's are not booting for me, with a QEMU error. Still debugging this.
Edgecombe, Rick P Jan. 15, 2025, 2:02 a.m. UTC | #7
On Tue, 2025-01-14 at 16:49 -0800, Rick Edgecombe wrote:
> Lastly, TD's are not booting for me, with a QEMU error. Still debugging this.

I was able to boot kvm-coco-queue with the below two fixes. I'll glue the TDX
tests back on tomorrow and test further. Regarding the mk_keyed_paddr()
cast/shift issue, Dave had expressed a preference for int over u16 for keyids:
https://lore.kernel.org/kvm/3a32ce4a-b108-4f06-a22d-14e9c2e135f7@intel.com/


diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 201f2e910411..e83f4bac6e9a 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -149,7 +149,7 @@ static inline u64 mk_keyed_paddr(u16 hkid, struct page
*page)
 
        ret = page_to_phys(page);
        /* KeyID bits are just above the physical address bits: */
-       ret |= hkid << boot_cpu_data.x86_phys_bits;
+       ret |= (u64)hkid << boot_cpu_data.x86_phys_bits;
        
        return ret;
 }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index df6928a62e2d..307b6ee083d0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2307,7 +2307,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params
*td_params,
        struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
        cpumask_var_t packages;
        struct page **tdcs_pages = NULL;
-       struct page *page, *tdr_page;
+       struct page *tdr_page;
        int ret, i;
        u64 err, rcx;
 
@@ -2333,7 +2333,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params
*td_params,
 
        for (i = 0; i < kvm_tdx->td.tdcs_nr_pages; i++) {
                tdcs_pages[i] = alloc_page(GFP_KERNEL);
-               if (!page)
+               if (!tdcs_pages[i])
                        goto free_tdcs;
        }
Yan Zhao Jan. 15, 2025, 5:49 a.m. UTC | #8
On Wed, Jan 15, 2025 at 12:32:24AM +0100, Paolo Bonzini wrote:
> On 1/6/25 06:50, Yan Zhao wrote:
> > Yeah.
> > So, do you think we need to have tdh_mem_page_aug() to support 4K level page
> > only and ask for Dave's review again for huge page?
> 
> You're right that TDH.MEM.PAGE.AUG is basically the only case in which a
> struct folio is involved; on the other hand that also means that the
> arch/x86/virt part of large page support will be tiny and I don't think it
> will be a problem to review it again (for either Dave or myself).
> 
> > Do we need to add param "level" ?
> > - if yes, "struct page" looks not fit.
> 
> Maybe, but I think adding folio knowledge now would be a bit too
> hypothetical.
> 
> > - if not, hardcode it as 0 in the wrapper and convert "pfn" to "struct page"?
> 
> I think it makes sense to add "int level" now everywhere, even if it is just
> to match the SEPT API and to have the same style for computing the SEAMCALL
> arguments.  I'd rather keep the arguments simple with just "gpa | level"
> (i.e. gpa/level instead of gfn/level) as the computation: that's because gpa
> is more obviously a u64.
> 
> I've pushed to kvm-coco-queue; if you have some time to double check what I
> did that's great, otherwise if I don't hear from you I'll post around noon
> European time the v3 of this series.
For tdh_mem_sept_add(), tdh_mem_page_aug(), tdh_mem_page_add()
- Use tdx_clflush_page() instead of  invoking clflush_cache_range() directly to
  share the common comment of tdx_clflush_page().
- prefer page_to_phy() over page_to_pfn()?
  https://lore.kernel.org/kvm/0070a616-5233-4a8d-8797-eb9f182f074d@intel.com/


4d0824a1daba ("x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages")
contains unexpected changes to tdh_mem_sept_add().
u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2)
   {
  +       u64 hpa = page_to_pfn(page) << PAGE_SHIFT;
          struct tdx_module_args args = {
                  .rcx = gpa | level,
                  .rdx = tdx_tdr_pa(td),
  -               .r8 = page_to_pfn(page) << PAGE_SHIFT,
  +               .r8 = hpa,
          };
          u64 ret;

  -       clflush_cache_range(page_to_virt(page), PAGE_SIZE);
  +       clflush_cache_range(__va(hpa), PAGE_SIZE);
          ret = seamcall_ret(TDH_MEM_SEPT_ADD, &args);

          *ext_err1 = args.rcx;
  @@ -1522,6 +1544,26 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
   }
   EXPORT_SYMBOL_GPL(tdh_vp_addcx);

> 
> I have also asked Amazon, since they use KVM without struct page, whether it
> is a problem to have struct page pervasively in the API and they don't care.
> 
> Paolo
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 9e0c60a41d5d..32f3981d56c5 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -138,8 +138,10 @@  struct tdx_vp {
 };
 
 u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
+u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx);
 u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx);
 u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page);
+u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx);
 u64 tdh_mng_key_config(struct tdx_td *td);
 u64 tdh_mng_create(struct tdx_td *td, u64 hkid);
 u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp);
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index a97a470dda23..f39197d4eafc 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1491,6 +1491,26 @@  u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page)
 }
 EXPORT_SYMBOL_GPL(tdh_mng_addcx);
 
+u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args args = {
+		.rcx = gpa,
+		.rdx = tdx_tdr_pa(td),
+		.r8 = hpa,
+		.r9 = source,
+	};
+	u64 ret;
+
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args);
+
+	*rcx = args.rcx;
+	*rdx = args.rdx;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_add);
+
 u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx)
 {
 	struct tdx_module_args args = {
@@ -1522,6 +1542,25 @@  u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
 }
 EXPORT_SYMBOL_GPL(tdh_vp_addcx);
 
+u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx)
+{
+	struct tdx_module_args args = {
+		.rcx = gpa,
+		.rdx = tdx_tdr_pa(td),
+		.r8 = hpa,
+	};
+	u64 ret;
+
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
+
+	*rcx = args.rcx;
+	*rdx = args.rdx;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdh_mem_page_aug);
+
 u64 tdh_mng_key_config(struct tdx_td *td)
 {
 	struct tdx_module_args args = {
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 308d3aa565d7..80e6ef006085 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -16,8 +16,10 @@ 
  * TDX module SEAMCALL leaf functions
  */
 #define TDH_MNG_ADDCX			1
+#define TDH_MEM_PAGE_ADD		2
 #define TDH_MEM_SEPT_ADD		3
 #define TDH_VP_ADDCX			4
+#define TDH_MEM_PAGE_AUG		6
 #define TDH_MNG_KEY_CONFIG		8
 #define TDH_MNG_CREATE			9
 #define TDH_MNG_RD			11