diff mbox series

[16/21] KVM: TDX: Premap initial guest memory

Message ID 20240904030751.117579-17-rick.p.edgecombe@intel.com (mailing list archive)
State New, archived
Headers show
Series TDX MMU Part 2 | expand

Commit Message

Edgecombe, Rick P Sept. 4, 2024, 3:07 a.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Update TDX's hook of set_external_spte() to record pre-mapping cnt instead
of doing nothing and returning when TD is not finalized.

TDX uses ioctl KVM_TDX_INIT_MEM_REGION to initialize its initial guest
memory. This ioctl calls kvm_gmem_populate() to get guest pages and in
tdx_gmem_post_populate(), it will
(1) Map page table pages into KVM mirror page table and private EPT.
(2) Map guest pages into KVM mirror page table. In the propagation hook,
    just record pre-mapping cnt without mapping the guest page into private
    EPT.
(3) Map guest pages into private EPT and decrease pre-mapping cnt.

Do not map guest pages into private EPT directly in step (2), because TDX
requires TDH.MEM.PAGE.ADD() to add a guest page before TD is finalized,
which copies page content from a source page from user to target guest page
to be added. However, source page is not available via common interface
kvm_tdp_map_page() in step (2).

Therefore, just pre-map the guest page into KVM mirror page table and
record the pre-mapping cnt in TDX's propagation hook. The pre-mapping cnt
would be decreased in ioctl KVM_TDX_INIT_MEM_REGION when the guest page is
mapped into private EPT.

Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
TDX MMU part 2 v1:
 - Update the code comment and patch log according to latest gmem update.
   https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/
 - Rename tdx_mem_page_add() to tdx_mem_page_record_premap_cnt() to avoid
   confusion.
 - Change the patch title to "KVM: TDX: Premap initial guest memory".
 - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean)
 - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to
   KVM_TDX_INIT_MEM_REGION. (Sean)
 - Added nr_premapped to track the number of premapped pages
 - Drop tdx_post_mmu_map_page().

v19:
 - Switched to use KVM_MEMORY_MAPPING
 - Dropped measurement extension
 - updated commit message. private_page_add() => set_private_spte()
---
 arch/x86/kvm/vmx/tdx.c | 40 +++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/vmx/tdx.h |  2 +-
 2 files changed, 34 insertions(+), 8 deletions(-)

Comments

Paolo Bonzini Sept. 10, 2024, 10:24 a.m. UTC | #1
On 9/4/24 05:07, Rick Edgecombe wrote:
> +static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn,
> +					  enum pg_level level, kvm_pfn_t pfn)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +
> +	/* Returning error here to let TDP MMU bail out early. */
> +	if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) {
> +		tdx_unpin(kvm, pfn);
> +		return -EINVAL;
> +	}

Should this "if" already be part of patch 14, and in 
tdx_sept_set_private_spte() rather than tdx_mem_page_record_premap_cnt()?

Thanks,

Paolo
Paolo Bonzini Sept. 10, 2024, 10:49 a.m. UTC | #2
On 9/4/24 05:07, Rick Edgecombe wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Update TDX's hook of set_external_spte() to record pre-mapping cnt instead
> of doing nothing and returning when TD is not finalized.
> 
> TDX uses ioctl KVM_TDX_INIT_MEM_REGION to initialize its initial guest
> memory. This ioctl calls kvm_gmem_populate() to get guest pages and in
> tdx_gmem_post_populate(), it will
> (1) Map page table pages into KVM mirror page table and private EPT.
> (2) Map guest pages into KVM mirror page table. In the propagation hook,
>      just record pre-mapping cnt without mapping the guest page into private
>      EPT.
> (3) Map guest pages into private EPT and decrease pre-mapping cnt.
> 
> Do not map guest pages into private EPT directly in step (2), because TDX
> requires TDH.MEM.PAGE.ADD() to add a guest page before TD is finalized,
> which copies page content from a source page from user to target guest page
> to be added. However, source page is not available via common interface
> kvm_tdp_map_page() in step (2).
> 
> Therefore, just pre-map the guest page into KVM mirror page table and
> record the pre-mapping cnt in TDX's propagation hook. The pre-mapping cnt
> would be decreased in ioctl KVM_TDX_INIT_MEM_REGION when the guest page is
> mapped into private EPT.

Stale commit message; squashing all of it into patch 20 is an easy cop 
out...

Paolo

> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> TDX MMU part 2 v1:
>   - Update the code comment and patch log according to latest gmem update.
>     https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/
>   - Rename tdx_mem_page_add() to tdx_mem_page_record_premap_cnt() to avoid
>     confusion.
>   - Change the patch title to "KVM: TDX: Premap initial guest memory".
>   - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean)
>   - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to
>     KVM_TDX_INIT_MEM_REGION. (Sean)
>   - Added nr_premapped to track the number of premapped pages
>   - Drop tdx_post_mmu_map_page().
> 
> v19:
>   - Switched to use KVM_MEMORY_MAPPING
>   - Dropped measurement extension
>   - updated commit message. private_page_add() => set_private_spte()
> ---
>   arch/x86/kvm/vmx/tdx.c | 40 +++++++++++++++++++++++++++++++++-------
>   arch/x86/kvm/vmx/tdx.h |  2 +-
>   2 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 59b627b45475..435112562954 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -488,6 +488,34 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
>   	return 0;
>   }
>   
> +/*
> + * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to get guest pages and
> + * tdx_gmem_post_populate() to premap page table pages into private EPT.
> + * Mapping guest pages into private EPT before TD is finalized should use a
> + * seamcall TDH.MEM.PAGE.ADD(), which copies page content from a source page
> + * from user to target guest pages to be added. This source page is not
> + * available via common interface kvm_tdp_map_page(). So, currently,
> + * kvm_tdp_map_page() only premaps guest pages into KVM mirrored root.
> + * A counter nr_premapped is increased here to record status. The counter will
> + * be decreased after TDH.MEM.PAGE.ADD() is called after the kvm_tdp_map_page()
> + * in tdx_gmem_post_populate().
> + */
> +static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn,
> +					  enum pg_level level, kvm_pfn_t pfn)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +
> +	/* Returning error here to let TDP MMU bail out early. */
> +	if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) {
> +		tdx_unpin(kvm, pfn);
> +		return -EINVAL;
> +	}
> +
> +	/* nr_premapped will be decreased when tdh_mem_page_add() is called. */
> +	atomic64_inc(&kvm_tdx->nr_premapped);
> +	return 0;
> +}
> +
>   int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
>   			      enum pg_level level, kvm_pfn_t pfn)
>   {
> @@ -510,11 +538,7 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
>   	if (likely(is_td_finalized(kvm_tdx)))
>   		return tdx_mem_page_aug(kvm, gfn, level, pfn);
>   
> -	/*
> -	 * TODO: KVM_MAP_MEMORY support to populate before finalize comes
> -	 * here for the initial memory.
> -	 */
> -	return 0;
> +	return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn);
>   }
>   
>   static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
> @@ -546,10 +570,12 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
>   	if (unlikely(!is_td_finalized(kvm_tdx) &&
>   		     err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) {
>   		/*
> -		 * This page was mapped with KVM_MAP_MEMORY, but
> -		 * KVM_TDX_INIT_MEM_REGION is not issued yet.
> +		 * Page is mapped by KVM_TDX_INIT_MEM_REGION, but hasn't called
> +		 * tdh_mem_page_add().
>   		 */
>   		if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) {
> +			WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped));
> +			atomic64_dec(&kvm_tdx->nr_premapped);
>   			tdx_unpin(kvm, pfn);
>   			return 0;
>   		}
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 66540c57ed61..25a4aaede2ba 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -26,7 +26,7 @@ struct kvm_tdx {
>   
>   	u64 tsc_offset;
>   
> -	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
> +	/* For KVM_TDX_INIT_MEM_REGION. */
>   	atomic64_t nr_premapped;
>   
>   	struct kvm_cpuid2 *cpuid;
Edgecombe, Rick P Sept. 11, 2024, 12:19 a.m. UTC | #3
On Tue, 2024-09-10 at 12:24 +0200, Paolo Bonzini wrote:
> On 9/4/24 05:07, Rick Edgecombe wrote:
> > +static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn,
> > +                                         enum pg_level level, kvm_pfn_t
> > pfn)
> > +{
> > +       struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > +
> > +       /* Returning error here to let TDP MMU bail out early. */
> > +       if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) {
> > +               tdx_unpin(kvm, pfn);
> > +               return -EINVAL;
> > +       }
> 
> Should this "if" already be part of patch 14, and in 
> tdx_sept_set_private_spte() rather than tdx_mem_page_record_premap_cnt()?

Hmm, makes sense to me. Thanks.
Edgecombe, Rick P Sept. 11, 2024, 12:30 a.m. UTC | #4
On Tue, 2024-09-10 at 12:49 +0200, Paolo Bonzini wrote:
> On 9/4/24 05:07, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > Update TDX's hook of set_external_spte() to record pre-mapping cnt instead
> > of doing nothing and returning when TD is not finalized.
> > 
> > TDX uses ioctl KVM_TDX_INIT_MEM_REGION to initialize its initial guest
> > memory. This ioctl calls kvm_gmem_populate() to get guest pages and in
> > tdx_gmem_post_populate(), it will
> > (1) Map page table pages into KVM mirror page table and private EPT.
> > (2) Map guest pages into KVM mirror page table. In the propagation hook,
> >       just record pre-mapping cnt without mapping the guest page into
> > private
> >       EPT.
> > (3) Map guest pages into private EPT and decrease pre-mapping cnt.
> > 
> > Do not map guest pages into private EPT directly in step (2), because TDX
> > requires TDH.MEM.PAGE.ADD() to add a guest page before TD is finalized,
> > which copies page content from a source page from user to target guest page
> > to be added. However, source page is not available via common interface
> > kvm_tdp_map_page() in step (2).
> > 
> > Therefore, just pre-map the guest page into KVM mirror page table and
> > record the pre-mapping cnt in TDX's propagation hook. The pre-mapping cnt
> > would be decreased in ioctl KVM_TDX_INIT_MEM_REGION when the guest page is
> > mapped into private EPT.
> 
> Stale commit message; squashing all of it into patch 20 is an easy cop 
> out...

Arh, yes this has details that are not relevant to the patch.

Squashing it seems fine, but I wasn't sure about whether we actually needed this
nr_premapped. It was one of the things we decided to punt a decision on in order
to continue our debates on the list. So we need to pick up the debate again.
Paolo Bonzini Sept. 11, 2024, 10:39 a.m. UTC | #5
On Wed, Sep 11, 2024 at 2:30 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
> Arh, yes this has details that are not relevant to the patch.
>
> Squashing it seems fine, but I wasn't sure about whether we actually needed this
> nr_premapped. It was one of the things we decided to punt a decision on in order
> to continue our debates on the list. So we need to pick up the debate again.

I think keeping nr_premapped is safer.

Paolo
Edgecombe, Rick P Sept. 11, 2024, 4:36 p.m. UTC | #6
On Wed, 2024-09-11 at 12:39 +0200, Paolo Bonzini wrote:
> On Wed, Sep 11, 2024 at 2:30 AM Edgecombe, Rick P
> <rick.p.edgecombe@intel.com> wrote:
> > Arh, yes this has details that are not relevant to the patch.
> > 
> > Squashing it seems fine, but I wasn't sure about whether we actually needed
> > this
> > nr_premapped. It was one of the things we decided to punt a decision on in
> > order
> > to continue our debates on the list. So we need to pick up the debate again.
> 
> I think keeping nr_premapped is safer.

Heh, well it's not hurting anything except adding a small amount of complexity,
so I guess we can cancel the debate. Thanks.
Adrian Hunter Sept. 13, 2024, 1:33 p.m. UTC | #7
On 11/09/24 03:19, Edgecombe, Rick P wrote:
> On Tue, 2024-09-10 at 12:24 +0200, Paolo Bonzini wrote:
>> On 9/4/24 05:07, Rick Edgecombe wrote:
>>> +static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn,
>>> +                                         enum pg_level level, kvm_pfn_t
>>> pfn)
>>> +{
>>> +       struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
>>> +
>>> +       /* Returning error here to let TDP MMU bail out early. */
>>> +       if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) {
>>> +               tdx_unpin(kvm, pfn);
>>> +               return -EINVAL;
>>> +       }
>>
>> Should this "if" already be part of patch 14, and in 
>> tdx_sept_set_private_spte() rather than tdx_mem_page_record_premap_cnt()?
> 
> Hmm, makes sense to me. Thanks.

It is already in patch 14, so just remove it from this patch
presumably.
Edgecombe, Rick P Sept. 13, 2024, 7:49 p.m. UTC | #8
On Fri, 2024-09-13 at 16:33 +0300, Adrian Hunter wrote:
> It is already in patch 14, so just remove it from this patch
> presumably.

Right, there is an equivalent check in tdx_sept_set_private_spte(), so we can
just drop this check.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 59b627b45475..435112562954 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -488,6 +488,34 @@  static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
 	return 0;
 }
 
+/*
+ * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to get guest pages and
+ * tdx_gmem_post_populate() to premap page table pages into private EPT.
+ * Mapping guest pages into private EPT before TD is finalized should use a
+ * seamcall TDH.MEM.PAGE.ADD(), which copies page content from a source page
+ * from user to target guest pages to be added. This source page is not
+ * available via common interface kvm_tdp_map_page(). So, currently,
+ * kvm_tdp_map_page() only premaps guest pages into KVM mirrored root.
+ * A counter nr_premapped is increased here to record status. The counter will
+ * be decreased after TDH.MEM.PAGE.ADD() is called after the kvm_tdp_map_page()
+ * in tdx_gmem_post_populate().
+ */
+static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn,
+					  enum pg_level level, kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+
+	/* Returning error here to let TDP MMU bail out early. */
+	if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) {
+		tdx_unpin(kvm, pfn);
+		return -EINVAL;
+	}
+
+	/* nr_premapped will be decreased when tdh_mem_page_add() is called. */
+	atomic64_inc(&kvm_tdx->nr_premapped);
+	return 0;
+}
+
 int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
 			      enum pg_level level, kvm_pfn_t pfn)
 {
@@ -510,11 +538,7 @@  int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
 	if (likely(is_td_finalized(kvm_tdx)))
 		return tdx_mem_page_aug(kvm, gfn, level, pfn);
 
-	/*
-	 * TODO: KVM_MAP_MEMORY support to populate before finalize comes
-	 * here for the initial memory.
-	 */
-	return 0;
+	return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn);
 }
 
 static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
@@ -546,10 +570,12 @@  static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
 	if (unlikely(!is_td_finalized(kvm_tdx) &&
 		     err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) {
 		/*
-		 * This page was mapped with KVM_MAP_MEMORY, but
-		 * KVM_TDX_INIT_MEM_REGION is not issued yet.
+		 * Page is mapped by KVM_TDX_INIT_MEM_REGION, but hasn't called
+		 * tdh_mem_page_add().
 		 */
 		if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) {
+			WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped));
+			atomic64_dec(&kvm_tdx->nr_premapped);
 			tdx_unpin(kvm, pfn);
 			return 0;
 		}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 66540c57ed61..25a4aaede2ba 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -26,7 +26,7 @@  struct kvm_tdx {
 
 	u64 tsc_offset;
 
-	/* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */
+	/* For KVM_TDX_INIT_MEM_REGION. */
 	atomic64_t nr_premapped;
 
 	struct kvm_cpuid2 *cpuid;