diff mbox series

[PULL,13/19] KVM: SEV: Implement gmem hook for invalidating private pages

Message ID 20240510211024.556136-14-michael.roth@amd.com (mailing list archive)
State New
Headers show
Series [PULL,01/19] KVM: MMU: Disable fast path if KVM_EXIT_MEMORY_FAULT is needed | expand

Commit Message

Michael Roth May 10, 2024, 9:10 p.m. UTC
Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501085210.2213060-15-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/Kconfig   |  1 +
 arch/x86/kvm/svm/sev.c | 64 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  2 ++
 4 files changed, 68 insertions(+)

Comments

Sean Christopherson May 15, 2024, 10:32 p.m. UTC | #1
On Fri, May 10, 2024, Michael Roth wrote:
> Implement a platform hook to do the work of restoring the direct map
> entries of gmem-managed pages and transitioning the corresponding RMP
> table entries back to the default shared/hypervisor-owned state.

...

> +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> +{
> +	kvm_pfn_t pfn;
> +
> +	pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
> +
> +	for (pfn = start; pfn < end;) {
> +		bool use_2m_update = false;
> +		int rc, rmp_level;
> +		bool assigned;
> +
> +		rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
> +		if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
> +			      pfn, rc))
> +			goto next_pfn;

This is comically trivial to hit, as it fires when running guest_memfd_test on a
!SNP host.  Presumably the correct fix is to simply do nothing for !sev_snp_guest(),
but that's easier said than done due to the lack of a @kvm in .gmem_invalidate().

That too is not a big fix, but that's beside the point.  IMO, the fact that I'm
the first person to (completely inadvertantly) hit this rather basic bug is a
good hint that we should wait until 6.11 to merge SNP support.
Michael Roth May 16, 2024, 3:11 a.m. UTC | #2
On Wed, May 15, 2024 at 03:32:31PM -0700, Sean Christopherson wrote:
> On Fri, May 10, 2024, Michael Roth wrote:
> > Implement a platform hook to do the work of restoring the direct map
> > entries of gmem-managed pages and transitioning the corresponding RMP
> > table entries back to the default shared/hypervisor-owned state.
> 
> ...
> 
> > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> > +{
> > +	kvm_pfn_t pfn;
> > +
> > +	pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
> > +
> > +	for (pfn = start; pfn < end;) {
> > +		bool use_2m_update = false;
> > +		int rc, rmp_level;
> > +		bool assigned;
> > +
> > +		rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
> > +		if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
> > +			      pfn, rc))
> > +			goto next_pfn;
> 
> This is comically trivial to hit, as it fires when running guest_memfd_test on a
> !SNP host.  Presumably the correct fix is to simply do nothing for !sev_snp_guest(),
> but that's easier said than done due to the lack of a @kvm in .gmem_invalidate().

Yah, the code assumes that SNP is the only SVM user that would use gmem
pages. Unfortunately KVM_X86_SW_PROTECTED_VM is the one other situation
where this can be the case. The minimal fix would be to squash the below
into this patch:

  diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
  index 176ba117413a..56b0b59b8263 100644
  --- a/arch/x86/kvm/svm/sev.c
  +++ b/arch/x86/kvm/svm/sev.c
  @@ -4675,6 +4675,9 @@ void sev_gmem_invalidate(kvm_pfn_t start,
  kvm_pfn_t end)
   {
           kvm_pfn_t pfn;
  
           +       if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
           +               return;
           +
                   pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n",
                   __func__, start, end);
  
                           for (pfn = start; pfn < end;) {

It's not perfect because the callback will still run for
KVM_X86_SW_PROTECTED_VM if SNP is enabled, but in the context of
KVM_X86_SW_PROTECTED_VM being a stand-in for testing SNP/TDX, that
might not be such a bad thing.

Longer term if we need something more robust would be to modify the
.free_folio callback path to pass along folio->mapping, or switch to
something else that provides similar functionality. Another approach
might be to set .free_folio dynamically based on the vm_type of the
gmem user when creating the gmem instance.

> 
> That too is not a big fix, but that's beside the point.  IMO, the fact that I'm
> the first person to (completely inadvertantly) hit this rather basic bug is a
> good hint that we should wait until 6.11 to merge SNP support.

We do regular testing of normal guests with/without SNP enabled, but
unfortunately we've only been doing KST runs on SNP-enabled hosts.
I've retested with the above fix and everything looks good with
SVM/SEV/SEV-ES/SNP/selftests with and without SNP enabled, but I
understand if we still have reservations after this.

-Mike
Paolo Bonzini May 16, 2024, 12:45 p.m. UTC | #3
On Thu, May 16, 2024 at 12:32 AM Sean Christopherson <seanjc@google.com> wrote:
> > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> > +{
> > +     kvm_pfn_t pfn;
> > +
> > +     pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
> > +
> > +     for (pfn = start; pfn < end;) {
> > +             bool use_2m_update = false;
> > +             int rc, rmp_level;
> > +             bool assigned;
> > +
> > +             rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
> > +             if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
> > +                           pfn, rc))
> > +                     goto next_pfn;
>
> This is comically trivial to hit, as it fires when running guest_memfd_test on a
> !SNP host.  Presumably the correct fix is to simply do nothing for !sev_snp_guest(),
> but that's easier said than done due to the lack of a @kvm in .gmem_invalidate().
>
> That too is not a big fix, but that's beside the point.  IMO, the fact that I'm
> the first person to (completely inadvertantly) hit this rather basic bug is a
> good hint that we should wait until 6.11 to merge SNP support.

Of course there is an explanation - I usually run all the tests before
pushing anything to kvm/next, here I did not do it because 1) I was
busy with the merge window and 2) I wanted to give exposure to the
code in linux-next, which was the right call indeed but it's beside
the point. Between the clang issue and this one, it's clear that even
though the implementation is 99.99% okay (especially considering the
size), there are a few kinks to fix.

I'll fix everything up and re-push to kvm/next, but I agree that we
shouldn't rush it any further. What really matters is that development
on userspace can proceed.

This also confirms that it's important to replace kvm/next with
kvm/queue in linux-next, since linux-next doesn't care that much about
branches that rebase.

Paolo
Paolo Bonzini May 21, 2024, 4:55 p.m. UTC | #4
On Thu, May 16, 2024 at 5:12 AM Michael Roth <michael.roth@amd.com> wrote:
> Longer term if we need something more robust would be to modify the
> .free_folio callback path to pass along folio->mapping, or switch to
> something else that provides similar functionality. Another approach
> might be to set .free_folio dynamically based on the vm_type of the
> gmem user when creating the gmem instance.

You need to not warn. Testing CC_ATTR_HOST_SEV_SNP is just an optimization.

Paolo

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index dc00b89404a2..1c57b4535f15 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4676,8 +4676,7 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
         bool assigned;

         rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
-        if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN
0x%llx error %d\n",
-                  pfn, rc))
+        if (rc)
             goto next_pfn;

         if (!assigned)

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 10768f13b240..2a7f69abcac3 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -138,6 +138,7 @@  config KVM_AMD_SEV
 	select ARCH_HAS_CC_PLATFORM
 	select KVM_GENERIC_PRIVATE_MEM
 	select HAVE_KVM_GMEM_PREPARE
+	select HAVE_KVM_GMEM_INVALIDATE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2bc4aa91cd31..379ac6efd74e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4663,3 +4663,67 @@  int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 
 	return 0;
 }
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn;
+
+	pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+	for (pfn = start; pfn < end;) {
+		bool use_2m_update = false;
+		int rc, rmp_level;
+		bool assigned;
+
+		rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+			      pfn, rc))
+			goto next_pfn;
+
+		if (!assigned)
+			goto next_pfn;
+
+		use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+				end >= (pfn + PTRS_PER_PMD) &&
+				rmp_level > PG_LEVEL_4K;
+
+		/*
+		 * If an unaligned PFN corresponds to a 2M region assigned as a
+		 * large page in the RMP table, PSMASH the region into individual
+		 * 4K RMP entries before attempting to convert a 4K sub-page.
+		 */
+		if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+			/*
+			 * This shouldn't fail, but if it does, report it, but
+			 * still try to update RMP entry to shared and pray this
+			 * was a spurious error that can be addressed later.
+			 */
+			rc = snp_rmptable_psmash(pfn);
+			WARN_ONCE(rc, "SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+				  pfn, rc);
+		}
+
+		rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+		if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+			      pfn, rc))
+			goto next_pfn;
+
+		/*
+		 * SEV-ES avoids host/guest cache coherency issues through
+		 * WBINVD hooks issued via MMU notifiers during run-time, and
+		 * KVM's VM destroy path at shutdown. Those MMU notifier events
+		 * don't cover gmem since there is no requirement to map pages
+		 * to a HVA in order to use them for a running guest. While the
+		 * shutdown path would still likely cover things for SNP guests,
+		 * userspace may also free gmem pages during run-time via
+		 * hole-punching operations on the guest_memfd, so flush the
+		 * cache entries for these pages before free'ing them back to
+		 * the host.
+		 */
+		clflush_cache_range(__va(pfn_to_hpa(pfn)),
+				    use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+		pfn += use_2m_update ? PTRS_PER_PMD : 1;
+		cond_resched();
+	}
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b9ecc06f8934..653cdb23a7d1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5083,6 +5083,7 @@  static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_invalidate = sev_gmem_invalidate,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4203bd9012e9..3cea024a7c18 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -737,6 +737,7 @@  void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -757,6 +758,7 @@  static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 {
 	return 0;
 }
+static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
 
 #endif