diff mbox series

[V3,09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch)

Message ID 20220521131700.3661-10-jiangshanlai@gmail.com (mailing list archive)
State New, archived
Headers show
Series KVM: X86/MMU: Use one-off local shadow page for special roots | expand

Commit Message

Lai Jiangshan May 21, 2022, 1:16 p.m. UTC
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

FNAME(page_fault) verifies PDPTE for nested NPT in PAE paging mode
because nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from
memory unconditionally for each call.

The verifying is complicated and it works only when mmu->pae_root
is always used when the guest is PAE paging.

Move the verifying code in FNAME(fetch) and simplify it since the local
shadow page is used and it can be walked in FNAME(fetch) and unlinked
from children via drop_spte().

It also allows for mmu->pae_root NOT to be used when it is NOT required
to be put in a 32bit CR3.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 72 ++++++++++++++++------------------
 1 file changed, 33 insertions(+), 39 deletions(-)

Comments

Sean Christopherson July 19, 2022, 11:21 p.m. UTC | #1
On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> FNAME(page_fault) verifies PDPTE for nested NPT in PAE paging mode
> because nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from
> memory unconditionally for each call.
> 
> The verifying is complicated and it works only when mmu->pae_root
> is always used when the guest is PAE paging.

Why is this relevant?  It's not _that_ complicated, and even if it were, I don't
see how calling that out helps the reader understand the motivation for this patch.

> Move the verifying code in FNAME(fetch) and simplify it since the local
> shadow page is used and it can be walked in FNAME(fetch) and unlinked
> from children via drop_spte().
> 
> It also allows for mmu->pae_root NOT to be used when it is NOT required

Avoid leading with pronous, "it" is ambiguous, e.g. at first I thought "it' meant
moving the code, but what "it" really means is using the iterator from the shadow
page walk instead of hardcoding a pae_root lookup.

And changing from pae_root to it.sptep needs to be explicitly called out.  It's
a subtle but important detail.  And if you call that out, then it's more obvious
why this patch is relevant to not having to use pae_root for a 64-bit host with NPT.

> to be put in a 32bit CR3.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/paging_tmpl.h | 72 ++++++++++++++++------------------
>  1 file changed, 33 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index cd6032e1947c..67c419bce1e5 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -659,6 +659,39 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  		clear_sp_write_flooding_count(it.sptep);
>  		drop_large_spte(vcpu, it.sptep);
>  
> +		/*
> +		 * When nested NPT enabled and L1 is PAE paging,
> +		 * mmu->get_pdptrs() which is nested_svm_get_tdp_pdptr() reads
> +		 * the guest NPT's PDPTE from memory unconditionally for each
> +		 * call.
> +		 *
> +		 * The guest PAE root page is not write-protected.
> +		 *
> +		 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get
> +		 * a value different from previous calls or different from the
> +		 * return value of mmu->get_pdptrs() in mmu_alloc_shadow_roots().
> +		 *
> +		 * It will cause the following code installs the spte in a wrong
> +		 * sp or links a sp to a wrong parent if the return value of
> +		 * mmu->get_pdptrs() is not verified unchanged since
> +		 * FNAME(gpte_changed) can't check this kind of change.
> +		 *
> +		 * Verify the return value of mmu->get_pdptrs() (only the gfn
> +		 * in it needs to be checked) and drop the spte if the gfn isn't
> +		 * matched.
> +		 *
> +		 * Do the verifying unconditionally when the guest is PAE
> +		 * paging no matter whether it is nested NPT or not to avoid
> +		 * complicated code.
> +		 */
> +		if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL &&
> +		    it.level == PT32E_ROOT_LEVEL &&
> +		    is_shadow_present_pte(*it.sptep)) {
> +			sp = to_shadow_page(*it.sptep & PT64_BASE_ADDR_MASK);

For this patch, it's probably worth a

			WARN_ON_ONCE(sp->spt != vcpu->arch.mmu->pae_root);

Mostly so that when the future patch stops using pae_root for 64-bit NPT hosts,
there's a code change for this particular logic that is very much relevant to
that change.

> +			if (gw->table_gfn[it.level - 2] != sp->gfn)
> +				drop_spte(vcpu->kvm, it.sptep);
> +		}
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index cd6032e1947c..67c419bce1e5 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -659,6 +659,39 @@  static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		clear_sp_write_flooding_count(it.sptep);
 		drop_large_spte(vcpu, it.sptep);
 
+		/*
+		 * When nested NPT enabled and L1 is PAE paging,
+		 * mmu->get_pdptrs() which is nested_svm_get_tdp_pdptr() reads
+		 * the guest NPT's PDPTE from memory unconditionally for each
+		 * call.
+		 *
+		 * The guest PAE root page is not write-protected.
+		 *
+		 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get
+		 * a value different from previous calls or different from the
+		 * return value of mmu->get_pdptrs() in mmu_alloc_shadow_roots().
+		 *
+		 * It will cause the following code installs the spte in a wrong
+		 * sp or links a sp to a wrong parent if the return value of
+		 * mmu->get_pdptrs() is not verified unchanged since
+		 * FNAME(gpte_changed) can't check this kind of change.
+		 *
+		 * Verify the return value of mmu->get_pdptrs() (only the gfn
+		 * in it needs to be checked) and drop the spte if the gfn isn't
+		 * matched.
+		 *
+		 * Do the verifying unconditionally when the guest is PAE
+		 * paging no matter whether it is nested NPT or not to avoid
+		 * complicated code.
+		 */
+		if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL &&
+		    it.level == PT32E_ROOT_LEVEL &&
+		    is_shadow_present_pte(*it.sptep)) {
+			sp = to_shadow_page(*it.sptep & PT64_BASE_ADDR_MASK);
+			if (gw->table_gfn[it.level - 2] != sp->gfn)
+				drop_spte(vcpu->kvm, it.sptep);
+		}
+
 		sp = NULL;
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
@@ -886,44 +919,6 @@  static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (is_page_fault_stale(vcpu, fault, mmu_seq))
 		goto out_unlock;
 
-	/*
-	 * When nested NPT enabled and L1 is PAE paging, mmu->get_pdptrs()
-	 * which is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE
-	 * from memory unconditionally for each call.
-	 *
-	 * The guest PAE root page is not write-protected.
-	 *
-	 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get a value
-	 * different from previous calls or different from the return value of
-	 * mmu->get_pdptrs() in mmu_alloc_shadow_roots().
-	 *
-	 * It will cause FNAME(fetch) installs the spte in a wrong sp or links
-	 * a sp to a wrong parent if the return value of mmu->get_pdptrs()
-	 * is not verified unchanged since FNAME(gpte_changed) can't check
-	 * this kind of change.
-	 *
-	 * Verify the return value of mmu->get_pdptrs() (only the gfn in it
-	 * needs to be checked) and do kvm_mmu_free_roots() like load_pdptr()
-	 * if the gfn isn't matched.
-	 *
-	 * Do the verifying unconditionally when the guest is PAE paging no
-	 * matter whether it is nested NPT or not to avoid complicated code.
-	 */
-	if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
-		u64 pdpte = vcpu->arch.mmu->pae_root[(fault->addr >> 30) & 3];
-		struct kvm_mmu_page *sp = NULL;
-
-		if (IS_VALID_PAE_ROOT(pdpte))
-			sp = to_shadow_page(pdpte & PT64_BASE_ADDR_MASK);
-
-		if (!sp || walker.table_gfn[PT32E_ROOT_LEVEL - 2] != sp->gfn) {
-			write_unlock(&vcpu->kvm->mmu_lock);
-			kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu,
-					   KVM_MMU_ROOT_CURRENT);
-			goto release_clean;
-		}
-	}
-
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
@@ -931,7 +926,6 @@  static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-release_clean:
 	kvm_release_pfn_clean(fault->pfn);
 	return r;
 }