diff mbox series

[v2,03/26] KVM: x86/mmu: Derive shadow MMU page role from parent

Message ID 20220311002528.2230172-4-dmatlack@google.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Extend Eager Page Splitting to the shadow MMU | expand

Commit Message

David Matlack March 11, 2022, 12:25 a.m. UTC
Instead of computing the shadow page role from scratch for every new
page, we can derive most of the information from the parent shadow page.
This avoids redundant calculations and reduces the number of parameters
to kvm_mmu_get_page().

Preemptively split out the role calculation to a separate function for
use in a following commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 91 ++++++++++++++++++++++++----------
 arch/x86/kvm/mmu/paging_tmpl.h |  9 ++--
 2 files changed, 71 insertions(+), 29 deletions(-)

Comments

Peter Xu March 15, 2022, 8:15 a.m. UTC | #1
On Fri, Mar 11, 2022 at 12:25:05AM +0000, David Matlack wrote:
> Instead of computing the shadow page role from scratch for every new
> page, we can derive most of the information from the parent shadow page.
> This avoids redundant calculations and reduces the number of parameters
> to kvm_mmu_get_page().
> 
> Preemptively split out the role calculation to a separate function for
> use in a following commit.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Looks right..

Reviewed-by: Peter Xu <peterx@redhat.com>

Two more comments/questions below.

> +static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access)
> +{
> +	struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
> +	union kvm_mmu_page_role role;
> +
> +	role = parent_sp->role;
> +	role.level--;
> +	role.access = access;
> +	role.direct = direct;
> +
> +	/*
> +	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
> +	 * 2-level, non-PAE paging. KVM shadows such guests using 4 PAE page
> +	 * directories, each mapping 1/4 of the guest's linear address space
> +	 * (1GiB). The shadow pages for those 4 page directories are
> +	 * pre-allocated and assigned a separate quadrant in their role.
> +	 *
> +	 * Since we are allocating a child shadow page and there are only 2
> +	 * levels, this must be a PG_LEVEL_4K shadow page. Here the quadrant
> +	 * will either be 0 or 1 because it maps 1/2 of the address space mapped
> +	 * by the guest's PG_LEVEL_4K page table (or 4MiB huge page) that it
> +	 * is shadowing. In this case, the quadrant can be derived by the index
> +	 * of the SPTE that points to the new child shadow page in the page
> +	 * directory (parent_sp). Specifically, every 2 SPTEs in parent_sp
> +	 * shadow one half of a guest's page table (or 4MiB huge page) so the
> +	 * quadrant is just the parity of the index of the SPTE.
> +	 */
> +	if (role.has_4_byte_gpte) {
> +		BUG_ON(role.level != PG_LEVEL_4K);
> +		role.quadrant = (sptep - parent_sp->spt) % 2;
> +	}

This made me wonder whether role.quadrant can be dropped, because it seems
it can be calculated out of the box with has_4_byte_gpte, level and spte
offset.  I could have missed something, though..

> +
> +	return role;
> +}
> +
> +static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
> +						 u64 *sptep, gfn_t gfn,
> +						 bool direct, u32 access)
> +{
> +	union kvm_mmu_page_role role;
> +
> +	role = kvm_mmu_child_role(sptep, direct, access);
> +	return kvm_mmu_get_page(vcpu, gfn, role);

Nit: it looks nicer to just drop the temp var?

        return kvm_mmu_get_page(vcpu, gfn,
                                kvm_mmu_child_role(sptep, direct, access));

Thanks,
David Matlack March 22, 2022, 6:30 p.m. UTC | #2
On Tue, Mar 15, 2022 at 1:15 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Mar 11, 2022 at 12:25:05AM +0000, David Matlack wrote:
> > Instead of computing the shadow page role from scratch for every new
> > page, we can derive most of the information from the parent shadow page.
> > This avoids redundant calculations and reduces the number of parameters
> > to kvm_mmu_get_page().
> >
> > Preemptively split out the role calculation to a separate function for
> > use in a following commit.
> >
> > No functional change intended.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
>
> Looks right..
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Two more comments/questions below.
>
> > +static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access)
> > +{
> > +     struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
> > +     union kvm_mmu_page_role role;
> > +
> > +     role = parent_sp->role;
> > +     role.level--;
> > +     role.access = access;
> > +     role.direct = direct;
> > +
> > +     /*
> > +      * If the guest has 4-byte PTEs then that means it's using 32-bit,
> > +      * 2-level, non-PAE paging. KVM shadows such guests using 4 PAE page
> > +      * directories, each mapping 1/4 of the guest's linear address space
> > +      * (1GiB). The shadow pages for those 4 page directories are
> > +      * pre-allocated and assigned a separate quadrant in their role.
> > +      *
> > +      * Since we are allocating a child shadow page and there are only 2
> > +      * levels, this must be a PG_LEVEL_4K shadow page. Here the quadrant
> > +      * will either be 0 or 1 because it maps 1/2 of the address space mapped
> > +      * by the guest's PG_LEVEL_4K page table (or 4MiB huge page) that it
> > +      * is shadowing. In this case, the quadrant can be derived by the index
> > +      * of the SPTE that points to the new child shadow page in the page
> > +      * directory (parent_sp). Specifically, every 2 SPTEs in parent_sp
> > +      * shadow one half of a guest's page table (or 4MiB huge page) so the
> > +      * quadrant is just the parity of the index of the SPTE.
> > +      */
> > +     if (role.has_4_byte_gpte) {
> > +             BUG_ON(role.level != PG_LEVEL_4K);
> > +             role.quadrant = (sptep - parent_sp->spt) % 2;
> > +     }
>
> This made me wonder whether role.quadrant can be dropped, because it seems
> it can be calculated out of the box with has_4_byte_gpte, level and spte
> offset.  I could have missed something, though..

I think you're right that we could compute it on-the-fly. But it'd be
non-trivial to remove since it's currently used to ensure the sp->role
and sp->gfn uniquely identifies each shadow page (e.g. when checking
for collisions in the mmu_page_hash).

>
> > +
> > +     return role;
> > +}
> > +
> > +static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
> > +                                              u64 *sptep, gfn_t gfn,
> > +                                              bool direct, u32 access)
> > +{
> > +     union kvm_mmu_page_role role;
> > +
> > +     role = kvm_mmu_child_role(sptep, direct, access);
> > +     return kvm_mmu_get_page(vcpu, gfn, role);
>
> Nit: it looks nicer to just drop the temp var?
>
>         return kvm_mmu_get_page(vcpu, gfn,
>                                 kvm_mmu_child_role(sptep, direct, access));

Yeah that's simpler. I just have an aversion to line wrapping :)

>
> Thanks,
>
> --
> Peter Xu
>
Peter Xu March 30, 2022, 2:25 p.m. UTC | #3
On Tue, Mar 22, 2022 at 11:30:07AM -0700, David Matlack wrote:
> > > +static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access)
> > > +{
> > > +     struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
> > > +     union kvm_mmu_page_role role;
> > > +
> > > +     role = parent_sp->role;
> > > +     role.level--;
> > > +     role.access = access;
> > > +     role.direct = direct;
> > > +
> > > +     /*
> > > +      * If the guest has 4-byte PTEs then that means it's using 32-bit,
> > > +      * 2-level, non-PAE paging. KVM shadows such guests using 4 PAE page
> > > +      * directories, each mapping 1/4 of the guest's linear address space
> > > +      * (1GiB). The shadow pages for those 4 page directories are
> > > +      * pre-allocated and assigned a separate quadrant in their role.
> > > +      *
> > > +      * Since we are allocating a child shadow page and there are only 2
> > > +      * levels, this must be a PG_LEVEL_4K shadow page. Here the quadrant
> > > +      * will either be 0 or 1 because it maps 1/2 of the address space mapped
> > > +      * by the guest's PG_LEVEL_4K page table (or 4MiB huge page) that it
> > > +      * is shadowing. In this case, the quadrant can be derived by the index
> > > +      * of the SPTE that points to the new child shadow page in the page
> > > +      * directory (parent_sp). Specifically, every 2 SPTEs in parent_sp
> > > +      * shadow one half of a guest's page table (or 4MiB huge page) so the
> > > +      * quadrant is just the parity of the index of the SPTE.
> > > +      */
> > > +     if (role.has_4_byte_gpte) {
> > > +             BUG_ON(role.level != PG_LEVEL_4K);
> > > +             role.quadrant = (sptep - parent_sp->spt) % 2;
> > > +     }
> >
> > This made me wonder whether role.quadrant can be dropped, because it seems
> > it can be calculated out of the box with has_4_byte_gpte, level and spte
> > offset.  I could have missed something, though..
> 
> I think you're right that we could compute it on-the-fly. But it'd be
> non-trivial to remove since it's currently used to ensure the sp->role
> and sp->gfn uniquely identifies each shadow page (e.g. when checking
> for collisions in the mmu_page_hash).

Makes sense.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 146df73a982e..23c2004c6435 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2027,30 +2027,14 @@  static void clear_sp_write_flooding_count(u64 *spte)
 	__clear_sp_write_flooding_count(sptep_to_sp(spte));
 }
 
-static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
-					     gfn_t gfn,
-					     gva_t gaddr,
-					     unsigned level,
-					     bool direct,
-					     unsigned int access)
+static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn,
+					     union kvm_mmu_page_role role)
 {
-	union kvm_mmu_page_role role;
 	struct hlist_head *sp_list;
-	unsigned quadrant;
 	struct kvm_mmu_page *sp;
 	int collisions = 0;
 	LIST_HEAD(invalid_list);
 
-	role = vcpu->arch.mmu->mmu_role.base;
-	role.level = level;
-	role.direct = direct;
-	role.access = access;
-	if (role.has_4_byte_gpte) {
-		quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
-		quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
-		role.quadrant = quadrant;
-	}
-
 	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
 	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
 		if (sp->gfn != gfn) {
@@ -2068,7 +2052,7 @@  static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 			 * Unsync pages must not be left as is, because the new
 			 * upper-level page will be write-protected.
 			 */
-			if (level > PG_LEVEL_4K && sp->unsync)
+			if (role.level > PG_LEVEL_4K && sp->unsync)
 				kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
 							 &invalid_list);
 			continue;
@@ -2107,14 +2091,14 @@  static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 
 	++vcpu->kvm->stat.mmu_cache_miss;
 
-	sp = kvm_mmu_alloc_page(vcpu, direct);
+	sp = kvm_mmu_alloc_page(vcpu, role.direct);
 
 	sp->gfn = gfn;
 	sp->role = role;
 	hlist_add_head(&sp->hash_link, sp_list);
-	if (!direct) {
+	if (!role.direct) {
 		account_shadowed(vcpu->kvm, sp);
-		if (level == PG_LEVEL_4K && kvm_vcpu_write_protect_gfn(vcpu, gfn))
+		if (role.level == PG_LEVEL_4K && kvm_vcpu_write_protect_gfn(vcpu, gfn))
 			kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);
 	}
 	trace_kvm_mmu_get_page(sp, true);
@@ -2126,6 +2110,51 @@  static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	return sp;
 }
 
+static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access)
+{
+	struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
+	union kvm_mmu_page_role role;
+
+	role = parent_sp->role;
+	role.level--;
+	role.access = access;
+	role.direct = direct;
+
+	/*
+	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
+	 * 2-level, non-PAE paging. KVM shadows such guests using 4 PAE page
+	 * directories, each mapping 1/4 of the guest's linear address space
+	 * (1GiB). The shadow pages for those 4 page directories are
+	 * pre-allocated and assigned a separate quadrant in their role.
+	 *
+	 * Since we are allocating a child shadow page and there are only 2
+	 * levels, this must be a PG_LEVEL_4K shadow page. Here the quadrant
+	 * will either be 0 or 1 because it maps 1/2 of the address space mapped
+	 * by the guest's PG_LEVEL_4K page table (or 4MiB huge page) that it
+	 * is shadowing. In this case, the quadrant can be derived by the index
+	 * of the SPTE that points to the new child shadow page in the page
+	 * directory (parent_sp). Specifically, every 2 SPTEs in parent_sp
+	 * shadow one half of a guest's page table (or 4MiB huge page) so the
+	 * quadrant is just the parity of the index of the SPTE.
+	 */
+	if (role.has_4_byte_gpte) {
+		BUG_ON(role.level != PG_LEVEL_4K);
+		role.quadrant = (sptep - parent_sp->spt) % 2;
+	}
+
+	return role;
+}
+
+static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
+						 u64 *sptep, gfn_t gfn,
+						 bool direct, u32 access)
+{
+	union kvm_mmu_page_role role;
+
+	role = kvm_mmu_child_role(sptep, direct, access);
+	return kvm_mmu_get_page(vcpu, gfn, role);
+}
+
 static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
 					struct kvm_vcpu *vcpu, hpa_t root,
 					u64 addr)
@@ -2930,8 +2959,7 @@  static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		if (is_shadow_present_pte(*it.sptep))
 			continue;
 
-		sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr,
-				      it.level - 1, true, ACC_ALL);
+		sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL);
 
 		link_shadow_page(vcpu, it.sptep, sp);
 		if (fault->is_tdp && fault->huge_page_disallowed &&
@@ -3316,9 +3344,22 @@  static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
 			    u8 level, bool direct)
 {
+	union kvm_mmu_page_role role;
 	struct kvm_mmu_page *sp;
+	unsigned int quadrant;
+
+	role = vcpu->arch.mmu->mmu_role.base;
+	role.level = level;
+	role.direct = direct;
+	role.access = ACC_ALL;
+
+	if (role.has_4_byte_gpte) {
+		quadrant = gva >> (PAGE_SHIFT + (PT64_PT_BITS * level));
+		quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
+		role.quadrant = quadrant;
+	}
 
-	sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL);
+	sp = kvm_mmu_get_page(vcpu, gfn, role);
 	++sp->root_count;
 
 	return __pa(sp->spt);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 252c77805eb9..c3909a07e938 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -683,8 +683,9 @@  static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
 			access = gw->pt_access[it.level - 2];
-			sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr,
-					      it.level-1, false, access);
+			sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn,
+						  false, access);
+
 			/*
 			 * We must synchronize the pagetable before linking it
 			 * because the guest doesn't need to flush tlb when
@@ -740,8 +741,8 @@  static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		drop_large_spte(vcpu, it.sptep);
 
 		if (!is_shadow_present_pte(*it.sptep)) {
-			sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr,
-					      it.level - 1, true, direct_access);
+			sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn,
+						  true, direct_access);
 			link_shadow_page(vcpu, it.sptep, sp);
 			if (fault->huge_page_disallowed &&
 			    fault->req_level >= it.level)