diff mbox series

[03/23] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions

Message ID 20220203010051.2813563-4-dmatlack@google.com (mailing list archive)
State New, archived
Headers show
Series Extend Eager Page Splitting to the shadow MMU | expand

Commit Message

David Matlack Feb. 3, 2022, 1 a.m. UTC
Decompose kvm_mmu_get_page() into separate helper functions to increase
readability and prepare for allocating shadow pages without a vcpu
pointer.

Specifically, pull the guts of kvm_mmu_get_page() into 3 helper
functions:

kvm_mmu_get_existing_sp_mabye_unsync() -
  Walks the page hash checking for any existing mmu pages that match the
  given gfn and role. Does not attempt to synchronize the page if it is
  unsync.

kvm_mmu_get_existing_sp() -
  Gets an existing page from the page hash if it exists and guarantees
  the page, if one is returned, is synced.  Implemented as a thin wrapper
  around kvm_mmu_get_existing_page_mabye_unsync. Requres access to a vcpu
  pointer in order to sync the page.

kvm_mmu_create_sp()
  Allocates an entirely new kvm_mmu_page. This currently requries a
  vcpu pointer for allocation and looking up the memslot but that will
  be removed in a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 132 ++++++++++++++++++++++++---------
 arch/x86/kvm/mmu/paging_tmpl.h |   5 +-
 arch/x86/kvm/mmu/spte.c        |   5 +-
 3 files changed, 101 insertions(+), 41 deletions(-)

Comments

Sean Christopherson Feb. 19, 2022, 1:25 a.m. UTC | #1
On Thu, Feb 03, 2022, David Matlack wrote:
> Decompose kvm_mmu_get_page() into separate helper functions to increase
> readability and prepare for allocating shadow pages without a vcpu
> pointer.
> 
> Specifically, pull the guts of kvm_mmu_get_page() into 3 helper
> functions:
> 
> kvm_mmu_get_existing_sp_mabye_unsync() -

Heh, this ain't Java.   Just add two underscores to whatever it's primary caller
ends up being named; that succinctly documents the relationship _and_ suggests
that there's some "danger" in using the inner helper.

>   Walks the page hash checking for any existing mmu pages that match the
>   given gfn and role. Does not attempt to synchronize the page if it is
>   unsync.
> 
> kvm_mmu_get_existing_sp() -

Meh.  We should really be able to distill this down to something like
kvm_mmu_find_sp().  I'm also tempted to say we go with shadow_page instead of
"sp" for these helpers, so long as the line lengths don't get too brutal.  KVM
uses "sp" and "spte" in lots of places, but I suspect it would be helpful to
KVM newbies if the core routines actually spell out shadow_page, a la
to_shadow_page().

>   Gets an existing page from the page hash if it exists and guarantees
>   the page, if one is returned, is synced.  Implemented as a thin wrapper
>   around kvm_mmu_get_existing_page_mabye_unsync. Requres access to a vcpu
>   pointer in order to sync the page.
> 
> kvm_mmu_create_sp()

Probably prefer s/create/alloc to match existing terminology for allocating roots.
Though looking through the series, there's going to be a lot of juggling of names.

It probably makes sense to figure out what names we want to end up with and then
work back from there.  I'll be back next week for a proper bikeshed session. :-)
David Matlack Feb. 24, 2022, 6:54 p.m. UTC | #2
On Fri, Feb 18, 2022 at 5:25 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Feb 03, 2022, David Matlack wrote:
> > Decompose kvm_mmu_get_page() into separate helper functions to increase
> > readability and prepare for allocating shadow pages without a vcpu
> > pointer.
> >
> > Specifically, pull the guts of kvm_mmu_get_page() into 3 helper
> > functions:
> >
> > kvm_mmu_get_existing_sp_mabye_unsync() -
>
> Heh, this ain't Java.   Just add two underscores to whatever it's primary caller
> ends up being named; that succinctly documents the relationship _and_ suggests
> that there's some "danger" in using the inner helper.
>
> >   Walks the page hash checking for any existing mmu pages that match the
> >   given gfn and role. Does not attempt to synchronize the page if it is
> >   unsync.
> >
> > kvm_mmu_get_existing_sp() -
>
> Meh.  We should really be able to distill this down to something like
> kvm_mmu_find_sp().  I'm also tempted to say we go with shadow_page instead of
> "sp" for these helpers, so long as the line lengths don't get too brutal.  KVM
> uses "sp" and "spte" in lots of places, but I suspect it would be helpful to
> KVM newbies if the core routines actually spell out shadow_page, a la
> to_shadow_page().

s/get_existing/find/ sounds good to me.

I'll play around with s/sp/shadow_page/ but I suspect it will make the
line lengths quite long. But if I also replace "maybe_unsync" with
double-underscores it might work out.

>
> >   Gets an existing page from the page hash if it exists and guarantees
> >   the page, if one is returned, is synced.  Implemented as a thin wrapper
> >   around kvm_mmu_get_existing_page_mabye_unsync. Requres access to a vcpu
> >   pointer in order to sync the page.
> >
> > kvm_mmu_create_sp()
>
> Probably prefer s/create/alloc to match existing terminology for allocating roots.
> Though looking through the series, there's going to be a lot of juggling of names.
>
> It probably makes sense to figure out what names we want to end up with and then
> work back from there.  I'll be back next week for a proper bikeshed session. :-)

kvm_mmu_create_sp() is temporary anyway. It goes away after patch 6
and we just have kvm_mmu_alloc_sp() and kvm_mmu_init_sp().

I'll see what I can do about using kvm_mmu_alloc_sp() as the temporary
name, but the next patch renames kvm_mmu_alloc_page() to
kvm_mmu_alloc_sp() so it will take some juggling for sure.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fc9a4d9c0ddd..24b3cf53aa12 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2045,16 +2045,25 @@  static void clear_sp_write_flooding_count(u64 *spte)
 	__clear_sp_write_flooding_count(sptep_to_sp(spte));
 }
 
-static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn,
-					     union kvm_mmu_page_role role)
+/*
+ * Looks up an existing SP for the given gfn and role. Makes no attempt to
+ * sync the SP if it is marked unsync.
+ *
+ * If creating an upper-level page table, zaps unsynced pages for the same
+ * gfn and adds them to the invalid_list. It's the callers responsibility
+ * to call kvm_mmu_commit_zap_page() on invalid_list.
+ */
+static struct kvm_mmu_page *kvm_mmu_get_existing_sp_maybe_unsync(struct kvm *kvm,
+								 gfn_t gfn,
+								 union kvm_mmu_page_role role,
+								 struct list_head *invalid_list)
 {
 	struct hlist_head *sp_list;
 	struct kvm_mmu_page *sp;
 	int collisions = 0;
-	LIST_HEAD(invalid_list);
 
-	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
-	for_each_valid_sp(vcpu->kvm, sp, sp_list) {
+	sp_list = &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
+	for_each_valid_sp(kvm, sp, sp_list) {
 		if (sp->gfn != gfn) {
 			collisions++;
 			continue;
@@ -2071,60 +2080,109 @@  static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn,
 			 * upper-level page will be write-protected.
 			 */
 			if (role.level > PG_LEVEL_4K && sp->unsync)
-				kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
-							 &invalid_list);
+				kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+
 			continue;
 		}
 
-		/* unsync and write-flooding only apply to indirect SPs. */
-		if (sp->role.direct)
-			goto trace_get_page;
+		/* Write-flooding is only tracked for indirect SPs. */
+		if (!sp->role.direct)
+			__clear_sp_write_flooding_count(sp);
 
-		if (sp->unsync) {
-			/*
-			 * The page is good, but is stale.  kvm_sync_page does
-			 * get the latest guest state, but (unlike mmu_unsync_children)
-			 * it doesn't write-protect the page or mark it synchronized!
-			 * This way the validity of the mapping is ensured, but the
-			 * overhead of write protection is not incurred until the
-			 * guest invalidates the TLB mapping.  This allows multiple
-			 * SPs for a single gfn to be unsync.
-			 *
-			 * If the sync fails, the page is zapped.  If so, break
-			 * in order to rebuild it.
-			 */
-			if (!kvm_sync_page(vcpu, sp, &invalid_list))
-				break;
+		goto out;
+	}
 
-			WARN_ON(!list_empty(&invalid_list));
-			kvm_flush_remote_tlbs(vcpu->kvm);
-		}
+	sp = NULL;
 
-		__clear_sp_write_flooding_count(sp);
+out:
+	if (collisions > kvm->stat.max_mmu_page_hash_collisions)
+		kvm->stat.max_mmu_page_hash_collisions = collisions;
+
+	return sp;
+}
 
-trace_get_page:
-		trace_kvm_mmu_get_page(sp, false);
+/*
+ * Looks up an existing SP for the given gfn and role if one exists. The
+ * return SP is guaranteed to be synced.
+ */
+static struct kvm_mmu_page *kvm_mmu_get_existing_sp(struct kvm_vcpu *vcpu,
+						    gfn_t gfn,
+						    union kvm_mmu_page_role role)
+{
+	struct kvm_mmu_page *sp;
+	LIST_HEAD(invalid_list);
+
+	sp = kvm_mmu_get_existing_sp_maybe_unsync(vcpu->kvm, gfn, role, &invalid_list);
+	if (!sp)
 		goto out;
+
+	if (sp->unsync) {
+		/*
+		 * The page is good, but is stale.  kvm_sync_page does
+		 * get the latest guest state, but (unlike mmu_unsync_children)
+		 * it doesn't write-protect the page or mark it synchronized!
+		 * This way the validity of the mapping is ensured, but the
+		 * overhead of write protection is not incurred until the
+		 * guest invalidates the TLB mapping.  This allows multiple
+		 * SPs for a single gfn to be unsync.
+		 *
+		 * If the sync fails, the page is zapped and added to the
+		 * invalid_list.
+		 */
+		if (!kvm_sync_page(vcpu, sp, &invalid_list)) {
+			sp = NULL;
+			goto out;
+		}
+
+		WARN_ON(!list_empty(&invalid_list));
+		kvm_flush_remote_tlbs(vcpu->kvm);
 	}
 
+out:
+	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+	return sp;
+}
+
+static struct kvm_mmu_page *kvm_mmu_create_sp(struct kvm_vcpu *vcpu,
+					      gfn_t gfn,
+					      union kvm_mmu_page_role role)
+{
+	struct kvm_mmu_page *sp;
+	struct hlist_head *sp_list;
+
 	++vcpu->kvm->stat.mmu_cache_miss;
 
 	sp = kvm_mmu_alloc_page(vcpu, role.direct);
-
 	sp->gfn = gfn;
 	sp->role = role;
+
+	sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
 	hlist_add_head(&sp->hash_link, sp_list);
+
 	if (!role.direct) {
 		account_shadowed(vcpu->kvm, sp);
 		if (role.level == PG_LEVEL_4K && kvm_vcpu_write_protect_gfn(vcpu, gfn))
 			kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);
 	}
-	trace_kvm_mmu_get_page(sp, true);
-out:
-	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
 
-	if (collisions > vcpu->kvm->stat.max_mmu_page_hash_collisions)
-		vcpu->kvm->stat.max_mmu_page_hash_collisions = collisions;
+	return sp;
+}
+
+static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn,
+					     union kvm_mmu_page_role role)
+{
+	struct kvm_mmu_page *sp;
+	bool created = false;
+
+	sp = kvm_mmu_get_existing_sp(vcpu, gfn, role);
+	if (sp)
+		goto out;
+
+	created = true;
+	sp = kvm_mmu_create_sp(vcpu, gfn, role);
+
+out:
+	trace_kvm_mmu_get_page(sp, created);
 	return sp;
 }
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f93d4423a067..c533c191925e 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -692,8 +692,9 @@  static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			 * the gpte is changed from non-present to present.
 			 * Otherwise, the guest may use the wrong mapping.
 			 *
-			 * For PG_LEVEL_4K, kvm_mmu_get_page() has already
-			 * synchronized it transiently via kvm_sync_page().
+			 * For PG_LEVEL_4K, kvm_mmu_get_existing_sp() has
+			 * already synchronized it transiently via
+			 * kvm_sync_page().
 			 *
 			 * For higher level pagetable, we synchronize it via
 			 * the slower mmu_sync_children().  If it needs to
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 8b5309faf5b9..20cf9e0d45dd 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -149,8 +149,9 @@  bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		/*
 		 * Optimization: for pte sync, if spte was writable the hash
 		 * lookup is unnecessary (and expensive). Write protection
-		 * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots.
-		 * Same reasoning can be applied to dirty page accounting.
+		 * is responsibility of kvm_mmu_create_sp() and
+		 * kvm_mmu_sync_roots(). Same reasoning can be applied to dirty
+		 * page accounting.
 		 */
 		if (is_writable_pte(old_spte))
 			goto out;