diff mbox series

[RFC,3/6] KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault

Message ID 20210813203504.2742757-4-dmatlack@google.com (mailing list archive)
State New, archived
Headers show
Series Pass memslot around during page fault handling | expand

Commit Message

David Matlack Aug. 13, 2021, 8:35 p.m. UTC
The memslot for the faulting gfn is used throughout the page fault
handling code, so capture it in kvm_page_fault as soon as we know the
gfn and use it in the page fault handling code that has direct access
to the kvm_page_fault struct.

This, in combination with the subsequent patch, improves "Populate
memory time" in dirty_log_perf_test by 5% when using the legacy MMU.
There is no discerable improvement to the performance of the TDP MMU.

No functional change intended.

Suggested-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu.h             |  3 +++
 arch/x86/kvm/mmu/mmu.c         | 27 +++++++++------------------
 arch/x86/kvm/mmu/paging_tmpl.h |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c     | 10 +++++-----
 4 files changed, 19 insertions(+), 23 deletions(-)

Comments

Paolo Bonzini Aug. 17, 2021, 1 p.m. UTC | #1
On 13/08/21 22:35, David Matlack wrote:
> -	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
> -		/*
> -		 * The gfn of direct spte is stable since it is
> -		 * calculated by sp->gfn.
> -		 */
> -		gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
> -		kvm_vcpu_mark_page_dirty(vcpu, gfn);
> -	}
> +	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
> +		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);

Oops, this actually needs kvm_vcpu_mark_page_dirty to receive the slot.

Paolo
David Matlack Aug. 17, 2021, 4:13 p.m. UTC | #2
On Tue, Aug 17, 2021 at 6:00 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 13/08/21 22:35, David Matlack wrote:
> > -     if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
> > -             /*
> > -              * The gfn of direct spte is stable since it is
> > -              * calculated by sp->gfn.
> > -              */
> > -             gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
> > -             kvm_vcpu_mark_page_dirty(vcpu, gfn);
> > -     }
> > +     if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
> > +             mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
>
> Oops, this actually needs kvm_vcpu_mark_page_dirty to receive the slot.

What do you mean? kvm_vcpu_mark_page_dirty ultimately just calls
mark_page_dirty_in_slot.

>
> Paolo
>
Paolo Bonzini Aug. 17, 2021, 5:02 p.m. UTC | #3
On 17/08/21 18:13, David Matlack wrote:
> On Tue, Aug 17, 2021 at 6:00 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 13/08/21 22:35, David Matlack wrote:
>>> -     if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
>>> -             /*
>>> -              * The gfn of direct spte is stable since it is
>>> -              * calculated by sp->gfn.
>>> -              */
>>> -             gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
>>> -             kvm_vcpu_mark_page_dirty(vcpu, gfn);
>>> -     }
>>> +     if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
>>> +             mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
>>
>> Oops, this actually needs kvm_vcpu_mark_page_dirty to receive the slot.
> 
> What do you mean? kvm_vcpu_mark_page_dirty ultimately just calls
> mark_page_dirty_in_slot.

Yeah, I was thinking of some very old version of the dirty page ring 
buffer patches.  What I wrote makes no sense. :)

Paolo
Sean Christopherson Aug. 19, 2021, 4:37 p.m. UTC | #4
On Fri, Aug 13, 2021, David Matlack wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3352312ab1c9..fb2c95e8df00 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2890,7 +2890,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  
>  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
> -	struct kvm_memory_slot *slot;
> +	struct kvm_memory_slot *slot = fault->slot;
>  	kvm_pfn_t mask;
>  
>  	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
> @@ -2901,8 +2901,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
>  		return;
>  
> -	slot = gfn_to_memslot_dirty_bitmap(vcpu, fault->gfn, true);
> -	if (!slot)
> +	if (kvm_slot_dirty_track_enabled(slot))

This is unnecessarily obfuscated.  It relies on the is_error_noslot_pfn() to
ensure fault->slot is valid, but the only reason that helper is used is because
it was the most efficient code when slot wasn't available.  IMO, this would be
better:

	if (!slot || kvm_slot_dirty_track_enabled(slot))
		return;

	if (kvm_is_reserved_pfn(fault->pfn))
		return;

On a related topic, a good follow-up to this series would be to pass @fault into
the prefetch helpers, and modify the prefetch logic to re-use fault->slot and
refuse to prefetch across memslot boundaries.  That would eliminate all users of
gfn_to_memslot_dirty_bitmap() and allow us to drop that abomination.
David Matlack Aug. 20, 2021, 10:54 p.m. UTC | #5
On Thu, Aug 19, 2021 at 9:37 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Aug 13, 2021, David Matlack wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 3352312ab1c9..fb2c95e8df00 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2890,7 +2890,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> >
> >  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >  {
> > -     struct kvm_memory_slot *slot;
> > +     struct kvm_memory_slot *slot = fault->slot;
> >       kvm_pfn_t mask;
> >
> >       fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
> > @@ -2901,8 +2901,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
> >       if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
> >               return;
> >
> > -     slot = gfn_to_memslot_dirty_bitmap(vcpu, fault->gfn, true);
> > -     if (!slot)
> > +     if (kvm_slot_dirty_track_enabled(slot))
>
> This is unnecessarily obfuscated.

Ugh. It's pure luck too. I meant to check if the slot is null here.

> It relies on the is_error_noslot_pfn() to
> ensure fault->slot is valid, but the only reason that helper is used is because
> it was the most efficient code when slot wasn't available.  IMO, this would be
> better:
>
>         if (!slot || kvm_slot_dirty_track_enabled(slot))
>                 return;
>
>         if (kvm_is_reserved_pfn(fault->pfn))
>                 return;

That looks reasonable to me. I can send a patch next week with this change.

>
> On a related topic, a good follow-up to this series would be to pass @fault into
> the prefetch helpers, and modify the prefetch logic to re-use fault->slot and
> refuse to prefetch across memslot boundaries.  That would eliminate all users of
> gfn_to_memslot_dirty_bitmap() and allow us to drop that abomination.
Sean Christopherson Aug. 20, 2021, 11:02 p.m. UTC | #6
On Fri, Aug 20, 2021, David Matlack wrote:
> On Thu, Aug 19, 2021 at 9:37 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Aug 13, 2021, David Matlack wrote:
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 3352312ab1c9..fb2c95e8df00 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -2890,7 +2890,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> > >
> > >  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > >  {
> > > -     struct kvm_memory_slot *slot;
> > > +     struct kvm_memory_slot *slot = fault->slot;
> > >       kvm_pfn_t mask;
> > >
> > >       fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
> > > @@ -2901,8 +2901,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
> > >       if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
> > >               return;
> > >
> > > -     slot = gfn_to_memslot_dirty_bitmap(vcpu, fault->gfn, true);
> > > -     if (!slot)
> > > +     if (kvm_slot_dirty_track_enabled(slot))
> >
> > This is unnecessarily obfuscated.
> 
> Ugh. It's pure luck too. I meant to check if the slot is null here.

Ha, better to be lucky than good ;-)
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2c726b255fa8..8d13333f0345 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -158,6 +158,9 @@  struct kvm_page_fault {
 	/* Shifted addr, or result of guest page table walk if addr is a gva.  */
 	gfn_t gfn;
 
+	/* The memslot containing gfn. May be NULL. */
+	struct kvm_memory_slot *slot;
+
 	/* Outputs of kvm_faultin_pfn.  */
 	kvm_pfn_t pfn;
 	hva_t hva;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3352312ab1c9..fb2c95e8df00 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2890,7 +2890,7 @@  int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	struct kvm_memory_slot *slot;
+	struct kvm_memory_slot *slot = fault->slot;
 	kvm_pfn_t mask;
 
 	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
@@ -2901,8 +2901,7 @@  void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
 		return;
 
-	slot = gfn_to_memslot_dirty_bitmap(vcpu, fault->gfn, true);
-	if (!slot)
+	if (kvm_slot_dirty_track_enabled(slot))
 		return;
 
 	/*
@@ -3076,13 +3075,9 @@  static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
  * someone else modified the SPTE from its original value.
  */
 static bool
-fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			u64 *sptep, u64 old_spte, u64 new_spte)
 {
-	gfn_t gfn;
-
-	WARN_ON(!sp->role.direct);
-
 	/*
 	 * Theoretically we could also set dirty bit (and flush TLB) here in
 	 * order to eliminate unnecessary PML logging. See comments in
@@ -3098,14 +3093,8 @@  fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if (cmpxchg64(sptep, old_spte, new_spte) != old_spte)
 		return false;
 
-	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
-		/*
-		 * The gfn of direct spte is stable since it is
-		 * calculated by sp->gfn.
-		 */
-		gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-		kvm_vcpu_mark_page_dirty(vcpu, gfn);
-	}
+	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
+		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
 
 	return true;
 }
@@ -3233,7 +3222,7 @@  static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * since the gfn is not stable for indirect shadow page. See
 		 * Documentation/virt/kvm/locking.rst to get more detail.
 		 */
-		if (fast_pf_fix_direct_spte(vcpu, sp, sptep, spte, new_spte)) {
+		if (fast_pf_fix_direct_spte(vcpu, fault, sptep, spte, new_spte)) {
 			ret = RET_PF_FIXED;
 			break;
 		}
@@ -3823,7 +3812,7 @@  static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 
 static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
 {
-	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
+	struct kvm_memory_slot *slot = fault->slot;
 	bool async;
 
 	/*
@@ -3888,6 +3877,8 @@  static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	int r;
 
 	fault->gfn = fault->addr >> PAGE_SHIFT;
+	fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
+
 	if (page_fault_handle_page_track(vcpu, fault))
 		return RET_PF_EMULATE;
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f70afecbf3a2..50ade6450ace 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -847,6 +847,8 @@  static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	}
 
 	fault->gfn = walker.gfn;
+	fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
+
 	if (page_fault_handle_page_track(vcpu, fault)) {
 		shadow_page_table_clear_flood(vcpu, fault->addr);
 		return RET_PF_EMULATE;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 47ec9f968406..6f733a68d750 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -533,6 +533,7 @@  static inline bool tdp_mmu_set_spte_atomic_no_dirty_log(struct kvm *kvm,
  * TDP page fault.
  *
  * @vcpu: The vcpu instance that took the TDP page fault.
+ * @fault: The kvm_page_fault being resolved by this SPTE.
  * @iter: a tdp_iter instance currently on the SPTE that should be set
  * @new_spte: The value the SPTE should be set to
  *
@@ -540,6 +541,7 @@  static inline bool tdp_mmu_set_spte_atomic_no_dirty_log(struct kvm *kvm,
  *	    this function will have no side-effects.
  */
 static inline bool tdp_mmu_map_set_spte_atomic(struct kvm_vcpu *vcpu,
+					       struct kvm_page_fault *fault,
 					       struct tdp_iter *iter,
 					       u64 new_spte)
 {
@@ -553,12 +555,10 @@  static inline bool tdp_mmu_map_set_spte_atomic(struct kvm_vcpu *vcpu,
 	 * handle_changed_spte_dirty_log() to leverage vcpu->last_used_slot.
 	 */
 	if (is_writable_pte(new_spte)) {
-		struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, iter->gfn);
-
-		if (slot && kvm_slot_dirty_track_enabled(slot)) {
+		if (fault->slot && kvm_slot_dirty_track_enabled(fault->slot)) {
 			/* Enforced by kvm_mmu_hugepage_adjust. */
 			WARN_ON_ONCE(iter->level > PG_LEVEL_4K);
-			mark_page_dirty_in_slot(kvm, slot, iter->gfn);
+			mark_page_dirty_in_slot(kvm, fault->slot, iter->gfn);
 		}
 	}
 
@@ -934,7 +934,7 @@  static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 
 	if (new_spte == iter->old_spte)
 		ret = RET_PF_SPURIOUS;
-	else if (!tdp_mmu_map_set_spte_atomic(vcpu, iter, new_spte))
+	else if (!tdp_mmu_map_set_spte_atomic(vcpu, fault, iter, new_spte))
 		return RET_PF_RETRY;
 
 	/*