diff mbox series

[2/3] KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()

Message ID 23b565dd3b3dfa20aea1c13bce01163f9427a237.1620200410.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series TDP MMU: several minor fixes or improvements | expand

Commit Message

Kai Huang May 5, 2021, 9:37 a.m. UTC
Currently pf_fixed is increased even when page fault requires emulation,
or fault is spurious.  Fix by only increasing it when return value is
RET_PF_FIXED.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ben Gardon May 5, 2021, 4:11 p.m. UTC | #1
On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
>
> Currently pf_fixed is increased even when page fault requires emulation,
> or fault is spurious.  Fix by only increasing it when return value is
> RET_PF_FIXED.

Revisiting __direct_map and mmu_set_spte, there are cases in the
legacy MMU where RET_PF_EMULATE is returned but pf_fixed is still
incremented.
Perhaps it would make more sense to do the increment in the success
case of tdp_mmu_set_spte_atomic as you suggested before. Sorry I
didn't catch that earlier.

It would probably also be worth putting a comment on pf_fixed so that
people in the future know what it's supposed to mean and we don't get
into archeology, reverse engineering the meaning of the stat again.

>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1cad4c9f7c34..debe8c3ec844 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
>                                        rcu_dereference(iter->sptep));
>         }
>
> -       if (!prefault)
> +       if (!prefault && ret == RET_PF_FIXED)
>                 vcpu->stat.pf_fixed++;
>
>         return ret;
> --
> 2.31.1
>
Sean Christopherson May 5, 2021, 4:29 p.m. UTC | #2
On Wed, May 05, 2021, Kai Huang wrote:
> Currently pf_fixed is increased even when page fault requires emulation,
> or fault is spurious.  Fix by only increasing it when return value is
> RET_PF_FIXED.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1cad4c9f7c34..debe8c3ec844 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
>  				       rcu_dereference(iter->sptep));
>  	}
>  
> -	if (!prefault)
> +	if (!prefault && ret == RET_PF_FIXED)
>  		vcpu->stat.pf_fixed++;

Both this patch and the existing code are wrong to check prefault, and they both
deviate from the legacy MMU (both TDP and shadow) for RET_PF_EMULATE.

For prefault==true, KVM should indeed bump pf_fixed since "prefault" really means
"async page fault completed".  In that case, the original page fault from the
guest was morphed into an async page fault and stat.pf_fixed was _not_ incremented
because KVM hasn't actually fixed the fault.  The "prefault" flag is there
purely so that KVM doesn't injected a #PF into the guest in the case where the
guest unmapped the gpa while the async page fault was being handled.

For RET_PF_EMULATE, I could go either way.  On one hand, KVM is installing a
translation that accelerates future emulated MMIO, so it's kinda sorta fixing
the page fault.  On the other handle, future emulated MMIO still page faults, it
just gets handled without going through the full page fault handler.

If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should
be done for all flavors of MMU.  For this patch, the correct code is:

	if (ret != RET_PF_SPURIOUS)
		vcpu->stat.pf_fixed++;

which works because "ret" cannot be RET_PF_RETRY.

Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast
page fault case.  So, if we decide to fix/adjust RET_PF_EMULATE, I think it would
make sense to handle stat.pf_fixed in a common location.

The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong,
but it's pretty much guaranteed to be a waste of time since prefetching only
installs SPTEs if there is a backing memslot and PFN.

Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change
into a separate mini-series to address the other issues I pointed out.  I was
hoping I could paste patches for them inline and let you carry them, but moving
stat.pf_fixed handling to a common location requires additional code shuffling
because of async page faults :-/

Thanks!

>  	return ret;
> -- 
> 2.31.1
>
Sean Christopherson May 5, 2021, 5:16 p.m. UTC | #3
On Wed, May 05, 2021, Sean Christopherson wrote:
> On Wed, May 05, 2021, Kai Huang wrote:
> > Currently pf_fixed is increased even when page fault requires emulation,
> > or fault is spurious.  Fix by only increasing it when return value is
> > RET_PF_FIXED.
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 1cad4c9f7c34..debe8c3ec844 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
> >  				       rcu_dereference(iter->sptep));
> >  	}
> >  
> > -	if (!prefault)
> > +	if (!prefault && ret == RET_PF_FIXED)
> >  		vcpu->stat.pf_fixed++;
> For RET_PF_EMULATE, I could go either way.  On one hand, KVM is installing a
> translation that accelerates future emulated MMIO, so it's kinda sorta fixing
> the page fault.  On the other handle, future emulated MMIO still page faults, it
> just gets handled without going through the full page fault handler.

Hrm, the other RET_PF_EMULATE case is when KVM creates a _new_ SPTE to handle a
page fault, but installs a read-only SPTE on a write fault because the page is
marked for write access tracking, e.g. for non-leaf guest page tables.  Bumping
pf_fixed is arguably correct in that case since KVM did fault in a page and from
the guest's perspective the page fault was fixed, it's just that "fixing" the
fault involved a bit of instruction emulation.

> If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should
> be done for all flavors of MMU.  For this patch, the correct code is:
> 
> 	if (ret != RET_PF_SPURIOUS)
> 		vcpu->stat.pf_fixed++;
> 
> which works because "ret" cannot be RET_PF_RETRY.
> 
> Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast
> page fault case.  So, if we decide to fix/adjust RET_PF_EMULATE, I think it would
> make sense to handle stat.pf_fixed in a common location.

Blech.  My original thought was to move the stat.pf_fixed logic all the way out
to kvm_mmu_do_page_fault(), but that would do the wrong thing if pf_fixed is
bumped on RET_PF_EMULATE and page_fault_handle_page_track() returns RET_PF_EMULATE.
That fast path handles the case where the guest gets a !WRITABLE page fault on
an PRESENT SPTE that KVM is write tracking.  *sigh*.

I'm leaning towards making RET_PF_EMULATE a modifier instead of a standalone
type, which would allow more precise pf_fixed adjustments and would also let us
clean up the EMULTYPE_ALLOW_RETRY_PF logic, which has a rather gross check for
detecting MMIO page faults.

> The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong,
> but it's pretty much guaranteed to be a waste of time since prefetching only
> installs SPTEs if there is a backing memslot and PFN.
> 
> Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change
> into a separate mini-series to address the other issues I pointed out.  I was
> hoping I could paste patches for them inline and let you carry them, but moving
> stat.pf_fixed handling to a common location requires additional code shuffling
> because of async page faults :-/

Cancel that idea, given the twisty mess of RET_PF_EMULATE it's probably best for
you to just send a new version of your patch to make the TDP MMU pf_fixed behavior
match the legacy MMU.  It doesn't make sense to hold up a trivial fix just so I
can scratch a refactoring itch :-)
Kai Huang May 6, 2021, 1:51 a.m. UTC | #4
> > > 
> > > -	if (!prefault)
> > > +	if (!prefault && ret == RET_PF_FIXED)
> > >  		vcpu->stat.pf_fixed++;
> > For RET_PF_EMULATE, I could go either way.  On one hand, KVM is installing a
> > translation that accelerates future emulated MMIO, so it's kinda sorta fixing
> > the page fault.  On the other handle, future emulated MMIO still page faults, it
> > just gets handled without going through the full page fault handler.
> 
> Hrm, the other RET_PF_EMULATE case is when KVM creates a _new_ SPTE to handle a
> page fault, but installs a read-only SPTE on a write fault because the page is
> marked for write access tracking, e.g. for non-leaf guest page tables.  Bumping
> pf_fixed is arguably correct in that case since KVM did fault in a page and from
> the guest's perspective the page fault was fixed, it's just that "fixing" the
> fault involved a bit of instruction emulation.

Yes this is exactly the case for video ram :)

> 
> > If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should
> > be done for all flavors of MMU.  For this patch, the correct code is:
> > 
> > 	if (ret != RET_PF_SPURIOUS)
> > 		vcpu->stat.pf_fixed++;
> > 
> > which works because "ret" cannot be RET_PF_RETRY.
> > 
> > Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast
> > page fault case.  So, if we decide to fix/adjust RET_PF_EMULATE, I think it would
> > make sense to handle stat.pf_fixed in a common location.
> 
> Blech.  My original thought was to move the stat.pf_fixed logic all the way out
> to kvm_mmu_do_page_fault(), but that would do the wrong thing if pf_fixed is
> bumped on RET_PF_EMULATE and page_fault_handle_page_track() returns RET_PF_EMULATE.
> That fast path handles the case where the guest gets a !WRITABLE page fault on
> an PRESENT SPTE that KVM is write tracking.  *sigh*.
> 
> I'm leaning towards making RET_PF_EMULATE a modifier instead of a standalone
> type, which would allow more precise pf_fixed adjustments and would also let us
> clean up the EMULTYPE_ALLOW_RETRY_PF logic, which has a rather gross check for
> detecting MMIO page faults.
> 
> > The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong,
> > but it's pretty much guaranteed to be a waste of time since prefetching only
> > installs SPTEs if there is a backing memslot and PFN.
> > 
> > Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change
> > into a separate mini-series to address the other issues I pointed out.  I was
> > hoping I could paste patches for them inline and let you carry them, but moving
> > stat.pf_fixed handling to a common location requires additional code shuffling
> > because of async page faults :-/
> 
> Cancel that idea, given the twisty mess of RET_PF_EMULATE it's probably best for
> you to just send a new version of your patch to make the TDP MMU pf_fixed behavior
> match the legacy MMU.  It doesn't make sense to hold up a trivial fix just so I
> can scratch a refactoring itch :-)

OK. Either way is fine to me. I'll send a new one with your suggestion:

	if (ret != RET_PF_SPURIOUS)
 		vcpu->stat.pf_fixed++;

Thanks!
Kai Huang May 6, 2021, 7:51 a.m. UTC | #5
On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote:
> On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote:
> > 
> > Currently pf_fixed is increased even when page fault requires emulation,
> > or fault is spurious.  Fix by only increasing it when return value is
> > RET_PF_FIXED.
> 
> Revisiting __direct_map and mmu_set_spte, there are cases in the
> legacy MMU where RET_PF_EMULATE is returned but pf_fixed is still
> incremented.
> Perhaps it would make more sense to do the increment in the success
> case of tdp_mmu_set_spte_atomic as you suggested before. Sorry I
> didn't catch that earlier.

If I understand correctly, Sean's suggestion:

        if (ret != RET_PF_SPURIOUS)
                vcpu->stat.pf_fixed++;   

can handle things correctly. The spurious fault check in existing code should work
correctly -- it detects spurious fault early, but later it overwrites if emulation is
required. So with above code, it should work consistently with legacy MMU behavior.

Or did I miss anything?

> 
> It would probably also be worth putting a comment on pf_fixed so that
> people in the future know what it's supposed to mean and we don't get
> into archeology, reverse engineering the meaning of the stat again.

It seems the legacy MMU code path is a better place to add the comment to explain when
pf_fixed should be increased.  However I am not sure whether it is necessary for this
patch (and I confess I found it's hard to explain why to increase pf_fixed in case of
emulation :)).  Or perhaps Sean can write a patch to add comment to legacy MMU :)

I ended up with  below, by adding a comment in TDP MMU saying "to make it consistent with
legacy MMU...", and in the commit message, I put a lore link of this discussion, since I
found Sean's explanation is quite useful. When people are interested in, they can do a git
blame and find the commit msg of this change -- although it is not as straightforward as
having comment directly.

Is this OK to you?

And Sean?

------------------------------------------------------------------------

Currently pf_fixed is not increased when prefault is true.  This is not
correct, since prefault here really means "async page fault completed".
In that case, the original page fault from the guest was morphed into as
async page fault and pf_fixed was not increased.  So when prefault
indicates async page fault is completed, pf_fixed should be increased.

Additionally, currently pf_fixed is also increased even when page fault
is spurious, while legacy MMU increases pf_fixed when page fault returns
RET_PF_EMULATE or RET_PF_FIXED.

To fix above two issues, change to increase pf_fixed when return value
is not RET_PF_SPURIOUS (RET_PF_RETRY has already been ruled out by
reaching here).

More information:
https://lore.kernel.org/kvm/cover.1620200410.git.kai.huang@intel.com/T/#mbb5f8083e58a2cd262231512b9211cbe70fc3bd5

Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler")
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1cad4c9f7c34..5e28fbabcd35 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -942,7 +942,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int
write,
                                       rcu_dereference(iter->sptep));
        }

-       if (!prefault)
+       /*
+        * Increase pf_fixed in both RET_PF_EMULATE and RET_PF_FIXED to be
+        * consistent with legacy MMU behavior.
+        */
+       if (ret != RET_PF_SPURIOUS)
                vcpu->stat.pf_fixed++;

        return ret;
Sean Christopherson May 6, 2021, 3:29 p.m. UTC | #6
On Thu, May 06, 2021, Kai Huang wrote:
> On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote:
> > It would probably also be worth putting a comment on pf_fixed so that
> > people in the future know what it's supposed to mean and we don't get
> > into archeology, reverse engineering the meaning of the stat again.
> 
> It seems the legacy MMU code path is a better place to add the comment to explain when
> pf_fixed should be increased.  However I am not sure whether it is necessary for this
> patch (and I confess I found it's hard to explain why to increase pf_fixed in case of
> emulation :)).  Or perhaps Sean can write a patch to add comment to legacy MMU :)

Ya, I think it makes sense to hold off on documenting the existing behavior in
the TDP MMU.  As is often the case in KVM, just because KVM has always done
something one way, doesn't mean it's correct/ideal.  But, bikeshedding over what
faults exactly should count towards pf_fixed is best left to a separate patch.

> I ended up with  below, by adding a comment in TDP MMU saying "to make it consistent with
> legacy MMU...", and in the commit message, I put a lore link of this discussion, since I
> found Sean's explanation is quite useful. When people are interested in, they can do a git
> blame and find the commit msg of this change -- although it is not as straightforward as
> having comment directly.
> 
> Is this OK to you?
> 
> And Sean?

Yep, works for me.
Kai Huang May 6, 2021, 10:21 p.m. UTC | #7
On Thu, 2021-05-06 at 15:29 +0000, Sean Christopherson wrote:
> On Thu, May 06, 2021, Kai Huang wrote:
> > On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote:
> > > It would probably also be worth putting a comment on pf_fixed so that
> > > people in the future know what it's supposed to mean and we don't get
> > > into archeology, reverse engineering the meaning of the stat again.
> > 
> > It seems the legacy MMU code path is a better place to add the comment to explain when
> > pf_fixed should be increased.  However I am not sure whether it is necessary for this
> > patch (and I confess I found it's hard to explain why to increase pf_fixed in case of
> > emulation :)).  Or perhaps Sean can write a patch to add comment to legacy MMU :)
> 
> Ya, I think it makes sense to hold off on documenting the existing behavior in
> the TDP MMU.  As is often the case in KVM, just because KVM has always done
> something one way, doesn't mean it's correct/ideal.  But, bikeshedding over what
> faults exactly should count towards pf_fixed is best left to a separate patch.
> 
> > I ended up with  below, by adding a comment in TDP MMU saying "to make it consistent with
> > legacy MMU...", and in the commit message, I put a lore link of this discussion, since I
> > found Sean's explanation is quite useful. When people are interested in, they can do a git
> > blame and find the commit msg of this change -- although it is not as straightforward as
> > having comment directly.
> > 
> > Is this OK to you?
> > 
> > And Sean?
> 
> Yep, works for me.

Thanks!
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1cad4c9f7c34..debe8c3ec844 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -942,7 +942,7 @@  static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
 				       rcu_dereference(iter->sptep));
 	}
 
-	if (!prefault)
+	if (!prefault && ret == RET_PF_FIXED)
 		vcpu->stat.pf_fixed++;
 
 	return ret;