Message ID | 23b565dd3b3dfa20aea1c13bce01163f9427a237.1620200410.git.kai.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDP MMU: several minor fixes or improvements | expand |
On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote: > > Currently pf_fixed is increased even when page fault requires emulation, > or fault is spurious. Fix by only increasing it when return value is > RET_PF_FIXED. Revisiting __direct_map and mmu_set_spte, there are cases in the legacy MMU where RET_PF_EMULATE is returned but pf_fixed is still incremented. Perhaps it would make more sense to do the increment in the success case of tdp_mmu_set_spte_atomic as you suggested before. Sorry I didn't catch that earlier. It would probably also be worth putting a comment on pf_fixed so that people in the future know what it's supposed to mean and we don't get into archeology, reverse engineering the meaning of the stat again. > > Signed-off-by: Kai Huang <kai.huang@intel.com> > --- > arch/x86/kvm/mmu/tdp_mmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 1cad4c9f7c34..debe8c3ec844 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, > rcu_dereference(iter->sptep)); > } > > - if (!prefault) > + if (!prefault && ret == RET_PF_FIXED) > vcpu->stat.pf_fixed++; > > return ret; > -- > 2.31.1 >
On Wed, May 05, 2021, Kai Huang wrote: > Currently pf_fixed is increased even when page fault requires emulation, > or fault is spurious. Fix by only increasing it when return value is > RET_PF_FIXED. > > Signed-off-by: Kai Huang <kai.huang@intel.com> > --- > arch/x86/kvm/mmu/tdp_mmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 1cad4c9f7c34..debe8c3ec844 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, > rcu_dereference(iter->sptep)); > } > > - if (!prefault) > + if (!prefault && ret == RET_PF_FIXED) > vcpu->stat.pf_fixed++; Both this patch and the existing code are wrong to check prefault, and they both deviate from the legacy MMU (both TDP and shadow) for RET_PF_EMULATE. For prefault==true, KVM should indeed bump pf_fixed since "prefault" really means "async page fault completed". In that case, the original page fault from the guest was morphed into an async page fault and stat.pf_fixed was _not_ incremented because KVM hasn't actually fixed the fault. The "prefault" flag is there purely so that KVM doesn't injected a #PF into the guest in the case where the guest unmapped the gpa while the async page fault was being handled. For RET_PF_EMULATE, I could go either way. On one hand, KVM is installing a translation that accelerates future emulated MMIO, so it's kinda sorta fixing the page fault. On the other handle, future emulated MMIO still page faults, it just gets handled without going through the full page fault handler. If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should be done for all flavors of MMU. For this patch, the correct code is: if (ret != RET_PF_SPURIOUS) vcpu->stat.pf_fixed++; which works because "ret" cannot be RET_PF_RETRY. Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast page fault case. So, if we decide to fix/adjust RET_PF_EMULATE, I think it would make sense to handle stat.pf_fixed in a common location. The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong, but it's pretty much guaranteed to be a waste of time since prefetching only installs SPTEs if there is a backing memslot and PFN. Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change into a separate mini-series to address the other issues I pointed out. I was hoping I could paste patches for them inline and let you carry them, but moving stat.pf_fixed handling to a common location requires additional code shuffling because of async page faults :-/ Thanks! > return ret; > -- > 2.31.1 >
On Wed, May 05, 2021, Sean Christopherson wrote: > On Wed, May 05, 2021, Kai Huang wrote: > > Currently pf_fixed is increased even when page fault requires emulation, > > or fault is spurious. Fix by only increasing it when return value is > > RET_PF_FIXED. > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > --- > > arch/x86/kvm/mmu/tdp_mmu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > > index 1cad4c9f7c34..debe8c3ec844 100644 > > --- a/arch/x86/kvm/mmu/tdp_mmu.c > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > > @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, > > rcu_dereference(iter->sptep)); > > } > > > > - if (!prefault) > > + if (!prefault && ret == RET_PF_FIXED) > > vcpu->stat.pf_fixed++; > For RET_PF_EMULATE, I could go either way. On one hand, KVM is installing a > translation that accelerates future emulated MMIO, so it's kinda sorta fixing > the page fault. On the other handle, future emulated MMIO still page faults, it > just gets handled without going through the full page fault handler. Hrm, the other RET_PF_EMULATE case is when KVM creates a _new_ SPTE to handle a page fault, but installs a read-only SPTE on a write fault because the page is marked for write access tracking, e.g. for non-leaf guest page tables. Bumping pf_fixed is arguably correct in that case since KVM did fault in a page and from the guest's perspective the page fault was fixed, it's just that "fixing" the fault involved a bit of instruction emulation. > If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should > be done for all flavors of MMU. For this patch, the correct code is: > > if (ret != RET_PF_SPURIOUS) > vcpu->stat.pf_fixed++; > > which works because "ret" cannot be RET_PF_RETRY. > > Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast > page fault case. So, if we decide to fix/adjust RET_PF_EMULATE, I think it would > make sense to handle stat.pf_fixed in a common location. Blech. My original thought was to move the stat.pf_fixed logic all the way out to kvm_mmu_do_page_fault(), but that would do the wrong thing if pf_fixed is bumped on RET_PF_EMULATE and page_fault_handle_page_track() returns RET_PF_EMULATE. That fast path handles the case where the guest gets a !WRITABLE page fault on an PRESENT SPTE that KVM is write tracking. *sigh*. I'm leaning towards making RET_PF_EMULATE a modifier instead of a standalone type, which would allow more precise pf_fixed adjustments and would also let us clean up the EMULTYPE_ALLOW_RETRY_PF logic, which has a rather gross check for detecting MMIO page faults. > The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong, > but it's pretty much guaranteed to be a waste of time since prefetching only > installs SPTEs if there is a backing memslot and PFN. > > Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change > into a separate mini-series to address the other issues I pointed out. I was > hoping I could paste patches for them inline and let you carry them, but moving > stat.pf_fixed handling to a common location requires additional code shuffling > because of async page faults :-/ Cancel that idea, given the twisty mess of RET_PF_EMULATE it's probably best for you to just send a new version of your patch to make the TDP MMU pf_fixed behavior match the legacy MMU. It doesn't make sense to hold up a trivial fix just so I can scratch a refactoring itch :-)
> > > > > > - if (!prefault) > > > + if (!prefault && ret == RET_PF_FIXED) > > > vcpu->stat.pf_fixed++; > > For RET_PF_EMULATE, I could go either way. On one hand, KVM is installing a > > translation that accelerates future emulated MMIO, so it's kinda sorta fixing > > the page fault. On the other handle, future emulated MMIO still page faults, it > > just gets handled without going through the full page fault handler. > > Hrm, the other RET_PF_EMULATE case is when KVM creates a _new_ SPTE to handle a > page fault, but installs a read-only SPTE on a write fault because the page is > marked for write access tracking, e.g. for non-leaf guest page tables. Bumping > pf_fixed is arguably correct in that case since KVM did fault in a page and from > the guest's perspective the page fault was fixed, it's just that "fixing" the > fault involved a bit of instruction emulation. Yes this is exactly the case for video ram :) > > > If we do decide to omit RET_PF_EMULATE, it should be a separate patch and should > > be done for all flavors of MMU. For this patch, the correct code is: > > > > if (ret != RET_PF_SPURIOUS) > > vcpu->stat.pf_fixed++; > > > > which works because "ret" cannot be RET_PF_RETRY. > > > > Looking through the other code, KVM also fails to bump stat.pf_fixed in the fast > > page fault case. So, if we decide to fix/adjust RET_PF_EMULATE, I think it would > > make sense to handle stat.pf_fixed in a common location. > > Blech. My original thought was to move the stat.pf_fixed logic all the way out > to kvm_mmu_do_page_fault(), but that would do the wrong thing if pf_fixed is > bumped on RET_PF_EMULATE and page_fault_handle_page_track() returns RET_PF_EMULATE. > That fast path handles the case where the guest gets a !WRITABLE page fault on > an PRESENT SPTE that KVM is write tracking. *sigh*. > > I'm leaning towards making RET_PF_EMULATE a modifier instead of a standalone > type, which would allow more precise pf_fixed adjustments and would also let us > clean up the EMULTYPE_ALLOW_RETRY_PF logic, which has a rather gross check for > detecting MMIO page faults. > > > The legacy MMU also prefetches on RET_PF_EMULATE, which isn't technically wrong, > > but it's pretty much guaranteed to be a waste of time since prefetching only > > installs SPTEs if there is a backing memslot and PFN. > > > > Kai, if it's ok with you, I'll fold the above "ret != RET_PF_SPURIOUS" change > > into a separate mini-series to address the other issues I pointed out. I was > > hoping I could paste patches for them inline and let you carry them, but moving > > stat.pf_fixed handling to a common location requires additional code shuffling > > because of async page faults :-/ > > Cancel that idea, given the twisty mess of RET_PF_EMULATE it's probably best for > you to just send a new version of your patch to make the TDP MMU pf_fixed behavior > match the legacy MMU. It doesn't make sense to hold up a trivial fix just so I > can scratch a refactoring itch :-) OK. Either way is fine to me. I'll send a new one with your suggestion: if (ret != RET_PF_SPURIOUS) vcpu->stat.pf_fixed++; Thanks!
On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote: > On Wed, May 5, 2021 at 2:38 AM Kai Huang <kai.huang@intel.com> wrote: > > > > Currently pf_fixed is increased even when page fault requires emulation, > > or fault is spurious. Fix by only increasing it when return value is > > RET_PF_FIXED. > > Revisiting __direct_map and mmu_set_spte, there are cases in the > legacy MMU where RET_PF_EMULATE is returned but pf_fixed is still > incremented. > Perhaps it would make more sense to do the increment in the success > case of tdp_mmu_set_spte_atomic as you suggested before. Sorry I > didn't catch that earlier. If I understand correctly, Sean's suggestion: if (ret != RET_PF_SPURIOUS) vcpu->stat.pf_fixed++; can handle things correctly. The spurious fault check in existing code should work correctly -- it detects spurious fault early, but later it overwrites if emulation is required. So with above code, it should work consistently with legacy MMU behavior. Or did I miss anything? > > It would probably also be worth putting a comment on pf_fixed so that > people in the future know what it's supposed to mean and we don't get > into archeology, reverse engineering the meaning of the stat again. It seems the legacy MMU code path is a better place to add the comment to explain when pf_fixed should be increased. However I am not sure whether it is necessary for this patch (and I confess I found it's hard to explain why to increase pf_fixed in case of emulation :)). Or perhaps Sean can write a patch to add comment to legacy MMU :) I ended up with below, by adding a comment in TDP MMU saying "to make it consistent with legacy MMU...", and in the commit message, I put a lore link of this discussion, since I found Sean's explanation is quite useful. When people are interested in, they can do a git blame and find the commit msg of this change -- although it is not as straightforward as having comment directly. Is this OK to you? And Sean? ------------------------------------------------------------------------ Currently pf_fixed is not increased when prefault is true. This is not correct, since prefault here really means "async page fault completed". In that case, the original page fault from the guest was morphed into as async page fault and pf_fixed was not increased. So when prefault indicates async page fault is completed, pf_fixed should be increased. Additionally, currently pf_fixed is also increased even when page fault is spurious, while legacy MMU increases pf_fixed when page fault returns RET_PF_EMULATE or RET_PF_FIXED. To fix above two issues, change to increase pf_fixed when return value is not RET_PF_SPURIOUS (RET_PF_RETRY has already been ruled out by reaching here). More information: https://lore.kernel.org/kvm/cover.1620200410.git.kai.huang@intel.com/T/#mbb5f8083e58a2cd262231512b9211cbe70fc3bd5 Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler") Signed-off-by: Kai Huang <kai.huang@intel.com> --- arch/x86/kvm/mmu/tdp_mmu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1cad4c9f7c34..5e28fbabcd35 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -942,7 +942,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, rcu_dereference(iter->sptep)); } - if (!prefault) + /* + * Increase pf_fixed in both RET_PF_EMULATE and RET_PF_FIXED to be + * consistent with legacy MMU behavior. + */ + if (ret != RET_PF_SPURIOUS) vcpu->stat.pf_fixed++; return ret;
On Thu, May 06, 2021, Kai Huang wrote: > On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote: > > It would probably also be worth putting a comment on pf_fixed so that > > people in the future know what it's supposed to mean and we don't get > > into archeology, reverse engineering the meaning of the stat again. > > It seems the legacy MMU code path is a better place to add the comment to explain when > pf_fixed should be increased. However I am not sure whether it is necessary for this > patch (and I confess I found it's hard to explain why to increase pf_fixed in case of > emulation :)). Or perhaps Sean can write a patch to add comment to legacy MMU :) Ya, I think it makes sense to hold off on documenting the existing behavior in the TDP MMU. As is often the case in KVM, just because KVM has always done something one way, doesn't mean it's correct/ideal. But, bikeshedding over what faults exactly should count towards pf_fixed is best left to a separate patch. > I ended up with below, by adding a comment in TDP MMU saying "to make it consistent with > legacy MMU...", and in the commit message, I put a lore link of this discussion, since I > found Sean's explanation is quite useful. When people are interested in, they can do a git > blame and find the commit msg of this change -- although it is not as straightforward as > having comment directly. > > Is this OK to you? > > And Sean? Yep, works for me.
On Thu, 2021-05-06 at 15:29 +0000, Sean Christopherson wrote: > On Thu, May 06, 2021, Kai Huang wrote: > > On Wed, 2021-05-05 at 09:11 -0700, Ben Gardon wrote: > > > It would probably also be worth putting a comment on pf_fixed so that > > > people in the future know what it's supposed to mean and we don't get > > > into archeology, reverse engineering the meaning of the stat again. > > > > It seems the legacy MMU code path is a better place to add the comment to explain when > > pf_fixed should be increased. However I am not sure whether it is necessary for this > > patch (and I confess I found it's hard to explain why to increase pf_fixed in case of > > emulation :)). Or perhaps Sean can write a patch to add comment to legacy MMU :) > > Ya, I think it makes sense to hold off on documenting the existing behavior in > the TDP MMU. As is often the case in KVM, just because KVM has always done > something one way, doesn't mean it's correct/ideal. But, bikeshedding over what > faults exactly should count towards pf_fixed is best left to a separate patch. > > > I ended up with below, by adding a comment in TDP MMU saying "to make it consistent with > > legacy MMU...", and in the commit message, I put a lore link of this discussion, since I > > found Sean's explanation is quite useful. When people are interested in, they can do a git > > blame and find the commit msg of this change -- although it is not as straightforward as > > having comment directly. > > > > Is this OK to you? > > > > And Sean? > > Yep, works for me. Thanks!
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1cad4c9f7c34..debe8c3ec844 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -942,7 +942,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, rcu_dereference(iter->sptep)); } - if (!prefault) + if (!prefault && ret == RET_PF_FIXED) vcpu->stat.pf_fixed++; return ret;
Currently pf_fixed is increased even when page fault requires emulation, or fault is spurious. Fix by only increasing it when return value is RET_PF_FIXED. Signed-off-by: Kai Huang <kai.huang@intel.com> --- arch/x86/kvm/mmu/tdp_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)