[10/22] kvm: mmu: Add TDP MMU PF handler

Message ID	20200925212302.3979661-11-bgardon@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=qKOj=DC=vger.kernel.org=kvm-owner@kernel.org> Sender: "bgardon via sendgmr" <bgardon@bgardon.sea.corp.google.com> Date: Fri, 25 Sep 2020 14:22:50 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-11-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> Subject: [PATCH 10/22] kvm: mmu: Add TDP MMU PF handler From: Ben Gardon <bgardon@google.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews <cannonmatthews@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, Sean Christopherson <sean.j.christopherson@intel.com>, Peter Shier <pshier@google.com>, Peter Feiner <pfeiner@google.com>, Junaid Shahid <junaids@google.com>, Jim Mattson <jmattson@google.com>, Yulei Zhang <yulei.kernel@gmail.com>, Wanpeng Li <kernellwp@gmail.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Xiao Guangrong <xiaoguangrong.eric@gmail.com>, Ben Gardon <bgardon@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	Introduce the TDP MMU \| expand [00/22] Introduce the TDP MMU [01/22] kvm: mmu: Separate making SPTEs from set_spte [02/22] kvm: mmu: Introduce tdp_iter [03/22] kvm: mmu: Init / Uninit the TDP MMU [04/22] kvm: mmu: Allocate and free TDP MMU roots [05/22] kvm: mmu: Add functions to handle changed TDP SPTEs [06/22] kvm: mmu: Make address space ID a property of memslots [07/22] kvm: mmu: Support zapping SPTEs in the TDP MMU [08/22] kvm: mmu: Separate making non-leaf sptes from link_shadow_page [09/22] kvm: mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg [10/22] kvm: mmu: Add TDP MMU PF handler [11/22] kvm: mmu: Factor out allocating a new tdp_mmu_page [12/22] kvm: mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU [13/22] kvm: mmu: Support invalidate range MMU notifier for TDP MMU [14/22] kvm: mmu: Add access tracking for tdp_mmu [15/22] kvm: mmu: Support changed pte notifier in tdp MMU [16/22] kvm: mmu: Add dirty logging handler for changed sptes [17/22] kvm: mmu: Support dirty logging for the TDP MMU [18/22] kvm: mmu: Support disabling dirty logging for the tdp MMU [19/22] kvm: mmu: Support write protection for nesting in tdp MMU [20/22] kvm: mmu: NX largepage recovery for TDP MMU [21/22] kvm: mmu: Support MMIO in the TDP MMU [22/22] kvm: mmu: Don't clear write flooding count for direct roots

Ben Gardon Sept. 25, 2020, 9:22 p.m. UTC

Add functions to handle page faults in the TDP MMU. These page faults
are currently handled in much the same way as the x86 shadow paging
based MMU, however the ordering of some operations is slightly
different. Future patches will add eager NX splitting, a fast page fault
handler, and parallel page faults.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c          |  66 ++++++-----------
 arch/x86/kvm/mmu/mmu_internal.h |  45 +++++++++++
 arch/x86/kvm/mmu/tdp_mmu.c      | 127 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      |   4 +
 4 files changed, 200 insertions(+), 42 deletions(-)

Paolo Bonzini Sept. 26, 2020, 12:24 a.m. UTC | #1

On 25/09/20 23:22, Ben Gardon wrote:
>  
> -static bool is_nx_huge_page_enabled(void)
> +bool is_nx_huge_page_enabled(void)
>  {
>  	return READ_ONCE(nx_huge_pages);
>  }
> @@ -381,7 +361,7 @@ static inline u64 spte_shadow_dirty_mask(u64 spte)
>  	return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
>  }
>  
> -static inline bool is_access_track_spte(u64 spte)
> +inline bool is_access_track_spte(u64 spte)
>  {
>  	return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
>  }
> @@ -433,7 +413,7 @@ static u64 get_mmio_spte_generation(u64 spte)
>  	return gen;
>  }
>  
> -static u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
> +u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
>  {
>  
>  	u64 gen = kvm_vcpu_memslots(vcpu)->generation & MMIO_SPTE_GEN_MASK;
> @@ -613,7 +593,7 @@ int is_shadow_present_pte(u64 pte)
>  	return (pte != 0) && !is_mmio_spte(pte);
>  }
>  
> -static int is_large_pte(u64 pte)
> +int is_large_pte(u64 pte)
>  {
>  	return pte & PT_PAGE_SIZE_MASK;
>  }

All candidates for inlining too

(Also probably we'll create a common.c file for stuff that is common to
the shadow and TDP MMU, but that can come later).

Paolo

Sean Christopherson Sept. 30, 2020, 4:37 p.m. UTC | #2

On Fri, Sep 25, 2020 at 02:22:50PM -0700, Ben Gardon wrote:
> @@ -4113,8 +4088,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  	if (page_fault_handle_page_track(vcpu, error_code, gfn))
>  		return RET_PF_EMULATE;
>  
> -	if (fast_page_fault(vcpu, gpa, error_code))
> -		return RET_PF_RETRY;
> +	if (!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> +		if (fast_page_fault(vcpu, gpa, error_code))
> +			return RET_PF_RETRY;

It'll probably be easier to handle is_tdp_mmu() in fast_page_fault().

>  
>  	r = mmu_topup_memory_caches(vcpu, false);
>  	if (r)
> @@ -4139,8 +4115,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  	r = make_mmu_pages_available(vcpu);
>  	if (r)
>  		goto out_unlock;
> -	r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
> -			 prefault, is_tdp && lpage_disallowed);
> +
> +	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> +		r = kvm_tdp_mmu_page_fault(vcpu, write, map_writable, max_level,
> +					   gpa, pfn, prefault,
> +					   is_tdp && lpage_disallowed);
> +	else
> +		r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
> +				 prefault, is_tdp && lpage_disallowed);
>  
>  out_unlock:
>  	spin_unlock(&vcpu->kvm->mmu_lock);

...

> +/*
> + * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
> + * page tables and SPTEs to translate the faulting guest physical address.
> + */
> +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable,
> +			   int max_level, gpa_t gpa, kvm_pfn_t pfn,
> +			   bool prefault, bool account_disallowed_nx_lpage)
> +{
> +	struct tdp_iter iter;
> +	struct kvm_mmu_memory_cache *pf_pt_cache =
> +			&vcpu->arch.mmu_shadow_page_cache;
> +	u64 *child_pt;
> +	u64 new_spte;
> +	int ret;
> +	int as_id = kvm_arch_vcpu_memslots_id(vcpu);
> +	gfn_t gfn = gpa >> PAGE_SHIFT;
> +	int level;
> +
> +	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
> +		return RET_PF_RETRY;

I feel like we should kill off these silly WARNs in the existing code instead
of adding more.  If they actually fired, I'm pretty sure that they would
continue firing and spamming the kernel log until the VM is killed as I don't
see how restarting the guest will magically fix anything.

> +
> +	if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
> +		return RET_PF_RETRY;

This seems especially gratuitous, this has exactly one caller that explicitly
checks is_tdp_mmu_root().  Again, if this fires it will spam the kernel log
into submission.

> +
> +	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn);
> +
> +	for_each_tdp_pte_vcpu(iter, vcpu, gfn, gfn + 1) {
> +		disallowed_hugepage_adjust(iter.old_spte, gfn, iter.level,
> +					   &pfn, &level);
> +
> +		if (iter.level == level)
> +			break;
> +
> +		/*
> +		 * If there is an SPTE mapping a large page at a higher level
> +		 * than the target, that SPTE must be cleared and replaced
> +		 * with a non-leaf SPTE.
> +		 */
> +		if (is_shadow_present_pte(iter.old_spte) &&
> +		    is_large_pte(iter.old_spte)) {
> +			*iter.sptep = 0;
> +			handle_changed_spte(vcpu->kvm, as_id, iter.gfn,
> +					    iter.old_spte, 0, iter.level);
> +			kvm_flush_remote_tlbs_with_address(vcpu->kvm, iter.gfn,
> +					KVM_PAGES_PER_HPAGE(iter.level));
> +
> +			/*
> +			 * The iter must explicitly re-read the spte here
> +			 * because the new is needed before the next iteration
> +			 * of the loop.
> +			 */

I think it'd be better to explicitly, and simply, call out that iter.old_spte
is consumed below.  It's subtle enough to warrant a comment, but the comment
didn't actually help me.  Maybe something like:

			/*
			 * Refresh iter.old_spte, it will trigger the !present
			 * path below.
			 */

> +			iter.old_spte = READ_ONCE(*iter.sptep);
> +		}
> +
> +		if (!is_shadow_present_pte(iter.old_spte)) {
> +			child_pt = kvm_mmu_memory_cache_alloc(pf_pt_cache);
> +			clear_page(child_pt);
> +			new_spte = make_nonleaf_spte(child_pt,
> +						     !shadow_accessed_mask);
> +
> +			*iter.sptep = new_spte;
> +			handle_changed_spte(vcpu->kvm, as_id, iter.gfn,
> +					    iter.old_spte, new_spte,
> +					    iter.level);
> +		}
> +	}
> +
> +	if (WARN_ON(iter.level != level))
> +		return RET_PF_RETRY;

This also seems unnecessary.  Or maybe these are all good candiates for
KVM_BUG_ON...

> +
> +	ret = page_fault_handle_target_level(vcpu, write, map_writable,
> +					     as_id, &iter, pfn, prefault);
> +
> +	/* If emulating, flush this vcpu's TLB. */

Why?  It's obvious _what_ the code is doing, the comment should explain _why_.

> +	if (ret == RET_PF_EMULATE)
> +		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> +
> +	return ret;
> +}
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index cb86f9fe69017..abf23dc0ab7ad 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -14,4 +14,8 @@ void kvm_tdp_mmu_put_root_hpa(struct kvm *kvm, hpa_t root_hpa);
>  
>  bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end);
>  void kvm_tdp_mmu_zap_all(struct kvm *kvm);
> +
> +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable,
> +			   int level, gpa_t gpa, kvm_pfn_t pfn, bool prefault,
> +			   bool lpage_disallowed);
>  #endif /* __KVM_X86_MMU_TDP_MMU_H */
> -- 
> 2.28.0.709.gb0816b6eb0-goog
>

Paolo Bonzini Sept. 30, 2020, 4:55 p.m. UTC | #3

On 30/09/20 18:37, Sean Christopherson wrote:
>> +
>> +	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
>> +		return RET_PF_RETRY;
> I feel like we should kill off these silly WARNs in the existing code instead
> of adding more.  If they actually fired, I'm pretty sure that they would
> continue firing and spamming the kernel log until the VM is killed as I don't
> see how restarting the guest will magically fix anything.

This is true, but I think it's better to be defensive.  They're
certainly all candidates for KVM_BUG_ON.

Paolo

Paolo Bonzini Sept. 30, 2020, 5:37 p.m. UTC | #4

On 30/09/20 18:37, Sean Christopherson wrote:
>> +	ret = page_fault_handle_target_level(vcpu, write, map_writable,
>> +					     as_id, &iter, pfn, prefault);
>> +
>> +	/* If emulating, flush this vcpu's TLB. */
> Why?  It's obvious _what_ the code is doing, the comment should explain _why_.
> 
>> +	if (ret == RET_PF_EMULATE)
>> +		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>> +
>> +	return ret;
>> +}

In particular it seems to be only needed in this case...

+	/*
+	 * If the page fault was caused by a write but the page is write
+	 * protected, emulation is needed. If the emulation was skipped,
+	 * the vCPU would have the same fault again.
+	 */
+	if ((make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) && write)
+		ret = RET_PF_EMULATE;
+

... corresponding to this code in mmu.c

        if (set_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
                if (write_fault)
                        ret = RET_PF_EMULATE;
                kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
        }

So it should indeed be better to make the code in
page_fault_handle_target_level look the same as mmu/mmu.c.

Paolo

Ben Gardon Oct. 6, 2020, 10:33 p.m. UTC | #5

On Wed, Sep 30, 2020 at 9:37 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Fri, Sep 25, 2020 at 02:22:50PM -0700, Ben Gardon wrote:
> > @@ -4113,8 +4088,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> >       if (page_fault_handle_page_track(vcpu, error_code, gfn))
> >               return RET_PF_EMULATE;
> >
> > -     if (fast_page_fault(vcpu, gpa, error_code))
> > -             return RET_PF_RETRY;
> > +     if (!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> > +             if (fast_page_fault(vcpu, gpa, error_code))
> > +                     return RET_PF_RETRY;
>
> It'll probably be easier to handle is_tdp_mmu() in fast_page_fault().

I'd prefer to keep this check here because then in the fast page fault
path, we can just handle the case where we do have a tdp mmu root with
the tdp mmu fast pf handler and it'll mirror the split below with
__direct_map and the TDP MMU PF handler.

>
> >
> >       r = mmu_topup_memory_caches(vcpu, false);
> >       if (r)
> > @@ -4139,8 +4115,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> >       r = make_mmu_pages_available(vcpu);
> >       if (r)
> >               goto out_unlock;
> > -     r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
> > -                      prefault, is_tdp && lpage_disallowed);
> > +
> > +     if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> > +             r = kvm_tdp_mmu_page_fault(vcpu, write, map_writable, max_level,
> > +                                        gpa, pfn, prefault,
> > +                                        is_tdp && lpage_disallowed);
> > +     else
> > +             r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
> > +                              prefault, is_tdp && lpage_disallowed);
> >
> >  out_unlock:
> >       spin_unlock(&vcpu->kvm->mmu_lock);
>
> ...
>
> > +/*
> > + * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
> > + * page tables and SPTEs to translate the faulting guest physical address.
> > + */
> > +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable,
> > +                        int max_level, gpa_t gpa, kvm_pfn_t pfn,
> > +                        bool prefault, bool account_disallowed_nx_lpage)
> > +{
> > +     struct tdp_iter iter;
> > +     struct kvm_mmu_memory_cache *pf_pt_cache =
> > +                     &vcpu->arch.mmu_shadow_page_cache;
> > +     u64 *child_pt;
> > +     u64 new_spte;
> > +     int ret;
> > +     int as_id = kvm_arch_vcpu_memslots_id(vcpu);
> > +     gfn_t gfn = gpa >> PAGE_SHIFT;
> > +     int level;
> > +
> > +     if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
> > +             return RET_PF_RETRY;
>
> I feel like we should kill off these silly WARNs in the existing code instead
> of adding more.  If they actually fired, I'm pretty sure that they would
> continue firing and spamming the kernel log until the VM is killed as I don't
> see how restarting the guest will magically fix anything.
>
> > +
> > +     if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
> > +             return RET_PF_RETRY;
>
> This seems especially gratuitous, this has exactly one caller that explicitly
> checks is_tdp_mmu_root().  Again, if this fires it will spam the kernel log
> into submission.
>
> > +
> > +     level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn);
> > +
> > +     for_each_tdp_pte_vcpu(iter, vcpu, gfn, gfn + 1) {
> > +             disallowed_hugepage_adjust(iter.old_spte, gfn, iter.level,
> > +                                        &pfn, &level);
> > +
> > +             if (iter.level == level)
> > +                     break;
> > +
> > +             /*
> > +              * If there is an SPTE mapping a large page at a higher level
> > +              * than the target, that SPTE must be cleared and replaced
> > +              * with a non-leaf SPTE.
> > +              */
> > +             if (is_shadow_present_pte(iter.old_spte) &&
> > +                 is_large_pte(iter.old_spte)) {
> > +                     *iter.sptep = 0;
> > +                     handle_changed_spte(vcpu->kvm, as_id, iter.gfn,
> > +                                         iter.old_spte, 0, iter.level);
> > +                     kvm_flush_remote_tlbs_with_address(vcpu->kvm, iter.gfn,
> > +                                     KVM_PAGES_PER_HPAGE(iter.level));
> > +
> > +                     /*
> > +                      * The iter must explicitly re-read the spte here
> > +                      * because the new is needed before the next iteration
> > +                      * of the loop.
> > +                      */
>
> I think it'd be better to explicitly, and simply, call out that iter.old_spte
> is consumed below.  It's subtle enough to warrant a comment, but the comment
> didn't actually help me.  Maybe something like:
>
>                         /*
>                          * Refresh iter.old_spte, it will trigger the !present
>                          * path below.
>                          */
>

That's a good point and calling out the relation to the present check
below is much clearer.


> > +                     iter.old_spte = READ_ONCE(*iter.sptep);
> > +             }
> > +
> > +             if (!is_shadow_present_pte(iter.old_spte)) {
> > +                     child_pt = kvm_mmu_memory_cache_alloc(pf_pt_cache);
> > +                     clear_page(child_pt);
> > +                     new_spte = make_nonleaf_spte(child_pt,
> > +                                                  !shadow_accessed_mask);
> > +
> > +                     *iter.sptep = new_spte;
> > +                     handle_changed_spte(vcpu->kvm, as_id, iter.gfn,
> > +                                         iter.old_spte, new_spte,
> > +                                         iter.level);
> > +             }
> > +     }
> > +
> > +     if (WARN_ON(iter.level != level))
> > +             return RET_PF_RETRY;
>
> This also seems unnecessary.  Or maybe these are all good candiates for
> KVM_BUG_ON...
>

I've replaced all these warnings with KVM_BUG_ONs.

> > +
> > +     ret = page_fault_handle_target_level(vcpu, write, map_writable,
> > +                                          as_id, &iter, pfn, prefault);
> > +
> > +     /* If emulating, flush this vcpu's TLB. */
>
> Why?  It's obvious _what_ the code is doing, the comment should explain _why_.
>
> > +     if (ret == RET_PF_EMULATE)
> > +             kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> > +
> > +     return ret;
> > +}
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> > index cb86f9fe69017..abf23dc0ab7ad 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.h
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> > @@ -14,4 +14,8 @@ void kvm_tdp_mmu_put_root_hpa(struct kvm *kvm, hpa_t root_hpa);
> >
> >  bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end);
> >  void kvm_tdp_mmu_zap_all(struct kvm *kvm);
> > +
> > +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable,
> > +                        int level, gpa_t gpa, kvm_pfn_t pfn, bool prefault,
> > +                        bool lpage_disallowed);
> >  #endif /* __KVM_X86_MMU_TDP_MMU_H */
> > --
> > 2.28.0.709.gb0816b6eb0-goog
> >

Ben Gardon Oct. 6, 2020, 10:35 p.m. UTC | #6

On Wed, Sep 30, 2020 at 10:38 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 30/09/20 18:37, Sean Christopherson wrote:
> >> +    ret = page_fault_handle_target_level(vcpu, write, map_writable,
> >> +                                         as_id, &iter, pfn, prefault);
> >> +
> >> +    /* If emulating, flush this vcpu's TLB. */
> > Why?  It's obvious _what_ the code is doing, the comment should explain _why_.
> >
> >> +    if (ret == RET_PF_EMULATE)
> >> +            kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> >> +
> >> +    return ret;
> >> +}
>
> In particular it seems to be only needed in this case...
>
> +       /*
> +        * If the page fault was caused by a write but the page is write
> +        * protected, emulation is needed. If the emulation was skipped,
> +        * the vCPU would have the same fault again.
> +        */
> +       if ((make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) && write)
> +               ret = RET_PF_EMULATE;
> +
>
> ... corresponding to this code in mmu.c
>
>         if (set_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
>                 if (write_fault)
>                         ret = RET_PF_EMULATE;
>                 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>         }
>
> So it should indeed be better to make the code in
> page_fault_handle_target_level look the same as mmu/mmu.c.

That's an excellent point. I've made an effort to make them more
similar. I think this difference arose from the synchronization
changes I was working back from, but this will be much more elegant in
either case.

>
> Paolo
>

Sean Christopherson Oct. 7, 2020, 8:55 p.m. UTC | #7

On Tue, Oct 06, 2020 at 03:33:21PM -0700, Ben Gardon wrote:
> On Wed, Sep 30, 2020 at 9:37 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Fri, Sep 25, 2020 at 02:22:50PM -0700, Ben Gardon wrote:
> > > @@ -4113,8 +4088,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> > >       if (page_fault_handle_page_track(vcpu, error_code, gfn))
> > >               return RET_PF_EMULATE;
> > >
> > > -     if (fast_page_fault(vcpu, gpa, error_code))
> > > -             return RET_PF_RETRY;
> > > +     if (!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> > > +             if (fast_page_fault(vcpu, gpa, error_code))
> > > +                     return RET_PF_RETRY;
> >
> > It'll probably be easier to handle is_tdp_mmu() in fast_page_fault().
> 
> I'd prefer to keep this check here because then in the fast page fault
> path, we can just handle the case where we do have a tdp mmu root with
> the tdp mmu fast pf handler and it'll mirror the split below with
> __direct_map and the TDP MMU PF handler.

Hmm, what about adding wrappers for these few cases where TDP MMU splits
cleanly from the existing paths?  The thought being that it would keep the
control flow somewhat straightforward, and might also help us keep the two
paths aligned (more below).

> > >
> > >       r = mmu_topup_memory_caches(vcpu, false);
> > >       if (r)
> > > @@ -4139,8 +4115,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> > >       r = make_mmu_pages_available(vcpu);
> > >       if (r)
> > >               goto out_unlock;
> > > -     r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
> > > -                      prefault, is_tdp && lpage_disallowed);
> > > +
> > > +     if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> > > +             r = kvm_tdp_mmu_page_fault(vcpu, write, map_writable, max_level,
> > > +                                        gpa, pfn, prefault,
> > > +                                        is_tdp && lpage_disallowed);
> > > +     else
> > > +             r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
> > > +                              prefault, is_tdp && lpage_disallowed);

Somewhat tangetially related to the above, it feels like the TDP MMU helper
here would be better named tdp_mmu_map() or so.  KVM has already done the
"fault" part, in that it has faulted in the page (if relevant) and obtained
a pfn.  What's left is the actual insertion into the TDP page tables.

And again related to the helper, ideally tdp_mmu_map() and __direct_map()
would have identical prototypes.  Ditto for the fast page fault paths.  In
theory, that would allow the compiler to generate identical preamble, with
only the final check being different.  And if the compiler isn't smart enough
to do that on its own, we might even make the wrapper non-inline, with an
"unlikely" annotation to coerce the compiler to generate a tail call for the
preferred path.

> > >
> > >  out_unlock:
> > >       spin_unlock(&vcpu->kvm->mmu_lock);

[10/22] kvm: mmu: Add TDP MMU PF handler

Commit Message

Comments

Patch