diff mbox series

[v5,7/7] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()

Message ID 20220930234854.1739690-8-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Apply NX mitigation more precisely | expand

Commit Message

Sean Christopherson Sept. 30, 2022, 11:48 p.m. UTC
From: Mingwei Zhang <mizhang@google.com>

Explicitly check if a NX huge page is disallowed when determining if a
page fault needs to be forced to use a smaller sized page.  KVM currently
assumes that the NX huge page mitigation is the only scenario where KVM
will force a shadow page instead of a huge page, and so unnecessarily
keeps an existing shadow page instead of replacing it with a huge page.

Any scenario that causes KVM to zap leaf SPTEs may result in having a SP
that can be made huge without violating the NX huge page mitigation.
E.g. prior to commit 5ba7c4c6d1c7 ("KVM: x86/MMU: Zap non-leaf SPTEs when
disabling dirty logging"), KVM would keep shadow pages after disabling
dirty logging due to a live migration being canceled, resulting in
degraded performance due to running with 4kb pages instead of huge pages.

Although the dirty logging case is "fixed", that fix is coincidental,
i.e. is an implementation detail, and there are other scenarios where KVM
will zap leaf SPTEs.  E.g. zapping leaf SPTEs in response to a host page
migration (mmu_notifier invalidation) to create a huge page would yield a
similar result; KVM would see the shadow-present non-leaf SPTE and assume
a huge page is disallowed.

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: use spte_to_child_sp(), massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Yan Zhao Oct. 10, 2022, 2:07 a.m. UTC | #1
On Fri, Sep 30, 2022 at 11:48:54PM +0000, Sean Christopherson wrote:
> From: Mingwei Zhang <mizhang@google.com>
> 
> Explicitly check if a NX huge page is disallowed when determining if a
> page fault needs to be forced to use a smaller sized page.  KVM currently
> assumes that the NX huge page mitigation is the only scenario where KVM
> will force a shadow page instead of a huge page, and so unnecessarily
> keeps an existing shadow page instead of replacing it with a huge page.
> 
> Any scenario that causes KVM to zap leaf SPTEs may result in having a SP
> that can be made huge without violating the NX huge page mitigation.
> E.g. prior to commit 5ba7c4c6d1c7 ("KVM: x86/MMU: Zap non-leaf SPTEs when
> disabling dirty logging"), KVM would keep shadow pages after disabling
> dirty logging due to a live migration being canceled, resulting in
> degraded performance due to running with 4kb pages instead of huge pages.
> 
> Although the dirty logging case is "fixed", that fix is coincidental,
> i.e. is an implementation detail, and there are other scenarios where KVM
> will zap leaf SPTEs.  E.g. zapping leaf SPTEs in response to a host page
> migration (mmu_notifier invalidation) to create a huge page would yield a
> similar result; KVM would see the shadow-present non-leaf SPTE and assume
> a huge page is disallowed.
> 
> Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> Reviewed-by: Ben Gardon <bgardon@google.com>
> Reviewed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> [sean: use spte_to_child_sp(), massage changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 54005b7f1499..688beed3a41e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3110,6 +3110,11 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>  	    cur_level == fault->goal_level &&
>  	    is_shadow_present_pte(spte) &&
>  	    !is_large_pte(spte)) {
> +		u64 page_mask;
> +
> +		if (!spte_to_child_sp(spte)->nx_huge_page_disallowed)
> +			return;
Merge this "if" with the upper level "if" ?

Thanks
Yan
> +
>  		/*
>  		 * A small SPTE exists for this pfn, but FNAME(fetch)
>  		 * and __direct_map would like to create a large PTE
> @@ -3117,8 +3122,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>  		 * patching back for them into pfn the next 9 bits of
>  		 * the address.
>  		 */
> -		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> -				KVM_PAGES_PER_HPAGE(cur_level - 1);
> +		page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> +			    KVM_PAGES_PER_HPAGE(cur_level - 1);
>  		fault->pfn |= fault->gfn & page_mask;
>  		fault->goal_level--;
>  	}
> -- 
> 2.38.0.rc1.362.ged0d419d3c-goog
>
Sean Christopherson Oct. 10, 2022, 4:31 p.m. UTC | #2
On Mon, Oct 10, 2022, Yan Zhao wrote:
> On Fri, Sep 30, 2022 at 11:48:54PM +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 54005b7f1499..688beed3a41e 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3110,6 +3110,11 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> >  	    cur_level == fault->goal_level &&
> >  	    is_shadow_present_pte(spte) &&
> >  	    !is_large_pte(spte)) {
> > +		u64 page_mask;
> > +
> > +		if (!spte_to_child_sp(spte)->nx_huge_page_disallowed)
> > +			return;
> Merge this "if" with the upper level "if" ?

Ya, good call.  The separate check is no longer necessary now that the superfluous
smp_rmb() is gone.  That'll minimize the diff to.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 54005b7f1499..688beed3a41e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3110,6 +3110,11 @@  void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    cur_level == fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte)) {
+		u64 page_mask;
+
+		if (!spte_to_child_sp(spte)->nx_huge_page_disallowed)
+			return;
+
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch)
 		 * and __direct_map would like to create a large PTE
@@ -3117,8 +3122,8 @@  void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 		 * patching back for them into pfn the next 9 bits of
 		 * the address.
 		 */
-		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
-				KVM_PAGES_PER_HPAGE(cur_level - 1);
+		page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
+			    KVM_PAGES_PER_HPAGE(cur_level - 1);
 		fault->pfn |= fault->gfn & page_mask;
 		fault->goal_level--;
 	}