diff mbox series

[RFC,07/19] KVM: x86/mmu: Factor wrprot for nested PML out of make_spte

Message ID 20211110223010.1392399-8-bgardon@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Optimize disabling dirty logging | expand

Commit Message

Ben Gardon Nov. 10, 2021, 10:29 p.m. UTC
When running a nested VM, KVM write protects SPTEs in the EPT/NPT02
instead of using PML for dirty tracking. This avoids expensive
translation later, when emptying the Page Modification Log. In service
of removing the vCPU pointer from make_spte, factor the check for nested
PML out of the function.


Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/spte.c | 10 +++++++---
 arch/x86/kvm/mmu/spte.h |  3 ++-
 2 files changed, 9 insertions(+), 4 deletions(-)

Comments

Sean Christopherson Nov. 18, 2021, 2:12 a.m. UTC | #1
On Wed, Nov 10, 2021, Ben Gardon wrote:
> When running a nested VM, KVM write protects SPTEs in the EPT/NPT02
> instead of using PML for dirty tracking. This avoids expensive
> translation later, when emptying the Page Modification Log. In service
> of removing the vCPU pointer from make_spte, factor the check for nested
> PML out of the function.

Aha!  The dependency on @vcpu can be avoided without having to take a flag from
the caller.  The shadow page has everything we need.  The check is really "is this
a page for L2 EPT".  The kvm_x86_ops.cpu_dirty_log_size gets us the EPT part, and
kvm_mmu_page.guest_mode gets us the L2 part.

Compile tested only...

From 773414e4fd7010c38ac89221d16089f3dcc57467 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 17 Nov 2021 18:08:42 -0800
Subject: [PATCH] KVM: x86/mmu: Use shadow page role to detect PML-unfriendly
 pages for L2

Rework make_spte() to query the shadow page's role, specifically whether
or not it's a guest_mode page, a.k.a. a page for L2, when determining if
the SPTE is compatible with PML.  This eliminates a dependency on @vcpu,
with a future goal of being able to create SPTEs without a specific vCPU.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 7 +++----
 arch/x86/kvm/mmu/spte.c         | 2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 8ede43a826af..03882b2624c8 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -109,7 +109,7 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
 	return kvm_mmu_role_as_id(sp->role);
 }

-static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
+static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
 	 * When using the EPT page-modification log, the GPAs in the CPU dirty
@@ -117,10 +117,9 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
 	 * on write protection to record dirty pages, which bypasses PML, since
 	 * writes now result in a vmexit.  Note, the check on CPU dirty logging
 	 * being enabled is mandatory as the bits used to denote WP-only SPTEs
-	 * are reserved for NPT w/ PAE (32-bit KVM).
+	 * are reserved for PAE paging (32-bit KVM).
 	 */
-	return vcpu->arch.mmu == &vcpu->arch.guest_mmu &&
-	       kvm_x86_ops.cpu_dirty_log_size;
+	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
 }

 int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 0c76c45fdb68..84e64dbdd89e 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -101,7 +101,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,

 	if (sp->role.ad_disabled)
 		spte |= SPTE_TDP_AD_DISABLED_MASK;
-	else if (kvm_vcpu_ad_need_write_protect(vcpu))
+	else if (kvm_mmu_page_ad_need_write_protect(sp))
 		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;

 	/*
--
Ben Gardon Nov. 18, 2021, 5:43 p.m. UTC | #2
On Wed, Nov 17, 2021 at 6:12 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Nov 10, 2021, Ben Gardon wrote:
> > When running a nested VM, KVM write protects SPTEs in the EPT/NPT02
> > instead of using PML for dirty tracking. This avoids expensive
> > translation later, when emptying the Page Modification Log. In service
> > of removing the vCPU pointer from make_spte, factor the check for nested
> > PML out of the function.
>
> Aha!  The dependency on @vcpu can be avoided without having to take a flag from
> the caller.  The shadow page has everything we need.  The check is really "is this
> a page for L2 EPT".  The kvm_x86_ops.cpu_dirty_log_size gets us the EPT part, and
> kvm_mmu_page.guest_mode gets us the L2 part.

Haha that's way cleaner than what I was doing! Seems like an obvious
solution in retrospect. I'll include this in the next version of the
series I send out unless Paolo beats me and just merges it directly.
Happy to give this my reviewed-by.

>
> Compile tested only...
>
> From 773414e4fd7010c38ac89221d16089f3dcc57467 Mon Sep 17 00:00:00 2001
> From: Sean Christopherson <seanjc@google.com>
> Date: Wed, 17 Nov 2021 18:08:42 -0800
> Subject: [PATCH] KVM: x86/mmu: Use shadow page role to detect PML-unfriendly
>  pages for L2
>
> Rework make_spte() to query the shadow page's role, specifically whether
> or not it's a guest_mode page, a.k.a. a page for L2, when determining if
> the SPTE is compatible with PML.  This eliminates a dependency on @vcpu,
> with a future goal of being able to create SPTEs without a specific vCPU.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  arch/x86/kvm/mmu/mmu_internal.h | 7 +++----
>  arch/x86/kvm/mmu/spte.c         | 2 +-
>  2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 8ede43a826af..03882b2624c8 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -109,7 +109,7 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
>         return kvm_mmu_role_as_id(sp->role);
>  }
>
> -static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
> +static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>  {
>         /*
>          * When using the EPT page-modification log, the GPAs in the CPU dirty
> @@ -117,10 +117,9 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
>          * on write protection to record dirty pages, which bypasses PML, since
>          * writes now result in a vmexit.  Note, the check on CPU dirty logging
>          * being enabled is mandatory as the bits used to denote WP-only SPTEs
> -        * are reserved for NPT w/ PAE (32-bit KVM).
> +        * are reserved for PAE paging (32-bit KVM).
>          */
> -       return vcpu->arch.mmu == &vcpu->arch.guest_mmu &&
> -              kvm_x86_ops.cpu_dirty_log_size;
> +       return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
>  }
>
>  int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 0c76c45fdb68..84e64dbdd89e 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -101,7 +101,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>
>         if (sp->role.ad_disabled)
>                 spte |= SPTE_TDP_AD_DISABLED_MASK;
> -       else if (kvm_vcpu_ad_need_write_protect(vcpu))
> +       else if (kvm_mmu_page_ad_need_write_protect(sp))
>                 spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
>
>         /*
> --
Paolo Bonzini Nov. 18, 2021, 6:04 p.m. UTC | #3
On 11/18/21 18:43, Ben Gardon wrote:
>> Aha!  The dependency on @vcpu can be avoided without having to take a flag from
>> the caller.  The shadow page has everything we need.  The check is really "is this
>> a page for L2 EPT".  The kvm_x86_ops.cpu_dirty_log_size gets us the EPT part, and
>> kvm_mmu_page.guest_mode gets us the L2 part.
>
> Haha that's way cleaner than what I was doing! Seems like an obvious
> solution in retrospect. I'll include this in the next version of the
> series I send out unless Paolo beats me and just merges it directly.
> Happy to give this my reviewed-by.

Yeah, I am including the early cleanup parts because it makes no sense
to hold off on them; and Sean's patch qualifies as well.

I can't blame you for not remembering role.guest_mode.  Jim added it for
a decidedly niche reason:

     commit 1313cc2bd8f6568dd8801feef446afbe43e6d313
     Author: Jim Mattson <jmattson@google.com>
     Date:   Wed May 9 17:02:04 2018 -0400

     kvm: mmu: Add guest_mode to kvm_mmu_page_role
     
     L1 and L2 need to have disjoint mappings, so that L1's APIC access
     page (under VMX) can be omitted from L2's mappings.
     
     Signed-off-by: Jim Mattson <jmattson@google.com>
     Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

though it's actually gotten a lot more important than just that:

     commit 992edeaefed682511bd173dabd2f54b1ce5387df
     Author: Liran Alon <liran.alon@oracle.com>
     Date:   Wed Nov 20 14:24:52 2019 +0200

     KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
     
     Since commit 1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role"),
     guest_mode was added to mmu-role and therefore if L0 use EPT, it will
     always run L1 and L2 with different EPTP. i.e. EPTP01!=EPTP02.
     
     Because TLB entries are tagged with EP4TA, KVM can assume
     TLB entries populated while running L2 are tagged differently
     than TLB entries populated while running L1.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 04d26e913941..3cf08a534a16 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -92,7 +92,8 @@  static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	       struct kvm_memory_slot *slot, unsigned int pte_access,
 	       gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch,
-	       bool can_unsync, bool host_writable, u64 *new_spte)
+	       bool can_unsync, bool host_writable, bool ad_need_write_protect,
+	       u64 *new_spte)
 {
 	int level = sp->role.level;
 	u64 spte = SPTE_MMU_PRESENT_MASK;
@@ -100,7 +101,7 @@  bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 
 	if (sp->role.ad_disabled)
 		spte |= SPTE_TDP_AD_DISABLED_MASK;
-	else if (kvm_vcpu_ad_need_write_protect(vcpu))
+	else if (ad_need_write_protect)
 		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
 
 	/*
@@ -195,8 +196,11 @@  bool vcpu_make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		    gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch,
 		    bool can_unsync, bool host_writable, u64 *new_spte)
 {
+	bool ad_need_write_protect = kvm_vcpu_ad_need_write_protect(vcpu);
+
 	return make_spte(vcpu, sp, slot, pte_access, gfn, pfn, old_spte,
-			 prefetch, can_unsync, host_writable, new_spte);
+			 prefetch, can_unsync, host_writable,
+			 ad_need_write_protect, new_spte);
 
 }
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 14f18082d505..bcf58602f224 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -332,7 +332,8 @@  static inline u64 get_mmio_spte_generation(u64 spte)
 bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	       struct kvm_memory_slot *slot, unsigned int pte_access,
 	       gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch,
-	       bool can_unsync, bool host_writable, u64 *new_spte);
+	       bool can_unsync, bool host_writable, bool ad_need_write_protect,
+	       u64 *new_spte);
 bool vcpu_make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		    struct kvm_memory_slot *slot,
 		    unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,