Message ID | 20211110223010.1392399-8-bgardon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/mmu: Optimize disabling dirty logging | expand |
On Wed, Nov 10, 2021, Ben Gardon wrote: > When running a nested VM, KVM write protects SPTEs in the EPT/NPT02 > instead of using PML for dirty tracking. This avoids expensive > translation later, when emptying the Page Modification Log. In service > of removing the vCPU pointer from make_spte, factor the check for nested > PML out of the function. Aha! The dependency on @vcpu can be avoided without having to take a flag from the caller. The shadow page has everything we need. The check is really "is this a page for L2 EPT". The kvm_x86_ops.cpu_dirty_log_size gets us the EPT part, and kvm_mmu_page.guest_mode gets us the L2 part. Compile tested only... From 773414e4fd7010c38ac89221d16089f3dcc57467 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc@google.com> Date: Wed, 17 Nov 2021 18:08:42 -0800 Subject: [PATCH] KVM: x86/mmu: Use shadow page role to detect PML-unfriendly pages for L2 Rework make_spte() to query the shadow page's role, specifically whether or not it's a guest_mode page, a.k.a. a page for L2, when determining if the SPTE is compatible with PML. This eliminates a dependency on @vcpu, with a future goal of being able to create SPTEs without a specific vCPU. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/mmu_internal.h | 7 +++---- arch/x86/kvm/mmu/spte.c | 2 +- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 8ede43a826af..03882b2624c8 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -109,7 +109,7 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) return kvm_mmu_role_as_id(sp->role); } -static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu) +static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) { /* * When using the EPT page-modification log, the GPAs in the CPU dirty @@ -117,10 +117,9 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu) * on write protection to record dirty pages, which bypasses PML, since * writes now result in a vmexit. Note, the check on CPU dirty logging * being enabled is mandatory as the bits used to denote WP-only SPTEs - * are reserved for NPT w/ PAE (32-bit KVM). + * are reserved for PAE paging (32-bit KVM). */ - return vcpu->arch.mmu == &vcpu->arch.guest_mmu && - kvm_x86_ops.cpu_dirty_log_size; + return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode; } int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 0c76c45fdb68..84e64dbdd89e 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -101,7 +101,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (sp->role.ad_disabled) spte |= SPTE_TDP_AD_DISABLED_MASK; - else if (kvm_vcpu_ad_need_write_protect(vcpu)) + else if (kvm_mmu_page_ad_need_write_protect(sp)) spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK; /* --
On Wed, Nov 17, 2021 at 6:12 PM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Nov 10, 2021, Ben Gardon wrote: > > When running a nested VM, KVM write protects SPTEs in the EPT/NPT02 > > instead of using PML for dirty tracking. This avoids expensive > > translation later, when emptying the Page Modification Log. In service > > of removing the vCPU pointer from make_spte, factor the check for nested > > PML out of the function. > > Aha! The dependency on @vcpu can be avoided without having to take a flag from > the caller. The shadow page has everything we need. The check is really "is this > a page for L2 EPT". The kvm_x86_ops.cpu_dirty_log_size gets us the EPT part, and > kvm_mmu_page.guest_mode gets us the L2 part. Haha that's way cleaner than what I was doing! Seems like an obvious solution in retrospect. I'll include this in the next version of the series I send out unless Paolo beats me and just merges it directly. Happy to give this my reviewed-by. > > Compile tested only... > > From 773414e4fd7010c38ac89221d16089f3dcc57467 Mon Sep 17 00:00:00 2001 > From: Sean Christopherson <seanjc@google.com> > Date: Wed, 17 Nov 2021 18:08:42 -0800 > Subject: [PATCH] KVM: x86/mmu: Use shadow page role to detect PML-unfriendly > pages for L2 > > Rework make_spte() to query the shadow page's role, specifically whether > or not it's a guest_mode page, a.k.a. a page for L2, when determining if > the SPTE is compatible with PML. This eliminates a dependency on @vcpu, > with a future goal of being able to create SPTEs without a specific vCPU. > > Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> > --- > arch/x86/kvm/mmu/mmu_internal.h | 7 +++---- > arch/x86/kvm/mmu/spte.c | 2 +- > 2 files changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index 8ede43a826af..03882b2624c8 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -109,7 +109,7 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) > return kvm_mmu_role_as_id(sp->role); > } > > -static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu) > +static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) > { > /* > * When using the EPT page-modification log, the GPAs in the CPU dirty > @@ -117,10 +117,9 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu) > * on write protection to record dirty pages, which bypasses PML, since > * writes now result in a vmexit. Note, the check on CPU dirty logging > * being enabled is mandatory as the bits used to denote WP-only SPTEs > - * are reserved for NPT w/ PAE (32-bit KVM). > + * are reserved for PAE paging (32-bit KVM). > */ > - return vcpu->arch.mmu == &vcpu->arch.guest_mmu && > - kvm_x86_ops.cpu_dirty_log_size; > + return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode; > } > > int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c > index 0c76c45fdb68..84e64dbdd89e 100644 > --- a/arch/x86/kvm/mmu/spte.c > +++ b/arch/x86/kvm/mmu/spte.c > @@ -101,7 +101,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > > if (sp->role.ad_disabled) > spte |= SPTE_TDP_AD_DISABLED_MASK; > - else if (kvm_vcpu_ad_need_write_protect(vcpu)) > + else if (kvm_mmu_page_ad_need_write_protect(sp)) > spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK; > > /* > --
On 11/18/21 18:43, Ben Gardon wrote: >> Aha! The dependency on @vcpu can be avoided without having to take a flag from >> the caller. The shadow page has everything we need. The check is really "is this >> a page for L2 EPT". The kvm_x86_ops.cpu_dirty_log_size gets us the EPT part, and >> kvm_mmu_page.guest_mode gets us the L2 part. > > Haha that's way cleaner than what I was doing! Seems like an obvious > solution in retrospect. I'll include this in the next version of the > series I send out unless Paolo beats me and just merges it directly. > Happy to give this my reviewed-by. Yeah, I am including the early cleanup parts because it makes no sense to hold off on them; and Sean's patch qualifies as well. I can't blame you for not remembering role.guest_mode. Jim added it for a decidedly niche reason: commit 1313cc2bd8f6568dd8801feef446afbe43e6d313 Author: Jim Mattson <jmattson@google.com> Date: Wed May 9 17:02:04 2018 -0400 kvm: mmu: Add guest_mode to kvm_mmu_page_role L1 and L2 need to have disjoint mappings, so that L1's APIC access page (under VMX) can be omitted from L2's mappings. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> though it's actually gotten a lot more important than just that: commit 992edeaefed682511bd173dabd2f54b1ce5387df Author: Liran Alon <liran.alon@oracle.com> Date: Wed Nov 20 14:24:52 2019 +0200 KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT Since commit 1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role"), guest_mode was added to mmu-role and therefore if L0 use EPT, it will always run L1 and L2 with different EPTP. i.e. EPTP01!=EPTP02. Because TLB entries are tagged with EP4TA, KVM can assume TLB entries populated while running L2 are tagged differently than TLB entries populated while running L1. Paolo
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 04d26e913941..3cf08a534a16 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -92,7 +92,8 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, - bool can_unsync, bool host_writable, u64 *new_spte) + bool can_unsync, bool host_writable, bool ad_need_write_protect, + u64 *new_spte) { int level = sp->role.level; u64 spte = SPTE_MMU_PRESENT_MASK; @@ -100,7 +101,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (sp->role.ad_disabled) spte |= SPTE_TDP_AD_DISABLED_MASK; - else if (kvm_vcpu_ad_need_write_protect(vcpu)) + else if (ad_need_write_protect) spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK; /* @@ -195,8 +196,11 @@ bool vcpu_make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, bool host_writable, u64 *new_spte) { + bool ad_need_write_protect = kvm_vcpu_ad_need_write_protect(vcpu); + return make_spte(vcpu, sp, slot, pte_access, gfn, pfn, old_spte, - prefetch, can_unsync, host_writable, new_spte); + prefetch, can_unsync, host_writable, + ad_need_write_protect, new_spte); } diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 14f18082d505..bcf58602f224 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -332,7 +332,8 @@ static inline u64 get_mmio_spte_generation(u64 spte) bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, - bool can_unsync, bool host_writable, u64 *new_spte); + bool can_unsync, bool host_writable, bool ad_need_write_protect, + u64 *new_spte); bool vcpu_make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
When running a nested VM, KVM write protects SPTEs in the EPT/NPT02 instead of using PML for dirty tracking. This avoids expensive translation later, when emptying the Page Modification Log. In service of removing the vCPU pointer from make_spte, factor the check for nested PML out of the function. Signed-off-by: Ben Gardon <bgardon@google.com> --- arch/x86/kvm/mmu/spte.c | 10 +++++++--- arch/x86/kvm/mmu/spte.h | 3 ++- 2 files changed, 9 insertions(+), 4 deletions(-)