Message ID | 20200925212302.3979661-19-bgardon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce the TDP MMU | expand |
On 25/09/20 23:22, Ben Gardon wrote: > + for_each_tdp_pte_root(iter, root, start, end) { > + if (!is_shadow_present_pte(iter.old_spte) || > + is_last_spte(iter.old_spte, iter.level)) > + continue; > + I'm starting to wonder if another iterator like for_each_tdp_leaf_pte_root would be clearer, since this idiom repeats itself quite often. The tdp_iter_next_leaf function would be easily implemented as while (likely(iter->valid) && (!is_shadow_present_pte(iter.old_spte) || is_last_spte(iter.old_spte, iter.level)) tdp_iter_next(iter); Paolo
On Fri, Sep 25, 2020 at 6:09 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 25/09/20 23:22, Ben Gardon wrote: > > + for_each_tdp_pte_root(iter, root, start, end) { > > + if (!is_shadow_present_pte(iter.old_spte) || > > + is_last_spte(iter.old_spte, iter.level)) > > + continue; > > + > > I'm starting to wonder if another iterator like > for_each_tdp_leaf_pte_root would be clearer, since this idiom repeats > itself quite often. The tdp_iter_next_leaf function would be easily > implemented as > > while (likely(iter->valid) && > (!is_shadow_present_pte(iter.old_spte) || > is_last_spte(iter.old_spte, iter.level)) > tdp_iter_next(iter); Do you see a substantial efficiency difference between adding a tdp_iter_next_leaf and building on for_each_tdp_pte_using_root with something like: #define for_each_tdp_leaf_pte_using_root(_iter, _root, _start, _end) \ for_each_tdp_pte_using_root(_iter, _root, _start, _end) \ if (!is_shadow_present_pte(_iter.old_spte) || \ !is_last_spte(_iter.old_spte, _iter.level)) \ continue; \ else I agree that putting those checks in a wrapper makes the code more concise. > > Paolo >
On 07/10/20 18:30, Ben Gardon wrote: >> I'm starting to wonder if another iterator like >> for_each_tdp_leaf_pte_root would be clearer, since this idiom repeats >> itself quite often. The tdp_iter_next_leaf function would be easily >> implemented as >> >> while (likely(iter->valid) && >> (!is_shadow_present_pte(iter.old_spte) || >> is_last_spte(iter.old_spte, iter.level)) >> tdp_iter_next(iter); > Do you see a substantial efficiency difference between adding a > tdp_iter_next_leaf and building on for_each_tdp_pte_using_root with > something like: > > #define for_each_tdp_leaf_pte_using_root(_iter, _root, _start, _end) \ > for_each_tdp_pte_using_root(_iter, _root, _start, _end) \ > if (!is_shadow_present_pte(_iter.old_spte) || \ > !is_last_spte(_iter.old_spte, _iter.level)) \ > continue; \ > else > > I agree that putting those checks in a wrapper makes the code more concise. > No, that would be just another way to write the same thing. That said, making the iteration API more complicated also has disadvantages because if get a Cartesian explosion of changes. Regarding the naming, I'm leaning towards tdp_root_for_each_pte tdp_vcpu_for_each_pte which is shorter than the version with "using" and still clarifies that "root" and "vcpu" are the thing that the iteration works on. Paolo
On Wed, Oct 7, 2020 at 10:21 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 07/10/20 18:30, Ben Gardon wrote: > >> I'm starting to wonder if another iterator like > >> for_each_tdp_leaf_pte_root would be clearer, since this idiom repeats > >> itself quite often. The tdp_iter_next_leaf function would be easily > >> implemented as > >> > >> while (likely(iter->valid) && > >> (!is_shadow_present_pte(iter.old_spte) || > >> is_last_spte(iter.old_spte, iter.level)) > >> tdp_iter_next(iter); > > Do you see a substantial efficiency difference between adding a > > tdp_iter_next_leaf and building on for_each_tdp_pte_using_root with > > something like: > > > > #define for_each_tdp_leaf_pte_using_root(_iter, _root, _start, _end) \ > > for_each_tdp_pte_using_root(_iter, _root, _start, _end) \ > > if (!is_shadow_present_pte(_iter.old_spte) || \ > > !is_last_spte(_iter.old_spte, _iter.level)) \ > > continue; \ > > else > > > > I agree that putting those checks in a wrapper makes the code more concise. > > > > No, that would be just another way to write the same thing. That said, > making the iteration API more complicated also has disadvantages because > if get a Cartesian explosion of changes. I wouldn't be too worried about that. The only things I ever found worth making an iterator case for were: Every SPTE Every present SPTE Every present leaf SPTE And really there aren't many cases that use the middle one. > > Regarding the naming, I'm leaning towards > > tdp_root_for_each_pte > tdp_vcpu_for_each_pte > > which is shorter than the version with "using" and still clarifies that > "root" and "vcpu" are the thing that the iteration works on. That sounds good to me. I agree it's similarly clear. > > Paolo >
On 07/10/20 19:28, Ben Gardon wrote: >> No, that would be just another way to write the same thing. That said, >> making the iteration API more complicated also has disadvantages because >> if get a Cartesian explosion of changes. > I wouldn't be too worried about that. The only things I ever found > worth making an iterator case for were: > Every SPTE > Every present SPTE > Every present leaf SPTE * (vcpu, root) * (all levels, large only) We only need a small subset of these, but the naming would be more complex at least. Paolo
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b9074603f9df1..12892fc4f146d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6025,6 +6025,9 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, spin_lock(&kvm->mmu_lock); slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, kvm_mmu_zap_collapsible_spte, true); + + if (kvm->arch.tdp_mmu_enabled) + kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot); spin_unlock(&kvm->mmu_lock); } diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e5cb7f0ec23e8..a2895119655ac 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1099,3 +1099,65 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) return spte_set; } +/* + * Clear non-leaf entries (and free associated page tables) which could + * be replaced by large mappings, for GFNs within the slot. + */ +static void zap_collapsible_spte_range(struct kvm *kvm, + struct kvm_mmu_page *root, + gfn_t start, gfn_t end) +{ + struct tdp_iter iter; + kvm_pfn_t pfn; + bool spte_set = false; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, start, end) { + if (!is_shadow_present_pte(iter.old_spte) || + is_last_spte(iter.old_spte, iter.level)) + continue; + + pfn = spte_to_pfn(iter.old_spte); + if (kvm_is_reserved_pfn(pfn) || + !PageTransCompoundMap(pfn_to_page(pfn))) + continue; + + *iter.sptep = 0; + handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + 0, iter.level); + spte_set = true; + + spte_set = !tdp_mmu_iter_cond_resched(kvm, &iter); + } + + if (spte_set) + kvm_flush_remote_tlbs(kvm); +} + +/* + * Clear non-leaf entries (and free associated page tables) which could + * be replaced by large mappings, for GFNs within the slot. + */ +void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, + const struct kvm_memory_slot *slot) +{ + struct kvm_mmu_page *root; + int root_as_id; + + for_each_tdp_mmu_root(kvm, root) { + root_as_id = kvm_mmu_page_as_id(root); + if (root_as_id != slot->as_id) + continue; + + /* + * Take a reference on the root so that it cannot be freed if + * this thread releases the MMU lock and yields in this loop. + */ + get_tdp_mmu_root(kvm, root); + + zap_collapsible_spte_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages); + + put_tdp_mmu_root(kvm, root); + } +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 2c9322ba3462b..10e70699c5372 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -38,4 +38,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, gfn_t gfn, unsigned long mask, bool wrprot); bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot); +void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, + const struct kvm_memory_slot *slot); #endif /* __KVM_X86_MMU_TDP_MMU_H */
Dirty logging ultimately breaks down MMU mappings to 4k granularity. When dirty logging is no longer needed, these granaular mappings represent a useless performance penalty. When dirty logging is disabled, search the paging structure for mappings that could be re-constituted into a large page mapping. Zap those mappings so that they can be faulted in again at a higher mapping level. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <bgardon@google.com> --- arch/x86/kvm/mmu/mmu.c | 3 ++ arch/x86/kvm/mmu/tdp_mmu.c | 62 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 ++ 3 files changed, 67 insertions(+)