Message ID | 20220503221357.943536-1-vipinsh@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] KVM: x86/mmu: Remove KVM memory shrinker | expand |
On Tue, May 03, 2022, Vipin Sharma wrote: > KVM memory shrinker is only used in the shadow paging. Most of the L1 > guests are backed by TDP (Two Dimensional Paging) which do not use the > shrinker, only L2 guests are backed by shadow paging. Nit, the TDP MMU doesn't use the shrinker, but legacy MMU, which supports TDP, does. > KVM memory shrinker can cause guests performance to degrade if any other > process (VM or non-VM) in the same or different cgroup in kernel causes > memory shrinker to run. Ah, but digging into this aspect reveals that per-memcg shrinkers were added in 2015 by commit cb731d6c62bb ("vmscan: per memory cgroup slab shrinkers"). I haven't dug too deep, but presumably it wouldn't be all that difficult to support SHRINKER_MEMCG_AWARE in KVM. That's quite tempting to support as it would/could guard against an unintentional DoS of sorts against L1 from L2, e.g. if L1 doesn't cap the number of TDP SPTEs it creates (*cough* TDP MMU *cough*) and ends up creating a large number of SPTEs for L2. IIUC, that sort of scenario was the primary motivation for commit 2de4085cccea ("KVM: x86/MMU: Recursively zap nested TDP SPs when zapping last/only parent"). > The KVM memory shrinker was introduced in 2008, > commit 3ee16c814511 ("KVM: MMU: allow the vm to shrink the kvm mmu > shadow caches"), each invocation of shrinker only released 1 shadow page > in 1 VM. This behavior was not effective until the batch zapping commit > was added in 2020, commit ebdb292dac79 ("KVM: x86/mmu: Batch zap MMU > pages when shrinking the slab"), which zaps multiple pages but still in > 1 VM for each shrink invocation. Overall, this feature existed for many > years without providing meaningful benefit. > > Signed-off-by: Vipin Sharma <vipinsh@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 99 ++---------------------------------------- > 1 file changed, 3 insertions(+), 96 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 4e8d546431eb..80618c847ce2 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -178,7 +178,6 @@ struct kvm_shadow_walk_iterator { > > static struct kmem_cache *pte_list_desc_cache; > struct kmem_cache *mmu_page_header_cache; > -static struct percpu_counter kvm_total_used_mmu_pages; > > static void mmu_spte_set(u64 *sptep, u64 spte); > > @@ -1658,16 +1657,9 @@ static int is_empty_shadow_page(u64 *spt) > } > #endif > > -/* > - * This value is the sum of all of the kvm instances's > - * kvm->arch.n_used_mmu_pages values. We need a global, > - * aggregate version in order to make the slab shrinker > - * faster > - */ > -static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr) > +static inline void kvm_used_mmu_pages(struct kvm *kvm, long nr) This rename is unnecessary, and I much prefer the existing name, kvm_used_mmu_pages() sounds like an accessor to get the number of used pages. > { > kvm->arch.n_used_mmu_pages += nr; > - percpu_counter_add(&kvm_total_used_mmu_pages, nr); > } > > static void kvm_mmu_free_page(struct kvm_mmu_page *sp) ... > -static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) > -{ > - return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); > -} zapped_obsolete_pages can be dropped, its sole purposed was to expose those pages to the shrinker. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c59fea4bdb6e..15b71de6f6fe 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1065,7 +1065,6 @@ struct kvm_arch { u8 mmu_valid_gen; struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; struct list_head active_mmu_pages; - struct list_head zapped_obsolete_pages; struct list_head lpage_disallowed_mmu_pages; struct kvm_page_track_notifier_node mmu_sp_tracker; struct kvm_page_track_notifier_head track_notifier_head; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 51443a8e779a..299c9297418e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5657,6 +5657,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) { struct kvm_mmu_page *sp, *node; int nr_zapped, batch = 0; + LIST_HEAD(zapped_pages); restart: list_for_each_entry_safe_reverse(sp, node, @@ -5688,8 +5689,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) goto restart; } - if (__kvm_mmu_prepare_zap_page(kvm, sp, - &kvm->arch.zapped_obsolete_pages, &nr_zapped)) { + if (__kvm_mmu_prepare_zap_page(kvm, sp, &zapped_pages, &nr_zapped)) { batch += nr_zapped; goto restart; } @@ -5704,7 +5704,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) * kvm_mmu_load()), and the reload in the caller ensure no vCPUs are * running with an obsolete MMU. */ - kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages); + kvm_mmu_commit_zap_page(kvm, &zapped_pages); } /* @@ -5780,7 +5780,6 @@ int kvm_mmu_init_vm(struct kvm *kvm) int r; INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); - INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages); INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages); spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4e8d546431eb..80618c847ce2 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -178,7 +178,6 @@ struct kvm_shadow_walk_iterator { static struct kmem_cache *pte_list_desc_cache; struct kmem_cache *mmu_page_header_cache; -static struct percpu_counter kvm_total_used_mmu_pages; static void mmu_spte_set(u64 *sptep, u64 spte); @@ -1658,16 +1657,9 @@ static int is_empty_shadow_page(u64 *spt) } #endif -/* - * This value is the sum of all of the kvm instances's - * kvm->arch.n_used_mmu_pages values. We need a global, - * aggregate version in order to make the slab shrinker - * faster - */ -static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr) +static inline void kvm_used_mmu_pages(struct kvm *kvm, long nr) { kvm->arch.n_used_mmu_pages += nr; - percpu_counter_add(&kvm_total_used_mmu_pages, nr); } static void kvm_mmu_free_page(struct kvm_mmu_page *sp) @@ -1725,7 +1717,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct */ sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen; list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages); - kvm_mod_used_mmu_pages(vcpu->kvm, +1); + kvm_used_mmu_pages(vcpu->kvm, 1); return sp; } @@ -2341,7 +2333,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, list_add(&sp->link, invalid_list); else list_move(&sp->link, invalid_list); - kvm_mod_used_mmu_pages(kvm, -1); + kvm_used_mmu_pages(kvm, -1); } else { /* * Remove the active root from the active page list, the root @@ -5801,11 +5793,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) kvm_tdp_mmu_zap_invalidated_roots(kvm); } -static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) -{ - return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); -} - static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, struct kvm_page_track_notifier_node *node) @@ -6159,77 +6146,6 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) } } -static unsigned long -mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) -{ - struct kvm *kvm; - int nr_to_scan = sc->nr_to_scan; - unsigned long freed = 0; - - mutex_lock(&kvm_lock); - - list_for_each_entry(kvm, &vm_list, vm_list) { - int idx; - LIST_HEAD(invalid_list); - - /* - * Never scan more than sc->nr_to_scan VM instances. - * Will not hit this condition practically since we do not try - * to shrink more than one VM and it is very unlikely to see - * !n_used_mmu_pages so many times. - */ - if (!nr_to_scan--) - break; - /* - * n_used_mmu_pages is accessed without holding kvm->mmu_lock - * here. We may skip a VM instance errorneosly, but we do not - * want to shrink a VM that only started to populate its MMU - * anyway. - */ - if (!kvm->arch.n_used_mmu_pages && - !kvm_has_zapped_obsolete_pages(kvm)) - continue; - - idx = srcu_read_lock(&kvm->srcu); - write_lock(&kvm->mmu_lock); - - if (kvm_has_zapped_obsolete_pages(kvm)) { - kvm_mmu_commit_zap_page(kvm, - &kvm->arch.zapped_obsolete_pages); - goto unlock; - } - - freed = kvm_mmu_zap_oldest_mmu_pages(kvm, sc->nr_to_scan); - -unlock: - write_unlock(&kvm->mmu_lock); - srcu_read_unlock(&kvm->srcu, idx); - - /* - * unfair on small ones - * per-vm shrinkers cry out - * sadness comes quickly - */ - list_move_tail(&kvm->vm_list, &vm_list); - break; - } - - mutex_unlock(&kvm_lock); - return freed; -} - -static unsigned long -mmu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) -{ - return percpu_counter_read_positive(&kvm_total_used_mmu_pages); -} - -static struct shrinker mmu_shrinker = { - .count_objects = mmu_shrink_count, - .scan_objects = mmu_shrink_scan, - .seeks = DEFAULT_SEEKS * 10, -}; - static void mmu_destroy_caches(void) { kmem_cache_destroy(pte_list_desc_cache); @@ -6325,13 +6241,6 @@ int kvm_mmu_vendor_module_init(void) if (!mmu_page_header_cache) goto out; - if (percpu_counter_init(&kvm_total_used_mmu_pages, 0, GFP_KERNEL)) - goto out; - - ret = register_shrinker(&mmu_shrinker); - if (ret) - goto out; - return 0; out: @@ -6350,8 +6259,6 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu) void kvm_mmu_vendor_module_exit(void) { mmu_destroy_caches(); - percpu_counter_destroy(&kvm_total_used_mmu_pages); - unregister_shrinker(&mmu_shrinker); } /*
KVM memory shrinker is only used in the shadow paging. Most of the L1 guests are backed by TDP (Two Dimensional Paging) which do not use the shrinker, only L2 guests are backed by shadow paging. KVM memory shrinker can cause guests performance to degrade if any other process (VM or non-VM) in the same or different cgroup in kernel causes memory shrinker to run. The KVM memory shrinker was introduced in 2008, commit 3ee16c814511 ("KVM: MMU: allow the vm to shrink the kvm mmu shadow caches"), each invocation of shrinker only released 1 shadow page in 1 VM. This behavior was not effective until the batch zapping commit was added in 2020, commit ebdb292dac79 ("KVM: x86/mmu: Batch zap MMU pages when shrinking the slab"), which zaps multiple pages but still in 1 VM for each shrink invocation. Overall, this feature existed for many years without providing meaningful benefit. Signed-off-by: Vipin Sharma <vipinsh@google.com> --- arch/x86/kvm/mmu/mmu.c | 99 ++---------------------------------------- 1 file changed, 3 insertions(+), 96 deletions(-)