From patchwork Thu Dec 22 02:34:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C8DEC4167B for ; Thu, 22 Dec 2022 02:35:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234970AbiLVCfG (ORCPT ); Wed, 21 Dec 2022 21:35:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234949AbiLVCfD (ORCPT ); Wed, 21 Dec 2022 21:35:03 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65A12C30 for ; Wed, 21 Dec 2022 18:35:02 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id b13-20020a056a000a8d00b0057348c50123so290233pfl.18 for ; Wed, 21 Dec 2022 18:35:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6jVBO9S5dbcnt6LQaPOW+OOm7KBKZVW/r/1rxTlv7y0=; b=Oo7Qropuw/gdh8zHqxm5y4tH/LW1Zuut2j3i90LaWzm1//jVejmK3t0ed6NX17Pv5j Z5OHUTQpFUFenvI4oWoFiCoLJ+I0EuIZK/8I55I53zFB4+ry3RmuEfwzelxYKcdWWLZg GPq6Dzv2zwGXrzI5DqVl+cgeh08cdc3EjFSVeDGxQcjhdy0CkU4ppP/+YBk5LKo75SLB MO/IQNg1nVXQoJHIkgNmNYodEZF949Th0Xu9IXfEGAoIYB8OL62nKrKu8ZtwfZngXvmC nD6GIoiek/7Ls1lDTRlti1yOUj0n1DXw1a1LJ+MROJyWxH1cTjY0/82fooJp4KrK9zkj K7ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6jVBO9S5dbcnt6LQaPOW+OOm7KBKZVW/r/1rxTlv7y0=; b=tJIiS3gHnkCNkln9bJavGQT9G56EBmB6aJvKZ6uFWm6V304QfIUP9Lc8He2upAgiKJ DKs3f9AUFStWEyd1ATBzQ/X4AuKcTIwOI3fHxR7UxaFm5eWCSA0vtF7GrHkOvNnX3sMf onTZ0a9+Ucxdml8HRYXnMIdfPUca1Og9eQey5bL2tAwpChDnBeMjtNgGnZAlAlHQEwf5 yLj3HxRuhiHF9rC7eBXyO2M5IYYdMw1anPXY+qoVeKxqxS1TT0r3rMd7k5SO/ZUaIzZp vtpr+CLv4Ar14RBm8RRm5mXSOTu6vtBOg9SpvvYAQAXNElShF+Rs/QNsaT5DWHxoz6MI cHgQ== X-Gm-Message-State: AFqh2kq7X6b7Uzlsk4KBD9MCfvsBqHANGVCfDP0QueL8s8ahR1XbCae4 OL39XXGoNgBGXsr72Npg1v9MOVbq0/mq X-Google-Smtp-Source: AMrXdXscnJjos0xpFtk0+sb/v4CvHXN/vVCpf4sWeWSpsLgs5nHAFKaZ6pg0jsQ+eZgvw4oz61jBcmjZT8Ng X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:902:fe0c:b0:189:6077:5598 with SMTP id g12-20020a170902fe0c00b0018960775598mr278964plj.100.1671676501765; Wed, 21 Dec 2022 18:35:01 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:49 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-2-vipinsh@google.com> Subject: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org mmu_shrink_scan() is very disruptive to VMs. It picks the first VM in the vm_list, zaps the oldest page which is most likely an upper level SPTEs and most like to be reused. Prior to TDP MMU, this is even more disruptive in nested VMs case, considering L1 SPTEs will be the oldest even though most of the entries are for L2 SPTEs. As discussed in https://lore.kernel.org/lkml/Y45dldZnI6OIf+a5@google.com/ shrinker logic has not be very useful in actually keeping VMs performant and reducing memory usage. Change mmu_shrink_scan() to free pages from the vCPU's shadow page cache. Freeing pages from cache doesn't cause vCPU exits, therefore, a VM's performance should not be affected. This also allows to change cache capacities without worrying too much about high memory usage in cache. Tested this change by running dirty_log_perf_test while dropping cache via "echo 2 > /proc/sys/vm/drop_caches" at 1 second interval continuously. There were WARN_ON(!mc->nobjs) messages printed in kernel logs from kvm_mmu_memory_cache_alloc(), which is expected. Suggested-by: Sean Christopherson Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 5 + arch/x86/kvm/mmu/mmu.c | 163 +++++++++++++++++++------------- arch/x86/kvm/mmu/mmu_internal.h | 2 + arch/x86/kvm/mmu/tdp_mmu.c | 3 +- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 11 ++- 6 files changed, 114 insertions(+), 71 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index aa4eb8cfcd7e..89cc809e4a00 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -786,6 +786,11 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; + /* + * Protects change in size of mmu_shadow_page_cache cache. + */ + spinlock_t mmu_shadow_page_cache_lock; + /* * QEMU userspace and the guest each have their own FPU state. * In vcpu_run, we switch between the user and guest FPU contexts. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 254bc46234e0..157417e1cb6e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -164,7 +164,10 @@ struct kvm_shadow_walk_iterator { static struct kmem_cache *pte_list_desc_cache; struct kmem_cache *mmu_page_header_cache; -static struct percpu_counter kvm_total_used_mmu_pages; +/* + * Total number of unused pages in MMU shadow page cache. + */ +static struct percpu_counter kvm_total_unused_mmu_pages; static void mmu_spte_set(u64 *sptep, u64 spte); @@ -655,6 +658,22 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu) } } +static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache, + spinlock_t *cache_lock) +{ + int orig_nobjs; + int r; + + spin_lock(cache_lock); + orig_nobjs = cache->nobjs; + r = kvm_mmu_topup_memory_cache(cache, PT64_ROOT_MAX_LEVEL); + if (orig_nobjs != cache->nobjs) + percpu_counter_add(&kvm_total_unused_mmu_pages, + (cache->nobjs - orig_nobjs)); + spin_unlock(cache_lock); + return r; +} + static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) { int r; @@ -664,8 +683,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); if (r) return r; - r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - PT64_ROOT_MAX_LEVEL); + r = mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, + &vcpu->arch.mmu_shadow_page_cache_lock); if (r) return r; if (maybe_indirect) { @@ -678,10 +697,25 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) PT64_ROOT_MAX_LEVEL); } +static void mmu_free_sp_memory_cache(struct kvm_mmu_memory_cache *cache, + spinlock_t *cache_lock) +{ + int orig_nobjs; + + spin_lock(cache_lock); + orig_nobjs = cache->nobjs; + kvm_mmu_free_memory_cache(cache); + if (orig_nobjs) + percpu_counter_sub(&kvm_total_unused_mmu_pages, orig_nobjs); + + spin_unlock(cache_lock); +} + static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) { kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); - kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); + mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, + &vcpu->arch.mmu_shadow_page_cache_lock); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } @@ -1693,27 +1727,15 @@ static int is_empty_shadow_page(u64 *spt) } #endif -/* - * This value is the sum of all of the kvm instances's - * kvm->arch.n_used_mmu_pages values. We need a global, - * aggregate version in order to make the slab shrinker - * faster - */ -static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr) -{ - kvm->arch.n_used_mmu_pages += nr; - percpu_counter_add(&kvm_total_used_mmu_pages, nr); -} - static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { - kvm_mod_used_mmu_pages(kvm, +1); + kvm->arch.n_used_mmu_pages++; kvm_account_pgtable_pages((void *)sp->spt, +1); } static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { - kvm_mod_used_mmu_pages(kvm, -1); + kvm->arch.n_used_mmu_pages--; kvm_account_pgtable_pages((void *)sp->spt, -1); } @@ -2150,8 +2172,31 @@ struct shadow_page_caches { struct kvm_mmu_memory_cache *page_header_cache; struct kvm_mmu_memory_cache *shadow_page_cache; struct kvm_mmu_memory_cache *shadowed_info_cache; + /* + * Protects change in size of shadow_page_cache cache. + */ + spinlock_t *shadow_page_cache_lock; }; +void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_memory_cache *shadow_page_cache, + spinlock_t *cache_lock) +{ + int orig_nobjs; + void *page; + + if (!cache_lock) { + spin_lock(cache_lock); + orig_nobjs = shadow_page_cache->nobjs; + } + page = kvm_mmu_memory_cache_alloc(shadow_page_cache); + if (!cache_lock) { + if (orig_nobjs) + percpu_counter_dec(&kvm_total_unused_mmu_pages); + spin_unlock(cache_lock); + } + return page; +} + static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm, struct shadow_page_caches *caches, gfn_t gfn, @@ -2161,7 +2206,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm, struct kvm_mmu_page *sp; sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache); - sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache); + sp->spt = kvm_mmu_sp_memory_cache_alloc(caches->shadow_page_cache, + caches->shadow_page_cache_lock); if (!role.direct) sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache); @@ -2218,6 +2264,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu, .page_header_cache = &vcpu->arch.mmu_page_header_cache, .shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache, .shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache, + .shadow_page_cache_lock = &vcpu->arch.mmu_shadow_page_cache_lock }; return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role); @@ -5916,6 +5963,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO; vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO; + spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock); vcpu->arch.mmu = &vcpu->arch.root_mmu; vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; @@ -6051,11 +6099,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) kvm_tdp_mmu_zap_invalidated_roots(kvm); } -static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) -{ - return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); -} - static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, struct kvm_page_track_notifier_node *node) @@ -6277,6 +6320,7 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *hu /* Direct SPs do not require a shadowed_info_cache. */ caches.page_header_cache = &kvm->arch.split_page_header_cache; caches.shadow_page_cache = &kvm->arch.split_shadow_page_cache; + caches.shadow_page_cache_lock = NULL; /* Safe to pass NULL for vCPU since requesting a direct SP. */ return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role); @@ -6646,66 +6690,49 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) static unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - struct kvm *kvm; - int nr_to_scan = sc->nr_to_scan; + struct kvm_mmu_memory_cache *cache; + struct kvm *kvm, *first_kvm = NULL; unsigned long freed = 0; + /* spinlock for memory cache */ + spinlock_t *cache_lock; + struct kvm_vcpu *vcpu; + unsigned long i; mutex_lock(&kvm_lock); list_for_each_entry(kvm, &vm_list, vm_list) { - int idx; - LIST_HEAD(invalid_list); - - /* - * Never scan more than sc->nr_to_scan VM instances. - * Will not hit this condition practically since we do not try - * to shrink more than one VM and it is very unlikely to see - * !n_used_mmu_pages so many times. - */ - if (!nr_to_scan--) + if (first_kvm == kvm) break; - /* - * n_used_mmu_pages is accessed without holding kvm->mmu_lock - * here. We may skip a VM instance errorneosly, but we do not - * want to shrink a VM that only started to populate its MMU - * anyway. - */ - if (!kvm->arch.n_used_mmu_pages && - !kvm_has_zapped_obsolete_pages(kvm)) - continue; + if (!first_kvm) + first_kvm = kvm; + list_move_tail(&kvm->vm_list, &vm_list); - idx = srcu_read_lock(&kvm->srcu); - write_lock(&kvm->mmu_lock); + kvm_for_each_vcpu(i, vcpu, kvm) { + cache = &vcpu->arch.mmu_shadow_page_cache; + cache_lock = &vcpu->arch.mmu_shadow_page_cache_lock; + if (READ_ONCE(cache->nobjs)) { + spin_lock(cache_lock); + freed += kvm_mmu_empty_memory_cache(cache); + spin_unlock(cache_lock); + } - if (kvm_has_zapped_obsolete_pages(kvm)) { - kvm_mmu_commit_zap_page(kvm, - &kvm->arch.zapped_obsolete_pages); - goto unlock; } - freed = kvm_mmu_zap_oldest_mmu_pages(kvm, sc->nr_to_scan); - -unlock: - write_unlock(&kvm->mmu_lock); - srcu_read_unlock(&kvm->srcu, idx); - - /* - * unfair on small ones - * per-vm shrinkers cry out - * sadness comes quickly - */ - list_move_tail(&kvm->vm_list, &vm_list); - break; + if (freed >= sc->nr_to_scan) + break; } + if (freed) + percpu_counter_sub(&kvm_total_unused_mmu_pages, freed); mutex_unlock(&kvm_lock); + percpu_counter_sync(&kvm_total_unused_mmu_pages); return freed; } static unsigned long mmu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { - return percpu_counter_read_positive(&kvm_total_used_mmu_pages); + return percpu_counter_sum_positive(&kvm_total_unused_mmu_pages); } static struct shrinker mmu_shrinker = { @@ -6820,7 +6847,7 @@ int kvm_mmu_vendor_module_init(void) if (!mmu_page_header_cache) goto out; - if (percpu_counter_init(&kvm_total_used_mmu_pages, 0, GFP_KERNEL)) + if (percpu_counter_init(&kvm_total_unused_mmu_pages, 0, GFP_KERNEL)) goto out; ret = register_shrinker(&mmu_shrinker, "x86-mmu"); @@ -6830,7 +6857,7 @@ int kvm_mmu_vendor_module_init(void) return 0; out_shrinker: - percpu_counter_destroy(&kvm_total_used_mmu_pages); + percpu_counter_destroy(&kvm_total_unused_mmu_pages); out: mmu_destroy_caches(); return ret; @@ -6847,7 +6874,7 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu) void kvm_mmu_vendor_module_exit(void) { mmu_destroy_caches(); - percpu_counter_destroy(&kvm_total_used_mmu_pages); + percpu_counter_destroy(&kvm_total_unused_mmu_pages); unregister_shrinker(&mmu_shrinker); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index ac00bfbf32f6..c2a342028b6a 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -325,4 +325,6 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); +void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_memory_cache *shadow_page_cache, + spinlock_t *cache_lock); #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 764f7c87286f..4974fa96deff 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -264,7 +264,8 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) struct kvm_mmu_page *sp; sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache); - sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache); + sp->spt = kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache, + &vcpu->arch.mmu_shadow_page_cache_lock); return sp; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 01aad8b74162..efd9b38ea9a2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1362,6 +1362,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm); int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity, int min); int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc); +int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc); void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 13e88297f999..f2d762878b97 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -438,8 +438,10 @@ int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc) return mc->nobjs; } -void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc) { + int freed = mc->nobjs; + while (mc->nobjs) { if (mc->kmem_cache) kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]); @@ -447,8 +449,13 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) free_page((unsigned long)mc->objects[--mc->nobjs]); } - kvfree(mc->objects); + return freed; +} +void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +{ + kvm_mmu_empty_memory_cache(mc); + kvfree(mc->objects); mc->objects = NULL; mc->capacity = 0; } From patchwork Thu Dec 22 02:34:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079336 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CD1CC10F1B for ; Thu, 22 Dec 2022 02:35:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234961AbiLVCfK (ORCPT ); Wed, 21 Dec 2022 21:35:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234957AbiLVCfE (ORCPT ); Wed, 21 Dec 2022 21:35:04 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBF2220BF2 for ; Wed, 21 Dec 2022 18:35:03 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id p17-20020a056a0026d100b005769067d113so310195pfw.3 for ; Wed, 21 Dec 2022 18:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lvgCZzGuCZEbUtwnqfq9owkASUBDttKfTawhWhWdLK8=; b=Lj/SLu+ZPDBJ2sGiuCeMbA8Eib+ys2Vq5L2kcNArJJ27StQ16ma1UJb42Mlbw+MX2y +vrXP50O/g9dt+8oLnnQPMY/pVvzZccMhn37yhkkMKRf4OlgdSQXrJ+ZWJgL43dFFYR+ tdwnZfiM6qXzjCF0pB4Ud8WosLspziQJwWE3TyKPGONvwBr+KawoDzl0ACL4bYB7f0ka +9u/KzDX7Y/WSElPNKVuB24p/2FtsBTEOpi1zH9/xrJSOu0EbYYUmkBsdvah4qZZp3Yj 04qEKwX3b7Ue+pkFRhxsJ6kCgSSMeS3TCIzw3hYmLO78q2Llvjee+NN6cvHnX+T+1k7v btLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lvgCZzGuCZEbUtwnqfq9owkASUBDttKfTawhWhWdLK8=; b=4Ao4ikI1AuJ+T8Em+kUzlaPoJbG1cwC8EEDaBOMPtJMZE+T2ZDDokVU4AvWRyKyBgD DsGWrH61/NHXNzsRgTuL2cKhkmtSzNZmFsI5FtPgBNOMl6LlL3iC2yrjBEvFiPVbk3+A ACMo5GcfTxIBQONcx2CSNVC9hGYgU1cxLHo7FzVVv5QSu4e30gNJWv+hiON0G0fSNmEn h/CJO6mWAmpGxk1fhiOkM72D+DuDZyiU1XhckbocSCfReLTPFaXBy4FuyViDyoVNS+Sw yXuDXQzgnbHINEH+pm2AdaBZ2FjxeWmfnVBFAFovlX93SaQzcsJi0EeEd7Vh/KT0logO iONA== X-Gm-Message-State: AFqh2kqsVh+d/OvWEhch0FzCqvLcLd0/UCyTgnsGLm5xANUSpMP6VS1H SZ159bdTJlsnEYHCUJ0TaZNq5k0edcdn X-Google-Smtp-Source: AMrXdXvpyhuXUSV2ZgsVbb3u/vup+sTPI608UoPHhlRzsUzMCiUiErRLse5yLEL+556xYOprZ+VJNxunZl78 X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90b:d8a:b0:223:f336:1519 with SMTP id bg10-20020a17090b0d8a00b00223f3361519mr359433pjb.198.1671676503393; Wed, 21 Dec 2022 18:35:03 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:50 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-3-vipinsh@google.com> Subject: [Patch v3 2/9] KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org zapped_obsolete_pages list was used in struct kvm_arch{} to provide pages for KVM MMU shrinker. This is not needed now as KVM MMU shrinker has been repurposed to free shadow page caches and not zapped_obsolete_pages. Remove zapped_obsolete_pages from struct kvm_arch{} and use local list in kvm_zap_obsolete_pages(). Signed-off-by: Vipin Sharma Reviewed-by: David Matlack --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/mmu/mmu.c | 8 ++++---- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 89cc809e4a00..f89f02e18080 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1215,7 +1215,6 @@ struct kvm_arch { u8 mmu_valid_gen; struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; struct list_head active_mmu_pages; - struct list_head zapped_obsolete_pages; /* * A list of kvm_mmu_page structs that, if zapped, could possibly be * replaced by an NX huge page. A shadow page is on this list if its diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 157417e1cb6e..3364760a1695 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5987,6 +5987,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) { struct kvm_mmu_page *sp, *node; int nr_zapped, batch = 0; + LIST_HEAD(zapped_pages); bool unstable; restart: @@ -6019,8 +6020,8 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) goto restart; } - unstable = __kvm_mmu_prepare_zap_page(kvm, sp, - &kvm->arch.zapped_obsolete_pages, &nr_zapped); + unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &zapped_pages, + &nr_zapped); batch += nr_zapped; if (unstable) @@ -6036,7 +6037,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) * kvm_mmu_load()), and the reload in the caller ensure no vCPUs are * running with an obsolete MMU. */ - kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages); + kvm_mmu_commit_zap_page(kvm, &zapped_pages); } /* @@ -6112,7 +6113,6 @@ int kvm_mmu_init_vm(struct kvm *kvm) int r; INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); - INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages); INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); From patchwork Thu Dec 22 02:34:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE078C4332F for ; Thu, 22 Dec 2022 02:35:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234984AbiLVCfM (ORCPT ); Wed, 21 Dec 2022 21:35:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234967AbiLVCfG (ORCPT ); Wed, 21 Dec 2022 21:35:06 -0500 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 980A8248D3 for ; Wed, 21 Dec 2022 18:35:05 -0800 (PST) Received: by mail-pg1-x549.google.com with SMTP id l63-20020a639142000000b0047942953738so407484pge.15 for ; Wed, 21 Dec 2022 18:35:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+chyPC2pzC5mKivdZn2tXKKp7i5hdJsSiGnXjGpd+qo=; b=nHCHJ906sv9cxX8DnY/jDErP4w5InUk/yQ735EsKsvPVAmpXxi0y6/6oFa0Ure2H/v hzYxnyKvYvrKkdb7uMSqsnWwhBbO0EhadDcgtpLc22KCypMQx7uCwCa782RsWJA/djXh VToUto+d6Ob3eNyF0TG05M2icTNN9rOnPtRuHqrAeoZZawgG/HO/2quETm+y7BdXSAPK 3XuARlSbS0ji1U90zg9ATDqpNhQy4Ra4Dar/1GHhR8l812S8MSprSWgnsTFSq6tDseSY XQXRLkfGe7gPLZJHtk0190ocrakiM/5KtJaA73UsDTTKrqmvqSCLZ8UryxcP8WaMHZlQ Lq+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+chyPC2pzC5mKivdZn2tXKKp7i5hdJsSiGnXjGpd+qo=; b=5gEqjG2uBsnxT2oA5m3amtEWySM47UlxXvT5huc6KYaWQj2AESNxbBZkaTC3fpSE6P UHiRKz7F91HYKQEAMlhGsLAPzt5kKCVUBHQQi26D8opKthZqoz+gHE1EMYAzjpdJU1BA /TI+RtobYjbGSlkpAc3wIKlaueL4fcmWWxKXmkyxVh25+cHb9esNhVA7ckrkbuznRrML kWyOdfjAsXW4PYLFR9JJ4EZqK0pJAm7MFLaoX8Mndlxfv+cSIrOMLT+Zhvzh+lOYQR2V vDT2jKc45NCbGsgkUV5rjeLGb1zLCjLhLeHRRu4BSjqyIFSSEO16cQoSe9aWSQTltCV5 pFyQ== X-Gm-Message-State: AFqh2koahT7ccfu9dqV38VvtAQs9qO7qVnK0fN/28uUEw9DtO4pRKrlU mbe4T0bjBtrVlXXs8Dd6LImDgxatqCXS X-Google-Smtp-Source: AMrXdXsothIyZZv/0F4TXnv6H6IlW73v2BSr7V+Bexsj6gh/v1vAiynOCYEIOrsxLEPfYDTbw3BlGUXc6gE7 X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a63:8c48:0:b0:479:46cd:e2dc with SMTP id q8-20020a638c48000000b0047946cde2dcmr163712pgn.547.1671676505112; Wed, 21 Dec 2022 18:35:05 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:51 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-4-vipinsh@google.com> Subject: [Patch v3 3/9] KVM: x86/mmu: Shrink split_shadow_page_cache via KVM MMU shrinker From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org split_shadow_page_cache is not used after dirty log is disabled. It is a good candidate to free memory in case of mmu_shrink_scan kicks in. Account for split_shadow_page_cache via kvm_total_unused_mmu_pages and use it in mmu_shrink_scan. Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 5 +++ arch/x86/kvm/mmu/mmu.c | 63 +++++++++++++++++++-------------- 2 files changed, 42 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f89f02e18080..293994fabae3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1413,6 +1413,11 @@ struct kvm_arch { struct kvm_mmu_memory_cache split_shadow_page_cache; struct kvm_mmu_memory_cache split_page_header_cache; + /* + * Protects change in size of split_shadow_page_cache cache. + */ + spinlock_t split_shadow_page_cache_lock; + /* * Memory cache used to allocate pte_list_desc structs while splitting * huge pages. In the worst case, to split one huge page, 512 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3364760a1695..6f6a10d7a871 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -659,14 +659,15 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu) } static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache, - spinlock_t *cache_lock) + spinlock_t *cache_lock, + int min) { int orig_nobjs; int r; spin_lock(cache_lock); orig_nobjs = cache->nobjs; - r = kvm_mmu_topup_memory_cache(cache, PT64_ROOT_MAX_LEVEL); + r = kvm_mmu_topup_memory_cache(cache, min); if (orig_nobjs != cache->nobjs) percpu_counter_add(&kvm_total_unused_mmu_pages, (cache->nobjs - orig_nobjs)); @@ -684,7 +685,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) if (r) return r; r = mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - &vcpu->arch.mmu_shadow_page_cache_lock); + &vcpu->arch.mmu_shadow_page_cache_lock, + PT64_ROOT_MAX_LEVEL); if (r) return r; if (maybe_indirect) { @@ -2184,16 +2186,12 @@ void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_memory_cache *shadow_page_cac int orig_nobjs; void *page; - if (!cache_lock) { - spin_lock(cache_lock); - orig_nobjs = shadow_page_cache->nobjs; - } + spin_lock(cache_lock); + orig_nobjs = shadow_page_cache->nobjs; page = kvm_mmu_memory_cache_alloc(shadow_page_cache); - if (!cache_lock) { - if (orig_nobjs) - percpu_counter_dec(&kvm_total_unused_mmu_pages); - spin_unlock(cache_lock); - } + if (orig_nobjs) + percpu_counter_dec(&kvm_total_unused_mmu_pages); + spin_unlock(cache_lock); return page; } @@ -6130,6 +6128,7 @@ int kvm_mmu_init_vm(struct kvm *kvm) kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO; kvm->arch.split_shadow_page_cache.gfp_zero = __GFP_ZERO; + spin_lock_init(&kvm->arch.split_shadow_page_cache_lock); kvm->arch.split_desc_cache.kmem_cache = pte_list_desc_cache; kvm->arch.split_desc_cache.gfp_zero = __GFP_ZERO; @@ -6141,7 +6140,8 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm) { kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache); kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache); - kvm_mmu_free_memory_cache(&kvm->arch.split_shadow_page_cache); + mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache, + &kvm->arch.split_shadow_page_cache_lock); } void kvm_mmu_uninit_vm(struct kvm *kvm) @@ -6295,7 +6295,9 @@ static int topup_split_caches(struct kvm *kvm) if (r) return r; - return kvm_mmu_topup_memory_cache(&kvm->arch.split_shadow_page_cache, 1); + return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache, + &kvm->arch.split_shadow_page_cache_lock, + 1); } static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *huge_sptep) @@ -6320,7 +6322,7 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *hu /* Direct SPs do not require a shadowed_info_cache. */ caches.page_header_cache = &kvm->arch.split_page_header_cache; caches.shadow_page_cache = &kvm->arch.split_shadow_page_cache; - caches.shadow_page_cache_lock = NULL; + caches.shadow_page_cache_lock = &kvm->arch.split_shadow_page_cache_lock; /* Safe to pass NULL for vCPU since requesting a direct SP. */ return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role); @@ -6687,14 +6689,23 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) } } +static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache, + spinlock_t *cache_lock) +{ + unsigned long freed = 0; + + spin_lock(cache_lock); + if (cache->nobjs) + freed = kvm_mmu_empty_memory_cache(cache); + spin_unlock(cache_lock); + return freed; +} + static unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - struct kvm_mmu_memory_cache *cache; struct kvm *kvm, *first_kvm = NULL; unsigned long freed = 0; - /* spinlock for memory cache */ - spinlock_t *cache_lock; struct kvm_vcpu *vcpu; unsigned long i; @@ -6707,15 +6718,15 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) first_kvm = kvm; list_move_tail(&kvm->vm_list, &vm_list); - kvm_for_each_vcpu(i, vcpu, kvm) { - cache = &vcpu->arch.mmu_shadow_page_cache; - cache_lock = &vcpu->arch.mmu_shadow_page_cache_lock; - if (READ_ONCE(cache->nobjs)) { - spin_lock(cache_lock); - freed += kvm_mmu_empty_memory_cache(cache); - spin_unlock(cache_lock); - } + freed += mmu_shrink_cache(&kvm->arch.split_shadow_page_cache, + &kvm->arch.split_shadow_page_cache_lock); + if (freed >= sc->nr_to_scan) + break; + + kvm_for_each_vcpu(i, vcpu, kvm) { + freed += mmu_shrink_cache(&vcpu->arch.mmu_shadow_page_cache, + &vcpu->arch.mmu_shadow_page_cache_lock); } if (freed >= sc->nr_to_scan) From patchwork Thu Dec 22 02:34:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E8FFC4332F for ; Thu, 22 Dec 2022 02:35:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234978AbiLVCfQ (ORCPT ); Wed, 21 Dec 2022 21:35:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234972AbiLVCfI (ORCPT ); Wed, 21 Dec 2022 21:35:08 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C655248D3 for ; Wed, 21 Dec 2022 18:35:07 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id b16-20020a17090a551000b00225aa26f1dbso295894pji.8 for ; Wed, 21 Dec 2022 18:35:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bxkmrVszmYtdX56U0+zIQlMbh2VgS2C48ld2l9CYwus=; b=Rn6JhTg0zhPlYluWmxYHpBlbM65jcOKT08CTpFfaF7H5DiKsfX4cghOlXKtbXYarOK Rly5XOhtvEYkFNI27R7kdmcDaft+DtQZHjIUVe60dOsihgLsa9smgxM+b5XDJjnQ5ViO Lkw5zbkLDE5ElgM22t1HWQhqSPjJc/A7roPsY0/dbXeUhXDdkCO+aerp6M5Gy8UvxMx9 hry1N1NLcMymvSxaWuZTNy40tIRuVsbU4zETnVIVyXentp2aZLJhf0IN9lIyvdeQEICD jjslMf9GOy2+ihLLAQ7wMggtMTn5Pk6LiRqyldHCiyElR3iBYcg59o/9Nzc9a7CBMBBj D2Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bxkmrVszmYtdX56U0+zIQlMbh2VgS2C48ld2l9CYwus=; b=fJ+RLpKnQhH9EE7aTbXHjgXVKc4nK17vyeZYHcd9OwvgpQyZ1b8tmmEB/1tkrLeYcy Aqf38Kb7q0mcH1fFoDq5Ow/TpF6Bes6bMvXVSuA5JbS6MdJHEI3nv3e9eIHsjSnMj1X1 JnUWMVaw3Xww/C8sJM0XtASDqdN0wUHhdm+WPzKINSu7GkCu7qPdk+vEGwZ2XPIPjepc pcoU5nB5ZlLuXD7PRZCseLb9tp85rQiJ3X8Fk6/aAGRXLaYBp76XzfJStu49p0aUEEvc cuCaTF/N/uDuVL/MUhZRB9liNBAJBWhuxxEzQ8fY2hn4OEjxBfCFkSpu+OM8jAm0sDwp Agug== X-Gm-Message-State: AFqh2krkyDLMaaT9YJXOnY/iHp0fx2scMa55xsZcTqdUbgbB0uf0YbmJ sV/DydOPVsdDsCmV2IItz0Q9B4pRNje0 X-Google-Smtp-Source: AMrXdXtuF9r3bYe3VPv1FhZLvcXC0CXs3tzUIuXM098fxmkDbP+sDF600+9NCrol1k+pvP3NgkN8AZPuhIVd X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a62:1d0c:0:b0:56b:bba4:650a with SMTP id d12-20020a621d0c000000b0056bbba4650amr294643pfd.4.1671676506822; Wed, 21 Dec 2022 18:35:06 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:52 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-5-vipinsh@google.com> Subject: [Patch v3 4/9] KVM: Add module param to make page tables NUMA aware From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a numa_aware_page_table module param to make page tables NUMA aware. Signed-off-by: Vipin Sharma --- include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index efd9b38ea9a2..d48064503b88 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1358,6 +1358,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible); void kvm_flush_remote_tlbs(struct kvm *kvm); +void *kvm_mmu_get_free_page(int nid, gfp_t gfp); + #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity, int min); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f2d762878b97..d96c8146e9ba 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -93,6 +93,13 @@ unsigned int halt_poll_ns_shrink; module_param(halt_poll_ns_shrink, uint, 0644); EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); +/* + * If possible, allocate page table's pages on the same node the underlying + * physical page is pointing to. + */ +static bool __read_mostly numa_aware_pagetable = true; +module_param_named(numa_aware_pagetable, numa_aware_pagetable, bool, 0644); + /* * Ordering of locks: * @@ -384,6 +391,21 @@ static void kvm_flush_shadow_all(struct kvm *kvm) kvm_arch_guest_memory_reclaimed(kvm); } +void *kvm_mmu_get_free_page(int nid, gfp_t gfp) +{ + #ifdef CONFIG_NUMA + struct page *spt_page; + + if (numa_aware_pagetable) { + spt_page = alloc_pages_node(nid, gfp, 0); + if (spt_page) + return page_address(spt_page); + } + #endif // CONFIG_NUMA + + return (void *)__get_free_page(gfp); +} + #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc, gfp_t gfp_flags) From patchwork Thu Dec 22 02:34:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBECFC4167B for ; Thu, 22 Dec 2022 02:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234991AbiLVCfT (ORCPT ); Wed, 21 Dec 2022 21:35:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234935AbiLVCfO (ORCPT ); Wed, 21 Dec 2022 21:35:14 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E446725E85 for ; Wed, 21 Dec 2022 18:35:08 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id u3-20020a056a00124300b0056d4ab0c7cbso302264pfi.7 for ; Wed, 21 Dec 2022 18:35:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Uy5Q2JyltsFU76SmhFR5SIPjgCJm3U7CWH+6Z0DXsQs=; b=Sxl6bgj9EJ9iuihYZp83JeZx6IvPl8UZ1m5+6Cm43HO2NjgEx1Ek/IP3Bjg0bBhZW0 8ZsoHARIkAabw+eZ9yX3vNLtLVrs4JXFtNBZ9ix0k6roaGbDS77OSl0l9+SaY1BxFrgj WyVFHhz44Nzpglk0v7PrLxxRSm/7PGwcY/LPhH5DfSLZIQd9fxWJKRGLBAUCpaB23ERE haqSqyKBtrgEucB247ShrNEzUOq8CKDN0XA8iJ7Koj/0KmQ9/BF5rYF7mYrXvQ555RLf Rq3YPDdsAMh/1f7gLPWqTz9zIe10Nns6MnBvB6QXtJMsV0+jRLRhbINgIKNU9UmSqp05 Sb4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Uy5Q2JyltsFU76SmhFR5SIPjgCJm3U7CWH+6Z0DXsQs=; b=Hm6tyD4gvglLabJhhR79nmD48fBiEYynwRUc7UZBGVyIR71WdDY9xH1NvmC7nIMAPk evo0QbUI3cl6E5X/CeE2yELCgoQU4GNmI0+P/NpuYRPwR3F28VOh1vzNyHbvpIThH0SH L4mv4SxpsjZVC6GG5JAGk5YChpMqYsNWWeCCmE/XKkhp4pmNXkx7Ppl7OI+L0ioOiPJe mnJC7uCVB8kmAeeiR6GyGwYzZJdynNIVoMtEaW/eh3ub4CZ8W4Eesvaoajmo06+VoFFR ntQ9ODmfXRP2c/7TzTJV7iXoaP6ApqHN6D8aoYFOk+h2W+HC9aLGsbMVsimlW0l0nqJ7 M9mQ== X-Gm-Message-State: AFqh2kp7LVLPZpLsyJNduNxgxQOm9o2hanphhjCGZCBOAVdY5VgID0js /lS3R3GepWpej4MsVx4aWzEqEaRw2u05 X-Google-Smtp-Source: AMrXdXv4l2iC4/6tj8LaLY8coRC0+QBvzj+rX/d00qDOyzqCHl1d07EeJQdXbmkvRo6dKHfncNl0vOPNS2s0 X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90a:8e81:b0:219:c5b3:c543 with SMTP id f1-20020a17090a8e8100b00219c5b3c543mr494590pjo.200.1671676508473; Wed, 21 Dec 2022 18:35:08 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:53 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-6-vipinsh@google.com> Subject: [Patch v3 5/9] KVM: x86/mmu: Allocate TDP page table's page on correct NUMA node on split From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When dirty log is enabled, huge pages are split. Page table's pages during the split are allocated based on the current thread NUMA node or mempolicy. This causes inefficient page table accesses if underlying page is on a different NUMA node Allocate page table's pages on the same NUMA node as the underlying huge page when dirty log is enabled and huge pages are split. The performance gain during the pre-copy phase of live migrations of a 416 vCPUs and 11 TiB memory VM on a 8 node host was seen in the range of 130% to 150%. Suggested-by: David Matlack Signed-off-by: Vipin Sharma --- arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++---- include/linux/kvm_host.h | 18 ++++++++++++++++++ 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4974fa96deff..376b8dceb3f9 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1403,7 +1403,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, return spte_set; } -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(int nid, gfp_t gfp) { struct kvm_mmu_page *sp; @@ -1413,7 +1413,8 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) if (!sp) return NULL; - sp->spt = (void *)__get_free_page(gfp); + sp->spt = kvm_mmu_get_free_page(nid, gfp); + if (!sp->spt) { kmem_cache_free(mmu_page_header_cache, sp); return NULL; @@ -1427,6 +1428,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, bool shared) { struct kvm_mmu_page *sp; + int nid; + + nid = kvm_pfn_to_page_table_nid(spte_to_pfn(iter->old_spte)); /* * Since we are allocating while under the MMU lock we have to be @@ -1437,7 +1441,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, * If this allocation fails we drop the lock and retry with reclaim * allowed. */ - sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT); + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_NOWAIT | __GFP_ACCOUNT); if (sp) return sp; @@ -1449,7 +1453,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, write_unlock(&kvm->mmu_lock); iter->yielded = true; - sp = __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT); + sp = __tdp_mmu_alloc_sp_for_split(nid, GFP_KERNEL_ACCOUNT); if (shared) read_lock(&kvm->mmu_lock); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d48064503b88..a262e15ebd19 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1583,6 +1583,24 @@ void kvm_arch_sync_events(struct kvm *kvm); int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn); + +/* + * Tells the appropriate NUMA node location of the page table's page based on + * pfn it will point to. + * + * Return the nid of the page if pfn is valid and backed by a refcounted page, + * otherwise, return the nearest memory node for the current CPU. + */ +static inline int kvm_pfn_to_page_table_nid(kvm_pfn_t pfn) +{ + struct page *page = kvm_pfn_to_refcounted_page(pfn); + + if (page) + return page_to_nid(page); + else + return numa_mem_id(); +} + bool kvm_is_zone_device_page(struct page *page); struct kvm_irq_ack_notifier { From patchwork Thu Dec 22 02:34:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26300C4332F for ; Thu, 22 Dec 2022 02:35:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235039AbiLVCfi (ORCPT ); Wed, 21 Dec 2022 21:35:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234988AbiLVCfQ (ORCPT ); Wed, 21 Dec 2022 21:35:16 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10C0C248C9 for ; Wed, 21 Dec 2022 18:35:10 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id a1-20020a056a001d0100b0057a6f74d7bcso311151pfx.1 for ; Wed, 21 Dec 2022 18:35:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=poiDBxCXZuaSeZicG2qgYYUccKGMhjWu3dn9FbyWbdI=; b=TH1iUYrFfRpDWtVxW3+pgFlRAZdRswEo031A+JoBSFw8+0LAdMFXq8dZa92oOrcqew zYK5KnaC50bemT2vcH3I6ljHX1zk29hcYKiDZBNLKVzfySyuRHuu18d89jK8PIfT1R5h LcbSU39GflE5evKZtr1uNxzcziFGWAUPdRR3Mx/UUTStoxEpMFjqdxAwTrADnkgxrxfR MoqyQVpxe61yOYHtHbVvl+2zwqIlTaQOZOu2yejduA1CKn6/fjkWxKF51oEKMyBQO64B G5UcwCIQeSaC9FL/TdKjgMmQqju1POsZRD1JXmJmsK1WsMnE1GTwbAqHf7k7VSLWZPyK ZOlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=poiDBxCXZuaSeZicG2qgYYUccKGMhjWu3dn9FbyWbdI=; b=62Po8o7KZ5sIUwxmwpaI00ccVCkzw+7IpBdSF8ZEO5LbBRY6uJ0pM0DrsVchgEcq0O Ek6bqOus6KOOukU1BAEImH33d/SIy+ySNjB2swqCPrEfAMoWAJuvQWmbLFiIJpF8tyFj 2JwuUh/zAZuJzubH7PI4fYlz7m7HAoN5CIFLWMoQs2o3NaYnMbEl5oEn1jwqtGJkdMI3 6vUS7+57RYCvltag3Slqf71VAz+LQViCKAz9rPXaMbM2Ah6ign/cDHikNuB9XJf6UzHG IYVDRxfD8PwjVOQ4+99qXHTJvo9+eCUT3k/JKqWerIVXOlshBtbXdC9FyxKWtjCyeWQT Nvqg== X-Gm-Message-State: AFqh2komgxw6BYnRbPQft2lTDJ5QJAcu/aFdcSwzzd8jRildU7P6pnNC imQuN7v0Azl2s8EMhRt8htPGHxH1BhfX X-Google-Smtp-Source: AMrXdXtcmKHWHo1AhNx9UQ0Z/sHm4RIiqsiHuv78kXo5tk+ZPsGbtCKvNa6VwqFR7MArdGz/SkeKXbSRg1zN X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90a:4c83:b0:219:ac7f:27d8 with SMTP id k3-20020a17090a4c8300b00219ac7f27d8mr366228pjh.192.1671676510249; Wed, 21 Dec 2022 18:35:10 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:54 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-7-vipinsh@google.com> Subject: [Patch v3 6/9] KVM: Provide NUMA node support to kvm_mmu_memory_cache{} From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add 'node' variable in kvm_mmu_memory_cache{} to denote which NUMA node this cache should allocate memory from. Default initialize to NUMA_NO_NODE in all architectures. Signed-off-by: Vipin Sharma --- arch/arm64/kvm/arm.c | 2 +- arch/arm64/kvm/mmu.c | 4 +++- arch/mips/kvm/mips.c | 2 ++ arch/riscv/kvm/mmu.c | 2 +- arch/riscv/kvm/vcpu.c | 2 +- arch/x86/kvm/mmu/mmu.c | 22 ++++++++++++---------- include/linux/kvm_host.h | 6 ++++++ include/linux/kvm_types.h | 2 ++ 8 files changed, 28 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 9c5573bc4614..52a41f4532e2 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -340,7 +340,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.target = -1; bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES); - vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_cache, NULL, NUMA_NO_NODE); /* * Default value for the FP state, will be overloaded at load diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 31d7fa4c7c14..bd07155e17fa 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -894,12 +894,14 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, { phys_addr_t addr; int ret = 0; - struct kvm_mmu_memory_cache cache = { .gfp_zero = __GFP_ZERO }; + struct kvm_mmu_memory_cache cache; struct kvm_pgtable *pgt = kvm->arch.mmu.pgt; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_R | (writable ? KVM_PGTABLE_PROT_W : 0); + INIT_KVM_MMU_MEMORY_CACHE(&cache, NULL, NUMA_NO_NODE); + if (is_protected_kvm_enabled()) return -EPERM; diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index a25e0b73ee70..b017c29a9340 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -304,6 +304,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) HRTIMER_MODE_REL); vcpu->arch.comparecount_timer.function = kvm_mips_comparecount_wakeup; + vcpu->arch.mmu_page_cache.node = NUMA_NO_NODE; + /* * Allocate space for host mode exception handlers that handle * guest mode exits diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 34b57e0be2ef..119de4520cc6 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -353,9 +353,9 @@ int kvm_riscv_gstage_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t addr, end; struct kvm_mmu_memory_cache pcache = { .gfp_custom = (in_atomic) ? GFP_ATOMIC | __GFP_ACCOUNT : 0, - .gfp_zero = __GFP_ZERO, }; + INIT_KVM_MMU_MEMORY_CACHE(&pcache, NULL, NUMA_NO_NODE); end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK; pfn = __phys_to_pfn(hpa); diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 7c08567097f0..189b14feb365 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -161,7 +161,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) /* Mark this VCPU never ran */ vcpu->arch.ran_atleast_once = false; - vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_cache, NULL, NUMA_NO_NODE); bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX); /* Setup ISA features available to VCPU */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6f6a10d7a871..23a3b82b2384 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5954,13 +5954,14 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) { int ret; - vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache; - vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_pte_list_desc_cache, + pte_list_desc_cache, NUMA_NO_NODE); - vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache; - vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_header_cache, + mmu_page_header_cache, NUMA_NO_NODE); - vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache, + NULL, NUMA_NO_NODE); spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock); vcpu->arch.mmu = &vcpu->arch.root_mmu; @@ -6124,14 +6125,15 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); - kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache; - kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_page_header_cache, + mmu_page_header_cache, NUMA_NO_NODE); - kvm->arch.split_shadow_page_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache, + NULL, NUMA_NO_NODE); spin_lock_init(&kvm->arch.split_shadow_page_cache_lock); - kvm->arch.split_desc_cache.kmem_cache = pte_list_desc_cache; - kvm->arch.split_desc_cache.gfp_zero = __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_desc_cache, + pte_list_desc_cache, NUMA_NO_NODE); return 0; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a262e15ebd19..719687a37ef7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2302,4 +2302,10 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +#define INIT_KVM_MMU_MEMORY_CACHE(_cache, _kmem_cache, _node) ({ \ + (_cache)->kmem_cache = _kmem_cache; \ + (_cache)->gfp_zero = __GFP_ZERO; \ + (_cache)->node = _node; \ +}) + #endif diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 76de36e56cdf..9c70ce95e51f 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -97,6 +97,8 @@ struct kvm_mmu_memory_cache { struct kmem_cache *kmem_cache; int capacity; void **objects; + /* Node on which memory should be allocated by default */ + int node; }; #endif From patchwork Thu Dec 22 02:34:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E621C4332F for ; Thu, 22 Dec 2022 02:35:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235050AbiLVCfn (ORCPT ); Wed, 21 Dec 2022 21:35:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235011AbiLVCfR (ORCPT ); Wed, 21 Dec 2022 21:35:17 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CFFC264BF for ; Wed, 21 Dec 2022 18:35:12 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id ot18-20020a17090b3b5200b00219c3543529so3118510pjb.1 for ; Wed, 21 Dec 2022 18:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mit5lexfNnxmUxBj8TQ21C+l/JJ6de0LF0cpUag1ehI=; b=EHRq6KnggJg0qHDofwfjR8ae32NXDI/HKi8Txp4oJH7WGgC54js8Lmg9jOpgQKL0rf z5JBSYGPSrYY/bjbOoczVB/e35UZHQcAExxP9b/UwvY8aeF7ea4XZKfRr0uNoyjCemAz jVdJjr7nGbgxKIB6oB8sOdVoj3l64EwTQ7tXZxVXxJhjQtsdcf9fjjFsaFhEV0lMOSog qLan7LTbdfWcqMilsqnd4KRXpnWI1fT1BqGyVFVLzG+lEAxZDh3DifZDDrCiKznX2L+W UUGlJ8LkknW4OnL+SHGFiZMuUeqW1LHnteUkayX070e5ORkSeVq505RNpaK5ocBgWUOy L9lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mit5lexfNnxmUxBj8TQ21C+l/JJ6de0LF0cpUag1ehI=; b=cIVD7UXXOFF5zAwQFszOt65SdNjAeJDi9kVl2qS4h84yx4zZsh1f5QFVrzxPWZzLuf 2drIDQqtDg0F+L3sRzqXv/ZNwSdCUdQY2ljhPrgsYkQI2+Le/KGfGrqu5RzzcTXMtjMo 8VrngZ4VXBk9LPor9+z8qNANWEzchHwSeFR/YGL2Bl0zj9YeWWhw0VnhXbNqR+zNBfFr sLuf8A/s4nR8OCXrebkpGkqrmmz1CPOC8dQ61w5dteb6mjAmstZZioEQQofXOvt3iLZ3 egKoDFVCGWsoMd/+U+WDVbXvPV0W2V7kfJ5iY5nogVKZ4s5SUL0nhHiA7eJiBGZf7H9m npiw== X-Gm-Message-State: AFqh2krqrV60goz10VIbu7XpL52zr5Xh0MnAXoca1Fyi+7rHNXhvXoEY 23WhgXAgtskGt+rDXyvIi3nUD4e+JYud X-Google-Smtp-Source: AMrXdXtxUxWd7v+xEZSntupDbitZG1s36UwcrA0HpW4+YL7A8mh7wqUIn/5H+j8OdYSHApBFGXK6k37/g7hV X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90a:bd12:b0:225:b164:8874 with SMTP id y18-20020a17090abd1200b00225b1648874mr90799pjr.87.1671676511714; Wed, 21 Dec 2022 18:35:11 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:55 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-8-vipinsh@google.com> Subject: [Patch v3 7/9] KVM: x86/mmu: Allocate page table's pages on NUMA node of the underlying pages From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Page table pages of a VM are currently allocated based on the current task's NUMA node or its mempolicy. This can cause suboptimal remote accesses by the vCPU if it is accessing physical pages local to its NUMA node but the page table pages mapping those physcal pages were created by some other vCPU which was on different NUMA node or had different policy. Allocate page table pages on the same NUMA node where underlying physical page exists. Page table at level 5, 4, and 3 might not end up on the same NUMA node as they can span multiple NUMA nodes. Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu/mmu.c | 63 ++++++++++++++++++++++----------- arch/x86/kvm/mmu/paging_tmpl.h | 4 +-- arch/x86/kvm/mmu/tdp_mmu.c | 11 +++--- virt/kvm/kvm_main.c | 2 +- 5 files changed, 53 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 293994fabae3..b1f319ad6f89 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -782,7 +782,7 @@ struct kvm_vcpu_arch { struct kvm_mmu *walk_mmu; struct kvm_mmu_memory_cache mmu_pte_list_desc_cache; - struct kvm_mmu_memory_cache mmu_shadow_page_cache; + struct kvm_mmu_memory_cache mmu_shadow_page_cache[MAX_NUMNODES]; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 23a3b82b2384..511c6ef265ee 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -677,24 +677,29 @@ static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache, static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) { - int r; + int r, nid; /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */ r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); if (r) return r; - r = mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - &vcpu->arch.mmu_shadow_page_cache_lock, - PT64_ROOT_MAX_LEVEL); - if (r) - return r; + + for_each_online_node(nid) { + r = mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache[nid], + &vcpu->arch.mmu_shadow_page_cache_lock, + PT64_ROOT_MAX_LEVEL); + if (r) + return r; + } + if (maybe_indirect) { r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadowed_info_cache, PT64_ROOT_MAX_LEVEL); if (r) return r; } + return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, PT64_ROOT_MAX_LEVEL); } @@ -715,9 +720,14 @@ static void mmu_free_sp_memory_cache(struct kvm_mmu_memory_cache *cache, static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) { + int nid; + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); - mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - &vcpu->arch.mmu_shadow_page_cache_lock); + + for_each_node(nid) + mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache[nid], + &vcpu->arch.mmu_shadow_page_cache_lock); + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } @@ -2256,11 +2266,12 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm, static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu, gfn_t gfn, - union kvm_mmu_page_role role) + union kvm_mmu_page_role role, + int nid) { struct shadow_page_caches caches = { .page_header_cache = &vcpu->arch.mmu_page_header_cache, - .shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache, + .shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache[nid], .shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache, .shadow_page_cache_lock = &vcpu->arch.mmu_shadow_page_cache_lock }; @@ -2316,15 +2327,19 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn, - bool direct, unsigned int access) + bool direct, unsigned int access, + kvm_pfn_t pfn) { union kvm_mmu_page_role role; + int nid; if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) return ERR_PTR(-EEXIST); role = kvm_mmu_child_role(sptep, direct, access); - return kvm_mmu_get_shadow_page(vcpu, gfn, role); + nid = kvm_pfn_to_page_table_nid(pfn); + + return kvm_mmu_get_shadow_page(vcpu, gfn, role, nid); } static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator, @@ -3208,7 +3223,8 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) if (it.level == fault->goal_level) break; - sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL); + sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, + ACC_ALL, fault->pfn); if (sp == ERR_PTR(-EEXIST)) continue; @@ -3636,7 +3652,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte); WARN_ON_ONCE(role.direct && role.has_4_byte_gpte); - sp = kvm_mmu_get_shadow_page(vcpu, gfn, role); + sp = kvm_mmu_get_shadow_page(vcpu, gfn, role, numa_mem_id()); ++sp->root_count; return __pa(sp->spt); @@ -5952,7 +5968,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) int kvm_mmu_create(struct kvm_vcpu *vcpu) { - int ret; + int ret, nid; INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_pte_list_desc_cache, pte_list_desc_cache, NUMA_NO_NODE); @@ -5960,8 +5976,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_header_cache, mmu_page_header_cache, NUMA_NO_NODE); - INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache, - NULL, NUMA_NO_NODE); + for_each_node(nid) + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache[nid], + NULL, nid); spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock); vcpu->arch.mmu = &vcpu->arch.root_mmu; @@ -6692,13 +6709,17 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) } static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache, + int cache_count, spinlock_t *cache_lock) { unsigned long freed = 0; + int nid; spin_lock(cache_lock); - if (cache->nobjs) - freed = kvm_mmu_empty_memory_cache(cache); + for (nid = 0; nid < cache_count; nid++) { + if (node_online(nid) && cache[nid].nobjs) + freed += kvm_mmu_empty_memory_cache(&cache[nid]); + } spin_unlock(cache_lock); return freed; } @@ -6721,13 +6742,15 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) list_move_tail(&kvm->vm_list, &vm_list); freed += mmu_shrink_cache(&kvm->arch.split_shadow_page_cache, + 1, &kvm->arch.split_shadow_page_cache_lock); if (freed >= sc->nr_to_scan) break; kvm_for_each_vcpu(i, vcpu, kvm) { - freed += mmu_shrink_cache(&vcpu->arch.mmu_shadow_page_cache, + freed += mmu_shrink_cache(vcpu->arch.mmu_shadow_page_cache, + MAX_NUMNODES, &vcpu->arch.mmu_shadow_page_cache_lock); } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index e5662dbd519c..1ceca62ec4cf 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -652,7 +652,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, table_gfn = gw->table_gfn[it.level - 2]; access = gw->pt_access[it.level - 2]; sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn, - false, access); + false, access, fault->pfn); if (sp != ERR_PTR(-EEXIST)) { /* @@ -708,7 +708,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, validate_direct_spte(vcpu, it.sptep, direct_access); sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, - true, direct_access); + true, direct_access, fault->pfn); if (sp == ERR_PTR(-EEXIST)) continue; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 376b8dceb3f9..b5abae2366dd 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -259,12 +259,12 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, kvm_mmu_page_as_id(_root) != _as_id) { \ } else -static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) +static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu, int nid) { struct kvm_mmu_page *sp; sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache); - sp->spt = kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache, + sp->spt = kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache[nid], &vcpu->arch.mmu_shadow_page_cache_lock); return sp; @@ -317,7 +317,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) goto out; } - root = tdp_mmu_alloc_sp(vcpu); + root = tdp_mmu_alloc_sp(vcpu, numa_mem_id()); tdp_mmu_init_sp(root, NULL, 0, role); refcount_set(&root->tdp_mmu_root_count, 1); @@ -1149,7 +1149,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) struct kvm *kvm = vcpu->kvm; struct tdp_iter iter; struct kvm_mmu_page *sp; - int ret = RET_PF_RETRY; + int ret = RET_PF_RETRY, nid; kvm_mmu_hugepage_adjust(vcpu, fault); @@ -1178,11 +1178,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) !is_large_pte(iter.old_spte)) continue; + nid = kvm_pfn_to_page_table_nid(fault->pfn); /* * The SPTE is either non-present or points to a huge page that * needs to be split. */ - sp = tdp_mmu_alloc_sp(vcpu); + sp = tdp_mmu_alloc_sp(vcpu, nid); tdp_mmu_init_child_sp(sp, &iter); sp->nx_huge_page_disallowed = fault->huge_page_disallowed; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d96c8146e9ba..4f3db7ffeba8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -415,7 +415,7 @@ static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc, if (mc->kmem_cache) return kmem_cache_alloc(mc->kmem_cache, gfp_flags); else - return (void *)__get_free_page(gfp_flags); + return kvm_mmu_get_free_page(mc->node, gfp_flags); } int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity, int min) From patchwork Thu Dec 22 02:34:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 238D9C4332F for ; Thu, 22 Dec 2022 02:36:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235094AbiLVCf5 (ORCPT ); Wed, 21 Dec 2022 21:35:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234949AbiLVCfh (ORCPT ); Wed, 21 Dec 2022 21:35:37 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A17E25EA5 for ; Wed, 21 Dec 2022 18:35:14 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id 191-20020a6214c8000000b00577ab8701b0so322830pfu.0 for ; Wed, 21 Dec 2022 18:35:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OqpNn9k+U+hG3YogdgDR/fOEr/aI6IdVDrzcvJRoBmQ=; b=LZCil3LEBuiqPglCaABHVnyyRktcG83zDWgYL8mZt4giWucmb8FzbuHkl/qRDPEfTL sho0PmZ83Kz1PVrv7LxQITDU73TzGBV57cqQUyLCGPrx9sfwBeRiioD2DlM4/tip3fqi o4IJFivw5AtKaKsdUfv+TfoKhPRPMShpd4EbkgwFft29q/MNo2D6I6u6PD7ToZ52tz4W VKhny6D60ScNULutdaRX1NMuignZ1Y+Q42SR/GGwjeDOYE8dXqUceH2wQdl+BQh9qtF5 92z74m16CNnw3BzXc5KaYuQ6cZkc4YU7xlvloYt7f+L5x8vmyokePQB0TBCZVG3vF0Yg INTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OqpNn9k+U+hG3YogdgDR/fOEr/aI6IdVDrzcvJRoBmQ=; b=2/j0vMn2n98y9Xjtb+09kp5Jq8emtj2JwrnR4hFbAXD+MdxuvZNihVm7l24lQ/Hw6F w8fasERwhsmz5oNaseUUmjz++EfIkkWWD2oR2v684Z74ZbawXfAeC1YBlkkGxDJ3zpl/ IPI6HDoyOJWHNJfjn9RKYMvPlcGzyWQ9hkiozuMRrmzqgHgMGJi77aQSKz7WYpSRdsKQ GSpD9kS13F0dfGKfH3VLGM7lwi6Ui3WaL2ZmZ+N+1uMyXjmv3ivRWDwaFBEdyai7Y4wk MyqA6KxNRlU295Yb5LSKGdUeRrAYJ1UselkcEP70AKkqnBi/0uqEM1eA+o2HUbzbRwSq v83Q== X-Gm-Message-State: AFqh2kqtP1Eg59gpCqxCkAOEfUaovXH7bGDWFkHk0nVCSx0S/EBMBDuh rQDpYcOc89GdR9g0pZnd3avH6y7MbxkZ X-Google-Smtp-Source: AMrXdXsdS1uAeaLst9k9vclVCCOLWIbxK9Lx9IWWeNQ8BLMOiOd/S8VYUSYeV2+muAopSD1+l7BaJNjs32ck X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:902:7d92:b0:18e:bd50:f19a with SMTP id a18-20020a1709027d9200b0018ebd50f19amr219700plm.81.1671676513459; Wed, 21 Dec 2022 18:35:13 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:56 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-9-vipinsh@google.com> Subject: [Patch v3 8/9] KVM: x86/mmu: Make split_shadow_page_cache NUMA aware From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Make split_shadow_page_cache NUMA aware and allocate page table's pages during the split based on the underlying physical page's NUMA node. Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu/mmu.c | 50 ++++++++++++++++++--------------- 2 files changed, 29 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b1f319ad6f89..7b3f36ae37a4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1410,7 +1410,7 @@ struct kvm_arch { * * Protected by kvm->slots_lock. */ - struct kvm_mmu_memory_cache split_shadow_page_cache; + struct kvm_mmu_memory_cache split_shadow_page_cache[MAX_NUMNODES]; struct kvm_mmu_memory_cache split_page_header_cache; /* diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 511c6ef265ee..7454bfc49a51 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6126,7 +6126,7 @@ static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, int kvm_mmu_init_vm(struct kvm *kvm) { struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker; - int r; + int r, nid; INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); @@ -6145,8 +6145,9 @@ int kvm_mmu_init_vm(struct kvm *kvm) INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_page_header_cache, mmu_page_header_cache, NUMA_NO_NODE); - INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache, - NULL, NUMA_NO_NODE); + for_each_node(nid) + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache[nid], + NULL, NUMA_NO_NODE); spin_lock_init(&kvm->arch.split_shadow_page_cache_lock); INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_desc_cache, @@ -6157,10 +6158,13 @@ int kvm_mmu_init_vm(struct kvm *kvm) static void mmu_free_vm_memory_caches(struct kvm *kvm) { + int nid; + kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache); kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache); - mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache, - &kvm->arch.split_shadow_page_cache_lock); + for_each_node(nid) + mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache[nid], + &kvm->arch.split_shadow_page_cache_lock); } void kvm_mmu_uninit_vm(struct kvm *kvm) @@ -6269,7 +6273,7 @@ static inline bool need_topup(struct kvm_mmu_memory_cache *cache, int min) return kvm_mmu_memory_cache_nr_free_objects(cache) < min; } -static bool need_topup_split_caches_or_resched(struct kvm *kvm) +static bool need_topup_split_caches_or_resched(struct kvm *kvm, int nid) { if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) return true; @@ -6281,10 +6285,10 @@ static bool need_topup_split_caches_or_resched(struct kvm *kvm) */ return need_topup(&kvm->arch.split_desc_cache, SPLIT_DESC_CACHE_MIN_NR_OBJECTS) || need_topup(&kvm->arch.split_page_header_cache, 1) || - need_topup(&kvm->arch.split_shadow_page_cache, 1); + need_topup(&kvm->arch.split_shadow_page_cache[nid], 1); } -static int topup_split_caches(struct kvm *kvm) +static int topup_split_caches(struct kvm *kvm, int nid) { /* * Allocating rmap list entries when splitting huge pages for nested @@ -6314,18 +6318,21 @@ static int topup_split_caches(struct kvm *kvm) if (r) return r; - return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache, + return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache[nid], &kvm->arch.split_shadow_page_cache_lock, 1); } -static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *huge_sptep) +static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, + u64 *huge_sptep, + u64 huge_spte) { struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep); struct shadow_page_caches caches = {}; union kvm_mmu_page_role role; unsigned int access; gfn_t gfn; + int nid; gfn = kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep)); access = kvm_mmu_page_get_access(huge_sp, spte_index(huge_sptep)); @@ -6338,9 +6345,11 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *hu */ role = kvm_mmu_child_role(huge_sptep, /*direct=*/true, access); + nid = kvm_pfn_to_page_table_nid(spte_to_pfn(huge_spte)); + /* Direct SPs do not require a shadowed_info_cache. */ caches.page_header_cache = &kvm->arch.split_page_header_cache; - caches.shadow_page_cache = &kvm->arch.split_shadow_page_cache; + caches.shadow_page_cache = &kvm->arch.split_shadow_page_cache[nid]; caches.shadow_page_cache_lock = &kvm->arch.split_shadow_page_cache_lock; /* Safe to pass NULL for vCPU since requesting a direct SP. */ @@ -6360,7 +6369,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm, gfn_t gfn; int index; - sp = shadow_mmu_get_sp_for_split(kvm, huge_sptep); + sp = shadow_mmu_get_sp_for_split(kvm, huge_sptep, huge_spte); for (index = 0; index < SPTE_ENT_PER_PAGE; index++) { sptep = &sp->spt[index]; @@ -6398,7 +6407,7 @@ static int shadow_mmu_try_split_huge_page(struct kvm *kvm, u64 *huge_sptep) { struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep); - int level, r = 0; + int level, r = 0, nid; gfn_t gfn; u64 spte; @@ -6406,13 +6415,14 @@ static int shadow_mmu_try_split_huge_page(struct kvm *kvm, gfn = kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep)); level = huge_sp->role.level; spte = *huge_sptep; + nid = kvm_pfn_to_page_table_nid(spte_to_pfn(spte)); if (kvm_mmu_available_pages(kvm) <= KVM_MIN_FREE_MMU_PAGES) { r = -ENOSPC; goto out; } - if (need_topup_split_caches_or_resched(kvm)) { + if (need_topup_split_caches_or_resched(kvm, nid)) { write_unlock(&kvm->mmu_lock); cond_resched(); /* @@ -6420,7 +6430,7 @@ static int shadow_mmu_try_split_huge_page(struct kvm *kvm, * rmap iterator should be restarted because the MMU lock was * dropped. */ - r = topup_split_caches(kvm) ?: -EAGAIN; + r = topup_split_caches(kvm, nid) ?: -EAGAIN; write_lock(&kvm->mmu_lock); goto out; } @@ -6709,17 +6719,15 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) } static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache, - int cache_count, spinlock_t *cache_lock) { unsigned long freed = 0; int nid; spin_lock(cache_lock); - for (nid = 0; nid < cache_count; nid++) { - if (node_online(nid) && cache[nid].nobjs) + for_each_online_node(nid) + if (cache[nid].nobjs) freed += kvm_mmu_empty_memory_cache(&cache[nid]); - } spin_unlock(cache_lock); return freed; } @@ -6741,8 +6749,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) first_kvm = kvm; list_move_tail(&kvm->vm_list, &vm_list); - freed += mmu_shrink_cache(&kvm->arch.split_shadow_page_cache, - 1, + freed += mmu_shrink_cache(kvm->arch.split_shadow_page_cache, &kvm->arch.split_shadow_page_cache_lock); if (freed >= sc->nr_to_scan) @@ -6750,7 +6757,6 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) kvm_for_each_vcpu(i, vcpu, kvm) { freed += mmu_shrink_cache(vcpu->arch.mmu_shadow_page_cache, - MAX_NUMNODES, &vcpu->arch.mmu_shadow_page_cache_lock); } From patchwork Thu Dec 22 02:34:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin Sharma X-Patchwork-Id: 13079343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36159C4167B for ; Thu, 22 Dec 2022 02:36:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235017AbiLVCgC (ORCPT ); Wed, 21 Dec 2022 21:36:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235045AbiLVCfj (ORCPT ); Wed, 21 Dec 2022 21:35:39 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58AC726AC3 for ; Wed, 21 Dec 2022 18:35:16 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id h6-20020a17090aa88600b00223fccff2efso2287575pjq.6 for ; Wed, 21 Dec 2022 18:35:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b9yKHiaQzq9mZ3IJWGzrPWbD3+83EkY9xWMVBG05dNQ=; b=XlyER0aXwnjO8q6dTDihNfSIfjWn8dSVJ8IiBcSHE7yuANREvStJV3g9XDl8A/5Vwr OaNHoUYM9Z3wM78fZ/XqZqc2ZUebk2Up0L9xXgXL4U3gFGGz6w6NBfJtgfGPwYUC+B8s lG8Y2mimnBPeTLnpwg2Gmem25P6HBJfKl/2T+uaujz8TJVx2GozbpVFZB4OD+HpbyDdX FHdfVOyX6+BPp8Ht/TdNLUscgGNFc8b5S04IVXGZFWLXt3JasjvhdrpNTDIHgFTTxdhl C2eu1PX9i9Z3SW518upry6mS+ShsSl8pxZNEoSPq4X6DQp773Xn1V/u5SNKVXrbP8X0c vtUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b9yKHiaQzq9mZ3IJWGzrPWbD3+83EkY9xWMVBG05dNQ=; b=AEKmCBs7lgVoRLjj9MWbw7GwQYSJMRbxS0xh7Fl9xOxf822lAump3QQuwxhtl0s7zL ArjhziD6Et2udJNfO5TkIC0ljIN/2ZMAML5UBqk0OjltRandBPRLSozCnRdcoDrEQmiG 16rfWVdf3g6S7Tz/03d/o7kbNQelGU1hEyYp7k32+exBcLHMmfLAWRkZHEArslcHydze uIUtQOjV3TFN8BbLlTfVqgcSX6GoN7u3hhxdXPuZsvu0YiSiSjHNyQHhPz7FwslWYFMv f7rhGWRAqsDxubRlDv6FRqMmeyyVkTxw2AF1DHfXMi7CQv0Y0RsdiVcv3jJSY09jAfWa 7I5g== X-Gm-Message-State: AFqh2kqJ2iHCmJpxIqyInq9airu8FSs0YY2XSc9I1hoEMXCSUQu2xwyF 3u0oPcl6xIl8Fy9wJ9bB1OKM0noqM2lK X-Google-Smtp-Source: AMrXdXuzNj9EuXh7z055yKJ7hUYnL0IVln/7MJaAb+bM/jJJ7BibhMJSEr7enuNFlr9FF9cWepYFeciD79VX X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90b:701:b0:219:1d0a:34a6 with SMTP id s1-20020a17090b070100b002191d0a34a6mr128835pjz.1.1671676515179; Wed, 21 Dec 2022 18:35:15 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:57 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-10-vipinsh@google.com> Subject: [Patch v3 9/9] KVM: x86/mmu: Reduce default cache size in KVM from 40 to PT64_ROOT_MAX_LEVEL From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE is set to 40 without any specific reason. Reduce default size to PT64_ROOT_MAX_LEVEL, which is currently 5. Change mmu_pte_list_desc_cache size to what is needed as it is more than 5 but way less than 40. Tested by running dirty_log_perf_test on both tdp and shadow MMU with 48 vcpu and 2GB/vcpu size on a 2 NUMA node machine. No impact on performance noticed. Ran perf on dirty_log_perf_test and found kvm_mmu_get_free_page() calls reduced by ~3300 which is near to 48 (vcpus) * 2 (nodes) * 35 (cache size). Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_types.h | 2 +- arch/x86/kvm/mmu/mmu.c | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_types.h index 08f1b57d3b62..752dab218a62 100644 --- a/arch/x86/include/asm/kvm_types.h +++ b/arch/x86/include/asm/kvm_types.h @@ -2,6 +2,6 @@ #ifndef _ASM_X86_KVM_TYPES_H #define _ASM_X86_KVM_TYPES_H -#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40 +#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE PT64_ROOT_MAX_LEVEL #endif /* _ASM_X86_KVM_TYPES_H */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7454bfc49a51..f89d933ff380 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -677,11 +677,12 @@ static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache, static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) { - int r, nid; + int r, nid, desc_capacity; /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */ - r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, - 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); + desc_capacity = 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM; + r = __kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, + desc_capacity, desc_capacity); if (r) return r;