From patchwork Fri Sep 25 21:22:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB3F16CA for ; Fri, 25 Sep 2020 21:23:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AD0E621D7F for ; Fri, 25 Sep 2020 21:23:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ba8vZCGd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728171AbgIYVXM (ORCPT ); Fri, 25 Sep 2020 17:23:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726687AbgIYVXK (ORCPT ); Fri, 25 Sep 2020 17:23:10 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8EF34C0613CE for ; Fri, 25 Sep 2020 14:23:10 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id s4so3254407pgk.17 for ; Fri, 25 Sep 2020 14:23:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=Ua7W65/d+eys6TkCdCKyUD20UsOP9zKnnrEciCir1oA=; b=Ba8vZCGd6BVlQ3ApTWbLnydXdozDtsK//xKSkkcRq0/GVpiouAAKjNjX7g4wF3RHFh cm5pVnHUV72MHz50fFJTsAvMnamSUg8AVwiSAeq5C9/d8lvMDSrVhmJ7zjsVAPZhwAXx ok7JapD/Nia/Sd55oRtc2pYiYuZxpbjV6n029qzPHEdB+TcXZ5lhD5rhMQCoVfJ9eDFM vyk3DEF8RNUUn4UAgLXIiI54oE+o55wAly4tB+unfNR85PQwZO7r18osoAV7JRb7iY/7 5HoFJ0A6IIgZGQWwvH97D+IQ9OTkR9YgWOU1suPLe6xQypWfPomWfLrz+3Um8Ful6DUT B8MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Ua7W65/d+eys6TkCdCKyUD20UsOP9zKnnrEciCir1oA=; b=F7MuyPkKzQbiKweOlAZWD5aK8odXSj0h9lLuNEiuFeHqNBCOWo/BTAWkt2i+LmYdKT 6ZIt/qeBWKEggHAqHadBqSk8kkt+05Rf76yrD87OELgs+OyaaJyP7v0X1HAlFpkrhsBJ xtB5s1FS57HssJpJFktVI2YjCTzVcsag/rGx6gl/0TdF8LEAAW9tzzJmjER5ocCclk45 wYv2H76ryhOWEdizXwF7h9WHwYEdPSWxfz4L2qQvaA9EWyqXqZj+lxL98Yo5HBoOR8Xt cw2quOsmGWcL8JhMMQ0Bo50yx1fyGwgxCBohDlLVAUMZ82ZAHN4mnBnoFgMppy1xsiwk AXCA== X-Gm-Message-State: AOAM5338+rDQllQiiaHg1FI+D7HnvhPa5NqvGVgH5R+2Z4Z2co6wHHI0 mUGxBfU4CJb9XCh6qMD86+Z3p2tAapD1 X-Google-Smtp-Source: ABdhPJwH0e7NUIaLBoXcIY3Ow2PsHnPi1TLNqHlvGwApDUSsvb4M85/q+KEi4+WBsGf9fCmKb47sSKwNYysg Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:90a:f298:: with SMTP id fs24mr475139pjb.4.1601068990014; Fri, 25 Sep 2020 14:23:10 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:41 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-2-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 01/22] kvm: mmu: Separate making SPTEs from set_spte From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Separate the functions for generating leaf page table entries from the function that inserts them into the paging structure. This refactoring will facilitate changes to the MMU sychronization model to use atomic compare / exchanges (which are not guaranteed to succeed) instead of a monolithic MMU lock. No functional change expected. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This commit introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon Reviewed-by: Peter Shier --- arch/x86/kvm/mmu/mmu.c | 52 +++++++++++++++++++++++++++--------------- 1 file changed, 34 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 71aa3da2a0b7b..81240b558d67f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2971,20 +2971,14 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) #define SET_SPTE_WRITE_PROTECTED_PT BIT(0) #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1) -static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, - unsigned int pte_access, int level, - gfn_t gfn, kvm_pfn_t pfn, bool speculative, - bool can_unsync, bool host_writable) +static u64 make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, + gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative, + bool can_unsync, bool host_writable, bool ad_disabled, + int *ret) { u64 spte = 0; - int ret = 0; - struct kvm_mmu_page *sp; - - if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access)) - return 0; - sp = sptep_to_sp(sptep); - if (sp_ad_disabled(sp)) + if (ad_disabled) spte |= SPTE_AD_DISABLED_MASK; else if (kvm_vcpu_ad_need_write_protect(vcpu)) spte |= SPTE_AD_WRPROT_ONLY_MASK; @@ -3037,27 +3031,49 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, * is responsibility of mmu_get_page / kvm_sync_page. * Same reasoning can be applied to dirty page accounting. */ - if (!can_unsync && is_writable_pte(*sptep)) - goto set_pte; + if (!can_unsync && is_writable_pte(old_spte)) + return spte; if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { pgprintk("%s: found shadow page for %llx, marking ro\n", __func__, gfn); - ret |= SET_SPTE_WRITE_PROTECTED_PT; + *ret |= SET_SPTE_WRITE_PROTECTED_PT; pte_access &= ~ACC_WRITE_MASK; spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); } } - if (pte_access & ACC_WRITE_MASK) { - kvm_vcpu_mark_page_dirty(vcpu, gfn); + if (pte_access & ACC_WRITE_MASK) spte |= spte_shadow_dirty_mask(spte); - } if (speculative) spte = mark_spte_for_access_track(spte); -set_pte: + return spte; +} + +static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, + unsigned int pte_access, int level, + gfn_t gfn, kvm_pfn_t pfn, bool speculative, + bool can_unsync, bool host_writable) +{ + u64 spte = 0; + struct kvm_mmu_page *sp; + int ret = 0; + + if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access)) + return 0; + + sp = sptep_to_sp(sptep); + + spte = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative, + can_unsync, host_writable, sp_ad_disabled(sp), &ret); + if (!spte) + return 0; + + if (spte & PT_WRITABLE_MASK) + kvm_vcpu_mark_page_dirty(vcpu, gfn); + if (mmu_spte_update(sptep, spte)) ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; return ret; From patchwork Fri Sep 25 21:22:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E1EA6CA for ; Fri, 25 Sep 2020 21:23:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1DF1821D7F for ; Fri, 25 Sep 2020 21:23:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pYiyHMEJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728472AbgIYVXP (ORCPT ); Fri, 25 Sep 2020 17:23:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728336AbgIYVXN (ORCPT ); Fri, 25 Sep 2020 17:23:13 -0400 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF00EC0613D5 for ; Fri, 25 Sep 2020 14:23:13 -0700 (PDT) Received: by mail-qv1-xf49.google.com with SMTP id w8so2615829qvt.18 for ; Fri, 25 Sep 2020 14:23:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=ghcbda5Gcwx3UnIAaqAaGm8LTVhquolw1KyMWFRxb/k=; b=pYiyHMEJi022dQb6mD1VjA3IbQ9/CGZ5EktjBaX+sqJovC+hXW50n6az8iXbQ4xGWp AOznuY9+Dz/NVxY3mPOsTHCgxELYa2uigXZXZRASH0u979wdpWzdX9gvPYepZFsQvfu7 lT52i1fzeS7O/bfrwEhr9XamVg4SrVJxK9M+L4chLwtqUBprSUnSZDdf3f1UFsiMiteH G1WsuQr3ijF7EV3dbu0Mbr4RRpWIJBJFn9FMOJtceZlzfX+PicjWDruT4btApxCY+dNM 4iTyN1eCEPvC1PTbLRyOBr9M5KaSO24T1ZMjyD6Rq9AkRk9irIumW3RLskFTWrqtZl1s 0rsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ghcbda5Gcwx3UnIAaqAaGm8LTVhquolw1KyMWFRxb/k=; b=VYHuvDIdLk9E5GdvQzFb3GmTi40/A6y1vahGRKFAJgMNf+LtfGCAf7+R7BEg5sk3Qy BJeCByrFvyetdCg0lR3SKb521zoXWRT4P6niio5YzlWlMpDmPPoBH7Y+SMDZsjLNlfE7 s3KIYK+XhyWGFnJxca269X7eEwB5lT6IX/EAUYWpiCH8FQreJWy7WqSsKPxK6h9nPOcc m/9srPChdPN06zce4qOUGoRtU8I4SUoy1tX8gyLkDsG67xaqN2hCjUzqGKIiqlo3XVmP FMiO+KRAd9P5psqV9ln4wJXRm3lFrImwHzU/ySUwr+yuGzsU8wGT2unb1YO+roPwe20n bb+w== X-Gm-Message-State: AOAM5333pXEbzZsZzGVH6CKrZEyz/9Ihu6TFeLG2py8fzhEhIAImM2kR 0f6iNHXL+sTycp6eW/KbDx4HOVV3L7ir X-Google-Smtp-Source: ABdhPJyJCL3aA3scPQXV3XIYl1a3sjvFmRNoITGvHEFC33SHo2AWp3SAL5U/D220afkAixxVLjAOgCBV87IV Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:ad4:4d87:: with SMTP id cv7mr642891qvb.49.1601068991820; Fri, 25 Sep 2020 14:23:11 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:42 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-3-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 02/22] kvm: mmu: Introduce tdp_iter From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The TDP iterator implements a pre-order traversal of a TDP paging structure. This iterator will be used in future patches to create an efficient implementation of the KVM MMU for the TDP case. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/Makefile | 3 +- arch/x86/kvm/mmu/mmu.c | 19 +--- arch/x86/kvm/mmu/mmu_internal.h | 15 +++ arch/x86/kvm/mmu/tdp_iter.c | 163 ++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_iter.h | 53 +++++++++++ 5 files changed, 237 insertions(+), 16 deletions(-) create mode 100644 arch/x86/kvm/mmu/tdp_iter.c create mode 100644 arch/x86/kvm/mmu/tdp_iter.h diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 4a3081e9f4b5d..cf6a9947955f7 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -15,7 +15,8 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ - hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o + hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \ + mmu/tdp_iter.o kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 81240b558d67f..b48b00c8cde65 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -134,15 +134,6 @@ module_param(dbg, bool, 0644); #define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52) #define SPTE_MMIO_MASK (3ULL << 52) -#define PT64_LEVEL_BITS 9 - -#define PT64_LEVEL_SHIFT(level) \ - (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS) - -#define PT64_INDEX(address, level)\ - (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) - - #define PT32_LEVEL_BITS 10 #define PT32_LEVEL_SHIFT(level) \ @@ -192,8 +183,6 @@ module_param(dbg, bool, 0644); #define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) #define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1)) -#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) - /* make pte_list_desc fit well in cache line */ #define PTE_LIST_EXT 3 @@ -346,7 +335,7 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask) } EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); -static bool is_mmio_spte(u64 spte) +bool is_mmio_spte(u64 spte) { return (spte & SPTE_SPECIAL_MASK) == SPTE_MMIO_MASK; } @@ -623,7 +612,7 @@ static int is_nx(struct kvm_vcpu *vcpu) return vcpu->arch.efer & EFER_NX; } -static int is_shadow_present_pte(u64 pte) +int is_shadow_present_pte(u64 pte) { return (pte != 0) && !is_mmio_spte(pte); } @@ -633,7 +622,7 @@ static int is_large_pte(u64 pte) return pte & PT_PAGE_SIZE_MASK; } -static int is_last_spte(u64 pte, int level) +int is_last_spte(u64 pte, int level) { if (level == PG_LEVEL_4K) return 1; @@ -647,7 +636,7 @@ static bool is_executable_pte(u64 spte) return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask; } -static kvm_pfn_t spte_to_pfn(u64 pte) +kvm_pfn_t spte_to_pfn(u64 pte) { return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 3acf3b8eb469d..65bb110847858 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -60,4 +60,19 @@ void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, struct kvm_memory_slot *slot, u64 gfn); +#define PT64_LEVEL_BITS 9 + +#define PT64_LEVEL_SHIFT(level) \ + (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS) + +#define PT64_INDEX(address, level)\ + (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) +#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) + +/* Functions for interpreting SPTEs */ +kvm_pfn_t spte_to_pfn(u64 pte); +bool is_mmio_spte(u64 spte); +int is_shadow_present_pte(u64 pte); +int is_last_spte(u64 pte, int level); + #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c new file mode 100644 index 0000000000000..ee90d62d2a9b1 --- /dev/null +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -0,0 +1,163 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include "mmu_internal.h" +#include "tdp_iter.h" + +/* + * Recalculates the pointer to the SPTE for the current GFN and level and + * reread the SPTE. + */ +static void tdp_iter_refresh_sptep(struct tdp_iter *iter) +{ + iter->sptep = iter->pt_path[iter->level - 1] + + SHADOW_PT_INDEX(iter->gfn << PAGE_SHIFT, iter->level); + iter->old_spte = READ_ONCE(*iter->sptep); +} + +/* + * Sets a TDP iterator to walk a pre-order traversal of the paging structure + * rooted at root_pt, starting with the walk to translate goal_gfn. + */ +void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, + gfn_t goal_gfn) +{ + WARN_ON(root_level < 1); + WARN_ON(root_level > PT64_ROOT_MAX_LEVEL); + + iter->goal_gfn = goal_gfn; + iter->root_level = root_level; + iter->level = root_level; + iter->pt_path[iter->level - 1] = root_pt; + + iter->gfn = iter->goal_gfn - + (iter->goal_gfn % KVM_PAGES_PER_HPAGE(iter->level)); + tdp_iter_refresh_sptep(iter); + + iter->valid = true; +} + +/* + * Given an SPTE and its level, returns a pointer containing the host virtual + * address of the child page table referenced by the SPTE. Returns null if + * there is no such entry. + */ +u64 *spte_to_child_pt(u64 spte, int level) +{ + u64 *pt; + /* There's no child entry if this entry isn't present */ + if (!is_shadow_present_pte(spte)) + return NULL; + + /* There is no child page table if this is a leaf entry. */ + if (is_last_spte(spte, level)) + return NULL; + + pt = (u64 *)__va(spte_to_pfn(spte) << PAGE_SHIFT); + return pt; +} + +/* + * Steps down one level in the paging structure towards the goal GFN. Returns + * true if the iterator was able to step down a level, false otherwise. + */ +static bool try_step_down(struct tdp_iter *iter) +{ + u64 *child_pt; + + if (iter->level == PG_LEVEL_4K) + return false; + + /* + * Reread the SPTE before stepping down to avoid traversing into page + * tables that are no longer linked from this entry. + */ + iter->old_spte = READ_ONCE(*iter->sptep); + + child_pt = spte_to_child_pt(iter->old_spte, iter->level); + if (!child_pt) + return false; + + iter->level--; + iter->pt_path[iter->level - 1] = child_pt; + iter->gfn = iter->goal_gfn - + (iter->goal_gfn % KVM_PAGES_PER_HPAGE(iter->level)); + tdp_iter_refresh_sptep(iter); + + return true; +} + +/* + * Steps to the next entry in the current page table, at the current page table + * level. The next entry could point to a page backing guest memory or another + * page table, or it could be non-present. Returns true if the iterator was + * able to step to the next entry in the page table, false if the iterator was + * already at the end of the current page table. + */ +static bool try_step_side(struct tdp_iter *iter) +{ + /* + * Check if the iterator is already at the end of the current page + * table. + */ + if (!((iter->gfn + KVM_PAGES_PER_HPAGE(iter->level)) % + KVM_PAGES_PER_HPAGE(iter->level + 1))) + return false; + + iter->gfn += KVM_PAGES_PER_HPAGE(iter->level); + iter->goal_gfn = iter->gfn; + iter->sptep++; + iter->old_spte = READ_ONCE(*iter->sptep); + + return true; +} + +/* + * Tries to traverse back up a level in the paging structure so that the walk + * can continue from the next entry in the parent page table. Returns true on a + * successful step up, false if already in the root page. + */ +static bool try_step_up(struct tdp_iter *iter) +{ + if (iter->level == iter->root_level) + return false; + + iter->level++; + iter->gfn = iter->gfn - (iter->gfn % KVM_PAGES_PER_HPAGE(iter->level)); + tdp_iter_refresh_sptep(iter); + + return true; +} + +/* + * Step to the next SPTE in a pre-order traversal of the paging structure. + * To get to the next SPTE, the iterator either steps down towards the goal + * GFN, if at a present, non-last-level SPTE, or over to a SPTE mapping a + * highter GFN. + * + * The basic algorithm is as follows: + * 1. If the current SPTE is a non-last-level SPTE, step down into the page + * table it points to. + * 2. If the iterator cannot step down, it will try to step to the next SPTE + * in the current page of the paging structure. + * 3. If the iterator cannot step to the next entry in the current page, it will + * try to step up to the parent paging structure page. In this case, that + * SPTE will have already been visited, and so the iterator must also step + * to the side again. + */ +void tdp_iter_next(struct tdp_iter *iter) +{ + bool done; + + done = try_step_down(iter); + if (done) + return; + + done = try_step_side(iter); + while (!done) { + if (!try_step_up(iter)) { + iter->valid = false; + break; + } + done = try_step_side(iter); + } +} diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h new file mode 100644 index 0000000000000..b102109778eac --- /dev/null +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __KVM_X86_MMU_TDP_ITER_H +#define __KVM_X86_MMU_TDP_ITER_H + +#include + +#include "mmu.h" + +/* + * A TDP iterator performs a pre-order walk over a TDP paging structure. + */ +struct tdp_iter { + /* + * The iterator will traverse the paging structure towards the mapping + * for this GFN. + */ + gfn_t goal_gfn; + /* Pointers to the page tables traversed to reach the current SPTE */ + u64 *pt_path[PT64_ROOT_MAX_LEVEL]; + /* A pointer to the current SPTE */ + u64 *sptep; + /* The lowest GFN mapped by the current SPTE */ + gfn_t gfn; + /* The level of the root page given to the iterator */ + int root_level; + /* The iterator's current level within the paging structure */ + int level; + /* A snapshot of the value at sptep */ + u64 old_spte; + /* + * Whether the iterator has a valid state. This will be false if the + * iterator walks off the end of the paging structure. + */ + bool valid; +}; + +/* + * Iterates over every SPTE mapping the GFN range [start, end) in a + * preorder traversal. + */ +#define for_each_tdp_pte(iter, root, root_level, start, end) \ + for (tdp_iter_start(&iter, root, root_level, start); \ + iter.valid && iter.gfn < end; \ + tdp_iter_next(&iter)) + +u64 *spte_to_child_pt(u64 pte, int level); + +void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, + gfn_t goal_gfn); +void tdp_iter_next(struct tdp_iter *iter); + +#endif /* __KVM_X86_MMU_TDP_ITER_H */ From patchwork Fri Sep 25 21:22:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800825 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 96E81112E for ; Fri, 25 Sep 2020 21:25:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 71DA721D7F for ; Fri, 25 Sep 2020 21:25:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DLMitrGx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728710AbgIYVXP (ORCPT ); Fri, 25 Sep 2020 17:23:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728410AbgIYVXO (ORCPT ); Fri, 25 Sep 2020 17:23:14 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2759EC0613D6 for ; Fri, 25 Sep 2020 14:23:14 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id z22so246240pjr.8 for ; Fri, 25 Sep 2020 14:23:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=+zqEz38TI9y5rZ41moN9G5sylTl3ehc4d8gzYBHtRvo=; b=DLMitrGxd+/kIktjqHT6zyLWslmhc5uQjdGhdRxHf/tlY5qw7UDWWpARaY13ISatxX ViNAD/vejX2tUv0CRJJKP2S2RIZP0HzYs3ApWXuZP+Uqwz/AvS0X13FW474910FtVCID autpxYJJVw2bw46JbmKImb+QtovTu8w8mSsvVPt48ivG/IGjQ3I5pJ+tOYmT1+9k+i7+ rZn+9WEb08VtBxQj38MXSEShsAOOSUBRsuiZWtF25p7fRnB2Ozyb0mWXlBBzS6dhpiZZ CbVtbMD3uX23c4fVyh2n/seMHk7siBRW0SkOnOTIzykdFqIt+gJuiGcn9hlaq4j7P6i4 6IOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=+zqEz38TI9y5rZ41moN9G5sylTl3ehc4d8gzYBHtRvo=; b=j4AJTZvwhJiR18eXA3moAM5s614Ijfu0JU/K9zE2duz0cWj3ySEvydG+plnou0UEqI rPQxaqwIrRVUu+PYYFoPRvuvNXeD4kzgJ5ei8b8CDy+IFe1OGURGGU7fF8JRhCfWRRQm f3TlphDkVXEzEU+bH/m9dl31A1farzo/C3KKajfBKxirrj7SOilBUj3bc4zqlgrbiwlq rMOWYcMGbHAOsinhhgVmnkaZ8rlYo3hqpzyJxqjv3LL7stO7s+QO6/Xn/DPeiZ2jzryw PVB5jh5S5I8ofmFHqMkjwbYE1TngW5bJ8lZYGbx5e02u+S0Xr0BcZdunxP7sd8mulKzz 4rPg== X-Gm-Message-State: AOAM531JuHat1GSVha6P8eTlh1E+yGd1DBFWavvqk/rlAJ8PfLcjGIpv sphuy3a72iS90o2b8zrtiwGWBcLFV0mi X-Google-Smtp-Source: ABdhPJyKVGVL2iwOoKcXhl4kcjsxdMol/lDZa8ZsBEKztYFqo/6T8Do6hBc0w++ZkfYVdoknYyOJ7qQcdpeB Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:902:9697:b029:d1:e598:4001 with SMTP id n23-20020a1709029697b02900d1e5984001mr1269468plp.59.1601068993541; Fri, 25 Sep 2020 14:23:13 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:43 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-4-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 03/22] kvm: mmu: Init / Uninit the TDP MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The TDP MMU offers an alternative mode of operation to the x86 shadow paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU will require new fields that need to be initialized and torn down. Add hooks into the existing KVM MMU initialization process to do that initialization / cleanup. Currently the initialization and cleanup fucntions do not do very much, however more operations will be added in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/include/asm/kvm_host.h | 9 +++++++++ arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/mmu/mmu.c | 5 +++++ arch/x86/kvm/mmu/tdp_mmu.c | 34 +++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 10 ++++++++++ 5 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kvm/mmu/tdp_mmu.c create mode 100644 arch/x86/kvm/mmu/tdp_mmu.h diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5303dbc5c9bce..35107819f48ae 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -963,6 +963,15 @@ struct kvm_arch { struct kvm_pmu_event_filter *pmu_event_filter; struct task_struct *nx_lpage_recovery_thread; + + /* + * Whether the TDP MMU is enabled for this VM. This contains a + * snapshot of the TDP MMU module parameter from when the VM was + * created and remains unchanged for the life of the VM. If this is + * true, TDP MMU handler functions will run for various MMU + * operations. + */ + bool tdp_mmu_enabled; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index cf6a9947955f7..e5b33938f86ed 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -16,7 +16,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \ - mmu/tdp_iter.o + mmu/tdp_iter.o mmu/tdp_mmu.o kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b48b00c8cde65..0cb0c26939dfc 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -19,6 +19,7 @@ #include "ioapic.h" #include "mmu.h" #include "mmu_internal.h" +#include "tdp_mmu.h" #include "x86.h" #include "kvm_cache_regs.h" #include "kvm_emulate.h" @@ -5865,6 +5866,8 @@ void kvm_mmu_init_vm(struct kvm *kvm) { struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker; + kvm_mmu_init_tdp_mmu(kvm); + node->track_write = kvm_mmu_pte_write; node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); @@ -5875,6 +5878,8 @@ void kvm_mmu_uninit_vm(struct kvm *kvm) struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker; kvm_page_track_unregister_notifier(kvm, node); + + kvm_mmu_uninit_tdp_mmu(kvm); } void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c new file mode 100644 index 0000000000000..8241e18c111e6 --- /dev/null +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include "tdp_mmu.h" + +static bool __read_mostly tdp_mmu_enabled = true; +module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644); + +static bool is_tdp_mmu_enabled(void) +{ + if (!READ_ONCE(tdp_mmu_enabled)) + return false; + + if (WARN_ONCE(!tdp_enabled, + "Creating a VM with TDP MMU enabled requires TDP.")) + return false; + + return true; +} + +/* Initializes the TDP MMU for the VM, if enabled. */ +void kvm_mmu_init_tdp_mmu(struct kvm *kvm) +{ + if (!is_tdp_mmu_enabled()) + return; + + /* This should not be changed for the lifetime of the VM. */ + kvm->arch.tdp_mmu_enabled = true; +} + +void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) +{ + if (!kvm->arch.tdp_mmu_enabled) + return; +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h new file mode 100644 index 0000000000000..dd3764f5a9aa3 --- /dev/null +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __KVM_X86_MMU_TDP_MMU_H +#define __KVM_X86_MMU_TDP_MMU_H + +#include + +void kvm_mmu_init_tdp_mmu(struct kvm *kvm); +void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); +#endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800823 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C16686CA for ; Fri, 25 Sep 2020 21:25:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A237221EC for ; Fri, 25 Sep 2020 21:25:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="p3PKaxwP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729588AbgIYVY5 (ORCPT ); Fri, 25 Sep 2020 17:24:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728905AbgIYVXP (ORCPT ); Fri, 25 Sep 2020 17:23:15 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC70CC0613CE for ; Fri, 25 Sep 2020 14:23:15 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id t201so3429141pfc.13 for ; Fri, 25 Sep 2020 14:23:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=kk3ueGPq7p+gjFz0RIiwD9x1NXd/FGzFe2YuGg/svSQ=; b=p3PKaxwPe58qwJA1frrDUL5FTykZ3ePCYzl8HIzwpu10qvGAeZbS8HF3j1Dp/DQ8vY ctQ5wfy9DQcIEbGGesaY98CHjKsFosVz7nORkVEeA90QLw4hIwoANUG+/w2idD0CpXIm 18IfwCuyVdm70xewH1IvjGePQdMRjmXyYAOP3fPmLVWsnSD+cXvSlznWPvvrnyrKAI4s OFqfrZdnrOZv6C8rB23iTapbKlTxj6IaWymWwpsrCte64I1pIPvKwHOXA/05mSAY44lx qDhvD7zfTNw9C1Jd4InJQYLFLP9fUdzHCdg76/Al9+aAMJ6Sc0w6SWw4L20FfpRHcROP 3w/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kk3ueGPq7p+gjFz0RIiwD9x1NXd/FGzFe2YuGg/svSQ=; b=mjfHplTcm1tKMOt6oGHWdUIlX4N0YHMCyrWOYFWtZp0EDwmUqnOTsq5j1J9es7o43P U+B4+0IGGT0aX8LAtCrNhtd0jdpTFTU+T5ZJ/mkTYaa1RjBSdH9UldsSRMdE/cKAm9+7 WFJlHWoZPTRavZLqlMBYqbihQnQjrFXCYfO8nADVxiKDAh4SEExQibRLkJ24c5URIOom gr3lm9kanLAJkY+wYWrEvuwF4Mdeo4cXGcbfazVp8u94mSxzpIuh8MTremezT1zNFfgZ KqUCzbkGC6ByBSm52T/WSfJeR31gL7rWX728w3p4PfB5Wr4/NM/SICm/Zs76c6XOk2WW FXfQ== X-Gm-Message-State: AOAM532cGYLm+qzqoF5rR5269S27E/eXI2PmJ3b2+mYu/eScUVLLdrNp Ccor5NFEtv8VSLFTLWwDDKJHDfUbWhVZ X-Google-Smtp-Source: ABdhPJwR9dvM8nVvdFvs01UJ1cEgLQklxvw7b6eLPvsCl4tWZ3srPZzLqrc4rXZdvjRACkxMwbXKZr5PMMvm Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:902:6bc7:b029:d2:6aa:e177 with SMTP id m7-20020a1709026bc7b02900d206aae177mr1291393plt.52.1601068995232; Fri, 25 Sep 2020 14:23:15 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:44 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-5-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 04/22] kvm: mmu: Allocate and free TDP MMU roots From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The TDP MMU must be able to allocate paging structure root pages and track the usage of those pages. Implement a similar, but separate system for root page allocation to that of the x86 shadow paging implementation. When future patches add synchronization model changes to allow for parallel page faults, these pages will need to be handled differently from the x86 shadow paging based MMU's root pages. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu/mmu.c | 27 +++--- arch/x86/kvm/mmu/mmu_internal.h | 9 ++ arch/x86/kvm/mmu/tdp_mmu.c | 157 ++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 5 + 5 files changed, 188 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 35107819f48ae..9ce6b35ecb33a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -972,6 +972,7 @@ struct kvm_arch { * operations. */ bool tdp_mmu_enabled; + struct list_head tdp_mmu_roots; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0cb0c26939dfc..0f871e36394da 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -170,11 +170,6 @@ module_param(dbg, bool, 0644); #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \ | shadow_x_mask | shadow_nx_mask | shadow_me_mask) -#define ACC_EXEC_MASK 1 -#define ACC_WRITE_MASK PT_WRITABLE_MASK -#define ACC_USER_MASK PT_USER_MASK -#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) - /* The mask for the R/X bits in EPT PTEs */ #define PT64_EPT_READABLE_MASK 0x1ull #define PT64_EPT_EXECUTABLE_MASK 0x4ull @@ -232,7 +227,7 @@ struct kvm_shadow_walk_iterator { __shadow_walk_next(&(_walker), spte)) static struct kmem_cache *pte_list_desc_cache; -static struct kmem_cache *mmu_page_header_cache; +struct kmem_cache *mmu_page_header_cache; static struct percpu_counter kvm_total_used_mmu_pages; static u64 __read_mostly shadow_nx_mask; @@ -3597,10 +3592,14 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, if (!VALID_PAGE(*root_hpa)) return; - sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK); - --sp->root_count; - if (!sp->root_count && sp->role.invalid) - kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); + if (is_tdp_mmu_root(kvm, *root_hpa)) { + kvm_tdp_mmu_put_root_hpa(kvm, *root_hpa); + } else { + sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK); + --sp->root_count; + if (!sp->root_count && sp->role.invalid) + kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); + } *root_hpa = INVALID_PAGE; } @@ -3691,7 +3690,13 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) unsigned i; if (shadow_root_level >= PT64_ROOT_4LEVEL) { - root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true); + if (vcpu->kvm->arch.tdp_mmu_enabled) { + root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu); + } else { + root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, + true); + } + if (!VALID_PAGE(root)) return -ENOSPC; vcpu->arch.mmu->root_hpa = root; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 65bb110847858..530b7d893c7b3 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -41,8 +41,12 @@ struct kvm_mmu_page { /* Number of writes since the last time traversal visited this page. */ atomic_t write_flooding_count; + + bool tdp_mmu_page; }; +extern struct kmem_cache *mmu_page_header_cache; + static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page) { struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT); @@ -69,6 +73,11 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) +#define ACC_EXEC_MASK 1 +#define ACC_WRITE_MASK PT_WRITABLE_MASK +#define ACC_USER_MASK PT_USER_MASK +#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) + /* Functions for interpreting SPTEs */ kvm_pfn_t spte_to_pfn(u64 pte); bool is_mmio_spte(u64 spte); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 8241e18c111e6..cdca829e42040 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1,5 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#include "mmu.h" +#include "mmu_internal.h" #include "tdp_mmu.h" static bool __read_mostly tdp_mmu_enabled = true; @@ -25,10 +27,165 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm) /* This should not be changed for the lifetime of the VM. */ kvm->arch.tdp_mmu_enabled = true; + + INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots); } void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) { if (!kvm->arch.tdp_mmu_enabled) return; + + WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots)); +} + +#define for_each_tdp_mmu_root(_kvm, _root) \ + list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) + +bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) +{ + struct kvm_mmu_page *root; + + if (!kvm->arch.tdp_mmu_enabled) + return false; + + root = to_shadow_page(hpa); + + if (WARN_ON(!root)) + return false; + + return root->tdp_mmu_page; +} + +static void free_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) +{ + lockdep_assert_held(&kvm->mmu_lock); + + WARN_ON(root->root_count); + WARN_ON(!root->tdp_mmu_page); + + list_del(&root->link); + + free_page((unsigned long)root->spt); + kmem_cache_free(mmu_page_header_cache, root); +} + +static void put_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) +{ + lockdep_assert_held(&kvm->mmu_lock); + + root->root_count--; + if (!root->root_count) + free_tdp_mmu_root(kvm, root); +} + +static void get_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) +{ + lockdep_assert_held(&kvm->mmu_lock); + WARN_ON(!root->root_count); + + root->root_count++; +} + +void kvm_tdp_mmu_put_root_hpa(struct kvm *kvm, hpa_t root_hpa) +{ + struct kvm_mmu_page *root; + + root = to_shadow_page(root_hpa); + + if (WARN_ON(!root)) + return; + + put_tdp_mmu_root(kvm, root); +} + +static struct kvm_mmu_page *find_tdp_mmu_root_with_role( + struct kvm *kvm, union kvm_mmu_page_role role) +{ + struct kvm_mmu_page *root; + + lockdep_assert_held(&kvm->mmu_lock); + for_each_tdp_mmu_root(kvm, root) { + WARN_ON(!root->root_count); + + if (root->role.word == role.word) + return root; + } + + return NULL; +} + +static struct kvm_mmu_page *alloc_tdp_mmu_root(struct kvm_vcpu *vcpu, + union kvm_mmu_page_role role) +{ + struct kvm_mmu_page *new_root; + struct kvm_mmu_page *root; + + new_root = kvm_mmu_memory_cache_alloc( + &vcpu->arch.mmu_page_header_cache); + new_root->spt = kvm_mmu_memory_cache_alloc( + &vcpu->arch.mmu_shadow_page_cache); + set_page_private(virt_to_page(new_root->spt), (unsigned long)new_root); + + new_root->role.word = role.word; + new_root->root_count = 1; + new_root->gfn = 0; + new_root->tdp_mmu_page = true; + + spin_lock(&vcpu->kvm->mmu_lock); + + /* Check that no matching root exists before adding this one. */ + root = find_tdp_mmu_root_with_role(vcpu->kvm, role); + if (root) { + get_tdp_mmu_root(vcpu->kvm, root); + spin_unlock(&vcpu->kvm->mmu_lock); + free_page((unsigned long)new_root->spt); + kmem_cache_free(mmu_page_header_cache, new_root); + return root; + } + + list_add(&new_root->link, &vcpu->kvm->arch.tdp_mmu_roots); + spin_unlock(&vcpu->kvm->mmu_lock); + + return new_root; +} + +static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu) +{ + struct kvm_mmu_page *root; + union kvm_mmu_page_role role; + + role = vcpu->arch.mmu->mmu_role.base; + role.level = vcpu->arch.mmu->shadow_root_level; + role.direct = true; + role.gpte_is_8_bytes = true; + role.access = ACC_ALL; + + spin_lock(&vcpu->kvm->mmu_lock); + + /* Search for an already allocated root with the same role. */ + root = find_tdp_mmu_root_with_role(vcpu->kvm, role); + if (root) { + get_tdp_mmu_root(vcpu->kvm, root); + spin_unlock(&vcpu->kvm->mmu_lock); + return root; + } + + spin_unlock(&vcpu->kvm->mmu_lock); + + /* If there is no appropriate root, allocate one. */ + root = alloc_tdp_mmu_root(vcpu, role); + + return root; +} + +hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) +{ + struct kvm_mmu_page *root; + + root = get_tdp_mmu_vcpu_root(vcpu); + if (!root) + return INVALID_PAGE; + + return __pa(root->spt); } diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index dd3764f5a9aa3..9274debffeaa1 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -7,4 +7,9 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); + +bool is_tdp_mmu_root(struct kvm *kvm, hpa_t root); +hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu); +void kvm_tdp_mmu_put_root_hpa(struct kvm *kvm, hpa_t root_hpa); + #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800787 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DFE0A6CA for ; Fri, 25 Sep 2020 21:23:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB9BA221EC for ; Fri, 25 Sep 2020 21:23:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nlElI6Hj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729003AbgIYVXT (ORCPT ); Fri, 25 Sep 2020 17:23:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728964AbgIYVXR (ORCPT ); Fri, 25 Sep 2020 17:23:17 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2972C0613CE for ; Fri, 25 Sep 2020 14:23:17 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id s2so3259665pgm.18 for ; Fri, 25 Sep 2020 14:23:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=67kS+AlksdlFy819QhjV7LatCvAcvKBj03i0x5Ti9mo=; b=nlElI6Hj4N5hLTPXKvZzwMiIkg/iRt2P6oTdO3b8Ss1Pa1sBWFjPH4djskRIS5JWWb eML9nWSg1QrczG72UnJWTJoevBH5ny1cQbsdfk/6WD3Y+DvKzo2jU9Ev4RI3DbzKyd5A NU/LM7iIeDC2W/HSX2c1xFAk6rcp7V3vDdiAhJ188M4kj2Qs3ZgeOWMLHapApPgXFisr UmhzyXtqUeieowGEMPdIWA2qg2y5hPMecb4bAWvnVmMzyl4p76B0UnJcndZvZBYZ/GXY OhK/CMGGuDNY2ruq3QgerYA0PFSKMcsma4N9bA9SLTm3Qacfo+wi8n6KlIMdngfjT017 mEEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=67kS+AlksdlFy819QhjV7LatCvAcvKBj03i0x5Ti9mo=; b=bmwoKYE/vCHC26CfmDhm8M/oaRjV92UdD5JH2TAhAlmvV6SQakT5sjj1YeoAjcC3ho Au8S0CDYgquZDxIOq1Brh+JLXs57OKxom7JwRyDU+XwVbNdU0oQvt1AwQXH1EXhze8ue xTZnOB7LheteD23wqw8rR9tDLmIFje3QmACqH2YwPfzSwthFARx7SgrJizeTkFTqnRpg WFL64BOGoB8o9oN4h7FbhnONXDmPmAb7YwrBhT23O5+cXffAx5loOtZfO2SBAEf9a9LJ CuAAHYlghCt5b3h6HEPe26BNGkb9VY7COa4+daZgmaIjA7yLBpekvju1Lwtkpa28Q4Hv MGVQ== X-Gm-Message-State: AOAM531SVIIZrRRCurqaEQhK92Dqq5/UU+Meh9rl7KT3kuGmZSOyF5JE TRi7TuI0sbdNLn2WpeMkeNJElJg0x/Hz X-Google-Smtp-Source: ABdhPJyf4GQP24ATnh6pSZDVHP6P6VOxyd9ONqC7oO+l+0WZsjBFnLwudc+L/dBRONuzxJVfuStARy1Hz5ZZ Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a62:3786:0:b029:150:e5d9:1e51 with SMTP id e128-20020a6237860000b0290150e5d91e51mr1017187pfa.77.1601068997148; Fri, 25 Sep 2020 14:23:17 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:45 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-6-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 05/22] kvm: mmu: Add functions to handle changed TDP SPTEs From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The existing bookkeeping done by KVM when a PTE is changed is spread around several functions. This makes it difficult to remember all the stats, bitmaps, and other subsystems that need to be updated whenever a PTE is modified. When a non-leaf PTE is marked non-present or becomes a leaf PTE, page table memory must also be freed. To simplify the MMU and facilitate the use of atomic operations on SPTEs in future patches, create functions to handle some of the bookkeeping required as a result of a change. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 6 +- arch/x86/kvm/mmu/mmu_internal.h | 3 + arch/x86/kvm/mmu/tdp_mmu.c | 105 ++++++++++++++++++++++++++++++++ 3 files changed, 111 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0f871e36394da..f09081f9137b0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -310,8 +310,8 @@ static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); } -static void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, - u64 start_gfn, u64 pages) +void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn, + u64 pages) { struct kvm_tlb_range range; @@ -819,7 +819,7 @@ static bool is_accessed_spte(u64 spte) : !is_access_track_spte(spte); } -static bool is_dirty_spte(u64 spte) +bool is_dirty_spte(u64 spte) { u64 dirty_mask = spte_shadow_dirty_mask(spte); diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 530b7d893c7b3..ff1fe0e04fba5 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -83,5 +83,8 @@ kvm_pfn_t spte_to_pfn(u64 pte); bool is_mmio_spte(u64 spte); int is_shadow_present_pte(u64 pte); int is_last_spte(u64 pte, int level); +bool is_dirty_spte(u64 spte); +void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn, + u64 pages); #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index cdca829e42040..653507773b42c 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -189,3 +189,108 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) return __pa(root->spt); } + +static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, + u64 old_spte, u64 new_spte, int level); + +/** + * handle_changed_spte - handle bookkeeping associated with an SPTE change + * @kvm: kvm instance + * @as_id: the address space of the paging structure the SPTE was a part of + * @gfn: the base GFN that was mapped by the SPTE + * @old_spte: The value of the SPTE before the change + * @new_spte: The value of the SPTE after the change + * @level: the level of the PT the SPTE is part of in the paging structure + * + * Handle bookkeeping that might result from the modification of a SPTE. + * This function must be called for all TDP SPTE modifications. + */ +static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, + u64 old_spte, u64 new_spte, int level) +{ + bool was_present = is_shadow_present_pte(old_spte); + bool is_present = is_shadow_present_pte(new_spte); + bool was_leaf = was_present && is_last_spte(old_spte, level); + bool is_leaf = is_present && is_last_spte(new_spte, level); + bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); + u64 *pt; + u64 old_child_spte; + int i; + + WARN_ON(level > PT64_ROOT_MAX_LEVEL); + WARN_ON(level < PG_LEVEL_4K); + WARN_ON(gfn % KVM_PAGES_PER_HPAGE(level)); + + /* + * If this warning were to trigger it would indicate that there was a + * missing MMU notifier or a race with some notifier handler. + * A present, leaf SPTE should never be directly replaced with another + * present leaf SPTE pointing to a differnt PFN. A notifier handler + * should be zapping the SPTE before the main MM's page table is + * changed, or the SPTE should be zeroed, and the TLBs flushed by the + * thread before replacement. + */ + if (was_leaf && is_leaf && pfn_changed) { + pr_err("Invalid SPTE change: cannot replace a present leaf\n" + "SPTE with another present leaf SPTE mapping a\n" + "different PFN!\n" + "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d", + as_id, gfn, old_spte, new_spte, level); + + /* + * Crash the host to prevent error propagation and guest data + * courruption. + */ + BUG(); + } + + if (old_spte == new_spte) + return; + + /* + * The only times a SPTE should be changed from a non-present to + * non-present state is when an MMIO entry is installed/modified/ + * removed. In that case, there is nothing to do here. + */ + if (!was_present && !is_present) { + /* + * If this change does not involve a MMIO SPTE, it is + * unexpected. Log the change, though it should not impact the + * guest since both the former and current SPTEs are nonpresent. + */ + if (WARN_ON(!is_mmio_spte(old_spte) && !is_mmio_spte(new_spte))) + pr_err("Unexpected SPTE change! Nonpresent SPTEs\n" + "should not be replaced with another,\n" + "different nonpresent SPTE, unless one or both\n" + "are MMIO SPTEs.\n" + "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d", + as_id, gfn, old_spte, new_spte, level); + return; + } + + + if (was_leaf && is_dirty_spte(old_spte) && + (!is_dirty_spte(new_spte) || pfn_changed)) + kvm_set_pfn_dirty(spte_to_pfn(old_spte)); + + /* + * Recursively handle child PTs if the change removed a subtree from + * the paging structure. + */ + if (was_present && !was_leaf && (pfn_changed || !is_present)) { + pt = spte_to_child_pt(old_spte, level); + + for (i = 0; i < PT64_ENT_PER_PAGE; i++) { + old_child_spte = *(pt + i); + *(pt + i) = 0; + handle_changed_spte(kvm, as_id, + gfn + (i * KVM_PAGES_PER_HPAGE(level - 1)), + old_child_spte, 0, level - 1); + } + + kvm_flush_remote_tlbs_with_address(kvm, gfn, + KVM_PAGES_PER_HPAGE(level)); + + free_page((unsigned long)pt); + } +} From patchwork Fri Sep 25 21:22:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1ABDC6CA for ; Fri, 25 Sep 2020 21:23:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F105D21D42 for ; Fri, 25 Sep 2020 21:23:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r5uHfrvW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729047AbgIYVXX (ORCPT ); Fri, 25 Sep 2020 17:23:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729008AbgIYVXU (ORCPT ); Fri, 25 Sep 2020 17:23:20 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCC01C0613CE for ; Fri, 25 Sep 2020 14:23:19 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id q2so3831452ybo.5 for ; Fri, 25 Sep 2020 14:23:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=wImq4f56uZGGumWCDNNBjGMOp7yniJ1YSEUaSsZ8S1E=; b=r5uHfrvWmOH5+tkh6TyxkJ0XyDHU070mMWmUfuJGs1LPdRUAInpm/EiOwJtv7y6oY6 4Rsw5lL+7zOXzQukoLvlZ7rU6+ML16UAubjQHJhM1BpYEzdoa47Av4VgP5dE87ZTmaWc fT8LHYkfU1C+xC2kFdtCdZjB+fRZyvh6jd21jy8nJ0SENVmVjdL/z14RShE2XNbfwi7i B7EXH5Q9ZLD+V3eS7rCBWTlX5VdUobHri2geXygyBhYe07K4QelI/Gkc5UpywRpMsU6X vfOSsLC7dlyRCBt5AjeIdhLrBA2Qpj4CRFS8d8vx3E46U5j3zgBZbjzjSFIUZPa2gAbX TAIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=wImq4f56uZGGumWCDNNBjGMOp7yniJ1YSEUaSsZ8S1E=; b=PBub07lD++48aUK8LENZQzU4i+zVsELgHakGaDqUqgCkY5XZicDEtQg+XW6khc0z2Q wrVbXZSV9ue1cm/iN+Fij/6y+5EsbIrqsMwigVUIgf4trQj58qwlvqlEZQuYi9zTbL90 TAnQ0AfXe5UiKVEBC1k6Yhcu/+aemVy6rGjCECQTj88igIplWRJfm6dvioxbBz2ebr0Z t2MCBFOJmmYTPX7PLDOVrroMY24+QhLzPdVv6JWxPS+YrvwJzOp+qcFdJPsEOOITZc3q PKZCRgCWZJDMYgfStfciIlJkuaN0qcx6Eo49SdkqlVuofGqDC2396jTiym+jdcHi9QD1 NYgQ== X-Gm-Message-State: AOAM530m5jcUyjKZG1KOLOjdvxrBDI+3dlB8KMDRjk1i9LFsBJr/LSMm AJE2AEgyyYxo8kk4VXAfCLpjg6202uTu X-Google-Smtp-Source: ABdhPJzs6xEs/BK+jYt2OWKo97X/WaqR/dReS6eqUEm9nKlThkNUe5BuXWKmRDo5eiyVuVUJWfZGsmgZQTsK Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a25:1405:: with SMTP id 5mr1758714ybu.97.1601068999010; Fri, 25 Sep 2020 14:23:19 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:46 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-7-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 06/22] kvm: mmu: Make address space ID a property of memslots From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Save address space ID as a field in each memslot so that functions that do not use rmaps (which implicitly encode the id) can handle multiple address spaces correctly. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 2 files changed, 2 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 05e3c2fb3ef78..a460bc712a81c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -345,6 +345,7 @@ struct kvm_memory_slot { struct kvm_arch_memory_slot arch; unsigned long userspace_addr; u32 flags; + int as_id; short id; }; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cf88233b819a0..f9c80351c9efd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1318,6 +1318,7 @@ int __kvm_set_memory_region(struct kvm *kvm, new.npages = mem->memory_size >> PAGE_SHIFT; new.flags = mem->flags; new.userspace_addr = mem->userspace_addr; + new.as_id = as_id; if (new.npages > KVM_MEM_MAX_NR_PAGES) return -EINVAL; From patchwork Fri Sep 25 21:22:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800821 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AAA56CA for ; Fri, 25 Sep 2020 21:24:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 641C621D7F for ; Fri, 25 Sep 2020 21:24:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FojbvIec" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728974AbgIYVYw (ORCPT ); Fri, 25 Sep 2020 17:24:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729018AbgIYVXV (ORCPT ); Fri, 25 Sep 2020 17:23:21 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C5EAC0613D7 for ; Fri, 25 Sep 2020 14:23:21 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id y7so284011pjt.1 for ; Fri, 25 Sep 2020 14:23:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=Ef4AIuZeb4s5cy019ZEPLkLaZqhCMC47Todu9wHoSxM=; b=FojbvIec/7pxWbOTa2bolog20/DFD50uqk2zpDc237fvki3tjTgMOPwmWU0ogx9E8u dJnYbU93k/nYnLjDSJuApQrffWNitNCBap0NRMMZ8ZzO6xLd9ReQIuQjMbu7f6E5BeUJ +Cdd16WUgsWOqJHy4qTsBczm+ACXL0bi5LcdQm5m0KExZmFEIPibPw2/qBEtqmuGrIsk EUIHjjA+H2DIvXYujIV4mMxJjfVWbDywL59qhl0YSos/Q8K/utJJfINqwWV7zNM55upX 9d4tFZPqgElsybxx9Snz/Pzoz8TkM5+me78i2YhE2AuyAAL+C4D3BJBpAbCRarbvJGjv bcJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Ef4AIuZeb4s5cy019ZEPLkLaZqhCMC47Todu9wHoSxM=; b=iUB0AfI5GbGnhXs6c68id80QVz5UGwR2Bo2g4MCvRYr2uoyaF3UPWPBhnb1O5/9tqc pVrlzCQ+AeaH3SwW7c66q/AUbdePqdH1gIVSgOUthiAvD2UNJCW7ycr/vOG0Xt+Y8eGs 5RciZ5TT3701dFuXUN8p8NSj6xb87Id9NNhP7G4GSpDYJ9NlYEH1cxzUXPAzryFhiF52 vuQB7eBfjbEPcPiTEJGqHORT+n4FlHGAtQTG7Uf/hCLmGBjvmQ9FgnxcUEAP8A+dK5jH 1XBSNkY2b17y782dxue3oq385E3bdeq/kMosCXY1JC2RLrRo25dMsU6hsJfZZxX9KLdC ceoQ== X-Gm-Message-State: AOAM531JwtnrS4xngCuP44T7oko3v6WeCKQhARe0HkgGTVevXpR8uiAE B4VDcCZrm/ExmJOoHaqlTvIUBRBlr3jY X-Google-Smtp-Source: ABdhPJxsYfhwAkNKqclJ88sk/OUcrg9MNbwQXHLfdFz7XI+Krkf1m/AZcFQ+krwqDaoc2peP3f5vtsok44uG Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:902:b713:b029:d2:6153:fb62 with SMTP id d19-20020a170902b713b02900d26153fb62mr1280963pls.28.1601069000829; Fri, 25 Sep 2020 14:23:20 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:47 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-8-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 07/22] kvm: mmu: Support zapping SPTEs in the TDP MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add functions to zap SPTEs to the TDP MMU. These are needed to tear down TDP MMU roots properly and implement other MMU functions which require tearing down mappings. Future patches will add functions to populate the page tables, but as for this patch there will not be any work for these functions to do. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 15 +++++ arch/x86/kvm/mmu/tdp_iter.c | 17 ++++++ arch/x86/kvm/mmu/tdp_iter.h | 1 + arch/x86/kvm/mmu/tdp_mmu.c | 106 ++++++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 + 5 files changed, 141 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f09081f9137b0..7a17cca19b0c1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5852,6 +5852,10 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) kvm_reload_remote_mmus(kvm); kvm_zap_obsolete_pages(kvm); + + if (kvm->arch.tdp_mmu_enabled) + kvm_tdp_mmu_zap_all(kvm); + spin_unlock(&kvm->mmu_lock); } @@ -5892,6 +5896,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) struct kvm_memslots *slots; struct kvm_memory_slot *memslot; int i; + bool flush; spin_lock(&kvm->mmu_lock); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { @@ -5911,6 +5916,12 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) } } + if (kvm->arch.tdp_mmu_enabled) { + flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end); + if (flush) + kvm_flush_remote_tlbs(kvm); + } + spin_unlock(&kvm->mmu_lock); } @@ -6077,6 +6088,10 @@ void kvm_mmu_zap_all(struct kvm *kvm) } kvm_mmu_commit_zap_page(kvm, &invalid_list); + + if (kvm->arch.tdp_mmu_enabled) + kvm_tdp_mmu_zap_all(kvm); + spin_unlock(&kvm->mmu_lock); } diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index ee90d62d2a9b1..6c1a38429c81a 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -161,3 +161,20 @@ void tdp_iter_next(struct tdp_iter *iter) done = try_step_side(iter); } } + +/* + * Restart the walk over the paging structure from the root, starting from the + * highest gfn the iterator had previously reached. Assumes that the entire + * paging structure, except the root page, may have been completely torn down + * and rebuilt. + */ +void tdp_iter_refresh_walk(struct tdp_iter *iter) +{ + gfn_t goal_gfn = iter->goal_gfn; + + if (iter->gfn > goal_gfn) + goal_gfn = iter->gfn; + + tdp_iter_start(iter, iter->pt_path[iter->root_level - 1], + iter->root_level, goal_gfn); +} diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index b102109778eac..34da3bdada436 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -49,5 +49,6 @@ u64 *spte_to_child_pt(u64 pte, int level); void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, gfn_t goal_gfn); void tdp_iter_next(struct tdp_iter *iter); +void tdp_iter_refresh_walk(struct tdp_iter *iter); #endif /* __KVM_X86_MMU_TDP_ITER_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 653507773b42c..d96fc182c8497 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -2,6 +2,7 @@ #include "mmu.h" #include "mmu_internal.h" +#include "tdp_iter.h" #include "tdp_mmu.h" static bool __read_mostly tdp_mmu_enabled = true; @@ -57,8 +58,13 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) return root->tdp_mmu_page; } +static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t start, gfn_t end); + static void free_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) { + gfn_t max_gfn = 1ULL << (boot_cpu_data.x86_phys_bits - PAGE_SHIFT); + lockdep_assert_held(&kvm->mmu_lock); WARN_ON(root->root_count); @@ -66,6 +72,8 @@ static void free_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) list_del(&root->link); + zap_gfn_range(kvm, root, 0, max_gfn); + free_page((unsigned long)root->spt); kmem_cache_free(mmu_page_header_cache, root); } @@ -193,6 +201,11 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_spte, u64 new_spte, int level); +static int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) +{ + return sp->role.smm ? 1 : 0; +} + /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -294,3 +307,96 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, free_page((unsigned long)pt); } } + +#define for_each_tdp_pte_root(_iter, _root, _start, _end) \ + for_each_tdp_pte(_iter, _root->spt, _root->role.level, _start, _end) + +/* + * If the MMU lock is contended or this thread needs to yield, flushes + * the TLBs, releases, the MMU lock, yields, reacquires the MMU lock, + * restarts the tdp_iter's walk from the root, and returns true. + * If no yield is needed, returns false. + */ +static bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter) +{ + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { + kvm_flush_remote_tlbs(kvm); + cond_resched_lock(&kvm->mmu_lock); + tdp_iter_refresh_walk(iter); + return true; + } else { + return false; + } +} + +/* + * Tears down the mappings for the range of gfns, [start, end), and frees the + * non-root pages mapping GFNs strictly within that range. Returns true if + * SPTEs have been cleared and a TLB flush is needed before releasing the + * MMU lock. + */ +static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t start, gfn_t end) +{ + struct tdp_iter iter; + bool flush_needed = false; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, start, end) { + if (!is_shadow_present_pte(iter.old_spte)) + continue; + + /* + * If this is a non-last-level SPTE that covers a larger range + * than should be zapped, continue, and zap the mappings at a + * lower level. + */ + if ((iter.gfn < start || + iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) && + !is_last_spte(iter.old_spte, iter.level)) + continue; + + *iter.sptep = 0; + handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, 0, + iter.level); + + flush_needed = !tdp_mmu_iter_cond_resched(kvm, &iter); + } + return flush_needed; +} + +/* + * Tears down the mappings for the range of gfns, [start, end), and frees the + * non-root pages mapping GFNs strictly within that range. Returns true if + * SPTEs have been cleared and a TLB flush is needed before releasing the + * MMU lock. + */ +bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_mmu_page *root; + bool flush = false; + + for_each_tdp_mmu_root(kvm, root) { + /* + * Take a reference on the root so that it cannot be freed if + * this thread releases the MMU lock and yields in this loop. + */ + get_tdp_mmu_root(kvm, root); + + flush = zap_gfn_range(kvm, root, start, end) || flush; + + put_tdp_mmu_root(kvm, root); + } + + return flush; +} + +void kvm_tdp_mmu_zap_all(struct kvm *kvm) +{ + gfn_t max_gfn = 1ULL << (boot_cpu_data.x86_phys_bits - PAGE_SHIFT); + bool flush; + + flush = kvm_tdp_mmu_zap_gfn_range(kvm, 0, max_gfn); + if (flush) + kvm_flush_remote_tlbs(kvm); +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 9274debffeaa1..cb86f9fe69017 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -12,4 +12,6 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t root); hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu); void kvm_tdp_mmu_put_root_hpa(struct kvm *kvm, hpa_t root_hpa); +bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end); +void kvm_tdp_mmu_zap_all(struct kvm *kvm); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800815 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C2EB36CA for ; Fri, 25 Sep 2020 21:24:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A54B721D42 for ; Fri, 25 Sep 2020 21:24:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uRtx0Sr9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729186AbgIYVXf (ORCPT ); Fri, 25 Sep 2020 17:23:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728892AbgIYVXX (ORCPT ); Fri, 25 Sep 2020 17:23:23 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40E90C0613D3 for ; Fri, 25 Sep 2020 14:23:23 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id c197so3394029pfb.23 for ; Fri, 25 Sep 2020 14:23:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=hVmmolr7lApGk7er3lyz+NeUKAxqmQR1uXEi12OS9QI=; b=uRtx0Sr98de4gAfMY1ZHAxFljk+3TBqrF4dPujcOUMsyV5fvW1CpsCnPvnSk4Vs+2i ZIVNnjex9pUL+yK2imwPPO4huqwVjgM8aoxMQwsFtsJEy9fgN9n7CkyoOIeKjvvywy+X EIVWmdyevSw4H6HREAcQ15Lslpl96aSeEoNszzmvueLQQn4Knbs7hlmXnRsSIfrCUE7X MT8uAB+eDiYP0NilKmyz4QL2kbEm0sbZSyHIG6uAzPpXvcZcWORNbzFap6tjeGdZSpiV 1eLi0rhwnrXhacfCK/zMlMJoJ9XRG+/E3Ch4YUnDGPXHUQ3STFZNKmhlIoK2AddOrYa1 CpQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=hVmmolr7lApGk7er3lyz+NeUKAxqmQR1uXEi12OS9QI=; b=ULhVAeZIQDezVwwvtMXD880PR9XkOACpUVVBnRXNDAvePrKh3WLDozki0gfkm8vih2 MnrdbhLsiFSZBdjlD2d+G8aOL5jwDbQNVzBSJJuFZQdeJWZnmtgjotrWyLXxYwecYY6l nWchYCcdlEumFtQjbC6kDktnRjM83tsenubT/AwkcY408uueSZbS2V2752IUxdJSvzuV gXAerrHsEjdeUY0h742wwzuFMBbZ+G9OIi2LVl8GI2QsUqv3QaHZ04f012aLJ71yGw0F b4F71skB1YRYRvLlNwk6PMGm1Y4IfP185bsmEzcg3jeX5QRaYcaUwXqGY3oJZr9rHCqS SWag== X-Gm-Message-State: AOAM533KrIhyac6Vt6bJE2xp58I+gY6p+bHgZU7kZtdi6JJkEQU6dl8k 5Zz7R5arH+eAvEJNMm9AJR/ag+eGEgH1 X-Google-Smtp-Source: ABdhPJykTZAToU4jQbARt+8vitv5ixVAqw6g+lXeuJjrE9Gr7hVSbzWTzxvSUmnaY+OwpaF11zb8bh9nYq2n Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:902:5995:b029:d2:6140:79d6 with SMTP id p21-20020a1709025995b02900d2614079d6mr1286927pli.11.1601069002622; Fri, 25 Sep 2020 14:23:22 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:48 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-9-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 08/22] kvm: mmu: Separate making non-leaf sptes from link_shadow_page From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The TDP MMU page fault handler will need to be able to create non-leaf SPTEs to build up the paging structures. Rather than re-implementing the function, factor the SPTE creation out of link_shadow_page. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7a17cca19b0c1..6344e7863a0f5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2555,21 +2555,30 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) __shadow_walk_next(iterator, *iterator->sptep); } -static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, - struct kvm_mmu_page *sp) +static u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled) { u64 spte; - BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); - - spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK | + spte = __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK | shadow_user_mask | shadow_x_mask | shadow_me_mask; - if (sp_ad_disabled(sp)) + if (ad_disabled) spte |= SPTE_AD_DISABLED_MASK; else spte |= shadow_accessed_mask; + return spte; +} + +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, + struct kvm_mmu_page *sp) +{ + u64 spte; + + BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); + + spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp)); + mmu_spte_set(sptep, spte); mmu_page_add_parent_pte(vcpu, sp, sptep); From patchwork Fri Sep 25 21:22:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800791 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C1B1E112E for ; Fri, 25 Sep 2020 21:23:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A401021D42 for ; Fri, 25 Sep 2020 21:23:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hUkuJGcG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729106AbgIYVX0 (ORCPT ); Fri, 25 Sep 2020 17:23:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729072AbgIYVXZ (ORCPT ); Fri, 25 Sep 2020 17:23:25 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 003A0C0613DA for ; Fri, 25 Sep 2020 14:23:24 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id 135so3437240pfu.9 for ; Fri, 25 Sep 2020 14:23:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=jlWnK0UatliDnGlUAK2d+RzZSNRxAzvcnq5xv9zblSM=; b=hUkuJGcGGOWXCuh3cgUdpzlmwWsVSOt/fpbdCKnS3R+E2rmbHpsFzpSNxlPgdkIv8q saKo1DDwM6Gw+SxtWWqGIo+9kwLa/eJc/yR4DKBw0tsuwriRlUz7OPgJO++uvlQtEDsG Lam4xfAZf6CqFaEsTaGSAmqVMgAuPmEr6kl4aYUIsqtnPM7cn6zDxKu6S1z/fJ7zgQPI QbpGj0sef5b15xuSxFDmID4GG4W7J9XbYip+GMGZGrEyG9kb2pFqWA50eSPeqziO8iD0 /4hHeaUrcUi7lYesG6oPInO4ne9seHVstyE4gBV+SlJKAtoj+oEoEm36KETaZ2G3HvwY wzRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=jlWnK0UatliDnGlUAK2d+RzZSNRxAzvcnq5xv9zblSM=; b=YpLyWLG2v1A22wHxMWOmAyCxa/oaWpVIhIfgOGwAbxTBQygZNYEszvERSbmuQBpgiJ 0IRId1ZK/xs21jJhFkL6U5ynPbbl4w+TIsFrSZb1BcbOtE6iaKtVNrWhjMJFIc7S6oDw U3jZ3qevvR7Yqu9RlYhqF51K1QMQ7OPn8EcXUiZI1QxZHGGYVqHyudZcCvK+ybxy7oEl kwDZMhNB8UcIfm1/SHBtVZnxog41ZK/sgf+SYcEvVaj86FL4RnQGf+XIFhhC9j56c+aU 8mg0JVna6A313lUubbNuy5igcE62wDA91D4G6YRpnlqpPnWD6XTaQL0JyPnH2gAb8LBT 4RJg== X-Gm-Message-State: AOAM531hRmZCSb9MSUZUMZJhnQTFhfCrIEZxXMkxeAe0xjZP9ja8gVjX xhhbelbFMjfkf/8+PjsswZg80dNP5JOH X-Google-Smtp-Source: ABdhPJxfPYyWGh6qxdLaDoqeK4fs5yqHKNRSUhi7+OavHqU4nQQy1q4RoM9BrE2CIUO80pEUH4i8c63S349B Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:902:bc8a:b029:d2:2a0b:f09e with SMTP id bb10-20020a170902bc8ab02900d22a0bf09emr1305261plb.33.1601069004450; Fri, 25 Sep 2020 14:23:24 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:49 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-10-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 09/22] kvm: mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to avoid creating executable hugepages in the TDP MMU PF handler, remove the dependency between disallowed_hugepage_adjust and the shadow_walk_iterator. This will open the function up to being used by the TDP MMU PF handler in a future patch. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 17 +++++++++-------- arch/x86/kvm/mmu/paging_tmpl.h | 3 ++- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6344e7863a0f5..f6e6fc9959c04 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3295,13 +3295,12 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, return level; } -static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it, - gfn_t gfn, kvm_pfn_t *pfnp, int *levelp) +static void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level, + kvm_pfn_t *pfnp, int *goal_levelp) { - int level = *levelp; - u64 spte = *it.sptep; + int goal_level = *goal_levelp; - if (it.level == level && level > PG_LEVEL_4K && + if (cur_level == goal_level && goal_level > PG_LEVEL_4K && is_nx_huge_page_enabled() && is_shadow_present_pte(spte) && !is_large_pte(spte)) { @@ -3312,9 +3311,10 @@ static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it, * patching back for them into pfn the next 9 bits of * the address. */ - u64 page_mask = KVM_PAGES_PER_HPAGE(level) - KVM_PAGES_PER_HPAGE(level - 1); + u64 page_mask = KVM_PAGES_PER_HPAGE(goal_level) - + KVM_PAGES_PER_HPAGE(goal_level - 1); *pfnp |= gfn & page_mask; - (*levelp)--; + (*goal_levelp)--; } } @@ -3339,7 +3339,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, * We cannot overwrite existing page tables with an NX * large page, as the leaf could be executable. */ - disallowed_hugepage_adjust(it, gfn, &pfn, &level); + disallowed_hugepage_adjust(*it.sptep, gfn, it.level, + &pfn, &level); base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); if (it.level == level) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 4dd6b1e5b8cf7..6a8666cb0d24b 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -690,7 +690,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr, * We cannot overwrite existing page tables with an NX * large page, as the leaf could be executable. */ - disallowed_hugepage_adjust(it, gw->gfn, &pfn, &hlevel); + disallowed_hugepage_adjust(*it.sptep, gw->gfn, it.level, + &pfn, &hlevel); base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); if (it.level == hlevel) From patchwork Fri Sep 25 21:22:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800819 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C3FB6CA for ; Fri, 25 Sep 2020 21:24:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2978E21D7F for ; Fri, 25 Sep 2020 21:24:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mLl9v8SR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729217AbgIYVYm (ORCPT ); Fri, 25 Sep 2020 17:24:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729123AbgIYVX0 (ORCPT ); Fri, 25 Sep 2020 17:23:26 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1B4AC0613D3 for ; Fri, 25 Sep 2020 14:23:26 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id r1so258082pjp.5 for ; Fri, 25 Sep 2020 14:23:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=UXulVPbiF44P5UU9x1p1VstHMCrG+98/OCotusC7CPo=; b=mLl9v8SRq2DChfgcA/HugGpbD/RpzoGh095opA24xNizabwnO7XZxuPQUb96ngIDes uCLvAPFmKY4dnj4gUGpn5Ny6Gn8ZCIq/8LEH8IZ8tpTFscy5SFT7oMBxEIPsuRPZWWZN NxddiUF/vKAI1UmvKQBawE43/4Y1yvH6GqOECBfvSy9CWOJ2jlNHL/NnOBycBNg9pbPk 09KqPZRVJf8f3SZQtK0xAIp0EZKjU9ZaGJmsjauG2yB573DCkPivLB2xrhEV9/LtX1ro 7mOZATKugtdMEZAYQrQnXE9ByOX5L8+lyvothlF0w6WgCKV9lSPIBOwuoilb1T85vOj2 NOTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=UXulVPbiF44P5UU9x1p1VstHMCrG+98/OCotusC7CPo=; b=NHqJBk/wSnTPHED9HY6koYBu77DV9LYKa/nfXcPkPBj64qYOqE4FaarO5iBK0ehOgS JHzUxQBPDWECiCIv/tEz9XyhbeVQwnvrO7f1M4kvH5eMlOH8aFNoSOBslzYzCQXj7qoe gpHOkxAOaD2U32qSnS7iZaA7MXU569fndGzXAVzMy57iM1cINc7pRL6XyMTWpOpIool4 1AcVAocRP+JkPqcu6vDcvhtsG0ba44F71lzRAhGFO1MblgpIzAuXD284JwQ17HiBL+zp sFeJpp2nIOg7erjOdNuWWwr79+D76yYV0webPGW+NEutQ0URUsHlrOcuq4tHXXIiqn9X Guog== X-Gm-Message-State: AOAM530e3cmlRFOh7uHEsVrFlDDY0/TOyIYGX0qjGCvbjA+GO1wbCy6r j0L4QobwqHIyEbd2Tu62KcRNqW5/7Oux X-Google-Smtp-Source: ABdhPJy5xZyaRSfJpW0pxmioNMPR5q8KeqZ1vKi4+AZXIJg5PolRY8PK+UyD00xi7mhxJ0lspvBTzQumbu2X Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a63:d449:: with SMTP id i9mr648654pgj.83.1601069006244; Fri, 25 Sep 2020 14:23:26 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:50 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-11-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 10/22] kvm: mmu: Add TDP MMU PF handler From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add functions to handle page faults in the TDP MMU. These page faults are currently handled in much the same way as the x86 shadow paging based MMU, however the ordering of some operations is slightly different. Future patches will add eager NX splitting, a fast page fault handler, and parallel page faults. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 66 ++++++----------- arch/x86/kvm/mmu/mmu_internal.h | 45 +++++++++++ arch/x86/kvm/mmu/tdp_mmu.c | 127 ++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 4 + 4 files changed, 200 insertions(+), 42 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f6e6fc9959c04..52d661a758585 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -153,12 +153,6 @@ module_param(dbg, bool, 0644); #else #define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) #endif -#define PT64_LVL_ADDR_MASK(level) \ - (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ - * PT64_LEVEL_BITS))) - 1)) -#define PT64_LVL_OFFSET_MASK(level) \ - (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ - * PT64_LEVEL_BITS))) - 1)) #define PT32_BASE_ADDR_MASK PAGE_MASK #define PT32_DIR_BASE_ADDR_MASK \ @@ -182,20 +176,6 @@ module_param(dbg, bool, 0644); /* make pte_list_desc fit well in cache line */ #define PTE_LIST_EXT 3 -/* - * Return values of handle_mmio_page_fault and mmu.page_fault: - * RET_PF_RETRY: let CPU fault again on the address. - * RET_PF_EMULATE: mmio page fault, emulate the instruction directly. - * - * For handle_mmio_page_fault only: - * RET_PF_INVALID: the spte is invalid, let the real page fault path update it. - */ -enum { - RET_PF_RETRY = 0, - RET_PF_EMULATE = 1, - RET_PF_INVALID = 2, -}; - struct pte_list_desc { u64 *sptes[PTE_LIST_EXT]; struct pte_list_desc *more; @@ -233,7 +213,7 @@ static struct percpu_counter kvm_total_used_mmu_pages; static u64 __read_mostly shadow_nx_mask; static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; -static u64 __read_mostly shadow_accessed_mask; +u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; static u64 __read_mostly shadow_mmio_value; static u64 __read_mostly shadow_mmio_access_mask; @@ -364,7 +344,7 @@ static inline bool spte_ad_need_write_protect(u64 spte) return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK; } -static bool is_nx_huge_page_enabled(void) +bool is_nx_huge_page_enabled(void) { return READ_ONCE(nx_huge_pages); } @@ -381,7 +361,7 @@ static inline u64 spte_shadow_dirty_mask(u64 spte) return spte_ad_enabled(spte) ? shadow_dirty_mask : 0; } -static inline bool is_access_track_spte(u64 spte) +inline bool is_access_track_spte(u64 spte) { return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0; } @@ -433,7 +413,7 @@ static u64 get_mmio_spte_generation(u64 spte) return gen; } -static u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) +u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) { u64 gen = kvm_vcpu_memslots(vcpu)->generation & MMIO_SPTE_GEN_MASK; @@ -613,7 +593,7 @@ int is_shadow_present_pte(u64 pte) return (pte != 0) && !is_mmio_spte(pte); } -static int is_large_pte(u64 pte) +int is_large_pte(u64 pte) { return pte & PT_PAGE_SIZE_MASK; } @@ -2555,7 +2535,7 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) __shadow_walk_next(iterator, *iterator->sptep); } -static u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled) +u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled) { u64 spte; @@ -2961,14 +2941,9 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) E820_TYPE_RAM); } -/* Bits which may be returned by set_spte() */ -#define SET_SPTE_WRITE_PROTECTED_PT BIT(0) -#define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1) - -static u64 make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, - gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative, - bool can_unsync, bool host_writable, bool ad_disabled, - int *ret) +u64 make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, + gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative, + bool can_unsync, bool host_writable, bool ad_disabled, int *ret) { u64 spte = 0; @@ -3249,8 +3224,8 @@ static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn, return level; } -static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, - int max_level, kvm_pfn_t *pfnp) +int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, + int max_level, kvm_pfn_t *pfnp) { struct kvm_memory_slot *slot; struct kvm_lpage_info *linfo; @@ -3295,8 +3270,8 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, return level; } -static void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level, - kvm_pfn_t *pfnp, int *goal_levelp) +void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level, + kvm_pfn_t *pfnp, int *goal_levelp) { int goal_level = *goal_levelp; @@ -4113,8 +4088,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, if (page_fault_handle_page_track(vcpu, error_code, gfn)) return RET_PF_EMULATE; - if (fast_page_fault(vcpu, gpa, error_code)) - return RET_PF_RETRY; + if (!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)) + if (fast_page_fault(vcpu, gpa, error_code)) + return RET_PF_RETRY; r = mmu_topup_memory_caches(vcpu, false); if (r) @@ -4139,8 +4115,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, r = make_mmu_pages_available(vcpu); if (r) goto out_unlock; - r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn, - prefault, is_tdp && lpage_disallowed); + + if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)) + r = kvm_tdp_mmu_page_fault(vcpu, write, map_writable, max_level, + gpa, pfn, prefault, + is_tdp && lpage_disallowed); + else + r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn, + prefault, is_tdp && lpage_disallowed); out_unlock: spin_unlock(&vcpu->kvm->mmu_lock); diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index ff1fe0e04fba5..4cef9da051847 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -73,6 +73,15 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) +#define PT64_LVL_ADDR_MASK(level) \ + (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \ + * PT64_LEVEL_BITS))) - 1)) +#define PT64_LVL_OFFSET_MASK(level) \ + (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ + * PT64_LEVEL_BITS))) - 1)) + +extern u64 shadow_accessed_mask; + #define ACC_EXEC_MASK 1 #define ACC_WRITE_MASK PT_WRITABLE_MASK #define ACC_USER_MASK PT_USER_MASK @@ -84,7 +93,43 @@ bool is_mmio_spte(u64 spte); int is_shadow_present_pte(u64 pte); int is_last_spte(u64 pte, int level); bool is_dirty_spte(u64 spte); +int is_large_pte(u64 pte); +bool is_access_track_spte(u64 spte); void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn, u64 pages); + +/* + * Return values of handle_mmio_page_fault and mmu.page_fault: + * RET_PF_RETRY: let CPU fault again on the address. + * RET_PF_EMULATE: mmio page fault, emulate the instruction directly. + * + * For handle_mmio_page_fault only: + * RET_PF_INVALID: the spte is invalid, let the real page fault path update it. + */ +enum { + RET_PF_RETRY = 0, + RET_PF_EMULATE = 1, + RET_PF_INVALID = 2, +}; + +/* Bits which may be returned by set_spte() */ +#define SET_SPTE_WRITE_PROTECTED_PT BIT(0) +#define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1) + +u64 make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, + gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative, + bool can_unsync, bool host_writable, bool ad_disabled, int *ret); +u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access); +u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled); + +int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, + int max_level, kvm_pfn_t *pfnp); +void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level, + kvm_pfn_t *pfnp, int *goal_levelp); + +bool is_nx_huge_page_enabled(void); + +void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); + #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d96fc182c8497..37bdebc2592ea 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -311,6 +311,10 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, #define for_each_tdp_pte_root(_iter, _root, _start, _end) \ for_each_tdp_pte(_iter, _root->spt, _root->role.level, _start, _end) +#define for_each_tdp_pte_vcpu(_iter, _vcpu, _start, _end) \ + for_each_tdp_pte(_iter, __va(_vcpu->arch.mmu->root_hpa), \ + _vcpu->arch.mmu->shadow_root_level, _start, _end) + /* * If the MMU lock is contended or this thread needs to yield, flushes * the TLBs, releases, the MMU lock, yields, reacquires the MMU lock, @@ -400,3 +404,126 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm) if (flush) kvm_flush_remote_tlbs(kvm); } + +/* + * Installs a last-level SPTE to handle a TDP page fault. + * (NPT/EPT violation/misconfiguration) + */ +static int page_fault_handle_target_level(struct kvm_vcpu *vcpu, int write, + int map_writable, int as_id, + struct tdp_iter *iter, + kvm_pfn_t pfn, bool prefault) +{ + u64 new_spte; + int ret = 0; + int make_spte_ret = 0; + + if (unlikely(is_noslot_pfn(pfn))) + new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); + else + new_spte = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn, + pfn, iter->old_spte, prefault, true, + map_writable, !shadow_accessed_mask, + &make_spte_ret); + + /* + * If the page fault was caused by a write but the page is write + * protected, emulation is needed. If the emulation was skipped, + * the vCPU would have the same fault again. + */ + if ((make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) && write) + ret = RET_PF_EMULATE; + + /* If a MMIO SPTE is installed, the MMIO will need to be emulated. */ + if (unlikely(is_mmio_spte(new_spte))) + ret = RET_PF_EMULATE; + + *iter->sptep = new_spte; + handle_changed_spte(vcpu->kvm, as_id, iter->gfn, iter->old_spte, + new_spte, iter->level); + + if (!prefault) + vcpu->stat.pf_fixed++; + + return ret; +} + +/* + * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing + * page tables and SPTEs to translate the faulting guest physical address. + */ +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, + int max_level, gpa_t gpa, kvm_pfn_t pfn, + bool prefault, bool account_disallowed_nx_lpage) +{ + struct tdp_iter iter; + struct kvm_mmu_memory_cache *pf_pt_cache = + &vcpu->arch.mmu_shadow_page_cache; + u64 *child_pt; + u64 new_spte; + int ret; + int as_id = kvm_arch_vcpu_memslots_id(vcpu); + gfn_t gfn = gpa >> PAGE_SHIFT; + int level; + + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + return RET_PF_RETRY; + + if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))) + return RET_PF_RETRY; + + level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn); + + for_each_tdp_pte_vcpu(iter, vcpu, gfn, gfn + 1) { + disallowed_hugepage_adjust(iter.old_spte, gfn, iter.level, + &pfn, &level); + + if (iter.level == level) + break; + + /* + * If there is an SPTE mapping a large page at a higher level + * than the target, that SPTE must be cleared and replaced + * with a non-leaf SPTE. + */ + if (is_shadow_present_pte(iter.old_spte) && + is_large_pte(iter.old_spte)) { + *iter.sptep = 0; + handle_changed_spte(vcpu->kvm, as_id, iter.gfn, + iter.old_spte, 0, iter.level); + kvm_flush_remote_tlbs_with_address(vcpu->kvm, iter.gfn, + KVM_PAGES_PER_HPAGE(iter.level)); + + /* + * The iter must explicitly re-read the spte here + * because the new is needed before the next iteration + * of the loop. + */ + iter.old_spte = READ_ONCE(*iter.sptep); + } + + if (!is_shadow_present_pte(iter.old_spte)) { + child_pt = kvm_mmu_memory_cache_alloc(pf_pt_cache); + clear_page(child_pt); + new_spte = make_nonleaf_spte(child_pt, + !shadow_accessed_mask); + + *iter.sptep = new_spte; + handle_changed_spte(vcpu->kvm, as_id, iter.gfn, + iter.old_spte, new_spte, + iter.level); + } + } + + if (WARN_ON(iter.level != level)) + return RET_PF_RETRY; + + ret = page_fault_handle_target_level(vcpu, write, map_writable, + as_id, &iter, pfn, prefault); + + /* If emulating, flush this vcpu's TLB. */ + if (ret == RET_PF_EMULATE) + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); + + return ret; +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index cb86f9fe69017..abf23dc0ab7ad 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -14,4 +14,8 @@ void kvm_tdp_mmu_put_root_hpa(struct kvm *kvm, hpa_t root_hpa); bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end); void kvm_tdp_mmu_zap_all(struct kvm *kvm); + +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, + int level, gpa_t gpa, kvm_pfn_t pfn, bool prefault, + bool lpage_disallowed); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800817 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 096C06CA for ; Fri, 25 Sep 2020 21:24:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCEA421D42 for ; Fri, 25 Sep 2020 21:24:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="n75SNZmX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729524AbgIYVYg (ORCPT ); Fri, 25 Sep 2020 17:24:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728909AbgIYVX2 (ORCPT ); Fri, 25 Sep 2020 17:23:28 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF016C0613D6 for ; Fri, 25 Sep 2020 14:23:28 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id t128so134946pgb.23 for ; Fri, 25 Sep 2020 14:23:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=qdut5f7Gn8gtFAtWyzdbgJG1DGusgtNnyf9bk1NcfOU=; b=n75SNZmXp+ycob3LZ9+327WCtH9N9zGtSq+mKW0Q3r75hy61EtZRYo6aZvdyEz1o+d dvJmZNWRlk+ad9Sxq1VUpAeP4FPlR4sdTYyd0ExIzO3tKxyASj3UubfDR75DF7Aph+Og SEdioV9D1Wk+dAQH2y54qCXohE4CIHysUtcsBTJ8KFYJ/WZNGTkyVhaCzGXcFOVuYsHv iaASSwpGPExonjwlRSGDzLG36gReqtihq4j2Nps9JC7BKLM000waKIiAlud/Y/+dvFgR IhHXF+cR3J21Vf7RWjoGfatlnVnpGoCl/bWF2mCiwuQMBYH77iujutnGJ1iNqFPY4f/i 1cug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=qdut5f7Gn8gtFAtWyzdbgJG1DGusgtNnyf9bk1NcfOU=; b=ivxD6oFVRPJ0hPBaNhc/zWdaIC+7lNZKnJnWidutqegS+TNXdCfHgfAybYVCV9SWHa QEho98rVNhBGZ1QafHrMM02URbGZl+/EoIrGfv4NJhKUeSUhAqmllRRfOxECdk4rWf/x StJeEpyLoHjjlzrRkq2kAl6gsaXTS7FsndaY05QqQeUPcOcgdOVR17vXRt2kxIe0QREH RmLoLOR+xtgi7Sbh78612cMkf0NL+PqZe2J5TmiPTqMa76rc2TQ3iVxcRXq7Hf1G9s+j hKdy+A5vbaw15KO66pxH16d0AUj1Fh/VVwuCGiKLFK9CKgQ0BcK9mmoyG8NfRrRcQmlx eC8g== X-Gm-Message-State: AOAM533/NOhgbAHU6wLcBsRdaPd06Hq6Q1v1FxDT8mqFC7jSgFnotm6U NpXOup+MlHsWQ7mvB8KwBAFGsj1rsl8H X-Google-Smtp-Source: ABdhPJyEegOo4xGdxZTPvHQvTOHItCy4eaaDeJsRP6IK53GOW1QTD5m2g4papmcyOQKHh1LuccKZxxgHTQiC Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a62:cd49:0:b029:150:7742:c6c8 with SMTP id o70-20020a62cd490000b02901507742c6c8mr1001560pfg.61.1601069008209; Fri, 25 Sep 2020 14:23:28 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:51 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-12-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 11/22] kvm: mmu: Factor out allocating a new tdp_mmu_page From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move the code to allocate a struct kvm_mmu_page for the TDP MMU out of the root allocation code to support allocating a struct kvm_mmu_page for every page of page table memory used by the TDP MMU, in the next commit. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/tdp_mmu.c | 59 ++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 37bdebc2592ea..a3bcee6bf30e8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -123,27 +123,50 @@ static struct kvm_mmu_page *find_tdp_mmu_root_with_role( return NULL; } -static struct kvm_mmu_page *alloc_tdp_mmu_root(struct kvm_vcpu *vcpu, - union kvm_mmu_page_role role) +static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu, + int level) +{ + union kvm_mmu_page_role role; + + role = vcpu->arch.mmu->mmu_role.base; + role.level = vcpu->arch.mmu->shadow_root_level; + role.direct = true; + role.gpte_is_8_bytes = true; + role.access = ACC_ALL; + + return role; +} + +static struct kvm_mmu_page *alloc_tdp_mmu_page(struct kvm_vcpu *vcpu, gfn_t gfn, + int level) +{ + struct kvm_mmu_page *sp; + + sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache); + sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache); + set_page_private(virt_to_page(sp->spt), (unsigned long)sp); + + sp->role.word = page_role_for_level(vcpu, level).word; + sp->gfn = gfn; + sp->tdp_mmu_page = true; + + return sp; +} + +static struct kvm_mmu_page *alloc_tdp_mmu_root(struct kvm_vcpu *vcpu) { struct kvm_mmu_page *new_root; struct kvm_mmu_page *root; - new_root = kvm_mmu_memory_cache_alloc( - &vcpu->arch.mmu_page_header_cache); - new_root->spt = kvm_mmu_memory_cache_alloc( - &vcpu->arch.mmu_shadow_page_cache); - set_page_private(virt_to_page(new_root->spt), (unsigned long)new_root); - - new_root->role.word = role.word; + new_root = alloc_tdp_mmu_page(vcpu, 0, + vcpu->arch.mmu->shadow_root_level); new_root->root_count = 1; - new_root->gfn = 0; - new_root->tdp_mmu_page = true; spin_lock(&vcpu->kvm->mmu_lock); /* Check that no matching root exists before adding this one. */ - root = find_tdp_mmu_root_with_role(vcpu->kvm, role); + root = find_tdp_mmu_root_with_role(vcpu->kvm, + page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level)); if (root) { get_tdp_mmu_root(vcpu->kvm, root); spin_unlock(&vcpu->kvm->mmu_lock); @@ -161,18 +184,12 @@ static struct kvm_mmu_page *alloc_tdp_mmu_root(struct kvm_vcpu *vcpu, static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu) { struct kvm_mmu_page *root; - union kvm_mmu_page_role role; - - role = vcpu->arch.mmu->mmu_role.base; - role.level = vcpu->arch.mmu->shadow_root_level; - role.direct = true; - role.gpte_is_8_bytes = true; - role.access = ACC_ALL; spin_lock(&vcpu->kvm->mmu_lock); /* Search for an already allocated root with the same role. */ - root = find_tdp_mmu_root_with_role(vcpu->kvm, role); + root = find_tdp_mmu_root_with_role(vcpu->kvm, + page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level)); if (root) { get_tdp_mmu_root(vcpu->kvm, root); spin_unlock(&vcpu->kvm->mmu_lock); @@ -182,7 +199,7 @@ static struct kvm_mmu_page *get_tdp_mmu_vcpu_root(struct kvm_vcpu *vcpu) spin_unlock(&vcpu->kvm->mmu_lock); /* If there is no appropriate root, allocate one. */ - root = alloc_tdp_mmu_root(vcpu, role); + root = alloc_tdp_mmu_root(vcpu); return root; } From patchwork Fri Sep 25 21:22:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A10006CA for ; Fri, 25 Sep 2020 21:24:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86D9220738 for ; Fri, 25 Sep 2020 21:24:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aRYjgaix" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729204AbgIYVXo (ORCPT ); Fri, 25 Sep 2020 17:23:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729208AbgIYVXf (ORCPT ); Fri, 25 Sep 2020 17:23:35 -0400 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF503C0613D8 for ; Fri, 25 Sep 2020 14:23:30 -0700 (PDT) Received: by mail-qv1-xf49.google.com with SMTP id y2so2648445qvs.14 for ; Fri, 25 Sep 2020 14:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=pNPbyzdy4uZV3DP217Bt0WHUXD3oX1/wytImNmZHn8U=; b=aRYjgaixoPJBp/KStmOjkU+cTijMIae2uSaZvbtEcGrrJXOnqFY8LtgjbIYmvcm1GI a2lak5FEqo2KPG/1mb+F0p0QHduPz3PSm9tiGg9v8rs4I2vGoZC3TpXzgFk0LdpKbtQC sonw79QOkuGwHUC9oxeCXTWqlGT50Uwys8Pt5AZqGT0iH1wxF90SOOjXAre2NPKfivqo l9R/zEOJhUb/qibl5hidseTqQvjaIKQKiEXLBNj3E9qNjXngyvmLbomZyB1nQz9VCnhn I4LNeRWxDDxgIZWvY6udyvSgd5QOrh9g6bL1N2cd81jH4oXN/Io1S7DO+iV9IvPe4Qni 2ylQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=pNPbyzdy4uZV3DP217Bt0WHUXD3oX1/wytImNmZHn8U=; b=gjYl5g3uWzybwpfQHJJKAbNIR/22WWwEicsYSf5lvoSKimEaIQ55Q6KHEU1iB+OoAB 19pufNhk17zrLm8DKGIELgFbTJC5U6OaFZMgtTzsYFRsr4pjMzvXsnLeSXFpHum+nAOa lACLpyB6el7sTNKosZ9tXldjbu3DclPVrYzGmIYIw/bDzQYy9yDXe8o9TYPiQ9TPs4ur Tgh4hwq8DDFutn+mjzSY3pj8T9QrK7arLee0mnKAPPr2w6uo7pqFptdorPf97fuZKDo6 oE2iSIqi2A+XX/ClCQkzNOegac/xXrjDOmc5IxLOlaLCZXut/KKRezEfMjc8VuMfTPCA 59Yw== X-Gm-Message-State: AOAM533Gfbi12JIoyHtGvHMBqZABSPqoUjBo/7ZNY8I+10mgj8DQBnKb TaQYz1w+QRizKXLFtDHr4+K2SOAqAsoE X-Google-Smtp-Source: ABdhPJzVOIKkGH44/sMKpQKYViiKvkjAodLvduGiZr6OkdYZn9iwyH8auNMyEptbtjPKMoxXrXzgoxfnarxT Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a05:6214:292:: with SMTP id l18mr639868qvv.5.1601069010017; Fri, 25 Sep 2020 14:23:30 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:52 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-13-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 12/22] kvm: mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Attach struct kvm_mmu_pages to every page in the TDP MMU to track metadata, facilitate NX reclaim, and enable inproved parallelism of MMU operations in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/include/asm/kvm_host.h | 4 ++++ arch/x86/kvm/mmu/tdp_mmu.c | 13 ++++++++++--- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9ce6b35ecb33a..a76bcb51d43d8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -972,7 +972,11 @@ struct kvm_arch { * operations. */ bool tdp_mmu_enabled; + + /* List of struct tdp_mmu_pages being used as roots */ struct list_head tdp_mmu_roots; + /* List of struct tdp_mmu_pages not being used as roots */ + struct list_head tdp_mmu_pages; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a3bcee6bf30e8..557e780bdf9f9 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -30,6 +30,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm) kvm->arch.tdp_mmu_enabled = true; INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots); + INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages); } void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) @@ -244,6 +245,7 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, bool is_leaf = is_present && is_last_spte(new_spte, level); bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); u64 *pt; + struct kvm_mmu_page *sp; u64 old_child_spte; int i; @@ -309,6 +311,9 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, */ if (was_present && !was_leaf && (pfn_changed || !is_present)) { pt = spte_to_child_pt(old_spte, level); + sp = sptep_to_sp(pt); + + list_del(&sp->link); for (i = 0; i < PT64_ENT_PER_PAGE; i++) { old_child_spte = *(pt + i); @@ -322,6 +327,7 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, KVM_PAGES_PER_HPAGE(level)); free_page((unsigned long)pt); + kmem_cache_free(mmu_page_header_cache, sp); } } @@ -474,8 +480,7 @@ int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, bool prefault, bool account_disallowed_nx_lpage) { struct tdp_iter iter; - struct kvm_mmu_memory_cache *pf_pt_cache = - &vcpu->arch.mmu_shadow_page_cache; + struct kvm_mmu_page *sp; u64 *child_pt; u64 new_spte; int ret; @@ -520,7 +525,9 @@ int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, } if (!is_shadow_present_pte(iter.old_spte)) { - child_pt = kvm_mmu_memory_cache_alloc(pf_pt_cache); + sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level); + list_add(&sp->link, &vcpu->kvm->arch.tdp_mmu_pages); + child_pt = sp->spt; clear_page(child_pt); new_spte = make_nonleaf_spte(child_pt, !shadow_accessed_mask); From patchwork Fri Sep 25 21:22:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800809 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B391C6CA for ; Fri, 25 Sep 2020 21:24:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 923F221D7F for ; Fri, 25 Sep 2020 21:24:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="G9grlhxj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729178AbgIYVXm (ORCPT ); Fri, 25 Sep 2020 17:23:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729196AbgIYVXf (ORCPT ); Fri, 25 Sep 2020 17:23:35 -0400 Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EAF6C0613DC for ; Fri, 25 Sep 2020 14:23:32 -0700 (PDT) Received: by mail-qk1-x74a.google.com with SMTP id r184so3018175qka.21 for ; Fri, 25 Sep 2020 14:23:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=uYos/4sZVdvJsXvHVNRoyrmDbfs5jzElVh9jP80i41w=; b=G9grlhxj+HtyqbvctWFd6D2comHM/DsEHcFELG3q5sdigAtI3zUKLWOn6EPcnzKlY+ MbJucJB/NCCXzNuOGISYHjVQ29ELJr9K0EKdt67k3mhsyHJzgLR08VirBiYVx1lJhWOT ssVD0KX1jNo6gi+nQnbg8x6FMJvoA+/ZG6l3Sij0i2b8RtpdMkAbojY7PCHaSHqWqZCS ipO8vM7vlh1K6W/QCQF9+GtqWWoSv0XV0aNNksMFW19m1gWNbCo9gsze4SVQ2XpQwcNa KbnMKaOtO8yBkvwjMa968daGRcJqhlH1fR+K+au1d4f2uVTo97iUYbiAuUHz4eXXgLo2 sxjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=uYos/4sZVdvJsXvHVNRoyrmDbfs5jzElVh9jP80i41w=; b=rYGLi/oU8vyp7fZnZWtoTCHxlXOrRafzjS0q+JS2RcdnhBremgkYBOWnzQULxjUJOR dUsNHwjH5H89Swd1lGPmsZIn4I+JQtXROTHq3UlRYCiCd4CoUiFlpJH1MT4MtG+41AFu vNgMorwIdstOExobxLdir1aZxC3ESwlChR/CL9rlUVuPPPar4nSJzHkQsXEOYWohOqrd qnvwLPbTBD3A5sj2hecQjq/BqiS6ex/95TXjB0+Lidk9Psy3vW013Dwvr3pAHmDyoqoy q+7TjiptHvOdeduoXzuemlngnKawMd2h3Kc7As8RYwAMjhRJ6khiKiXZ8WHUP2o3WxJJ /8hw== X-Gm-Message-State: AOAM532JRg+x2ZiRA/gXjlrTGBnq4euDBXjU15dr4qHsUg2V1kFF+RIN pDG0c+8W1ghBX5BAHBtanGZaWnv9d43J X-Google-Smtp-Source: ABdhPJxtFOw51sQanwNBWDZItA0R4Pb9VRTruYEcKh3jRqvwaWSYsTKOwn6fiXFmj2WGPoYa9ilpwbnw0YrQ Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:ad4:4594:: with SMTP id x20mr716275qvu.4.1601069011627; Fri, 25 Sep 2020 14:23:31 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:53 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-14-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 13/22] kvm: mmu: Support invalidate range MMU notifier for TDP MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to interoperate correctly with the rest of KVM and other Linux subsystems, the TDP MMU must correctly handle various MMU notifiers. Add hooks to handle the invalidate range family of MMU notifiers. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 9 ++++- arch/x86/kvm/mmu/tdp_mmu.c | 80 +++++++++++++++++++++++++++++++++++--- arch/x86/kvm/mmu/tdp_mmu.h | 3 ++ 3 files changed, 86 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 52d661a758585..0ddfdab942554 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1884,7 +1884,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end, unsigned flags) { - return kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp); + int r; + + r = kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp); + + if (kvm->arch.tdp_mmu_enabled) + r |= kvm_tdp_mmu_zap_hva_range(kvm, start, end); + + return r; } int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 557e780bdf9f9..1cea58db78a13 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -60,7 +60,7 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) } static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, - gfn_t start, gfn_t end); + gfn_t start, gfn_t end, bool can_yield); static void free_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) { @@ -73,7 +73,7 @@ static void free_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) list_del(&root->link); - zap_gfn_range(kvm, root, 0, max_gfn); + zap_gfn_range(kvm, root, 0, max_gfn, false); free_page((unsigned long)root->spt); kmem_cache_free(mmu_page_header_cache, root); @@ -361,9 +361,14 @@ static bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter) * non-root pages mapping GFNs strictly within that range. Returns true if * SPTEs have been cleared and a TLB flush is needed before releasing the * MMU lock. + * If can_yield is true, will release the MMU lock and reschedule if the + * scheduler needs the CPU or there is contention on the MMU lock. If this + * function cannot yield, it will not release the MMU lock or reschedule and + * the caller must ensure it does not supply too large a GFN range, or the + * operation can cause a soft lockup. */ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, - gfn_t start, gfn_t end) + gfn_t start, gfn_t end, bool can_yield) { struct tdp_iter iter; bool flush_needed = false; @@ -387,7 +392,10 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, 0, iter.level); - flush_needed = !tdp_mmu_iter_cond_resched(kvm, &iter); + if (can_yield) + flush_needed = !tdp_mmu_iter_cond_resched(kvm, &iter); + else + flush_needed = true; } return flush_needed; } @@ -410,7 +418,7 @@ bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end) */ get_tdp_mmu_root(kvm, root); - flush = zap_gfn_range(kvm, root, start, end) || flush; + flush = zap_gfn_range(kvm, root, start, end, true) || flush; put_tdp_mmu_root(kvm, root); } @@ -551,3 +559,65 @@ int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, return ret; } + +static int kvm_tdp_mmu_handle_hva_range(struct kvm *kvm, unsigned long start, + unsigned long end, unsigned long data, + int (*handler)(struct kvm *kvm, struct kvm_memory_slot *slot, + struct kvm_mmu_page *root, gfn_t start, + gfn_t end, unsigned long data)) +{ + struct kvm_memslots *slots; + struct kvm_memory_slot *memslot; + struct kvm_mmu_page *root; + int ret = 0; + int as_id; + + for_each_tdp_mmu_root(kvm, root) { + /* + * Take a reference on the root so that it cannot be freed if + * this thread releases the MMU lock and yields in this loop. + */ + get_tdp_mmu_root(kvm, root); + + as_id = kvm_mmu_page_as_id(root); + slots = __kvm_memslots(kvm, as_id); + kvm_for_each_memslot(memslot, slots) { + unsigned long hva_start, hva_end; + gfn_t gfn_start, gfn_end; + + hva_start = max(start, memslot->userspace_addr); + hva_end = min(end, memslot->userspace_addr + + (memslot->npages << PAGE_SHIFT)); + if (hva_start >= hva_end) + continue; + /* + * {gfn(page) | page intersects with [hva_start, hva_end)} = + * {gfn_start, gfn_start+1, ..., gfn_end-1}. + */ + gfn_start = hva_to_gfn_memslot(hva_start, memslot); + gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot); + + ret |= handler(kvm, memslot, root, gfn_start, + gfn_end, data); + } + + put_tdp_mmu_root(kvm, root); + } + + return ret; +} + +static int zap_gfn_range_hva_wrapper(struct kvm *kvm, + struct kvm_memory_slot *slot, + struct kvm_mmu_page *root, gfn_t start, + gfn_t end, unsigned long unused) +{ + return zap_gfn_range(kvm, root, start, end, false); +} + +int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start, + unsigned long end) +{ + return kvm_tdp_mmu_handle_hva_range(kvm, start, end, 0, + zap_gfn_range_hva_wrapper); +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index abf23dc0ab7ad..ce804a97bfa1d 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -18,4 +18,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm); int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, int level, gpa_t gpa, kvm_pfn_t pfn, bool prefault, bool lpage_disallowed); + +int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start, + unsigned long end); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 668EC112E for ; Fri, 25 Sep 2020 21:24:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3EF3221D7F for ; Fri, 25 Sep 2020 21:24:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ijNRLRNm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728967AbgIYVXn (ORCPT ); Fri, 25 Sep 2020 17:23:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729203AbgIYVXf (ORCPT ); Fri, 25 Sep 2020 17:23:35 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03A7CC0613DE for ; Fri, 25 Sep 2020 14:23:33 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id p11so278827pjv.2 for ; Fri, 25 Sep 2020 14:23:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=V/5gyg3b1e2ud3cg8TI5rpqY4AhLEdY7X2TDge3jAek=; b=ijNRLRNmK8bNQMiKB51n8oMQdPgAncOf8UKcf7TPO/35Qp7hv/fwsayZJTr8T7viay tUGLUo8vbV04OgAZNtShTIwuaPdUXN7nOe5d1enViWhGiDqya4Vm39ToNUxCeCQkqkjq jU6NnvnQPuchegsnhjWOl94QGfD0oRkYQkjXVGM0HlwH+o7r0SukFZIOlOAreLujy9/G ajdqhKMRg8wl6qhjrH9KYgCGXzbMA+W4hEB3sFPE31+0D6HrwtuXrpeqVrltXXQfuTkJ apVe0gD7Uf+ajpH32IO6IE4/7KKj9SE44RSJG3CFpQpeFpkCjP3s/fvBCdTX/bS9IfMM P8NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=V/5gyg3b1e2ud3cg8TI5rpqY4AhLEdY7X2TDge3jAek=; b=Z14licQCbQ8Zgx0fKaHd2qs5ZXYCJEmI8s85dZ/GbSDVh3I9zMfdZolF67PvAzEdJD zCtc5VUTtyg2g3V/wYGMLoyUR468DPP52Ws/fed3WagrAxFZIVjefelf8QJeKlUSDbzp 6DhggGhW5nZCCdFa4HK5MXz+ZjJTDGPod9BdL7dh8+HqQy3QQcC7+Ae45l7Ptfkg9NnT 1xG1tSRJ4z+OO4z84o+lB7dk6KL/zECIgAg6mAqhlyUv/N6ZaLVpZ+/kBXF5dCU12HXZ irzzjfXNUyMsa8MCqC3wJPanjt8uEG9YafgaH1lHoOI60ZztUb736kGvkyZvRAM3UYH2 /Jnw== X-Gm-Message-State: AOAM5324y/0hQkv/KNRpABR/mD0GpHqQWz8EuZnWDlQffA0hlbXZz7Nm fFoWHRNANCfb/8xiRhgALY7fXwoUMgiR X-Google-Smtp-Source: ABdhPJyAaseQrHpUJntVt4+GThwAkECWq7yLGgUxzMprjqzyHSSxPzY5gPa5abbqexeNVzqcEZcoJdBqAOFn Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:902:fe83:b029:d2:2359:e64b with SMTP id x3-20020a170902fe83b02900d22359e64bmr1313762plm.7.1601069013480; Fri, 25 Sep 2020 14:23:33 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:54 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-15-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 14/22] kvm: mmu: Add access tracking for tdp_mmu From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to interoperate correctly with the rest of KVM and other Linux subsystems, the TDP MMU must correctly handle various MMU notifiers. The main Linux MM uses the access tracking MMU notifiers for swap and other features. Add hooks to handle the test/flush HVA (range) family of MMU notifiers. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 29 ++++++--- arch/x86/kvm/mmu/mmu_internal.h | 7 +++ arch/x86/kvm/mmu/tdp_mmu.c | 103 +++++++++++++++++++++++++++++++- arch/x86/kvm/mmu/tdp_mmu.h | 4 ++ 4 files changed, 133 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0ddfdab942554..8c1e806b3d53f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -212,12 +212,12 @@ static struct percpu_counter kvm_total_used_mmu_pages; static u64 __read_mostly shadow_nx_mask; static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ -static u64 __read_mostly shadow_user_mask; +u64 __read_mostly shadow_user_mask; u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; static u64 __read_mostly shadow_mmio_value; static u64 __read_mostly shadow_mmio_access_mask; -static u64 __read_mostly shadow_present_mask; +u64 __read_mostly shadow_present_mask; static u64 __read_mostly shadow_me_mask; /* @@ -265,7 +265,6 @@ static u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; static u8 __read_mostly shadow_phys_bits; static void mmu_spte_set(u64 *sptep, u64 spte); -static bool is_executable_pte(u64 spte); static union kvm_mmu_page_role kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); @@ -332,7 +331,7 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu) return vcpu->arch.mmu == &vcpu->arch.guest_mmu; } -static inline bool spte_ad_enabled(u64 spte) +inline bool spte_ad_enabled(u64 spte) { MMU_WARN_ON(is_mmio_spte(spte)); return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_DISABLED_MASK; @@ -607,7 +606,7 @@ int is_last_spte(u64 pte, int level) return 0; } -static bool is_executable_pte(u64 spte) +bool is_executable_pte(u64 spte) { return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask; } @@ -791,7 +790,7 @@ static bool spte_has_volatile_bits(u64 spte) return false; } -static bool is_accessed_spte(u64 spte) +bool is_accessed_spte(u64 spte) { u64 accessed_mask = spte_shadow_accessed_mask(spte); @@ -941,7 +940,7 @@ static u64 mmu_spte_get_lockless(u64 *sptep) return __get_spte_lockless(sptep); } -static u64 mark_spte_for_access_track(u64 spte) +u64 mark_spte_for_access_track(u64 spte) { if (spte_ad_enabled(spte)) return spte & ~shadow_accessed_mask; @@ -1945,12 +1944,24 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) { - return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp); + int young = false; + + young = kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp); + if (kvm->arch.tdp_mmu_enabled) + young |= kvm_tdp_mmu_age_hva_range(kvm, start, end); + + return young; } int kvm_test_age_hva(struct kvm *kvm, unsigned long hva) { - return kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp); + int young = false; + + young = kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp); + if (kvm->arch.tdp_mmu_enabled) + young |= kvm_tdp_mmu_test_age_hva(kvm, hva); + + return young; } #ifdef MMU_DEBUG diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 4cef9da051847..228bda0885552 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -80,7 +80,9 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ * PT64_LEVEL_BITS))) - 1)) +extern u64 shadow_user_mask; extern u64 shadow_accessed_mask; +extern u64 shadow_present_mask; #define ACC_EXEC_MASK 1 #define ACC_WRITE_MASK PT_WRITABLE_MASK @@ -95,6 +97,9 @@ int is_last_spte(u64 pte, int level); bool is_dirty_spte(u64 spte); int is_large_pte(u64 pte); bool is_access_track_spte(u64 spte); +bool is_accessed_spte(u64 spte); +bool spte_ad_enabled(u64 spte); +bool is_executable_pte(u64 spte); void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn, u64 pages); @@ -132,4 +137,6 @@ bool is_nx_huge_page_enabled(void); void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); +u64 mark_spte_for_access_track(u64 spte); + #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1cea58db78a13..0a4b98669b3ef 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -224,6 +224,18 @@ static int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) return sp->role.smm ? 1 : 0; } +static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level) +{ + bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); + + if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level)) + return; + + if (is_accessed_spte(old_spte) && + (!is_accessed_spte(new_spte) || pfn_changed)) + kvm_set_pfn_accessed(spte_to_pfn(old_spte)); +} + /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -236,7 +248,7 @@ static int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) * Handle bookkeeping that might result from the modification of a SPTE. * This function must be called for all TDP SPTE modifications. */ -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, +static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_spte, u64 new_spte, int level) { bool was_present = is_shadow_present_pte(old_spte); @@ -331,6 +343,13 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, } } +static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, + u64 old_spte, u64 new_spte, int level) +{ + __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level); + handle_changed_spte_acc_track(old_spte, new_spte, level); +} + #define for_each_tdp_pte_root(_iter, _root, _start, _end) \ for_each_tdp_pte(_iter, _root->spt, _root->role.level, _start, _end) @@ -621,3 +640,85 @@ int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start, return kvm_tdp_mmu_handle_hva_range(kvm, start, end, 0, zap_gfn_range_hva_wrapper); } + +/* + * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero + * if any of the GFNs in the range have been accessed. + */ +static int age_gfn_range(struct kvm *kvm, struct kvm_memory_slot *slot, + struct kvm_mmu_page *root, gfn_t start, gfn_t end, + unsigned long unused) +{ + struct tdp_iter iter; + int young = 0; + u64 new_spte = 0; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, start, end) { + if (!is_shadow_present_pte(iter.old_spte) || + !is_last_spte(iter.old_spte, iter.level)) + continue; + + /* + * If we have a non-accessed entry we don't need to change the + * pte. + */ + if (!is_accessed_spte(iter.old_spte)) + continue; + + new_spte = iter.old_spte; + + if (spte_ad_enabled(new_spte)) { + clear_bit((ffs(shadow_accessed_mask) - 1), + (unsigned long *)&new_spte); + } else { + /* + * Capture the dirty status of the page, so that it doesn't get + * lost when the SPTE is marked for access tracking. + */ + if (is_writable_pte(new_spte)) + kvm_set_pfn_dirty(spte_to_pfn(new_spte)); + + new_spte = mark_spte_for_access_track(new_spte); + } + + *iter.sptep = new_spte; + __handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + young = true; + } + + return young; +} + +int kvm_tdp_mmu_age_hva_range(struct kvm *kvm, unsigned long start, + unsigned long end) +{ + return kvm_tdp_mmu_handle_hva_range(kvm, start, end, 0, + age_gfn_range); +} + +static int test_age_gfn(struct kvm *kvm, struct kvm_memory_slot *slot, + struct kvm_mmu_page *root, gfn_t gfn, gfn_t unused, + unsigned long unused2) +{ + struct tdp_iter iter; + int young = 0; + + for_each_tdp_pte_root(iter, root, gfn, gfn + 1) { + if (!is_shadow_present_pte(iter.old_spte) || + !is_last_spte(iter.old_spte, iter.level)) + continue; + + if (is_accessed_spte(iter.old_spte)) + young = true; + } + + return young; +} + +int kvm_tdp_mmu_test_age_hva(struct kvm *kvm, unsigned long hva) +{ + return kvm_tdp_mmu_handle_hva_range(kvm, hva, hva + 1, 0, + test_age_gfn); +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index ce804a97bfa1d..f316773b7b5a8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -21,4 +21,8 @@ int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end); + +int kvm_tdp_mmu_age_hva_range(struct kvm *kvm, unsigned long start, + unsigned long end); +int kvm_tdp_mmu_test_age_hva(struct kvm *kvm, unsigned long hva); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 37BFF6CA for ; Fri, 25 Sep 2020 21:23:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08EFF21D7F for ; Fri, 25 Sep 2020 21:23:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rcO6+on5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729265AbgIYVXj (ORCPT ); Fri, 25 Sep 2020 17:23:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729231AbgIYVXg (ORCPT ); Fri, 25 Sep 2020 17:23:36 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73CA4C0613CE for ; Fri, 25 Sep 2020 14:23:36 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id m13so3419885pfk.19 for ; Fri, 25 Sep 2020 14:23:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=2ZW97evDdGZ1vZX3rJE1h1hcVRAtHA0th2WLnrerXFE=; b=rcO6+on5bvD26jOOTJ3pOOYuGTkqfwYFC2emDtlAgUI2igLZjBi49pSvzd/SNLCrhx bGwrsMwhm1nJI2YUtnl6TLktq1Br9eg6iw9fHvWwXLo6kGabTddTZv1Lfwi1HEDSkqJd dJtkevhjDUvyMh0C8odOucJVXLzeK7tkVVph+T/mIvo+ezLqm3pC5NTOKUnH4I06HQeJ gHF9i5plGAyiYFI0X5igAAN5NbR6QeFJWLjF+G/MmNWqW9cnpSRC96NkCUsYVtfHGEAL 2DQmye1AZ2SKfmuHvJXxXiveWNZQQ+jJvDaqvNG/3jO5XaumXd1LNvO/13jy1KzMGtTd bmZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=2ZW97evDdGZ1vZX3rJE1h1hcVRAtHA0th2WLnrerXFE=; b=Re19UmL7gfr3cfYAfyIjevdF9GHJdkcOnAGRDUNPvCwYMfz6JiGlBn4HDjjHKpGchV 9B0yj+Pr9yNVJ1e+3syxHV1jX5yaPqv2Wfg5YNO3BBCX72l2h9jGmFPlCZGE+0UHo5Oh zbwdVxbryRNDSZacwfquXirm/tMjei6w0ofQlSdSbuFVDyAtYGXCb7th5P9251IetdFF Mc+GdLUthgRppzHvv0p8w6SocAmQipCFhNTuraeSf4iPg3PWhS5SuDnOSZqYceloqyaR r3ImS2+/nHGZclOrlrctW/snOxu1a/lXA6zx0Cbb6yExHBY2hmDxFKtc/8QC8KyTxaF1 dDpQ== X-Gm-Message-State: AOAM53390K4xAgS2vbbFk4yxTCImtqdJ+gGFgFnhxfpgtEDEFFAZE2aR lfiU/gNE+x9ERnbyCiIVDkr5P8FO+vBm X-Google-Smtp-Source: ABdhPJxOnXFh3ugXDvH7iih5//M0RhAFbKOi44WgqPGt00bT5oGQJZK6p+S9qNOEz7cHOrDCuqRQESiiEZQk Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:90b:941:: with SMTP id dw1mr29935pjb.1.1601069015590; Fri, 25 Sep 2020 14:23:35 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:55 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-16-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 15/22] kvm: mmu: Support changed pte notifier in tdp MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to interoperate correctly with the rest of KVM and other Linux subsystems, the TDP MMU must correctly handle various MMU notifiers. Add a hook and handle the change_pte MMU notifier. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 46 +++++++++++++------------ arch/x86/kvm/mmu/mmu_internal.h | 13 +++++++ arch/x86/kvm/mmu/tdp_mmu.c | 61 +++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 3 ++ 4 files changed, 102 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8c1e806b3d53f..0d80abe82ca93 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -122,9 +122,6 @@ module_param(dbg, bool, 0644); #define PTE_PREFETCH_NUM 8 -#define PT_FIRST_AVAIL_BITS_SHIFT 10 -#define PT64_SECOND_AVAIL_BITS_SHIFT 54 - /* * The mask used to denote special SPTEs, which can be either MMIO SPTEs or * Access Tracking SPTEs. @@ -147,13 +144,6 @@ module_param(dbg, bool, 0644); #define PT32_INDEX(address, level)\ (((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1)) - -#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK -#define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1)) -#else -#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) -#endif - #define PT32_BASE_ADDR_MASK PAGE_MASK #define PT32_DIR_BASE_ADDR_MASK \ (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1)) @@ -170,9 +160,6 @@ module_param(dbg, bool, 0644); #include -#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) -#define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1)) - /* make pte_list_desc fit well in cache line */ #define PTE_LIST_EXT 3 @@ -1708,6 +1695,21 @@ static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, return kvm_zap_rmapp(kvm, rmap_head); } +u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn) +{ + u64 new_spte; + + new_spte = old_spte & ~PT64_BASE_ADDR_MASK; + new_spte |= (u64)new_pfn << PAGE_SHIFT; + + new_spte &= ~PT_WRITABLE_MASK; + new_spte &= ~SPTE_HOST_WRITEABLE; + + new_spte = mark_spte_for_access_track(new_spte); + + return new_spte; +} + static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, struct kvm_memory_slot *slot, gfn_t gfn, int level, unsigned long data) @@ -1733,13 +1735,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, pte_list_remove(rmap_head, sptep); goto restart; } else { - new_spte = *sptep & ~PT64_BASE_ADDR_MASK; - new_spte |= (u64)new_pfn << PAGE_SHIFT; - - new_spte &= ~PT_WRITABLE_MASK; - new_spte &= ~SPTE_HOST_WRITEABLE; - - new_spte = mark_spte_for_access_track(new_spte); + new_spte = kvm_mmu_changed_pte_notifier_make_spte( + *sptep, new_pfn); mmu_spte_clear_track_bits(sptep); mmu_spte_set(sptep, new_spte); @@ -1895,7 +1892,14 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end, int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) { - return kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp); + int r; + + r = kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp); + + if (kvm->arch.tdp_mmu_enabled) + r |= kvm_tdp_mmu_set_spte_hva(kvm, hva, &pte); + + return r; } static int kvm_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 228bda0885552..8eaa6e4764bce 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -80,6 +80,12 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ * PT64_LEVEL_BITS))) - 1)) +#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK +#define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1)) +#else +#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) +#endif + extern u64 shadow_user_mask; extern u64 shadow_accessed_mask; extern u64 shadow_present_mask; @@ -89,6 +95,12 @@ extern u64 shadow_present_mask; #define ACC_USER_MASK PT_USER_MASK #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) +#define PT_FIRST_AVAIL_BITS_SHIFT 10 +#define PT64_SECOND_AVAIL_BITS_SHIFT 54 + +#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) +#define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1)) + /* Functions for interpreting SPTEs */ kvm_pfn_t spte_to_pfn(u64 pte); bool is_mmio_spte(u64 spte); @@ -138,5 +150,6 @@ bool is_nx_huge_page_enabled(void); void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); u64 mark_spte_for_access_track(u64 spte); +u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn); #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0a4b98669b3ef..3119583409131 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -722,3 +722,64 @@ int kvm_tdp_mmu_test_age_hva(struct kvm *kvm, unsigned long hva) return kvm_tdp_mmu_handle_hva_range(kvm, hva, hva + 1, 0, test_age_gfn); } + +/* + * Handle the changed_pte MMU notifier for the TDP MMU. + * data is a pointer to the new pte_t mapping the HVA specified by the MMU + * notifier. + * Returns non-zero if a flush is needed before releasing the MMU lock. + */ +static int set_tdp_spte(struct kvm *kvm, struct kvm_memory_slot *slot, + struct kvm_mmu_page *root, gfn_t gfn, gfn_t unused, + unsigned long data) +{ + struct tdp_iter iter; + pte_t *ptep = (pte_t *)data; + kvm_pfn_t new_pfn; + u64 new_spte; + int need_flush = 0; + int as_id = kvm_mmu_page_as_id(root); + + WARN_ON(pte_huge(*ptep)); + + new_pfn = pte_pfn(*ptep); + + for_each_tdp_pte_root(iter, root, gfn, gfn + 1) { + if (iter.level != PG_LEVEL_4K) + continue; + + if (!is_shadow_present_pte(iter.old_spte)) + break; + + *iter.sptep = 0; + handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + + kvm_flush_remote_tlbs_with_address(kvm, iter.gfn, 1); + + if (!pte_write(*ptep)) { + new_spte = kvm_mmu_changed_pte_notifier_make_spte( + iter.old_spte, new_pfn); + + *iter.sptep = new_spte; + handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + } + + need_flush = 1; + } + + if (need_flush) + kvm_flush_remote_tlbs_with_address(kvm, gfn, 1); + + return 0; +} + +int kvm_tdp_mmu_set_spte_hva(struct kvm *kvm, unsigned long address, + pte_t *host_ptep) +{ + return kvm_tdp_mmu_handle_hva_range(kvm, address, address + 1, + (unsigned long)host_ptep, + set_tdp_spte); +} + diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index f316773b7b5a8..5a399aa60b8d8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -25,4 +25,7 @@ int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start, int kvm_tdp_mmu_age_hva_range(struct kvm *kvm, unsigned long start, unsigned long end); int kvm_tdp_mmu_test_age_hva(struct kvm *kvm, unsigned long hva); + +int kvm_tdp_mmu_set_spte_hva(struct kvm *kvm, unsigned long address, + pte_t *host_ptep); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800813 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 115866CA for ; Fri, 25 Sep 2020 21:24:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E2DC9221EC for ; Fri, 25 Sep 2020 21:24:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dfKIlNg1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729451AbgIYVYX (ORCPT ); Fri, 25 Sep 2020 17:24:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729258AbgIYVXi (ORCPT ); Fri, 25 Sep 2020 17:23:38 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 770F2C0613D5 for ; Fri, 25 Sep 2020 14:23:38 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id c5so3230309qtd.12 for ; Fri, 25 Sep 2020 14:23:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=jRVbtRnVunaUbsRIJ3qqaIgFOVHoYWyu7xL6dJMWEVU=; b=dfKIlNg1y7ZqiIyhdvXSMvj8pVfsBfNZKnyiXx9VE5GfVSl8WjprUfryl5rIymrH7i XJ5s8PI7wf36Feoycno1aFCYIdLW1nOpheLf6Ol5H7/wydSBAcGTZrv3yOD6n1ixHN7g TFN8MFXDwstrtpdbz3Bdy/ZmPNuf5BKCWp51ogfcYtoQlKNiODD/+gQTSHj0VUvhme+E bkbmyzK6iDSvJO6R9T1UVGbCqIewIbrTGl5UbW5LhlUfSabrFEF5NAy1jgQY6BDYbK+6 DSoaRk9Y3Z5QYdeDtnAeLl3NLxVPSfu+P6M+YUz3iYag3LHoHwrwwPuebKGPw53HoJUb itpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=jRVbtRnVunaUbsRIJ3qqaIgFOVHoYWyu7xL6dJMWEVU=; b=ZddU+VRbXhBVRyXve+XDAsi6KJlS6p69yRLQyTrHcxepeugcgNckGmqI+lvRlpbP+n l07IDek5IEPxtDNj+BSljqTqNHTM2VhsJo98PC7owpvTpHf3kh8kgnDemCPbaKoCCCmY rmPKzROvSNEzCf1HJbjbxI+5SkkGgZTTjlzKLnHh7oowWkUfFDb1J2iOp1Z8XWY/7fJg Jk9Qvx1eEbLfyqeQuLNSwAqH2fIW2yb3qLXenRdimsCRDQWJHiYbAZdBntKrd9SOtAuj jQo9FBeQb7V9gNy0QTgziEy6aOJmUccYGwlbrq/GehYVUkiqDrqvgVb7P1vCAsDsmy5O gI8g== X-Gm-Message-State: AOAM533JT0jRhNCBM6ZNxNPn3N4E7mzA1L0/1+ziYRDBQp2ce5JPNTX9 V0RbOYO+3OXgJb2pPJTSLubLJ4r9vMrz X-Google-Smtp-Source: ABdhPJw17NhL/7qCi5kl9UW4U6EP5dY6yG9Ua1eYsaf+4istvoy/1+8ZHu8SQUzB7xuFrOmapRmOE+NxZVtZ Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:ad4:5745:: with SMTP id q5mr636253qvx.29.1601069017635; Fri, 25 Sep 2020 14:23:37 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:56 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-17-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 16/22] kvm: mmu: Add dirty logging handler for changed sptes From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a function to handle the dirty logging bookkeeping associated with SPTE changes. This will be important for future commits which will allow the TDP MMU to log dirty pages the same way the x86 shadow paging based MMU does. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/tdp_mmu.c | 21 +++++++++++++++++++++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 6 ++---- 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3119583409131..bbe973d3f8084 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -236,6 +236,24 @@ static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level) kvm_set_pfn_accessed(spte_to_pfn(old_spte)); } +static void handle_changed_spte_dlog(struct kvm *kvm, int as_id, gfn_t gfn, + u64 old_spte, u64 new_spte, int level) +{ + bool pfn_changed; + struct kvm_memory_slot *slot; + + if (level > PG_LEVEL_4K) + return; + + pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); + + if ((!is_writable_pte(old_spte) || pfn_changed) && + is_writable_pte(new_spte)) { + slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn); + mark_page_dirty_in_slot(slot, gfn); + } +} + /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -348,6 +366,7 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, { __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level); handle_changed_spte_acc_track(old_spte, new_spte, level); + handle_changed_spte_dlog(kvm, as_id, gfn, old_spte, new_spte, level); } #define for_each_tdp_pte_root(_iter, _root, _start, _end) \ @@ -685,6 +704,8 @@ static int age_gfn_range(struct kvm *kvm, struct kvm_memory_slot *slot, *iter.sptep = new_spte; __handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, new_spte, iter.level); + handle_changed_spte_dlog(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); young = true; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a460bc712a81c..2f8c3f644d809 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -798,6 +798,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn); +void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f9c80351c9efd..b5082ce60a33f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -143,8 +143,6 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); -static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn); - __visible bool kvm_rebooting; EXPORT_SYMBOL_GPL(kvm_rebooting); @@ -2640,8 +2638,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) } EXPORT_SYMBOL_GPL(kvm_clear_guest); -static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, - gfn_t gfn) +void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn) { if (memslot && memslot->dirty_bitmap) { unsigned long rel_gfn = gfn - memslot->base_gfn; @@ -2649,6 +2646,7 @@ static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, set_bit_le(rel_gfn, memslot->dirty_bitmap); } } +EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot); void mark_page_dirty(struct kvm *kvm, gfn_t gfn) { From patchwork Fri Sep 25 21:22:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800811 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBAF5112E for ; Fri, 25 Sep 2020 21:24:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 870C621D42 for ; Fri, 25 Sep 2020 21:24:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="P8zn92rx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728983AbgIYVYX (ORCPT ); Fri, 25 Sep 2020 17:24:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729268AbgIYVXj (ORCPT ); Fri, 25 Sep 2020 17:23:39 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7D29C0613D7 for ; Fri, 25 Sep 2020 14:23:39 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id c3so3297577pgj.5 for ; Fri, 25 Sep 2020 14:23:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=++T5s3YUay3rT5B+q2dH+DuGWd7bTsv0ett1YKhKSfo=; b=P8zn92rxX46Uh7YPJ9dvtX8aZ/bn5kQC0uN5VafSE/U4zgBDmTglpK36HInd8FIPSL jbFD35EAO/P+CUhsyxRkn2SNHcyYwilbEuKzJmtFLyjCVif2y1iE0TlC3xZRgV/Y8dS6 lK54XvJ2kWYSMTwPvCfS5jHysPNbD2xxaRj7Uzuc7izRCVqvaCLCQPyb7/wB6RqqGSbu SgBxo1NwPZnd021EJcUu1b61T79Prv4Naa3ewQ7ntygpbAtSoQ/c/BZlFuTpi9YvUP1w RW1jH3p7KwXVcrz0tTrguF3amRTXV+T3NOooFeJ4PhSrBLA0wbaZhM6l4hfZIx19z0O5 7JHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=++T5s3YUay3rT5B+q2dH+DuGWd7bTsv0ett1YKhKSfo=; b=JXp5R3bZvLyRJseGU9s2Y75hh/UvFTxZuQYesSJ/P10S7z8uZ+hmdP1/XrUaslmuMa r2JWjffv0pKkfkVNvPV/UWiSsk3ZeEUiPc68e2IYurblYBEzf3ViUpUVOXmZRlqc5Oc3 YWVyJQWWrNwsuPRyX/shKLt9b+wJ4xpPh+SlEiE4UljljL1n4q95HG0FV0QD+qMxC5bZ CXXXXry3OnsY4w4/TX8duzCYf7y4COhdfRU2tpIBa49hcmHbO5uQBaPMbj1rBofznMJK C1cnDz/yAnqQKXtUGCfigIQlUnox8tLIxKgxGrYa0wXPcnp8QVqVm0cS9g/oC1eHYPJ5 rpwg== X-Gm-Message-State: AOAM531zjwO09nNpbykxX5VJUQ3+3V83dDoNP5I1Nhs7/lZSq0R8p+g3 G8AhQ+OBcuk2M06qEWVoCsAPesyycuIc X-Google-Smtp-Source: ABdhPJx2ipmX5d0mSlnu2hDv6zA5hEE8DlZs0VHesABKTm87rPEAHVRB+/erIsPyvFo4Es+t6yjRJSb0wh69 Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:90a:ea0c:: with SMTP id w12mr431434pjy.65.1601069019270; Fri, 25 Sep 2020 14:23:39 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:57 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-18-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 17/22] kvm: mmu: Support dirty logging for the TDP MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Dirty logging is a key feature of the KVM MMU and must be supported by the TDP MMU. Add support for both the write protection and PML dirty logging modes. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 19 +- arch/x86/kvm/mmu/mmu_internal.h | 2 + arch/x86/kvm/mmu/tdp_iter.c | 18 ++ arch/x86/kvm/mmu/tdp_iter.h | 1 + arch/x86/kvm/mmu/tdp_mmu.c | 295 ++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 10 ++ 6 files changed, 343 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0d80abe82ca93..b9074603f9df1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -201,7 +201,7 @@ static u64 __read_mostly shadow_nx_mask; static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ u64 __read_mostly shadow_user_mask; u64 __read_mostly shadow_accessed_mask; -static u64 __read_mostly shadow_dirty_mask; +u64 __read_mostly shadow_dirty_mask; static u64 __read_mostly shadow_mmio_value; static u64 __read_mostly shadow_mmio_access_mask; u64 __read_mostly shadow_present_mask; @@ -324,7 +324,7 @@ inline bool spte_ad_enabled(u64 spte) return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_DISABLED_MASK; } -static inline bool spte_ad_need_write_protect(u64 spte) +inline bool spte_ad_need_write_protect(u64 spte) { MMU_WARN_ON(is_mmio_spte(spte)); return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK; @@ -1591,6 +1591,9 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, { struct kvm_rmap_head *rmap_head; + if (kvm->arch.tdp_mmu_enabled) + kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot, + slot->base_gfn + gfn_offset, mask, true); while (mask) { rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask), PG_LEVEL_4K, slot); @@ -1617,6 +1620,9 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm, { struct kvm_rmap_head *rmap_head; + if (kvm->arch.tdp_mmu_enabled) + kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot, + slot->base_gfn + gfn_offset, mask, false); while (mask) { rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask), PG_LEVEL_4K, slot); @@ -5954,6 +5960,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, spin_lock(&kvm->mmu_lock); flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, start_level, KVM_MAX_HUGEPAGE_LEVEL, false); + if (kvm->arch.tdp_mmu_enabled) + flush = kvm_tdp_mmu_wrprot_slot(kvm, memslot, false) || flush; spin_unlock(&kvm->mmu_lock); /* @@ -6034,6 +6042,7 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm, kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn, memslot->npages); } +EXPORT_SYMBOL_GPL(kvm_arch_flush_remote_tlbs_memslot); void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot) @@ -6042,6 +6051,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, spin_lock(&kvm->mmu_lock); flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false); + if (kvm->arch.tdp_mmu_enabled) + flush = kvm_tdp_mmu_clear_dirty_slot(kvm, memslot) || flush; spin_unlock(&kvm->mmu_lock); /* @@ -6063,6 +6074,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, spin_lock(&kvm->mmu_lock); flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect, false); + if (kvm->arch.tdp_mmu_enabled) + flush = kvm_tdp_mmu_wrprot_slot(kvm, memslot, true) || flush; spin_unlock(&kvm->mmu_lock); if (flush) @@ -6077,6 +6090,8 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm, spin_lock(&kvm->mmu_lock); flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false); + if (kvm->arch.tdp_mmu_enabled) + flush = kvm_tdp_mmu_slot_set_dirty(kvm, memslot) || flush; spin_unlock(&kvm->mmu_lock); if (flush) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 8eaa6e4764bce..1a777ccfde44e 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -89,6 +89,7 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, extern u64 shadow_user_mask; extern u64 shadow_accessed_mask; extern u64 shadow_present_mask; +extern u64 shadow_dirty_mask; #define ACC_EXEC_MASK 1 #define ACC_WRITE_MASK PT_WRITABLE_MASK @@ -112,6 +113,7 @@ bool is_access_track_spte(u64 spte); bool is_accessed_spte(u64 spte); bool spte_ad_enabled(u64 spte); bool is_executable_pte(u64 spte); +bool spte_ad_need_write_protect(u64 spte); void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn, u64 pages); diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index 6c1a38429c81a..132e286150856 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -178,3 +178,21 @@ void tdp_iter_refresh_walk(struct tdp_iter *iter) tdp_iter_start(iter, iter->pt_path[iter->root_level - 1], iter->root_level, goal_gfn); } + +/* + * Move on to the next SPTE, but do not move down into a child page table even + * if the current SPTE leads to one. + */ +void tdp_iter_next_no_step_down(struct tdp_iter *iter) +{ + bool done; + + done = try_step_side(iter); + while (!done) { + if (!try_step_up(iter)) { + iter->valid = false; + break; + } + done = try_step_side(iter); + } +} diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 34da3bdada436..d0e65a62ea7d9 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -50,5 +50,6 @@ void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, gfn_t goal_gfn); void tdp_iter_next(struct tdp_iter *iter); void tdp_iter_refresh_walk(struct tdp_iter *iter); +void tdp_iter_next_no_step_down(struct tdp_iter *iter); #endif /* __KVM_X86_MMU_TDP_ITER_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index bbe973d3f8084..e5cb7f0ec23e8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -700,6 +700,7 @@ static int age_gfn_range(struct kvm *kvm, struct kvm_memory_slot *slot, new_spte = mark_spte_for_access_track(new_spte); } + new_spte &= ~shadow_dirty_mask; *iter.sptep = new_spte; __handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, @@ -804,3 +805,297 @@ int kvm_tdp_mmu_set_spte_hva(struct kvm *kvm, unsigned long address, set_tdp_spte); } +/* + * Remove write access from all the SPTEs mapping GFNs [start, end). If + * skip_4k is set, SPTEs that map 4k pages, will not be write-protected. + * Returns true if an SPTE has been changed and the TLBs need to be flushed. + */ +static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t start, gfn_t end, bool skip_4k) +{ + struct tdp_iter iter; + u64 new_spte; + bool spte_set = false; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, start, end) { +iteration_start: + if (!is_shadow_present_pte(iter.old_spte)) + continue; + + /* + * If this entry points to a page of 4K entries, and 4k entries + * should be skipped, skip the whole page. If the non-leaf + * entry is at a higher level, move on to the next, + * (lower level) entry. + */ + if (!is_last_spte(iter.old_spte, iter.level)) { + if (skip_4k && iter.level == PG_LEVEL_2M) { + tdp_iter_next_no_step_down(&iter); + if (iter.valid && iter.gfn >= end) + goto iteration_start; + else + break; + } else { + continue; + } + } + + WARN_ON(skip_4k && iter.level == PG_LEVEL_4K); + + new_spte = iter.old_spte & ~PT_WRITABLE_MASK; + + *iter.sptep = new_spte; + __handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + handle_changed_spte_acc_track(iter.old_spte, new_spte, + iter.level); + spte_set = true; + + tdp_mmu_iter_cond_resched(kvm, &iter); + } + return spte_set; +} + +/* + * Remove write access from all the SPTEs mapping GFNs in the memslot. If + * skip_4k is set, SPTEs that map 4k pages, will not be write-protected. + * Returns true if an SPTE has been changed and the TLBs need to be flushed. + */ +bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, struct kvm_memory_slot *slot, + bool skip_4k) +{ + struct kvm_mmu_page *root; + int root_as_id; + bool spte_set = false; + + for_each_tdp_mmu_root(kvm, root) { + root_as_id = kvm_mmu_page_as_id(root); + if (root_as_id != slot->as_id) + continue; + + /* + * Take a reference on the root so that it cannot be freed if + * this thread releases the MMU lock and yields in this loop. + */ + get_tdp_mmu_root(kvm, root); + + spte_set = wrprot_gfn_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages, skip_4k) || + spte_set; + + put_tdp_mmu_root(kvm, root); + } + + return spte_set; +} + +/* + * Clear the dirty status of all the SPTEs mapping GFNs in the memslot. If + * AD bits are enabled, this will involve clearing the dirty bit on each SPTE. + * If AD bits are not enabled, this will require clearing the writable bit on + * each SPTE. Returns true if an SPTE has been changed and the TLBs need to + * be flushed. + */ +static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t start, gfn_t end) +{ + struct tdp_iter iter; + u64 new_spte; + bool spte_set = false; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, start, end) { + if (!is_shadow_present_pte(iter.old_spte) || + !is_last_spte(iter.old_spte, iter.level)) + continue; + + if (spte_ad_need_write_protect(iter.old_spte)) { + if (is_writable_pte(iter.old_spte)) + new_spte = iter.old_spte & ~PT_WRITABLE_MASK; + else + continue; + } else { + if (iter.old_spte & shadow_dirty_mask) + new_spte = iter.old_spte & ~shadow_dirty_mask; + else + continue; + } + + *iter.sptep = new_spte; + __handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + handle_changed_spte_acc_track(iter.old_spte, new_spte, + iter.level); + spte_set = true; + + tdp_mmu_iter_cond_resched(kvm, &iter); + } + return spte_set; +} + +/* + * Clear the dirty status of all the SPTEs mapping GFNs in the memslot. If + * AD bits are enabled, this will involve clearing the dirty bit on each SPTE. + * If AD bits are not enabled, this will require clearing the writable bit on + * each SPTE. Returns true if an SPTE has been changed and the TLBs need to + * be flushed. + */ +bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + struct kvm_mmu_page *root; + int root_as_id; + bool spte_set = false; + + for_each_tdp_mmu_root(kvm, root) { + root_as_id = kvm_mmu_page_as_id(root); + if (root_as_id != slot->as_id) + continue; + + /* + * Take a reference on the root so that it cannot be freed if + * this thread releases the MMU lock and yields in this loop. + */ + get_tdp_mmu_root(kvm, root); + + spte_set = clear_dirty_gfn_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages) || spte_set; + + put_tdp_mmu_root(kvm, root); + } + + return spte_set; +} + +/* + * Clears the dirty status of all the 4k SPTEs mapping GFNs for which a bit is + * set in mask, starting at gfn. The given memslot is expected to contain all + * the GFNs represented by set bits in the mask. If AD bits are enabled, + * clearing the dirty status will involve clearing the dirty bit on each SPTE + * or, if AD bits are not enabled, clearing the writable bit on each SPTE. + */ +static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t gfn, unsigned long mask, bool wrprot) +{ + struct tdp_iter iter; + u64 new_spte; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, gfn + __ffs(mask), + gfn + BITS_PER_LONG) { + if (!mask) + break; + + if (!is_shadow_present_pte(iter.old_spte) || + !is_last_spte(iter.old_spte, iter.level) || + iter.level > PG_LEVEL_4K || + !(mask & (1UL << (iter.gfn - gfn)))) + continue; + + if (wrprot || spte_ad_need_write_protect(iter.old_spte)) { + if (is_writable_pte(iter.old_spte)) + new_spte = iter.old_spte & ~PT_WRITABLE_MASK; + else + continue; + } else { + if (iter.old_spte & shadow_dirty_mask) + new_spte = iter.old_spte & ~shadow_dirty_mask; + else + continue; + } + + *iter.sptep = new_spte; + __handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + handle_changed_spte_acc_track(iter.old_spte, new_spte, + iter.level); + + mask &= ~(1UL << (iter.gfn - gfn)); + } +} + +/* + * Clears the dirty status of all the 4k SPTEs mapping GFNs for which a bit is + * set in mask, starting at gfn. The given memslot is expected to contain all + * the GFNs represented by set bits in the mask. If AD bits are enabled, + * clearing the dirty status will involve clearing the dirty bit on each SPTE + * or, if AD bits are not enabled, clearing the writable bit on each SPTE. + */ +void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, unsigned long mask, + bool wrprot) +{ + struct kvm_mmu_page *root; + int root_as_id; + + lockdep_assert_held(&kvm->mmu_lock); + for_each_tdp_mmu_root(kvm, root) { + root_as_id = kvm_mmu_page_as_id(root); + if (root_as_id != slot->as_id) + continue; + + clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot); + } +} + +/* + * Set the dirty status of all the SPTEs mapping GFNs in the memslot. This is + * only used for PML, and so will involve setting the dirty bit on each SPTE. + * Returns true if an SPTE has been changed and the TLBs need to be flushed. + */ +static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t start, gfn_t end) +{ + struct tdp_iter iter; + u64 new_spte; + bool spte_set = false; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, start, end) { + if (!is_shadow_present_pte(iter.old_spte)) + continue; + + new_spte = iter.old_spte | shadow_dirty_mask; + + *iter.sptep = new_spte; + handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + spte_set = true; + + tdp_mmu_iter_cond_resched(kvm, &iter); + } + + return spte_set; +} + +/* + * Set the dirty status of all the SPTEs mapping GFNs in the memslot. This is + * only used for PML, and so will involve setting the dirty bit on each SPTE. + * Returns true if an SPTE has been changed and the TLBs need to be flushed. + */ +bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + struct kvm_mmu_page *root; + int root_as_id; + bool spte_set = false; + + for_each_tdp_mmu_root(kvm, root) { + root_as_id = kvm_mmu_page_as_id(root); + if (root_as_id != slot->as_id) + continue; + + /* + * Take a reference on the root so that it cannot be freed if + * this thread releases the MMU lock and yields in this loop. + */ + get_tdp_mmu_root(kvm, root); + + spte_set = set_dirty_gfn_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages) || spte_set; + + put_tdp_mmu_root(kvm, root); + } + return spte_set; +} + diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 5a399aa60b8d8..2c9322ba3462b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -28,4 +28,14 @@ int kvm_tdp_mmu_test_age_hva(struct kvm *kvm, unsigned long hva); int kvm_tdp_mmu_set_spte_hva(struct kvm *kvm, unsigned long address, pte_t *host_ptep); + +bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, struct kvm_memory_slot *slot, + bool skip_4k); +bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, + struct kvm_memory_slot *slot); +void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, unsigned long mask, + bool wrprot); +bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800795 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB53C112E for ; Fri, 25 Sep 2020 21:23:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C019421D7F for ; Fri, 25 Sep 2020 21:23:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rvIMmShd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729323AbgIYVXp (ORCPT ); Fri, 25 Sep 2020 17:23:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729301AbgIYVXn (ORCPT ); Fri, 25 Sep 2020 17:23:43 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F56CC0613D9 for ; Fri, 25 Sep 2020 14:23:41 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id a19so3413782pff.12 for ; Fri, 25 Sep 2020 14:23:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=oWnifn6HrnRKD0oI/QDUODqRYBCf1M68/btQOk7Cqz8=; b=rvIMmShdKhLZvzrzWVKzDjmRMLaSmqgHKDAcOshq1z1TsJHeaPaayVTUl96VJrRjch ONn+brqlySmwvRi138DzYPa78U3WRp6qUhWVbslnGaLX1Tu7bgLfq/2vjNdL048W0+Ye 6t1bnW57iHch5oSQmMbHgN/uKSfRlrBXxmsRXh3pDRQwRh+rJxoUYo/YrLFZ8vw5bmNU quu8MlbvrwATqKo0IzwKeJ0UjUPKZOtLe6TBRoT5NaEuJZXGUIH//9h8DI7OmXK0YM4Q RHvqGd8DNgcuQzu1UryR5NjI7ntkr0S5+qXFClB4yxR9VPKatzHnvWquXBaaIN54jlO6 HSig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=oWnifn6HrnRKD0oI/QDUODqRYBCf1M68/btQOk7Cqz8=; b=K7GDbfJITSiiw5iyeF6xi4Ty+OPGuTzqnucMZ+AGNFmpJyPllYBpbo4HNDz5cGqpf6 NaYlhBNolsC/dyi3CATQVTWpjLpCeyX6q3D4W/aGjelE4TXxySJ5X8AkFq7kwZNcttgL QnUcsAR4ahiLmHbfP5NPRMzW4NIV7o7PlvKrcWvKNoDWy8FGdLYx51Licch98kai/cnb /+o1QNutbathnJGFC5FdV25KNvje83Am1ZnQuLFFu54NcxVp5yVLRgfwCyTrEF15qO7J 4QPmE3ydpPUiewVQFpCeNri2Eq/mJkb4rMVVZeeCQBK9ilSjUOMB7ZSyVkvKLRkj9A59 5I6Q== X-Gm-Message-State: AOAM531aJ4s+EK5obbeHxuDq/zoqagABZbkV/OGcRcotogZY/q8H0MBL YnKDeIXpp3Wzq+0aA4URFgrarq8YssI+ X-Google-Smtp-Source: ABdhPJwFGjQjPll9oi2B2DEy6aojVRSl1c7w0vVXPBRsNk+LzosTN6FQuDjgxauAdjxuM7ukB2CZtkD5ZZWM Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:902:d3d1:b029:d2:635f:6616 with SMTP id w17-20020a170902d3d1b02900d2635f6616mr1291657plb.28.1601069020961; Fri, 25 Sep 2020 14:23:40 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:58 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-19-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 18/22] kvm: mmu: Support disabling dirty logging for the tdp MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Dirty logging ultimately breaks down MMU mappings to 4k granularity. When dirty logging is no longer needed, these granaular mappings represent a useless performance penalty. When dirty logging is disabled, search the paging structure for mappings that could be re-constituted into a large page mapping. Zap those mappings so that they can be faulted in again at a higher mapping level. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 3 ++ arch/x86/kvm/mmu/tdp_mmu.c | 62 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 ++ 3 files changed, 67 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b9074603f9df1..12892fc4f146d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6025,6 +6025,9 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, spin_lock(&kvm->mmu_lock); slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, kvm_mmu_zap_collapsible_spte, true); + + if (kvm->arch.tdp_mmu_enabled) + kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot); spin_unlock(&kvm->mmu_lock); } diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e5cb7f0ec23e8..a2895119655ac 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1099,3 +1099,65 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) return spte_set; } +/* + * Clear non-leaf entries (and free associated page tables) which could + * be replaced by large mappings, for GFNs within the slot. + */ +static void zap_collapsible_spte_range(struct kvm *kvm, + struct kvm_mmu_page *root, + gfn_t start, gfn_t end) +{ + struct tdp_iter iter; + kvm_pfn_t pfn; + bool spte_set = false; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, start, end) { + if (!is_shadow_present_pte(iter.old_spte) || + is_last_spte(iter.old_spte, iter.level)) + continue; + + pfn = spte_to_pfn(iter.old_spte); + if (kvm_is_reserved_pfn(pfn) || + !PageTransCompoundMap(pfn_to_page(pfn))) + continue; + + *iter.sptep = 0; + handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + 0, iter.level); + spte_set = true; + + spte_set = !tdp_mmu_iter_cond_resched(kvm, &iter); + } + + if (spte_set) + kvm_flush_remote_tlbs(kvm); +} + +/* + * Clear non-leaf entries (and free associated page tables) which could + * be replaced by large mappings, for GFNs within the slot. + */ +void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, + const struct kvm_memory_slot *slot) +{ + struct kvm_mmu_page *root; + int root_as_id; + + for_each_tdp_mmu_root(kvm, root) { + root_as_id = kvm_mmu_page_as_id(root); + if (root_as_id != slot->as_id) + continue; + + /* + * Take a reference on the root so that it cannot be freed if + * this thread releases the MMU lock and yields in this loop. + */ + get_tdp_mmu_root(kvm, root); + + zap_collapsible_spte_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages); + + put_tdp_mmu_root(kvm, root); + } +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 2c9322ba3462b..10e70699c5372 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -38,4 +38,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, gfn_t gfn, unsigned long mask, bool wrprot); bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot); +void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, + const struct kvm_memory_slot *slot); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:22:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36A35112E for ; Fri, 25 Sep 2020 21:24:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1743A20738 for ; Fri, 25 Sep 2020 21:24:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tAJy0i4V" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729172AbgIYVYQ (ORCPT ); Fri, 25 Sep 2020 17:24:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729312AbgIYVXo (ORCPT ); Fri, 25 Sep 2020 17:23:44 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC019C0613CE for ; Fri, 25 Sep 2020 14:23:44 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id b39so3275706qta.0 for ; Fri, 25 Sep 2020 14:23:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=wGRaKZqepjfqHu8EAcgDimD+fkRsdcPXOcl2oXcLVLw=; b=tAJy0i4VTSVjyS7q2KkOmQHelTLbYCm/7l3p0QD0QKcCbL/jJKy3KcSb0AiykDcVPg q+Ir25YzqwwJR8isTg9v6pVwrLyYA6IGTLKlOjfBt6hu4KGuO3eyMgMinGyLOzik/jRL cqGHpN2Y9V3DsjS8kEQK7VHIDA4y7qfZDSlXNm7TqGoomvY5l7cNR75hyIsQA0yhV5NB Q62xqjjwH8gq1bhPyJHSuaydyRFNGcAlaCKnXJitGDSCdRjOMA+nizYntreGmKR1N4Kc 6OLtsV/mj8Ow+UX8WBLM6j22Q/kuyoU6QMj6Gran4+mEwWcxQ8HBOxZUMn2o085UXsUb Oc8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=wGRaKZqepjfqHu8EAcgDimD+fkRsdcPXOcl2oXcLVLw=; b=oUuHvWdYOkEefJv2g8G/GJLnK0eMDCPP1U5WHtasl5LV+QEN8cccc+URshSOvugcMJ zDBh3EAK9jeJRuoa4ipnprrvzcYDwzkIdWcY4peMIX3WtP4t/1UxQNZFx9epH5EmnFGe JprWMqs5L3ov+AYwsy4rVXMzafccWv5wq7GiOmCrYNF0i+/8JLDULxtkStLw9IAnG/st qAdSU0RhjF05gOq1quRM1r9VtytgudKdCAezpubmkyXFHNxTHB4MD4PcNcxjnvO7tPbn bY+AV7/lWmKPqBSGke7nwx10d2er39CZPo/KphJs9UDzYpoEwLkv8579f4HDCqkX4WAw 2P9w== X-Gm-Message-State: AOAM531y+fnmB8/Kcp2XEGI5gHMpuUuy4vn7BHdm2vrA9lsrl+QJqRKv V7ptyw7X6cvKZjf86bvKg7wYPUixuiKg X-Google-Smtp-Source: ABdhPJx4y/if82KtvkXENISsfx8NKmDmAhvA8WY15xvjNvUppkQAPg+tX2rELyCptq23jwCVN1Mw1E/1wkrr Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a0c:e04e:: with SMTP id y14mr674676qvk.38.1601069022744; Fri, 25 Sep 2020 14:23:42 -0700 (PDT) Date: Fri, 25 Sep 2020 14:22:59 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-20-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 19/22] kvm: mmu: Support write protection for nesting in tdp MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org To support nested virtualization, KVM will sometimes need to write protect pages which are part of a shadowed paging structure or are not writable in the shadowed paging structure. Add a function to write protect GFN mappings for this purpose. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 5 ++++ arch/x86/kvm/mmu/tdp_mmu.c | 57 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 3 ++ 3 files changed, 65 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 12892fc4f146d..e6f5093ba8f6f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1667,6 +1667,11 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, write_protected |= __rmap_write_protect(kvm, rmap_head, true); } + if (kvm->arch.tdp_mmu_enabled) + write_protected = + kvm_tdp_mmu_write_protect_gfn(kvm, slot, gfn) || + write_protected; + return write_protected; } diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a2895119655ac..931cb469b1f2f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1161,3 +1161,60 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, put_tdp_mmu_root(kvm, root); } } + +/* + * Removes write access on the last level SPTE mapping this GFN and unsets the + * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted. + * Returns true if an SPTE was set and a TLB flush is needed. + */ +static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t gfn) +{ + struct tdp_iter iter; + u64 new_spte; + bool spte_set = false; + int as_id = kvm_mmu_page_as_id(root); + + for_each_tdp_pte_root(iter, root, gfn, gfn + 1) { + if (!is_shadow_present_pte(iter.old_spte) || + !is_last_spte(iter.old_spte, iter.level)) + continue; + + if (!is_writable_pte(iter.old_spte)) + break; + + new_spte = iter.old_spte & + ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); + + *iter.sptep = new_spte; + handle_changed_spte(kvm, as_id, iter.gfn, iter.old_spte, + new_spte, iter.level); + spte_set = true; + } + + return spte_set; +} + +/* + * Removes write access on the last level SPTE mapping this GFN and unsets the + * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted. + * Returns true if an SPTE was set and a TLB flush is needed. + */ +bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn) +{ + struct kvm_mmu_page *root; + int root_as_id; + bool spte_set = false; + + lockdep_assert_held(&kvm->mmu_lock); + for_each_tdp_mmu_root(kvm, root) { + root_as_id = kvm_mmu_page_as_id(root); + if (root_as_id != slot->as_id) + continue; + + spte_set = write_protect_gfn(kvm, root, gfn) || spte_set; + } + return spte_set; +} + diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 10e70699c5372..2ecb047211a6d 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -40,4 +40,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot); void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *slot); + +bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:23:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84DC7112E for ; Fri, 25 Sep 2020 21:24:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6407C21D7F for ; Fri, 25 Sep 2020 21:24:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hcU3avKq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729426AbgIYVYJ (ORCPT ); Fri, 25 Sep 2020 17:24:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728966AbgIYVXq (ORCPT ); Fri, 25 Sep 2020 17:23:46 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E1E0C0613CE for ; Fri, 25 Sep 2020 14:23:46 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id t128so135390pgb.23 for ; Fri, 25 Sep 2020 14:23:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=ZtTFMXOwk8C5vsuTGnAkh2xh9duLo4nsCMaZH0pUTdA=; b=hcU3avKqIN+6PuxfNZwEg93X/KJeNgORAEkZ/PriVAe96Tx0XcZJzCzEPE2Zk308c3 8Uy6YHa9/adr0EJ8vOVKvxuRu1op+nXP6GcW4G3DIjRhF5FQGu7wECiKEeghKf/Q73Uc BI7q6rvz7OIWKygNEQOfxjXeyQ/G4xPzw4+gVvJlGFXaYwTID+XoGKH212jmKLu6KPML fVHnYqUcQsKfJQdbvTDoJe34+0+5xrK+LRiAyTyXvaskPoN5Oc3ei5qNCeY4a/n8ugUj JZ4VfiP6tKAR8GwiSWbzQa8V/Evl8WNdosJvh1xDb1UFQuECEjfwk9DBVdrJPXABiJKV zKFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ZtTFMXOwk8C5vsuTGnAkh2xh9duLo4nsCMaZH0pUTdA=; b=BeEAklqjdlZH7qJKR8rL+WcKoj3HS5Jo6zu0yzk0wg7P/0IFkMsSGnLv/WMBRYcg8G EufpkuPADmmlc7qd/DeCUKDWK3R7okWYS/p65xLJmN0Lw4V/PAPMu8vYL6KBeioiyXi0 hF1ZHShPsWB4Rf9Gg5HZC/vp/ni2HWTvqlvulOC+GtS6cmsdAKs6nG95hXel2bfGOzla v6xesOxKOvlcRI25WVPc3GxNIRLb2v+QCiYjK0pq4KDkg3Y8DzZTWgZcBl45/QdAGhkP DIFnBeoCp+MjmDBVrASa9aPDxJRWk3smz4wX3pknYvyaNUAjk8hXyIrrPVc3ge4xDlR5 bluw== X-Gm-Message-State: AOAM533z9mtWCdlLbwQHZLt/OrBeN8uuX9Z+FIDb+XXDW6RTdmAR+LFz MerAHRene4I4lBp929VEEjRoxVJdhzkD X-Google-Smtp-Source: ABdhPJzwN0bL90aFnxlyc8pnXr3MMR7bFhEqLqgmUIw+smVYRoFCKS8N7SAcu35L/W6sN7D0XNyaKt+OWDzL Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:aa7:9edb:0:b029:13e:d13d:a059 with SMTP id r27-20020aa79edb0000b029013ed13da059mr1088084pfq.31.1601069025715; Fri, 25 Sep 2020 14:23:45 -0700 (PDT) Date: Fri, 25 Sep 2020 14:23:00 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-21-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 20/22] kvm: mmu: NX largepage recovery for TDP MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When KVM maps a largepage backed region at a lower level in order to make it executable (i.e. NX large page shattering), it reduces the TLB performance of that region. In order to avoid making this degradation permanent, KVM must periodically reclaim shattered NX largepages by zapping them and allowing them to be rebuilt in the page fault handler. With this patch, the TDP MMU does not respect KVM's rate limiting on reclaim. It traverses the entire TDP structure every time. This will be addressed in a future patch. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/include/asm/kvm_host.h | 3 ++ arch/x86/kvm/mmu/mmu.c | 27 +++++++++++--- arch/x86/kvm/mmu/mmu_internal.h | 4 ++ arch/x86/kvm/mmu/tdp_mmu.c | 66 +++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 + 5 files changed, 97 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a76bcb51d43d8..cf00b1c837708 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -963,6 +963,7 @@ struct kvm_arch { struct kvm_pmu_event_filter *pmu_event_filter; struct task_struct *nx_lpage_recovery_thread; + struct task_struct *nx_lpage_tdp_mmu_recovery_thread; /* * Whether the TDP MMU is enabled for this VM. This contains a @@ -977,6 +978,8 @@ struct kvm_arch { struct list_head tdp_mmu_roots; /* List of struct tdp_mmu_pages not being used as roots */ struct list_head tdp_mmu_pages; + struct list_head tdp_mmu_lpage_disallowed_pages; + u64 tdp_mmu_lpage_disallowed_page_count; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e6f5093ba8f6f..6101c696e92d3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -54,12 +54,12 @@ extern bool itlb_multihit_kvm_mitigation; -static int __read_mostly nx_huge_pages = -1; +int __read_mostly nx_huge_pages = -1; #ifdef CONFIG_PREEMPT_RT /* Recovery can cause latency spikes, disable it for PREEMPT_RT. */ -static uint __read_mostly nx_huge_pages_recovery_ratio = 0; +uint __read_mostly nx_huge_pages_recovery_ratio = 0; #else -static uint __read_mostly nx_huge_pages_recovery_ratio = 60; +uint __read_mostly nx_huge_pages_recovery_ratio = 60; #endif static int set_nx_huge_pages(const char *val, const struct kernel_param *kp); @@ -6455,7 +6455,7 @@ static long get_nx_lpage_recovery_timeout(u64 start_time) : MAX_SCHEDULE_TIMEOUT; } -static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t data) +static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t tdp_mmu) { u64 start_time; long remaining_time; @@ -6476,7 +6476,10 @@ static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t data) if (kthread_should_stop()) return 0; - kvm_recover_nx_lpages(kvm); + if (tdp_mmu) + kvm_tdp_mmu_recover_nx_lpages(kvm); + else + kvm_recover_nx_lpages(kvm); } } @@ -6489,6 +6492,17 @@ int kvm_mmu_post_init_vm(struct kvm *kvm) &kvm->arch.nx_lpage_recovery_thread); if (!err) kthread_unpark(kvm->arch.nx_lpage_recovery_thread); + else + return err; + + if (!kvm->arch.tdp_mmu_enabled) + return err; + + err = kvm_vm_create_worker_thread(kvm, kvm_nx_lpage_recovery_worker, 1, + "kvm-nx-lpage-tdp-mmu-recovery", + &kvm->arch.nx_lpage_tdp_mmu_recovery_thread); + if (!err) + kthread_unpark(kvm->arch.nx_lpage_tdp_mmu_recovery_thread); return err; } @@ -6497,4 +6511,7 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) { if (kvm->arch.nx_lpage_recovery_thread) kthread_stop(kvm->arch.nx_lpage_recovery_thread); + + if (kvm->arch.nx_lpage_tdp_mmu_recovery_thread) + kthread_stop(kvm->arch.nx_lpage_tdp_mmu_recovery_thread); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 1a777ccfde44e..567e119da424f 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -43,6 +43,7 @@ struct kvm_mmu_page { atomic_t write_flooding_count; bool tdp_mmu_page; + u64 *parent_sptep; }; extern struct kmem_cache *mmu_page_header_cache; @@ -154,4 +155,7 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); u64 mark_spte_for_access_track(u64 spte); u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn); +extern int nx_huge_pages; +extern uint nx_huge_pages_recovery_ratio; + #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 931cb469b1f2f..b83c18e29f9c6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -578,10 +578,18 @@ int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, int write, int map_writable, new_spte = make_nonleaf_spte(child_pt, !shadow_accessed_mask); + if (iter.level <= max_level && + account_disallowed_nx_lpage) { + list_add(&sp->lpage_disallowed_link, + &vcpu->kvm->arch.tdp_mmu_lpage_disallowed_pages); + vcpu->kvm->arch.tdp_mmu_lpage_disallowed_page_count++; + } + *iter.sptep = new_spte; handle_changed_spte(vcpu->kvm, as_id, iter.gfn, iter.old_spte, new_spte, iter.level); + sp->parent_sptep = iter.sptep; } } @@ -1218,3 +1226,61 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, return spte_set; } +/* + * Clear non-leaf SPTEs and free the page tables they point to, if those SPTEs + * exist in order to allow execute access on a region that would otherwise be + * mapped as a large page. + */ +void kvm_tdp_mmu_recover_nx_lpages(struct kvm *kvm) +{ + struct kvm_mmu_page *sp; + bool flush; + int rcu_idx; + unsigned int ratio; + ulong to_zap; + u64 old_spte; + + rcu_idx = srcu_read_lock(&kvm->srcu); + spin_lock(&kvm->mmu_lock); + + ratio = READ_ONCE(nx_huge_pages_recovery_ratio); + to_zap = ratio ? DIV_ROUND_UP(kvm->stat.nx_lpage_splits, ratio) : 0; + + while (to_zap && + !list_empty(&kvm->arch.tdp_mmu_lpage_disallowed_pages)) { + /* + * We use a separate list instead of just using active_mmu_pages + * because the number of lpage_disallowed pages is expected to + * be relatively small compared to the total. + */ + sp = list_first_entry(&kvm->arch.tdp_mmu_lpage_disallowed_pages, + struct kvm_mmu_page, + lpage_disallowed_link); + + old_spte = *sp->parent_sptep; + *sp->parent_sptep = 0; + + list_del(&sp->lpage_disallowed_link); + kvm->arch.tdp_mmu_lpage_disallowed_page_count--; + + handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), sp->gfn, + old_spte, 0, sp->role.level + 1); + + flush = true; + + if (!--to_zap || need_resched() || + spin_needbreak(&kvm->mmu_lock)) { + flush = false; + kvm_flush_remote_tlbs(kvm); + if (to_zap) + cond_resched_lock(&kvm->mmu_lock); + } + } + + if (flush) + kvm_flush_remote_tlbs(kvm); + + spin_unlock(&kvm->mmu_lock); + srcu_read_unlock(&kvm->srcu, rcu_idx); +} + diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 2ecb047211a6d..45ea2d44545db 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -43,4 +43,6 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn); + +void kvm_tdp_mmu_recover_nx_lpages(struct kvm *kvm); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:23:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B4186CA for ; Fri, 25 Sep 2020 21:24:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 48A4F21D42 for ; Fri, 25 Sep 2020 21:24:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nvM6eY4K" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729344AbgIYVYI (ORCPT ); Fri, 25 Sep 2020 17:24:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729332AbgIYVXs (ORCPT ); Fri, 25 Sep 2020 17:23:48 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 305CBC0613CE for ; Fri, 25 Sep 2020 14:23:48 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id b54so3214933qtk.17 for ; Fri, 25 Sep 2020 14:23:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=2W93x0wFn2B/eR702ln689L1xVXpVcuv86awhcY5u5k=; b=nvM6eY4KZEPy35TPI8xHdc7ZzfhV6Kb46revIGLZhhGe9x5mRrUVR7yZ6jj6o2KjH4 NCjnvx5lB0E3i2qbJXNY3sepzWBiNhwDJ8belpU0xTLK7AKrUAt30z1pUAEYqaCJE5Uz eTvV/43+HCt4c75LPF0+W88mAICuVNgNulf4YiAY6DVEJkRB9WhiHghQMT9LXLTxp+XB rhP7T+KcaUtVFUhNvutqmSYX5MR0aPKWVrsKtpaS9tD0IZIHB7UJMLElmIFjUgne+imK pViwBVMUq1JD0Zol8LW6J/Jqf6G8gmm+/1IDvAnzd5OXOZpDQhtV7QJ7X308Gg/i/tfN H0Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=2W93x0wFn2B/eR702ln689L1xVXpVcuv86awhcY5u5k=; b=lXeTrMY/5jA0y7sbrhRUtQA0PHPbJQkmk3xF+2U2rNnKIk3O6z3B/Ojiy1T1fjxwmg N+D4OtOzxJYY44+sTN06udmCTbEbhaHApMYTlvLTpzsIPhKdlIWJ2lmh32d99EQGxadF HbyOaoVtJu+xTDIjKQ3pJ+33B0cFusB+vwdoVzOjt4jA6EUgWlelpuDmC0WaYXoO90fl 9p9V6VHfMH9Jxej9Iyr/iGqJ51XWUBy/86Mk48FFZ9elT5QiiP1b//8/2H/5/FmMIhC/ +4CSyJAu5nAlGrAT99RKzGD9NMw/hE8xM67mshFimFsIajjgOZagWj8su2IS3YCMKB+O +Lpg== X-Gm-Message-State: AOAM532PI0cpQDQB3JxGUTv7GC0LARQc71AjFVW56Tjownx+KnpPNmv6 tbQMIHJ/QUsxZEDt4adYAQkMrgLV4ATS X-Google-Smtp-Source: ABdhPJyPxktLMKlNwgUfxRN5VMY8IyHJsoD87PwWyG6SLsXUtbeLR58Fk2FQL55dVOR4gMQji3fS54cG+zfF Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a0c:a4c5:: with SMTP id x63mr655836qvx.58.1601069027271; Fri, 25 Sep 2020 14:23:47 -0700 (PDT) Date: Fri, 25 Sep 2020 14:23:01 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-22-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 21/22] kvm: mmu: Support MMIO in the TDP MMU From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to support MMIO, KVM must be able to walk the TDP paging structures to find mappings for a given GFN. Support this walk for the TDP MMU. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 70 ++++++++++++++++++++++++++------------ arch/x86/kvm/mmu/tdp_mmu.c | 17 +++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 ++ 3 files changed, 68 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6101c696e92d3..0ce7720a72d4e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3939,54 +3939,82 @@ static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct) return vcpu_match_mmio_gva(vcpu, addr); } -/* return true if reserved bit is detected on spte. */ -static bool -walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) +/* + * Return the level of the lowest level SPTE added to sptes. + * That SPTE may be non-present. + */ +static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes) { struct kvm_shadow_walk_iterator iterator; - u64 sptes[PT64_ROOT_MAX_LEVEL], spte = 0ull; - struct rsvd_bits_validate *rsvd_check; - int root, leaf; - bool reserved = false; + int leaf = vcpu->arch.mmu->root_level; + u64 spte; - rsvd_check = &vcpu->arch.mmu->shadow_zero_check; walk_shadow_page_lockless_begin(vcpu); - for (shadow_walk_init(&iterator, vcpu, addr), - leaf = root = iterator.level; + for (shadow_walk_init(&iterator, vcpu, addr); shadow_walk_okay(&iterator); __shadow_walk_next(&iterator, spte)) { + leaf = iterator.level; spte = mmu_spte_get_lockless(iterator.sptep); sptes[leaf - 1] = spte; - leaf--; if (!is_shadow_present_pte(spte)) break; + } + + walk_shadow_page_lockless_end(vcpu); + + return leaf; +} + +/* return true if reserved bit is detected on spte. */ +static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) +{ + u64 sptes[PT64_ROOT_MAX_LEVEL]; + struct rsvd_bits_validate *rsvd_check; + int root; + int leaf; + int level; + bool reserved = false; + + if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) { + *sptep = 0ull; + return reserved; + } + + if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)) + leaf = kvm_tdp_mmu_get_walk(vcpu, addr, sptes); + else + leaf = get_walk(vcpu, addr, sptes); + + rsvd_check = &vcpu->arch.mmu->shadow_zero_check; + + for (level = root; level >= leaf; level--) { + if (!is_shadow_present_pte(sptes[level - 1])) + break; /* * Use a bitwise-OR instead of a logical-OR to aggregate the * reserved bit and EPT's invalid memtype/XWR checks to avoid * adding a Jcc in the loop. */ - reserved |= __is_bad_mt_xwr(rsvd_check, spte) | - __is_rsvd_bits_set(rsvd_check, spte, iterator.level); + reserved |= __is_bad_mt_xwr(rsvd_check, sptes[level - 1]) | + __is_rsvd_bits_set(rsvd_check, sptes[level - 1], + level); } - walk_shadow_page_lockless_end(vcpu); - if (reserved) { pr_err("%s: detect reserved bits on spte, addr 0x%llx, dump hierarchy:\n", __func__, addr); - while (root > leaf) { + for (level = root; level >= leaf; level--) pr_err("------ spte 0x%llx level %d.\n", - sptes[root - 1], root); - root--; - } + sptes[level - 1], level); } - *sptep = spte; + *sptep = sptes[leaf - 1]; + return reserved; } @@ -3998,7 +4026,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct) if (mmio_info_in_cache(vcpu, addr, direct)) return RET_PF_EMULATE; - reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, &spte); + reserved = get_mmio_spte(vcpu, addr, &spte); if (WARN_ON(reserved)) return -EINVAL; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index b83c18e29f9c6..42dde27decd75 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1284,3 +1284,20 @@ void kvm_tdp_mmu_recover_nx_lpages(struct kvm *kvm) srcu_read_unlock(&kvm->srcu, rcu_idx); } +/* + * Return the level of the lowest level SPTE added to sptes. + * That SPTE may be non-present. + */ +int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes) +{ + struct tdp_iter iter; + int leaf = vcpu->arch.mmu->shadow_root_level; + gfn_t gfn = addr >> PAGE_SHIFT; + + for_each_tdp_pte_vcpu(iter, vcpu, gfn, gfn + 1) { + leaf = iter.level; + sptes[leaf - 1] = iter.old_spte; + } + + return leaf; +} diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 45ea2d44545db..cc0b7241975aa 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -45,4 +45,6 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn); void kvm_tdp_mmu_recover_nx_lpages(struct kvm *kvm); + +int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes); #endif /* __KVM_X86_MMU_TDP_MMU_H */ From patchwork Fri Sep 25 21:23:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11800797 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D9C486CA for ; Fri, 25 Sep 2020 21:24:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B767F21D7F for ; Fri, 25 Sep 2020 21:24:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rXGM+YU5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729443AbgIYVYB (ORCPT ); Fri, 25 Sep 2020 17:24:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729352AbgIYVXt (ORCPT ); Fri, 25 Sep 2020 17:23:49 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA599C0613D6 for ; Fri, 25 Sep 2020 14:23:49 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id ic18so267423pjb.3 for ; Fri, 25 Sep 2020 14:23:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=Qb8go8HycpjH5yveF+PfyYHMqnWZLnSFt7U7sjuOnrE=; b=rXGM+YU5BRbaFP3RvtKtUPBu576cAytErIUb3kUMO/E1cSI+ZFLxB0tnLxPI1LI6t0 z2M2xLrk9a1hsIaIkdFcuZoa2QppeVZYyTjWqrxoJdyL07HBV1G9CFxpMxIlQgEfeURM mv/Vpgo2dvH5I+ii4RdBDnR35BNsJ7LpSyJHBqQG19B7wDFRlXgwtOWiFIkp5dpjCC3z ImSDfNzIiZthK3CUmuUO/+X4Za4emZUkyOan3n6KgL+ckGJzS1q00jhEL3BNtfjXNpu+ DaA7K+S7LmrcGnFRaU16qRHRwpjfP3UOFgEP8W07uwVAD90VlyqEbrJIakBGZabbvJqa 70GA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Qb8go8HycpjH5yveF+PfyYHMqnWZLnSFt7U7sjuOnrE=; b=PjrpexAJ4bQyNh8TxeXlftvJlj5W45Rw/NZgNFpQYfHo34DQVaPEd0GORiml/9Xcxz QIzwwcVSgMDarMMYHuPmgdOVdnjq5X7Xi8r5h+h7bJmdyXjliynxY8vHK1EKwFb3eLGj NsFIMDC9SD0yKqrpn6TYsO/AsyL/I9y7IxBpXDxOKvl02i7am3QwoV3RAx42ycx+2tqb /+SbM6LSc1q1qEfL1Bd+wRAS08rzhjm4vOQVioIWOOT0YGFWemJgV2A9C/iFcBVVItv6 LlNpID3+Zuw3z7uyVKpJqRy39vvNHedsLTEWgq5meWlB13rrtGm75bHmaKk/DxpNZ14g WPdQ== X-Gm-Message-State: AOAM533Jo1c1fpPGCwWqxMk3O2tgFvF7pyVPuTuXvw3FIkMZbR2jdtLC 1za6wqD/jhqydGFmKLmS5Xxc89WGnA4x X-Google-Smtp-Source: ABdhPJysWah3jm8O7Wa66GBxjE9TeS1ZTdGpfHZiqr2pWyjnRuqDAwGoFqvCzWelnOy1dlxCz88T8C8/a2zk Sender: "bgardon via sendgmr" X-Received: from bgardon.sea.corp.google.com ([2620:15c:100:202:f693:9fff:fef4:a293]) (user=bgardon job=sendgmr) by 2002:a17:90b:15c6:: with SMTP id lh6mr30176pjb.0.1601069029061; Fri, 25 Sep 2020 14:23:49 -0700 (PDT) Date: Fri, 25 Sep 2020 14:23:02 -0700 In-Reply-To: <20200925212302.3979661-1-bgardon@google.com> Message-Id: <20200925212302.3979661-23-bgardon@google.com> Mime-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> X-Mailer: git-send-email 2.28.0.709.gb0816b6eb0-goog Subject: [PATCH 22/22] kvm: mmu: Don't clear write flooding count for direct roots From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Direct roots don't have a write flooding count because the guest can't affect that paging structure. Thus there's no need to clear the write flooding count on a fast CR3 switch for direct roots. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 15 +++++++++++---- arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 ++ 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0ce7720a72d4e..345c934fabf4c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4267,7 +4267,8 @@ static void nonpaging_init_context(struct kvm_vcpu *vcpu, context->nx = false; } -static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd, +static inline bool is_root_usable(struct kvm *kvm, + struct kvm_mmu_root_info *root, gpa_t pgd, union kvm_mmu_page_role role) { return (role.direct || pgd == root->pgd) && @@ -4293,13 +4294,13 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd, root.pgd = mmu->root_pgd; root.hpa = mmu->root_hpa; - if (is_root_usable(&root, new_pgd, new_role)) + if (is_root_usable(vcpu->kvm, &root, new_pgd, new_role)) return true; for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { swap(root, mmu->prev_roots[i]); - if (is_root_usable(&root, new_pgd, new_role)) + if (is_root_usable(vcpu->kvm, &root, new_pgd, new_role)) break; } @@ -4356,7 +4357,13 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, */ vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); - __clear_sp_write_flooding_count(to_shadow_page(vcpu->arch.mmu->root_hpa)); + /* + * If this is a direct root page, it doesn't have a write flooding + * count. Otherwise, clear the write flooding count. + */ + if (!new_role.direct) + __clear_sp_write_flooding_count( + to_shadow_page(vcpu->arch.mmu->root_hpa)); } void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 42dde27decd75..c07831b0c73e1 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -124,6 +124,18 @@ static struct kvm_mmu_page *find_tdp_mmu_root_with_role( return NULL; } +hpa_t kvm_tdp_mmu_root_hpa_for_role(struct kvm *kvm, + union kvm_mmu_page_role role) +{ + struct kvm_mmu_page *root; + + root = find_tdp_mmu_root_with_role(kvm, role); + if (root) + return __pa(root->spt); + + return INVALID_PAGE; +} + static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu, int level) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index cc0b7241975aa..2395ffa71bb05 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -9,6 +9,8 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); bool is_tdp_mmu_root(struct kvm *kvm, hpa_t root); +hpa_t kvm_tdp_mmu_root_hpa_for_role(struct kvm *kvm, + union kvm_mmu_page_role role); hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu); void kvm_tdp_mmu_put_root_hpa(struct kvm *kvm, hpa_t root_hpa);