From patchwork Fri May 26 23:44:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8B0D3C77B7A for ; Fri, 26 May 2023 23:45:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=bAe6IPod9NBP5h9jglIq8NdQRd3vRBrcNLnDSQBPp7g=; b=CBvBe+RTRt4RB31mUfpiEUG5+T E6VpvVNd6mSSaCGyRPdlhaaACIp1FmT1mU4unpKmgom8l5YCGESgFjExRLHGsKlmAO87k1ON/bBJs +AySiiCOmZ1L21+EewILgSwBgINXSz2W8ETEvnX5QRe4DIoqtI/jdlKK1XHGWb9hmn5ZqTHeJb0Cl 9ZJZoL2+5vpPHvEe/NsR9Uo9zNJerqog8CNmWJmarVnsNLJc0oGJp4x43u9DVG076agEsCVxCs02G k9F18v2cCaSf2Mn8jIS/1MY5isMtl7mEIgTXmkRjFZpbSwN4AdNUPouGBNi8Gmlsr7m8FcTf8fLpz pNAMBjGg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h79-004KWL-0R; Fri, 26 May 2023 23:44:55 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h6x-004KNT-0S for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:45 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bac6a453dd5so1730065276.2 for ; Fri, 26 May 2023 16:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144681; x=1687736681; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kGsSI0lPP8ERxd6p+DSa9eEPYGyuTkZRYkJGde6ThZo=; b=hvb8rxkERCbIh1bZc0q4NRTJ+rJW/WU+U0AR8hQmLag3Ue+qSWO6cakjY3a2+aFQIE x+dhqQS7EQpNz5Tcg/879WfdtsT2g9m9IEV/ar3WsH6hpivs+AgK+n3cV9xivqmDDPOm KKmHGpOcNrwi3l5Yw+fAvneY5R+j+Mh2hjIBHhqkVzBeZTwwqArwApBbzG0OmYJUYGc6 DRb2e1iDRkrr5VQNcGKChaSGvbQt6NfDiW4utZo1zEzUWIhsS0vIBmPt/jCICfYU7AGj 2EXBMubac0epYeTWkMmnp5dg4awgt7i7k/gPemJKnVVjNCrCZqeAJZJODgbRp4LsgzpY rjvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144681; x=1687736681; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kGsSI0lPP8ERxd6p+DSa9eEPYGyuTkZRYkJGde6ThZo=; b=iOfuehMZcYC5nuQ5lKhwyS1CuxKCfZ3clwaUt0ksh1oxyzG8jxXjJ397ZyWf4tRPvx dy1LwyiQfehKztH3qp9EiQ8ovqywNLzQpDiaixV0I5TcqnbUrqMM8D8hRaZ3R3Fwzct2 Lj2c8Ecy125oc/jw4jCYIZUBM1BCcKEcRU/K0S14pHgvwr89G4FzSVnvlr+qT4/Vi5JL O2taymFymK/YJmpdmhe42ClSSOBPZEFUIGvM77CPxpMlviGVVvfo30kj0/fdJQgDmqR7 hNNYvQh4Fc1ilRWkSPaBlf3is8c8U0igvNhCfOzgTMCFNeLxmLqsshpVNjy7tPYT5HVc R3Og== X-Gm-Message-State: AC+VfDxvACTSGxezBCM8NmKQa833a9nmuFQw31vwC8vG8/0yNC1iRKdk 98y20rfiA6mknXIBkFLG+VZZN9lUsw0= X-Google-Smtp-Source: ACHHUZ6fi8C05HO0+KMDQENWGjZ4c7vAIwn8WK8PM0ILwAQbwBWVWIEplGkrZ1gydvcndVbPZA/WyjMMkr8= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a05:6902:50d:b0:ba8:3e2d:58f8 with SMTP id x13-20020a056902050d00b00ba83e2d58f8mr1858255ybs.5.1685144681575; Fri, 26 May 2023 16:44:41 -0700 (PDT) Date: Fri, 26 May 2023 17:44:26 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-2-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 01/10] mm/kvm: add mmu_notifier_ops->test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164443_180789_8F35842F X-CRM114-Status: GOOD ( 29.21 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add mmu_notifier_ops->test_clear_young() to supersede test_young() and clear_young(). test_clear_young() has a fast path, which if supported, allows its callers to safely clear the accessed bit without taking kvm->mmu_lock. The fast path requires arch-specific code that generally relies on RCU and CAS: the former protects KVM page tables from being freed while the latter clears the accessed bit atomically against both the hardware and other software page table walkers. If the fast path is unsupported, test_clear_young() falls back to the existing slow path where kvm->mmu_lock is then taken. test_clear_young() can also operate on a range of KVM PTEs individually according to a bitmap, if the caller provides it. Signed-off-by: Yu Zhao --- include/linux/kvm_host.h | 22 +++++++++++ include/linux/mmu_notifier.h | 52 ++++++++++++++++++++++++ mm/mmu_notifier.c | 24 ++++++++++++ virt/kvm/kvm_main.c | 76 +++++++++++++++++++++++++++++++++++- 4 files changed, 173 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0e571e973bc2..374262545f96 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -258,6 +258,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #ifdef KVM_ARCH_WANT_MMU_NOTIFIER struct kvm_gfn_range { struct kvm_memory_slot *slot; + void *args; gfn_t start; gfn_t end; pte_t pte; @@ -267,6 +268,27 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_should_clear_young(struct kvm_gfn_range *range, gfn_t gfn); +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range); +#endif + +/* + * Architectures that implement kvm_arch_test_clear_young() should override + * kvm_arch_has_test_clear_young(). + * + * kvm_arch_has_test_clear_young() is allowed to return false positive, i.e., it + * can return true if kvm_arch_test_clear_young() is supported but disabled due + * to some runtime constraint. In this case, kvm_arch_test_clear_young() should + * return true; otherwise, it should return false. + * + * For each young KVM PTE, kvm_arch_test_clear_young() should call + * kvm_should_clear_young() to decide whether to clear the accessed bit. + */ +#ifndef kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return false; +} #endif enum { diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 64a3e051c3c4..dfdbb370682d 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -60,6 +60,8 @@ enum mmu_notifier_event { }; #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0) +#define MMU_NOTIFIER_RANGE_LOCKLESS (1 << 1) +#define MMU_NOTIFIER_RANGE_YOUNG (1 << 2) struct mmu_notifier_ops { /* @@ -122,6 +124,10 @@ struct mmu_notifier_ops { struct mm_struct *mm, unsigned long address); + int (*test_clear_young)(struct mmu_notifier *mn, struct mm_struct *mm, + unsigned long start, unsigned long end, + bool clear, unsigned long *bitmap); + /* * change_pte is called in cases that pte mapping to page is changed: * for example, when ksm remaps pte to point to a new shared page. @@ -392,6 +398,9 @@ extern int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long end); extern int __mmu_notifier_test_young(struct mm_struct *mm, unsigned long address); +extern int __mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool clear, unsigned long *bitmap); extern void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); @@ -440,6 +449,35 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm, return 0; } +/* + * mmu_notifier_test_clear_young() returns nonzero if any of the KVM PTEs within + * a given range was young. Specifically, it returns MMU_NOTIFIER_RANGE_LOCKLESS + * if the fast path was successful, MMU_NOTIFIER_RANGE_YOUNG otherwise. + * + * The last parameter to the function is a bitmap and only the fast path + * supports it: if it is NULL, the function falls back to the slow path if the + * fast path was unsuccessful; otherwise, the function bails out. + * + * The bitmap has the following specifications: + * 1. The number of bits should be at least (end-start)/PAGE_SIZE. + * 2. The offset of each bit should be relative to the end, i.e., the offset + * corresponding to addr should be (end-addr)/PAGE_SIZE-1. This is convenient + * for batching while forward looping. + * + * When testing, this function sets the corresponding bit in the bitmap for each + * young KVM PTE. When clearing, this function clears the accessed bit for each + * young KVM PTE whose corresponding bit in the bitmap is set. + */ +static inline int mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool clear, unsigned long *bitmap) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_test_clear_young(mm, start, end, clear, bitmap); + + return 0; +} + static inline void mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte) { @@ -684,12 +722,26 @@ static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, return 0; } +static inline int mmu_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return 0; +} + static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { return 0; } +static inline int mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool clear, unsigned long *bitmap) +{ + return 0; +} + static inline void mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte) { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 50c0dde1354f..7e6aba4bddcb 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -424,6 +424,30 @@ int __mmu_notifier_test_young(struct mm_struct *mm, return young; } +int __mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool clear, unsigned long *bitmap) +{ + int idx; + struct mmu_notifier *mn; + int young = 0; + + idx = srcu_read_lock(&srcu); + + hlist_for_each_entry_srcu(mn, &mm->notifier_subscriptions->list, hlist, + srcu_read_lock_held(&srcu)) { + if (mn->ops->test_clear_young) + young |= mn->ops->test_clear_young(mn, mm, start, end, clear, bitmap); + + if (young && !clear) + break; + } + + srcu_read_unlock(&srcu, idx); + + return young; +} + void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 51e4882d0873..31ee58754b19 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -541,6 +541,7 @@ typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, typedef void (*on_unlock_fn_t)(struct kvm *kvm); struct kvm_hva_range { + void *args; unsigned long start; unsigned long end; pte_t pte; @@ -549,6 +550,7 @@ struct kvm_hva_range { on_unlock_fn_t on_unlock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -602,6 +604,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, hva_end = min(range->end, slot->userspace_addr + (slot->npages << PAGE_SHIFT)); + gfn_range.args = range->args; + /* * To optimize for the likely case where the address * range is covered by zero or one memslots, don't @@ -619,7 +623,7 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; - if (!locked) { + if (!range->lockless && !locked) { locked = true; KVM_MMU_LOCK(kvm); if (!IS_KVM_NULL_FN(range->on_lock)) @@ -628,6 +632,9 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, break; } ret |= range->handler(kvm, &gfn_range); + + if (range->lockless && ret) + break; } } @@ -880,6 +887,72 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, kvm_test_age_gfn); } +struct test_clear_young_args { + unsigned long *bitmap; + unsigned long end; + bool clear; + bool young; +}; + +bool kvm_should_clear_young(struct kvm_gfn_range *range, gfn_t gfn) +{ + struct test_clear_young_args *args = range->args; + + VM_WARN_ON_ONCE(gfn < range->start || gfn >= range->end); + + args->young = true; + + if (args->bitmap) { + int offset = hva_to_gfn_memslot(args->end - 1, range->slot) - gfn; + + if (args->clear) + return test_bit(offset, args->bitmap); + + __set_bit(offset, args->bitmap); + } + + return args->clear; +} + +static int kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn, struct mm_struct *mm, + unsigned long start, unsigned long end, + bool clear, unsigned long *bitmap) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + struct kvm_hva_range range = { + .start = start, + .end = end, + .on_lock = (void *)kvm_null_fn, + .on_unlock = (void *)kvm_null_fn, + }; + + trace_kvm_age_hva(start, end); + + if (kvm_arch_has_test_clear_young()) { + struct test_clear_young_args args = { + .bitmap = bitmap, + .end = end, + .clear = clear, + }; + + range.args = &args; + range.lockless = true; + range.handler = kvm_arch_test_clear_young; + + if (!__kvm_handle_hva_range(kvm, &range)) + return args.young ? MMU_NOTIFIER_RANGE_LOCKLESS : 0; + } + + if (bitmap) + return 0; + + range.args = NULL; + range.lockless = false; + range.handler = clear ? kvm_age_gfn : kvm_test_age_gfn; + + return __kvm_handle_hva_range(kvm, &range) ? MMU_NOTIFIER_RANGE_YOUNG : 0; +} + static void kvm_mmu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm) { @@ -898,6 +971,7 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { .clear_flush_young = kvm_mmu_notifier_clear_flush_young, .clear_young = kvm_mmu_notifier_clear_young, .test_young = kvm_mmu_notifier_test_young, + .test_clear_young = kvm_mmu_notifier_test_clear_young, .change_pte = kvm_mmu_notifier_change_pte, .release = kvm_mmu_notifier_release, }; From patchwork Fri May 26 23:44:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257474 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B300EC7EE2C for ; Fri, 26 May 2023 23:45:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=4/b+cIjIlYeoEEasavawfB8VhuWia5nNT8j7TXklxes=; b=jxxFYY5jSmVRTxij9L/ZrwBk+j F09LWZYU2J142Vz23tyjBWTYKgKrWUOCL1DquLQSPrslUqwAcmiIXa7/UZEO+Hjd6CqBpyHQfsGKF ltoYWZ342keCoZ/GmEn6MAFJYlyxgsyog+ly/ze+oB8xoHFJfeudvnPD2sxdz6prLNQMHeMRpnzA6 S53fScssSft6wOSLJm4mLVBt/9YawJmpL8ao21kj4ZbRFaBF+QOjxux6O0AzHanmZcoiNHPowpU9S Z5nd4Z2aoQ4dSZNFHZ+bfflhfRBcM3m25MGXcmzz30+WjuIUpCsFxCHJzIOzEmzYAPMERZBLp+D0h JyIGItew==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h79-004KXH-2S; Fri, 26 May 2023 23:44:55 +0000 Received: from mail-yw1-x114a.google.com ([2607:f8b0:4864:20::114a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h6z-004KO0-1T for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:47 +0000 Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-561eb6c66f6so17969297b3.0 for ; Fri, 26 May 2023 16:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144683; x=1687736683; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=74b92Wk/fESh4Us3jjAt9ER/dw3pjjcYNf6aM3EASmw=; b=HaeURiP8Vv+S5DfSxjNudO8jlQJ3LkpTBadHgzP5bqFZBs2b3foI2By3BUQ7HvtvPw W7QXj6uSkaA/QUNh4WGZeI/+nH65GkUhIUrM4lyZvIxfASIS2yU+XXbU6dLJN4ghefvO HpvDwxabNs2VhwI1FmYErw8clASpd/3GLJz6qPYetihMIY0W6vexsuS/yVGVZpRWw3il an2SbsIcj+SL6k3rb4VmBY4XZ/nf3i0WPF6L2efKh0GQc6ZIGr6GiKv97NKXrU9K5c92 qSgR2Y5nDu/cn+wzryf1LV/5CRQgKNqjczW+qgHgoPm+B6opAwXo+9yxImX5iYvtFZ8h Yz0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144683; x=1687736683; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=74b92Wk/fESh4Us3jjAt9ER/dw3pjjcYNf6aM3EASmw=; b=Z1fVjOYuX4EhWobxK5nl1K5j3PlDgJp6olb3dssY5q9ScsIHPhf3wu7uqBLcNC9L/c ztibDCLIgSUgbt1HGdQSuLAAQtdJWkJNPqfhRg84W8Ly6zOCLWVhJSVeaF/cv9Y7ZkzJ AMJ7tV9pLs9GI3Lefi3Ak9NqDbKDIBokPtA3FCppxoViLMkurgUbZtpGmNoD0VjgSU8M MCZzy+s4//rwT+qubYw98nAkTMgEbOx+aXliKc+PwfDysMFvHqQpBt0+F4MSbcNMA0b+ NQIKUDuZoB0SUXDEQAYhDr/2j+Clgghk3+4N82xbPALNAHQKtRAwVC4uIo+1QfUj+SL4 72nQ== X-Gm-Message-State: AC+VfDxma5Qtoq3ux0uj6YXQ0Um+8axkyKuXVg4hgDBe9Ya7Wd5AA3Ij 5b34OCiou84dJ0ofjBoe5Q0sLzQUPJQ= X-Google-Smtp-Source: ACHHUZ77t8VJ8lKvCLJIppCq4GXisPABEajtkBmVe9W7YGxSZPGguPwTv3qLn5OBBx4lDI9spcGEUr4VdcA= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a81:e608:0:b0:561:c4ef:1def with SMTP id u8-20020a81e608000000b00561c4ef1defmr2027484ywl.0.1685144683190; Fri, 26 May 2023 16:44:43 -0700 (PDT) Date: Fri, 26 May 2023 17:44:27 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-3-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 02/10] mm/kvm: use mmu_notifier_ops->test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164445_496089_D3903F22 X-CRM114-Status: GOOD ( 22.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Replace test_young() and clear_young() with test_clear_young(). Signed-off-by: Yu Zhao --- include/linux/mmu_notifier.h | 29 ++----------------- include/trace/events/kvm.h | 15 ---------- mm/mmu_notifier.c | 42 ---------------------------- virt/kvm/kvm_main.c | 54 ------------------------------------ 4 files changed, 2 insertions(+), 138 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index dfdbb370682d..c8f35fc08703 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -104,26 +104,6 @@ struct mmu_notifier_ops { unsigned long start, unsigned long end); - /* - * clear_young is a lightweight version of clear_flush_young. Like the - * latter, it is supposed to test-and-clear the young/accessed bitflag - * in the secondary pte, but it may omit flushing the secondary tlb. - */ - int (*clear_young)(struct mmu_notifier *subscription, - struct mm_struct *mm, - unsigned long start, - unsigned long end); - - /* - * test_young is called to check the young/accessed bitflag in - * the secondary pte. This is used to know if the page is - * frequently used without actually clearing the flag or tearing - * down the secondary mapping on the page. - */ - int (*test_young)(struct mmu_notifier *subscription, - struct mm_struct *mm, - unsigned long address); - int (*test_clear_young)(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, unsigned long end, bool clear, unsigned long *bitmap); @@ -393,11 +373,6 @@ extern void __mmu_notifier_release(struct mm_struct *mm); extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, unsigned long start, unsigned long end); -extern int __mmu_notifier_clear_young(struct mm_struct *mm, - unsigned long start, - unsigned long end); -extern int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address); extern int __mmu_notifier_test_clear_young(struct mm_struct *mm, unsigned long start, unsigned long end, bool clear, unsigned long *bitmap); @@ -437,7 +412,7 @@ static inline int mmu_notifier_clear_young(struct mm_struct *mm, unsigned long end) { if (mm_has_notifiers(mm)) - return __mmu_notifier_clear_young(mm, start, end); + return __mmu_notifier_test_clear_young(mm, start, end, true, NULL); return 0; } @@ -445,7 +420,7 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { if (mm_has_notifiers(mm)) - return __mmu_notifier_test_young(mm, address); + return __mmu_notifier_test_clear_young(mm, address, address + 1, false, NULL); return 0; } diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 3bd31ea23fee..46c347e56e60 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -489,21 +489,6 @@ TRACE_EVENT(kvm_age_hva, __entry->start, __entry->end) ); -TRACE_EVENT(kvm_test_age_hva, - TP_PROTO(unsigned long hva), - TP_ARGS(hva), - - TP_STRUCT__entry( - __field( unsigned long, hva ) - ), - - TP_fast_assign( - __entry->hva = hva; - ), - - TP_printk("mmu notifier test age hva: %#016lx", __entry->hva) -); - #endif /* _TRACE_KVM_MAIN_H */ /* This part must be outside protection */ diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 7e6aba4bddcb..c7e9747c9920 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -382,48 +382,6 @@ int __mmu_notifier_clear_flush_young(struct mm_struct *mm, return young; } -int __mmu_notifier_clear_young(struct mm_struct *mm, - unsigned long start, - unsigned long end) -{ - struct mmu_notifier *subscription; - int young = 0, id; - - id = srcu_read_lock(&srcu); - hlist_for_each_entry_rcu(subscription, - &mm->notifier_subscriptions->list, hlist, - srcu_read_lock_held(&srcu)) { - if (subscription->ops->clear_young) - young |= subscription->ops->clear_young(subscription, - mm, start, end); - } - srcu_read_unlock(&srcu, id); - - return young; -} - -int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address) -{ - struct mmu_notifier *subscription; - int young = 0, id; - - id = srcu_read_lock(&srcu); - hlist_for_each_entry_rcu(subscription, - &mm->notifier_subscriptions->list, hlist, - srcu_read_lock_held(&srcu)) { - if (subscription->ops->test_young) { - young = subscription->ops->test_young(subscription, mm, - address); - if (young) - break; - } - } - srcu_read_unlock(&srcu, id); - - return young; -} - int __mmu_notifier_test_clear_young(struct mm_struct *mm, unsigned long start, unsigned long end, bool clear, unsigned long *bitmap) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 31ee58754b19..977baaf1b248 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -674,25 +674,6 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, return __kvm_handle_hva_range(kvm, &range); } -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - hva_handler_t handler) -{ - struct kvm *kvm = mmu_notifier_to_kvm(mn); - const struct kvm_hva_range range = { - .start = start, - .end = end, - .pte = __pte(0), - .handler = handler, - .on_lock = (void *)kvm_null_fn, - .on_unlock = (void *)kvm_null_fn, - .flush_on_ret = false, - .may_block = false, - }; - - return __kvm_handle_hva_range(kvm, &range); -} static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long address, @@ -854,39 +835,6 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, return kvm_handle_hva_range(mn, start, end, __pte(0), kvm_age_gfn); } -static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long start, - unsigned long end) -{ - trace_kvm_age_hva(start, end); - - /* - * Even though we do not flush TLB, this will still adversely - * affect performance on pre-Haswell Intel EPT, where there is - * no EPT Access Bit to clear so that we have to tear down EPT - * tables instead. If we find this unacceptable, we can always - * add a parameter to kvm_age_hva so that it effectively doesn't - * do anything on clear_young. - * - * Also note that currently we never issue secondary TLB flushes - * from clear_young, leaving this job up to the regular system - * cadence. If we find this inaccurate, we might come up with a - * more sophisticated heuristic later. - */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); -} - -static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long address) -{ - trace_kvm_test_age_hva(address); - - return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn); -} - struct test_clear_young_args { unsigned long *bitmap; unsigned long end; @@ -969,8 +917,6 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, .clear_flush_young = kvm_mmu_notifier_clear_flush_young, - .clear_young = kvm_mmu_notifier_clear_young, - .test_young = kvm_mmu_notifier_test_young, .test_clear_young = kvm_mmu_notifier_test_clear_young, .change_pte = kvm_mmu_notifier_change_pte, .release = kvm_mmu_notifier_release, From patchwork Fri May 26 23:44:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257472 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1E6BEC77B7C for ; Fri, 26 May 2023 23:45:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=qar4cx4/mr2qYr+qJX7ocsle4/BV+ou8AS5VmQOjA8U=; b=LxMBgsWPBcV5LAJdTKPG4ywfb5 dyhfNonUzqvMgIDJEGBbA0wmuTDK1gkdgu3ePDSCN+EvgTKUGt7BTyuaCYr6regHAKQXxHrpS8gqg FkTlPSTiLelUoKxPwOpKslbneoAIOvdsA0AxldJDlokpSXnXPFDMauH5dwjpruPinTTREPK04l5Pr 16a2dxmgCaoCph8SISSptws8Noq2GzFKgPCwVFqzwe3e1ppw1zHPKrhioFvq9CY6lLfhbjuxzhTYJ 1yfE8KIG47q2gA0zuQw02rtQ9g7GB98Ji0FLYe2XzW+p/7Gbky0J73B2KrtiLjHKx6e7e9rW5iE5s SVJ7jbCw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h7A-004KY2-1c; Fri, 26 May 2023 23:44:56 +0000 Received: from mail-yw1-x114a.google.com ([2607:f8b0:4864:20::114a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h70-004KON-0n for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:48 +0000 Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-561f201a646so31029727b3.1 for ; Fri, 26 May 2023 16:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144684; x=1687736684; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W5mqYLgSgL41U5MVABcpsNGg2VTYVhdnWBmYAGgA32M=; b=lFfC+K1prBwz+QWauf5t3Z5nGOG+XH3WsUV6BbAJbu8yEh9dwSmQ7iKcNTjxofo1xQ drv4Lmvn48X7bRRKv2GPWasyFIShdI23SCQ+h/TYUmoXc/hOnlnscK719N8Jpf3m7n6U mEaD6Epy7t6qJzvv/9f9Jp/JqQmnMe81faOyaQAqHTEXDufnGK7wY+Ey5ZFKaBi7dzJd 6O8PEoHW6NsTzElKeYmJHsoCaRTvtaP/AkDf6+cRfXVKyOf9lMpoYxPIHrUwcDj+nl2+ 7q+JpYkRKelWoBaQmJuf7/axxLP6kaz1KnqZQAEmA4XWZSzVgynpinko26WGtZXkTjpp oXNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144684; x=1687736684; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W5mqYLgSgL41U5MVABcpsNGg2VTYVhdnWBmYAGgA32M=; b=kyw/AKT68coxAgOglR1YgpHyY/EfwDOCSsXCcANL6hpEAS5NGb1u9kBselflx/VYH8 oak3azGkl1M27X+oGevRrgZsFJtDracOfmw6tbH443WTK4hXCDFMS4pow8FDivB6YPPt FLNjqhonwgR7cknjT7SZtxhkMxqnehGNkSDkbnEUWP/S7JPIJQY8Dj8gsz4cFgWKp55J PheUzmeA05TQgpd1QbG8HfH8cdlB3/Nni1msI6podLU2BJjVt12AHqFMVyec9ndZRt9q 1P6Fy7rDjXmz69Cn1SnVUi6am2RJvVrZZXAXhXUfZMkL/4wHsoogfc8IG0K/xQHwiW3p XYPg== X-Gm-Message-State: AC+VfDxak1B/zbwq0tt4KDv4RaDce6mbbuUU2Lb/4SdIwk96ISUEh1cH QGT2dzzj90ae0+pvqiZRQ25tRz06vcg= X-Google-Smtp-Source: ACHHUZ5aYhy2MBmaWD1q0FxpVfKGE+Nrskt2Z7iv2nNf760O+qZ11RhzmW2rToUdRrv6PXs76j2V3Kvga7o= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a81:4421:0:b0:565:9f59:664f with SMTP id r33-20020a814421000000b005659f59664fmr2006806ywa.6.1685144684605; Fri, 26 May 2023 16:44:44 -0700 (PDT) Date: Fri, 26 May 2023 17:44:28 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-4-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 03/10] kvm/arm64: export stage2_try_set_pte() and macros From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164446_284030_2B3F2608 X-CRM114-Status: GOOD ( 14.21 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org stage2_try_set_pte() and KVM_PTE_LEAF_ATTR_LO_S2_AF are needed to implement kvm_arch_test_clear_young(). Signed-off-by: Yu Zhao --- arch/arm64/include/asm/kvm_pgtable.h | 53 ++++++++++++++++++++++++++++ arch/arm64/kvm/hyp/pgtable.c | 53 ---------------------------- 2 files changed, 53 insertions(+), 53 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index dc3c072e862f..ff520598b62c 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -44,6 +44,49 @@ typedef u64 kvm_pte_t; #define KVM_PHYS_INVALID (-1ULL) +#define KVM_PTE_TYPE BIT(1) +#define KVM_PTE_TYPE_BLOCK 0 +#define KVM_PTE_TYPE_PAGE 1 +#define KVM_PTE_TYPE_TABLE 1 + +#define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2) + +#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2) +#define KVM_PTE_LEAF_ATTR_LO_S1_AP GENMASK(7, 6) +#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO 3 +#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW 1 +#define KVM_PTE_LEAF_ATTR_LO_S1_SH GENMASK(9, 8) +#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS 3 +#define KVM_PTE_LEAF_ATTR_LO_S1_AF BIT(10) + +#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR GENMASK(5, 2) +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R BIT(6) +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W BIT(7) +#define KVM_PTE_LEAF_ATTR_LO_S2_SH GENMASK(9, 8) +#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS 3 +#define KVM_PTE_LEAF_ATTR_LO_S2_AF BIT(10) + +#define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 51) + +#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55) + +#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54) + +#define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54) + +#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \ + KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \ + KVM_PTE_LEAF_ATTR_HI_S2_XN) + +#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2) +#define KVM_MAX_OWNER_ID 1 + +/* + * Used to indicate a pte for which a 'break-before-make' sequence is in + * progress. + */ +#define KVM_INVALID_PTE_LOCKED BIT(10) + static inline bool kvm_pte_valid(kvm_pte_t pte) { return pte & KVM_PTE_VALID; @@ -224,6 +267,16 @@ static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *c return ctx->flags & KVM_PGTABLE_WALK_SHARED; } +static inline bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new) +{ + if (!kvm_pgtable_walk_shared(ctx)) { + WRITE_ONCE(*ctx->ptep, new); + return true; + } + + return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old; +} + /** * struct kvm_pgtable_walker - Hook into a page-table walk. * @cb: Callback function to invoke during the walk. diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 5282cb9ca4cf..24678ccba76a 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -12,49 +12,6 @@ #include -#define KVM_PTE_TYPE BIT(1) -#define KVM_PTE_TYPE_BLOCK 0 -#define KVM_PTE_TYPE_PAGE 1 -#define KVM_PTE_TYPE_TABLE 1 - -#define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2) - -#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2) -#define KVM_PTE_LEAF_ATTR_LO_S1_AP GENMASK(7, 6) -#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO 3 -#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW 1 -#define KVM_PTE_LEAF_ATTR_LO_S1_SH GENMASK(9, 8) -#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS 3 -#define KVM_PTE_LEAF_ATTR_LO_S1_AF BIT(10) - -#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR GENMASK(5, 2) -#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R BIT(6) -#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W BIT(7) -#define KVM_PTE_LEAF_ATTR_LO_S2_SH GENMASK(9, 8) -#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS 3 -#define KVM_PTE_LEAF_ATTR_LO_S2_AF BIT(10) - -#define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 51) - -#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55) - -#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54) - -#define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54) - -#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \ - KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \ - KVM_PTE_LEAF_ATTR_HI_S2_XN) - -#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2) -#define KVM_MAX_OWNER_ID 1 - -/* - * Used to indicate a pte for which a 'break-before-make' sequence is in - * progress. - */ -#define KVM_INVALID_PTE_LOCKED BIT(10) - struct kvm_pgtable_walk_data { struct kvm_pgtable_walker *walker; @@ -702,16 +659,6 @@ static bool stage2_pte_is_locked(kvm_pte_t pte) return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED); } -static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new) -{ - if (!kvm_pgtable_walk_shared(ctx)) { - WRITE_ONCE(*ctx->ptep, new); - return true; - } - - return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old; -} - /** * stage2_try_break_pte() - Invalidates a pte according to the * 'break-before-make' requirements of the From patchwork Fri May 26 23:44:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257473 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 65D57C77B7A for ; Fri, 26 May 2023 23:45:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=3ppn/Q1sMsP3wyLX4KKjTpm2yavXoTQP/G71VXQQlSw=; b=vCqo7q8Vph0V2Ch9QG2/QLmRMT FmTsm0LFooARETf4vxeGoPQ+J4HskMb637/dJO2I9iM1GfufVaGuuUwX4YMjZ+fxHFZue+iQvBB3+ m4cDj/ibI6X9AeUvjqlR38bEh6PnKLRVS1dBhZg+jVLTphAEDaOEVdWZfaDJp2moCuMb3OR9kDz6Q ditNnJD96rjy7KOtBVBOGNu4cdHXb7dTJOnaLW9Qxqgpi8g7LNHNc+NefUsewo1HzP1c/OMzziVsd xcfHSfFRdSS4z4mhsqpGnHffdPgzZyXl7WOpOtGHHL+5pGoGXdgiqaflcc83TncWqiw+CsUiTPES+ 3YAKWJNQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h7D-004KbO-0D; Fri, 26 May 2023 23:44:59 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h72-004KQ0-2A for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:50 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-565c380565dso13130097b3.1 for ; Fri, 26 May 2023 16:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144686; x=1687736686; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9t4P2GvyPz5aecPxpWswvlll2je3r362Snycxj82bkU=; b=cWUN3P8rM/d3APcGkGsw1zxYbzC0SLP23YWYAHvNkl8en24zIhenJIzsgauwSrM86L GtpWgjW+9PaIY51P7YD5ERtfy398MWWnA2qbOrXT+m6cJHuvmS8It0VeeYLeOMql5pG+ oF4R8PaA47tg1/M/GaUTjyQPHdY2Yt35bMOgs0yeO0A95+MuHExqrRIRhATT/aSAcAvS DuZIw+jQWwcC1PbwFYK5fVCc5aYJ2bcpXcCLA842C0fEFJDU8WE3MLQI6mTQVW9b35Gn iKpM9XslaKQGwLt7gqMS0LD/fBOZjhpB3/6lsoxW54UOFTglxZ15DmEhLn/Wm52Z8EKo 74zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144686; x=1687736686; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9t4P2GvyPz5aecPxpWswvlll2je3r362Snycxj82bkU=; b=PlZ7CELiidu3UjD8h89to0yzu7HzpCW9rQ8H2yLUFYV/jAa7+kyHu/+VtSymPWPv4g 7S2P0E2iGVYNo+YIJ4FBvKmAGuwIrYZwNyVH5kQYavFZwkGpG9N/ShO8ke6hRegNgH8F 4mrS0m2o4yru2PufgnEB08Ipc+lZtO5cWv0rRlm1K6hEMS2vNZNfjAxCot7FZMgCzUO3 jjwKawvE26PTKLY/+gCK4KMDORqj7dQFxPLaLPcdfz+NfyYJEknDYtIkSvz+SOoxuokF PDw8r3XKTrrAWz2R+lKpGn2E0N8DBGUCFCJOHwhg97/Qz7ogpxUYL52/ayxEUBkrPBXT BeXA== X-Gm-Message-State: AC+VfDyM5kst0UGbxibq9dwxzSyYFALrcB3UQ1TUgx2lNoGxdRH1PqKi lpym40fvbJnGcd3chhDvxTtkWsgxI18= X-Google-Smtp-Source: ACHHUZ671mv9b1WNngTqma0ZpDsFyu7jCdBVYj6cC/Q6bUpKz7y4SPqrDazly1Z4TuFOLliyXqQloqgktcM= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a81:b627:0:b0:559:d859:d749 with SMTP id u39-20020a81b627000000b00559d859d749mr437593ywh.5.1685144686225; Fri, 26 May 2023 16:44:46 -0700 (PDT) Date: Fri, 26 May 2023 17:44:29 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-5-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 04/10] kvm/arm64: make stage2 page tables RCU safe From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164448_705909_CF7E4335 X-CRM114-Status: GOOD ( 18.53 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Stage2 page tables are currently not RCU safe against unmapping or VM destruction. The previous mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with those operations. However, the new mmu_notifier_ops member test_clear_young() provides a fast path that does not take kvm->mmu_lock. To implement kvm_arch_test_clear_young() for that path, unmapped page tables need to be freed by RCU and kvm_free_stage2_pgd() needs to be after mmu_notifier_unregister(). Remapping, specifically stage2_free_removed_table(), is already RCU safe. Signed-off-by: Yu Zhao --- arch/arm64/include/asm/kvm_pgtable.h | 2 ++ arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/hyp/pgtable.c | 8 ++++++-- arch/arm64/kvm/mmu.c | 17 ++++++++++++++++- 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index ff520598b62c..5cab52e3a35f 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -153,6 +153,7 @@ static inline bool kvm_level_supports_block_mapping(u32 level) * @put_page: Decrement the refcount on a page. When the * refcount reaches 0 the page is automatically * freed. + * @put_page_rcu: RCU variant of the above. * @page_count: Return the refcount of a page. * @phys_to_virt: Convert a physical address into a virtual * address mapped in the current context. @@ -170,6 +171,7 @@ struct kvm_pgtable_mm_ops { void (*free_removed_table)(void *addr, u32 level); void (*get_page)(void *addr); void (*put_page)(void *addr); + void (*put_page_rcu)(void *addr); int (*page_count)(void *addr); void* (*phys_to_virt)(phys_addr_t phys); phys_addr_t (*virt_to_phys)(void *addr); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 14391826241c..ee93271035d9 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -191,6 +191,7 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) */ void kvm_arch_destroy_vm(struct kvm *kvm) { + kvm_free_stage2_pgd(&kvm->arch.mmu); bitmap_free(kvm->arch.pmu_filter); free_cpumask_var(kvm->arch.supported_cpus); diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 24678ccba76a..dbace4c6a841 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -988,8 +988,12 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops), kvm_granule_size(ctx->level)); - if (childp) - mm_ops->put_page(childp); + if (childp) { + if (mm_ops->put_page_rcu) + mm_ops->put_page_rcu(childp); + else + mm_ops->put_page(childp); + } return 0; } diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 3b9d4d24c361..c3b3e2afe26f 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -172,6 +172,21 @@ static int kvm_host_page_count(void *addr) return page_count(virt_to_page(addr)); } +static void kvm_s2_rcu_put_page(struct rcu_head *head) +{ + put_page(container_of(head, struct page, rcu_head)); +} + +static void kvm_s2_put_page_rcu(void *addr) +{ + struct page *page = virt_to_page(addr); + + if (kvm_host_page_count(addr) == 1) + kvm_account_pgtable_pages(addr, -1); + + call_rcu(&page->rcu_head, kvm_s2_rcu_put_page); +} + static phys_addr_t kvm_host_pa(void *addr) { return __pa(addr); @@ -704,6 +719,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = { .free_removed_table = stage2_free_removed_table, .get_page = kvm_host_get_page, .put_page = kvm_s2_put_page, + .put_page_rcu = kvm_s2_put_page_rcu, .page_count = kvm_host_page_count, .phys_to_virt = kvm_host_va, .virt_to_phys = kvm_host_pa, @@ -1877,7 +1893,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) void kvm_arch_flush_shadow_all(struct kvm *kvm) { - kvm_free_stage2_pgd(&kvm->arch.mmu); } void kvm_arch_flush_shadow_memslot(struct kvm *kvm, From patchwork Fri May 26 23:44:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7E8C3C7EE32 for ; Fri, 26 May 2023 23:45:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=3EEmE0zte7XN+IgwYonMak4arj+oRyUzi4sUoQLFzzo=; b=sEO2s0bOG7oNs2mmbWkC5wUAnF 0RM60CBkvO0Yh/8oW2V+h2M3XrFECH1bya4cEjSYTAlNUOKp343fQVqva9YTUiT+U3WeiTGE+9lC5 QFeZwBpdcICMqbdgA4LEcLNGcg9Wc99iZ/myst4yxDHKRO/GyMnwR3nAAEPdg7oCj4C7ARpNLgG9V I1jqJRqh+Y7sAXx7e23sPvOJDzGE8hSzwDsqOj0FYkZlsw9MNsVUjQlEUx+nrCrK41P55YIP90fZR qwVBn++XdrC7ujhiyfrBE+kkEO3TlJ8VQBDP3h1MoezWthIoNjUD8f701JYx2BX431kR3zyf419ky LZ8FTFmA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h7C-004Ka5-0i; Fri, 26 May 2023 23:44:58 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h72-004KQS-2I for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:50 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bacfa4eefd3so1681556276.3 for ; Fri, 26 May 2023 16:44:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144687; x=1687736687; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gdtrmeQ7r8fSzyqqoeow0Oco6w18CVCYcKcBQXKvLk0=; b=A+5tnxly4jAziM7PCgC2JSUrvZIuWUL9UuTqWd7F/s/V7U9ZvBSUtbCMn4PDjXhEkh xPkXA8ijRXU8kfBAAJh45kvENT/cQ+0BqqVDKv4CzNn3FeibDZ4esB2AS5+kj2xWW1A/ 5f+rTWaRryOWr0eLpQP68cwlR3USUfp7Ely4h3ulPbhNc/8a6O9EbhRXY0Hc8yu56mx8 LQMCwDasuefCJVI+rIDEYvr3/19bdInHAEUvVq6phC2Jc21wzdoWQlzM7WDhIq/i18eK BEwyErOKi0xkhmnwWH0N/n1UzNpwg5kRKa2pc24aeVs5XXh0p7GOcbD7gzDkKgQaeHXr C7IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144687; x=1687736687; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gdtrmeQ7r8fSzyqqoeow0Oco6w18CVCYcKcBQXKvLk0=; b=MaYwW6BsqnpKJZ/vUXrzYEMQnngaUNG9vvYpJcmeVLNG4w31q2O8z+GIG1wGm3yx4G 2V/mBBS9AN8um3R7iPPuup6z9cfnqVzkmuY58SH8Hs07tja3EgnUVY5pfiiq3MkCDhH3 MQJtlACqZudpTN0HDQxiq0kR/SDILIDoit8BfyQP8fKOICkF/Pvi9Kxdv3qk3IJIPqvJ ikZyLpLO/AYoluNP3C7vigJucuIQeKaL7kV8WosT2S51Xj/iG1s1/MNuQ/m67ifUZUvh Xt1LyGQZBXNLTfh6c8Q4P+PUeeD8pX4gfowr88tMjWbttvthscdXLQQLEILNhSGViQqy fulQ== X-Gm-Message-State: AC+VfDxAISu8LfKa1ntbbL7uC/zVoFRUYAcN5swFQAuj+KiYBnxilR2D fBDHaOX+7+olYLDFln4LdlOAJPF//zo= X-Google-Smtp-Source: ACHHUZ75roT4ZouTjM2lH4DZFyTBmS3QLSjPTFGxgUaEGtzpG1LtCtXvcTQXcsV4/+3guLPQ6FLc+WkYaOc= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a25:b28d:0:b0:bad:155a:1004 with SMTP id k13-20020a25b28d000000b00bad155a1004mr1830575ybj.2.1685144687512; Fri, 26 May 2023 16:44:47 -0700 (PDT) Date: Fri, 26 May 2023 17:44:30 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-6-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 05/10] kvm/arm64: add kvm_arch_test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164448_763555_07A157E2 X-CRM114-Status: GOOD ( 13.71 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Implement kvm_arch_test_clear_young() to support the fast path in mmu_notifier_ops->test_clear_young(). It focuses on a simple case, i.e., hardware sets the accessed bit in KVM PTEs and VMs are not protected, where it can rely on RCU and cmpxchg to safely clear the accessed bit without taking kvm->mmu_lock. Complex cases fall back to the existing slow path where kvm->mmu_lock is then taken. Signed-off-by: Yu Zhao --- arch/arm64/include/asm/kvm_host.h | 6 ++++++ arch/arm64/kvm/mmu.c | 36 +++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 7e7e19ef6993..da32b0890716 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1113,4 +1113,10 @@ static inline void kvm_hyp_reserve(void) { } void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu); bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu); +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return cpu_has_hw_af() && !is_protected_kvm_enabled(); +} + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c3b3e2afe26f..26a8d955b49c 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1678,6 +1678,42 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) range->start << PAGE_SHIFT); } +static int stage2_test_clear_young(const struct kvm_pgtable_visit_ctx *ctx, + enum kvm_pgtable_walk_flags flags) +{ + kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF; + + VM_WARN_ON_ONCE(!page_count(virt_to_page(ctx->ptep))); + + if (!kvm_pte_valid(new)) + return 0; + + if (new == ctx->old) + return 0; + + if (kvm_should_clear_young(ctx->arg, ctx->addr / PAGE_SIZE)) + stage2_try_set_pte(ctx, new); + + return 0; +} + +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range) +{ + u64 start = range->start * PAGE_SIZE; + u64 end = range->end * PAGE_SIZE; + struct kvm_pgtable_walker walker = { + .cb = stage2_test_clear_young, + .arg = range, + .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_SHARED, + }; + + BUILD_BUG_ON(is_hyp_code()); + + kvm_pgtable_walk(kvm->arch.mmu.pgt, start, end - start, &walker); + + return false; +} + phys_addr_t kvm_mmu_get_httbr(void) { return __pa(hyp_pgtable->pgd); From patchwork Fri May 26 23:44:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 411F7C7EE31 for ; Fri, 26 May 2023 23:45:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=RXb8lmJQ6LwYODSTird6sAz5ei7H4YB8JznpiQIRG0c=; b=KrSZa01frM3Aur6LYeSLLtUzR4 tzLt3Q+H+M/LXXuo6hzczQZOGZtzi6xCqeu95gxmOYhnxxpEobV46TzAZj5RNCCJASjkyz58B12j3 gQBBPuC/sl5J9J/4F8QZoSs5jOr2ZneI/p3oJpQpnpRcmbjktbHvD2mK7hvmfme37g9Wz/u0786yn v8JNpah0yEDC9xqkBn0hF21FdYqoyo3mFPR5PERdkkc/Y+uIo8dNRTXJObEPeL3d/8kFSUhIzZO8L HHkbKsCzaTjte16zd+AgF2CkeCynCvkD7YNTKuZZzIZBPSn8byXMA8+SQFlQLkykEZOB2w/3DSiAo w9ut28Ug==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h7E-004KcF-0W; Fri, 26 May 2023 23:45:00 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h74-004KRl-0n for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:51 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-babb53e6952so2824745276.0 for ; Fri, 26 May 2023 16:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144689; x=1687736689; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Bguo3hUZaLYxse9v+ik+MX3ufqUdHR10jXNpIhPD3cE=; b=Y62V38pt3Kf7P9M2YivEC5661SUHDQjGPJeLNpi65nB4y+p3TsqoRR0z/9AOIr1Kjf gl3yg0EBv9UJcyGGhkoV+eP5nxHHEckaVjNM+WV0ph8y6N8NKNrOoxejQzf6xVLF0GNF CMPzW7iyFKrk0I+YLOBH1dB2cwpR8z8XT9p+1tlyni+H85cDXJGB/ZGkaiDpbVC82/o/ GKwW1qcFloQI+vCY/C+X0I8xbKAWwMlWN0KpQsh6osmLARaX2g1tNlgMEbkEwKadQaPu 8s3poXNR7qoLyRgjp4108zRCziAP6oNak8GIUn6lV7gRCswVoQjlt0PKr+oTkaiyWyY0 Ciug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144689; x=1687736689; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bguo3hUZaLYxse9v+ik+MX3ufqUdHR10jXNpIhPD3cE=; b=JmvOnLWCHTqHbY9dkbTNgcjD7OLXYnucMi5DvQnqdgMz9H9DWyc/DMljl8hdyGOCPw ps9HMFq+F17/SvRslqTD3nGWeIWOGw18VQr/QUz8FY7yVwz+/Qq/caE6oEc96Cr3lwHb pQGYP7FVJyid8skKjkiUMtbk2V48rfMUejlqfdG0C3R0lHZ+4TQD26BVSSIpN2g3x2eA Xu31g33uv/zS1fqVIvQhNzk8KbU99n28EEbjX4oPZDTRPDTm4DpdrZmGcPNWRlJGoBGr t843eDs1sYMG5Mpwxw82Y9eJntRXfouZ9LcCMCUw5Pnypt7iaRbhT/s1vPradEaBsjFD AOoA== X-Gm-Message-State: AC+VfDxVMfBg8swSUCKv0SNdljLgjqTmZmDGpWGLg6bxloFFlegr+03t R7MbzjSiSWbPQ6kKjhX30tdu5Sm6t/c= X-Google-Smtp-Source: ACHHUZ72Yke0umLVJsAYBjLDCmB2C1ZbHNslgBIJrMfaEGzX80HaCkEwKhsoBR2khvT0ady9wDj4p4x/v3M= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a25:3cb:0:b0:ba8:337a:d8a3 with SMTP id 194-20020a2503cb000000b00ba8337ad8a3mr1807757ybd.11.1685144689141; Fri, 26 May 2023 16:44:49 -0700 (PDT) Date: Fri, 26 May 2023 17:44:31 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-7-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 06/10] kvm/powerpc: make radix page tables RCU safe From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164450_284545_48F399C9 X-CRM114-Status: GOOD ( 11.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org KVM page tables are currently not RCU safe against remapping, i.e., kvmppc_unmap_free_pmd_entry_table() et al. The previous mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with that operation. However, the new mmu_notifier_ops member test_clear_young() provides a fast path that does not take kvm->mmu_lock. To implement kvm_arch_test_clear_young() for that path, orphan page tables need to be freed by RCU. Unmapping, specifically kvm_unmap_radix(), does not free page tables, hence not a concern. Signed-off-by: Yu Zhao --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 461307b89c3a..3b65b3b11041 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void) { unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE; - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, + SLAB_TYPESAFE_BY_RCU, pte_ctor); if (!kvm_pte_cache) return -ENOMEM; size = sizeof(void *) << RADIX_PMD_INDEX_SIZE; - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor); + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, + SLAB_TYPESAFE_BY_RCU, pmd_ctor); if (!kvm_pmd_cache) { kmem_cache_destroy(kvm_pte_cache); return -ENOMEM; From patchwork Fri May 26 23:44:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257476 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 46FBDC7EE2F for ; Fri, 26 May 2023 23:45:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=LyKvFYu+rUNShxPjxOPqZOgPjbv/kLKdErxrSFq2eiE=; b=CP/e2cJ6bKum3SmstHfP/MhZYU J40rDZHXXBbWLjPEoQG2LmonKK6Ce/UCsIMG2tPet3X1H76q98FsBrWyk0X/YovadiCvxJsGB9Sfb /6CQ/JlbFdO8uYC/2v7D+XmKerzVCnicI4dZSCQrsphREpfKB3jwHD2DxGaW2zSPnOmuhpe6qtJ9E rwQ9FCLsDreWmiLyMv2jQU0cykHWhN6UBQJ1A92tyeuP/WbFJqxbP6Qc0HNbeuNSwV6/EHV7//aEA lhBJWpOx4x7pf0645LMYh8eguyysTxq24vgsOl1zCcAo9usHCUemvw4ie4F50Q56UqU0U589luw3+ pIxvW9uQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h7F-004KdQ-0N; Fri, 26 May 2023 23:45:01 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h76-004KT8-1L for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:54 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-babb5e91ab4so2849398276.0 for ; Fri, 26 May 2023 16:44:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144691; x=1687736691; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SAnqMQmXLaJfeZI+VeXqIwvB0doaLL+gp8U5/HUId5I=; b=k8BERP9A+6Uiw+Uaig20mMncTvjKAsRCKq+HC63zo5f14BiSqxPWFPVFrxvd4dUkfQ bKRY2Xn6Sj5cpDeqFqXaOaTxVBSBg6TnHG9z0eetRbywYU3ie/H8/CJOp1+Ez2wWWiIG CY1cPbCWNOAs/19D6iZC6kEHwAF0m7RMLVET0mpluU6zBeVTNhcDLR1hhBJpslaBZ/Nm 0fQXpAKGqnmpqrKCUVaNiYpgSsJF+6ODE5plgt1dvul/aWCITvYKB/5mkPp/iio/HINg 59adq2fXr8dYaO8RPmc4VSfW2nQVaqhTR9xX4O6C0F0j/4E6MdCu9N0KSqKFRTup9DJ6 82/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144691; x=1687736691; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SAnqMQmXLaJfeZI+VeXqIwvB0doaLL+gp8U5/HUId5I=; b=flIrBnFgzg8fy7WU2cYmx9yxSQQdVY5pR+dur5XTNHBppeoOybeRkowyLocJbLPcIA FwsmOICVmhNA/1o81oaZ/svW1MQ2HJE1Bc2y1XvvphGMhUSm24vDGAeWjOfuxa/YeY1L +aSiOZiNLB/ba78s5E6e/JJBdUt2c9en9yTl3btJekdifH5i/rI89Vn1uZchcWIdv37W iUl7pI/ch/JebdHs5yDPsYSkeVqbMljhDly1UaJdVrCqp33TA5MOu10EKSwD/fcQusQp 8CQaSPLxFq7PI4/3/YyfdQcLA6uIEsBBlMFqSOfbzSe43+iXP4R9Jop34+HmdGHS9SVA yMfg== X-Gm-Message-State: AC+VfDyUrF85EMhC2xRimsecU9Gy441f6pkOYLwLrjGu+bDHr0amQJRe 6sK9BC1nH5swJEFTcHHVkUx1cGpzgF0= X-Google-Smtp-Source: ACHHUZ56f5xRS5UFm0fGZF6KKL2sEC5mGRlgoj/Qp341qhbzjJy6m3cfCKx+HZdIQ3EgPejfKdLHbTHVxEI= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a25:8211:0:b0:ba6:e7ee:bb99 with SMTP id q17-20020a258211000000b00ba6e7eebb99mr1850590ybk.12.1685144690927; Fri, 26 May 2023 16:44:50 -0700 (PDT) Date: Fri, 26 May 2023 17:44:32 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-8-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 07/10] kvm/powerpc: add kvm_arch_test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164452_451630_2FDBBE6D X-CRM114-Status: GOOD ( 17.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Implement kvm_arch_test_clear_young() to support the fast path in mmu_notifier_ops->test_clear_young(). It focuses on a simple case, i.e., radix MMU sets the accessed bit in KVM PTEs and VMs are not nested, where it can rely on RCU and pte_xchg() to safely clear the accessed bit without taking kvm->mmu_lock. Complex cases fall back to the existing slow path where kvm->mmu_lock is then taken. Signed-off-by: Yu Zhao --- arch/powerpc/include/asm/kvm_host.h | 8 ++++ arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s.c | 6 +++ arch/powerpc/kvm/book3s.h | 1 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 59 ++++++++++++++++++++++++++ arch/powerpc/kvm/book3s_hv.c | 5 +++ 6 files changed, 80 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 14ee0dece853..75c260ea8a9e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -883,4 +883,12 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return IS_ENABLED(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && + cpu_has_feature(CPU_FTR_HVMODE) && cpu_has_feature(CPU_FTR_ARCH_300) && + radix_enabled(); +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 79a9c0bb8bba..ff1af6a7b44f 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -287,6 +287,7 @@ struct kvmppc_ops { bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); + bool (*test_clear_young)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); void (*free_memslot)(struct kvm_memory_slot *slot); int (*init_vm)(struct kvm *kvm); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 686d8d9eda3e..37bf40b0c4ff 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -899,6 +899,12 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm->arch.kvm_ops->test_age_gfn(kvm, range); } +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return !kvm->arch.kvm_ops->test_clear_young || + kvm->arch.kvm_ops->test_clear_young(kvm, range); +} + bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { return kvm->arch.kvm_ops->set_spte_gfn(kvm, range); diff --git a/arch/powerpc/kvm/book3s.h b/arch/powerpc/kvm/book3s.h index 58391b4b32ed..fa2659e21ccc 100644 --- a/arch/powerpc/kvm/book3s.h +++ b/arch/powerpc/kvm/book3s.h @@ -12,6 +12,7 @@ extern void kvmppc_core_flush_memslot_hv(struct kvm *kvm, extern bool kvm_unmap_gfn_range_hv(struct kvm *kvm, struct kvm_gfn_range *range); extern bool kvm_age_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range); extern bool kvm_test_age_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range); +extern bool kvm_test_clear_young_hv(struct kvm *kvm, struct kvm_gfn_range *range); extern bool kvm_set_spte_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range); extern int kvmppc_mmu_init_pr(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 3b65b3b11041..0a392e9a100a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -1088,6 +1088,65 @@ bool kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, return ref; } +bool kvm_test_clear_young_hv(struct kvm *kvm, struct kvm_gfn_range *range) +{ + bool err; + gfn_t gfn = range->start; + + rcu_read_lock(); + + err = !kvm_is_radix(kvm); + if (err) + goto unlock; + + /* + * Case 1: This function kvmppc_switch_mmu_to_hpt() + * + * rcu_read_lock() + * Test kvm_is_radix() kvm->arch.radix = 0 + * Use kvm->arch.pgtable synchronize_rcu() + * rcu_read_unlock() + * kvmppc_free_radix() + * + * + * Case 2: This function kvmppc_switch_mmu_to_radix() + * + * kvmppc_init_vm_radix() + * smp_wmb() + * Test kvm_is_radix() kvm->arch.radix = 1 + * smp_rmb() + * Use kvm->arch.pgtable + */ + smp_rmb(); + + while (gfn < range->end) { + pte_t *ptep; + pte_t old, new; + unsigned int shift; + + ptep = find_kvm_secondary_pte_unlocked(kvm, gfn * PAGE_SIZE, &shift); + if (!ptep) + goto next; + + VM_WARN_ON_ONCE(!page_count(virt_to_page(ptep))); + + old = READ_ONCE(*ptep); + if (!pte_present(old) || !pte_young(old)) + goto next; + + new = pte_mkold(old); + + if (kvm_should_clear_young(range, gfn)) + pte_xchg(ptep, old, new); +next: + gfn += shift ? BIT(shift - PAGE_SHIFT) : 1; + } +unlock: + rcu_read_unlock(); + + return err; +} + /* Returns the number of PAGE_SIZE pages that are dirty */ static int kvm_radix_test_clear_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot, int pagenum) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 130bafdb1430..20a81ec9fde8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5262,6 +5262,8 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) spin_lock(&kvm->mmu_lock); kvm->arch.radix = 0; spin_unlock(&kvm->mmu_lock); + /* see the comments in kvm_test_clear_young_hv() */ + synchronize_rcu(); kvmppc_free_radix(kvm); lpcr = LPCR_VPM1; @@ -5286,6 +5288,8 @@ int kvmppc_switch_mmu_to_radix(struct kvm *kvm) if (err) return err; kvmppc_rmap_reset(kvm); + /* see the comments in kvm_test_clear_young_hv() */ + smp_wmb(); /* Mutual exclusion with kvm_unmap_gfn_range etc. */ spin_lock(&kvm->mmu_lock); kvm->arch.radix = 1; @@ -6185,6 +6189,7 @@ static struct kvmppc_ops kvm_ops_hv = { .unmap_gfn_range = kvm_unmap_gfn_range_hv, .age_gfn = kvm_age_gfn_hv, .test_age_gfn = kvm_test_age_gfn_hv, + .test_clear_young = kvm_test_clear_young_hv, .set_spte_gfn = kvm_set_spte_gfn_hv, .free_memslot = kvmppc_core_free_memslot_hv, .init_vm = kvmppc_core_init_vm_hv, From patchwork Fri May 26 23:44:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 06756C77B7C for ; Fri, 26 May 2023 23:46:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=YOqAxaY5mqhasmJpDGJ/P+0TdQbnPh8aQ9qWCb5W7sg=; b=FCwju+2MW45lxzd3AouDWun0JG NKBP9DyNzZE3PDEjL5lb9kTQNN2YOR9H+Afg0hgf/1rxHOxUrE722Ox/dYiBMoiAk4FsUJ7t9PM25 zVrnRej8J+wwSC7kq7bXNz6slvxu9rRqgd3A/oVZtdBkQRXFbS8P5UYN2+xtH07c6/AY1jrUK1UmT E39Yv9zh2B7kuLmo2+/ucT8sVOiPfKVpFNgDVG3rij4W/anNXl+j21T1gY66+ha96ErBU2LW2Pgad 8RKLlx/VoUMt8DaWBIj4evWaMRYfX7+K0u0zSHo9yn3ypcY6g04tJaT9d7Tw+Lon1qBfijH1d0sh3 /TKeofqg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h82-004L6Z-1O; Fri, 26 May 2023 23:45:50 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h79-004KUz-0c for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:57 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bacfa4eefcbso3201550276.1 for ; Fri, 26 May 2023 16:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144693; x=1687736693; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7JLLUnbPXAaoH9i/N14KC8FRF4yfK0mZvbZkcTBOq5Y=; b=Ne/YRQ/svpJFGXhXpgHmwxc0D9cLMxIoz8hghy2YEExILLlaUzGTTLTu8b32WLIGJt Pl2W5ENMW5Sa2Kwrgh0yVLYa4j/IXKl/wxvEFqusCYfl9i2LCglCwg4qFOwnu/SiZlAZ epK5zgrczZWcQF0/2oLgZ8laj6CsPXbNpnBrvrF3XhoGhjELY5RWTIQgLbNJgOlO3uxt 4d+XiEr2KggZNk0D2a/3M0F1IRHplA5m+1NXrAN0tP918Y2nokL0n9zRLbfOWAMyaJ2Q beEb9W4X5NKj/DpGHg7ZQVC9QnQZ7wDRATscsNx8htwusD9k+VmLxpoV1j1SSPGezxSx 58Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144693; x=1687736693; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7JLLUnbPXAaoH9i/N14KC8FRF4yfK0mZvbZkcTBOq5Y=; b=ArhW1bTWcS6a5TeXy0OYJiOxfYK79FFb1eHLzl2ckJDA7efJEd4Jv9ckPF5cvEfGvN WGm7d22lRy35sk5bgZr1pvlduVittPPPcXFV6dS4CExXhHySjY/+jSqlmcckapKnipvO skDuehV4hBWFqgKggxAQiGv7hkup2SFwxBsZElVAB8bTTlC1nU2o48DOMrQRsJyTgtsB LYu2OVqiQkqeicp2kk4C8EWzCc1xTKTxXEWJg/gcTg2im03fp7ksrZdt5vXg/Xoy3UqV orGkQORYy66eWFKGxdiVN6/26viBeXfmgSUb5hm4wWrjpjvfK+NtGgl8kjkUSnzMixi7 jlpA== X-Gm-Message-State: AC+VfDytfUhngwlyl7MxzPEppaXbMYlZ+a+I0rNs7jPC3s/1axxFETnJ UJAQUEfQfV9RsftWG4VwNF3DmmPi3A4= X-Google-Smtp-Source: ACHHUZ5CZlpEmT2VkUcPhwDn9jTP7EScM/+97DNWp/Z/n1uD6F2Bv8HNZfloYPpxuZT6MB0BvBWbDvtVv+8= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a5b:9c6:0:b0:ba8:381b:f764 with SMTP id y6-20020a5b09c6000000b00ba8381bf764mr354063ybq.3.1685144692904; Fri, 26 May 2023 16:44:52 -0700 (PDT) Date: Fri, 26 May 2023 17:44:33 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-9-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 08/10] kvm/x86: move tdp_mmu_enabled and shadow_accessed_mask From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164455_234049_F0BD609D X-CRM114-Status: GOOD ( 11.92 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org tdp_mmu_enabled and shadow_accessed_mask are needed to implement kvm_arch_has_test_clear_young(). Signed-off-by: Yu Zhao --- arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/mmu.h | 6 ------ arch/x86/kvm/mmu/spte.h | 1 - 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fb9d1f2d6136..753c67072c47 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1772,6 +1772,7 @@ struct kvm_arch_async_pf { extern u32 __read_mostly kvm_nr_uret_msrs; extern u64 __read_mostly host_efer; +extern u64 __read_mostly shadow_accessed_mask; extern bool __read_mostly allow_smaller_maxphyaddr; extern bool __read_mostly enable_apicv; extern struct kvm_x86_ops kvm_x86_ops; @@ -1855,6 +1856,11 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, bool mask); extern bool tdp_enabled; +#ifdef CONFIG_X86_64 +extern bool tdp_mmu_enabled; +#else +#define tdp_mmu_enabled false +#endif u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 92d5a1924fc1..84aedb2671ef 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -253,12 +253,6 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm) return smp_load_acquire(&kvm->arch.shadow_root_allocated); } -#ifdef CONFIG_X86_64 -extern bool tdp_mmu_enabled; -#else -#define tdp_mmu_enabled false -#endif - static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 1279db2eab44..a82c4fa1c47b 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -153,7 +153,6 @@ extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_user_mask; -extern u64 __read_mostly shadow_accessed_mask; extern u64 __read_mostly shadow_dirty_mask; extern u64 __read_mostly shadow_mmio_value; extern u64 __read_mostly shadow_mmio_mask; From patchwork Fri May 26 23:44:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B7D9FC77B7A for ; Fri, 26 May 2023 23:46:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=eObbOq/ncQgHEmyLKAHHkCSOblp4P2VCPvSbM2aCKB8=; b=qak+Y6JFFVO4WzkjBEzKW+F7A6 O0yCDX0pPOMQt6/N9+xkX9JKkwgr7G2acNvUWWJxwPfGVbGyZ8BowfoSBWkEky7Rr0UesN8ZE+1sk NgDhQUdj3tRdFpTOIusV2OqddfVRTAA+waprAHT3pjBeMmaSJwWDYft6wyZkNvSgXxku2yrIbYciw IPdQhyHoM910REsHTJiQKuJf+QiVjc92+p51y5eNFO+duy4It9DATzkyVQgtHqxJ6Jn74rAMeLP4s kgpU8qx+fBWm502RVMu+qvARrpIkcTFlPU2i0V4LtT0kRNidd/Lrt6Yk3FJNFPmvw7vzwNHvb3nYA WkKYF/1A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2h82-004L6z-3C; Fri, 26 May 2023 23:45:50 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h7A-004KXD-2L for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:44:58 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bad06cc7fb7so2973481276.3 for ; Fri, 26 May 2023 16:44:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144695; x=1687736695; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=M3AF7fBmc1xnJAMOiZdfCsfXZHiS6oPk41ReBM4zbdo=; b=yrD5cypjwj76oGySeMKoL+EObdL+VlHBR0ZE4dlUQxoYbTEWiiVpSKdOJmkcaaiF/Z 4zilNg4E7S1qiA0OC/fMHUKvaq4lLe3h3XrcfBRajVZEzuTWsMjYGvjMBJ/wRyLbmfd/ x0p59R3lRKLAmUF0+sTBQhD8rRciT1hrCi2rbco+ObNSFPCgz7YI6gzDi295X90OlgMM zrJNip6i/8aKO3FHb/HgogYFOfPOgx60+DOedoHxymQQDAPdmrus+K28IPt1M7E0g0cI asbBmGj30I6PgAgE5VquUWTvHUX2/5VYaFLlsw0xBC7qUsoM7lDeTwedn+JU6nSVVIPC w2tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144695; x=1687736695; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=M3AF7fBmc1xnJAMOiZdfCsfXZHiS6oPk41ReBM4zbdo=; b=QUPyhvOQVKgVI0WgSxytJ5CnBBf38hlZ3LDDqScS9iHlQtjyQcBRHEdn9twu5VgDf8 vKC+Mh9B0/DwtX8qKFFiHPSmnbbtsnZafacYo9ndm/zckmZf78s2kzWnSUHYUvabkz4q newXTyNeQngSco4xMtWlyJI8jLZqoiwPTo9fiBRXETCpepYkvgclQwymsGdNliCE/zqg 5DgTjPCUb0lrC8Qezee/dhl140WDGpYVQT1AgKFh7GuIxXXhClOvYOfaJIyvKlV0Xl40 +Cnf4qQdH6lDcAwBtKUcRxgQVGZSrKV6kPmRjJW8PaySu6D711iQLPXiei8BsoZS4XEu ZsEQ== X-Gm-Message-State: AC+VfDyXfk+NpBS4DcxesjmfM1/SFMJwbdWzh+piNsKrMJFibMbid1zb 9rIpyFnO2UbeYMhf+KC9kwGvPEFKsqc= X-Google-Smtp-Source: ACHHUZ4QgTvrucO0Q6McE6sU/ZYyF1Ujw9/eAceyjgq3u2aouYR16g/56Sc2t/zSHX7sxuPwIu+K0d0WjYg= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a05:6902:1341:b0:bac:6bb:2549 with SMTP id g1-20020a056902134100b00bac06bb2549mr1840571ybu.7.1685144695170; Fri, 26 May 2023 16:44:55 -0700 (PDT) Date: Fri, 26 May 2023 17:44:34 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-10-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 09/10] kvm/x86: add kvm_arch_test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230526_164456_801946_815E572C X-CRM114-Status: GOOD ( 13.59 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Implement kvm_arch_test_clear_young() to support the fast path in mmu_notifier_ops->test_clear_young(). It focuses on a simple case, i.e., TDP MMU sets the accessed bit in KVM PTEs and VMs are not nested, where it can rely on RCU and clear_bit() to safely clear the accessed bit without taking kvm->mmu_lock. Complex cases fall back to the existing slow path where kvm->mmu_lock is then taken. Signed-off-by: Yu Zhao --- arch/x86/include/asm/kvm_host.h | 7 +++++++ arch/x86/kvm/mmu/tdp_mmu.c | 34 +++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 753c67072c47..d6dfdebe3d94 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2223,4 +2223,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); */ #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return IS_ENABLED(CONFIG_X86_64) && + (!IS_REACHABLE(CONFIG_KVM) || (tdp_mmu_enabled && shadow_accessed_mask)); +} + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 08340219c35a..6875a819e007 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1232,6 +1232,40 @@ bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn); } +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range) +{ + struct kvm_mmu_page *root; + int offset = ffs(shadow_accessed_mask) - 1; + + if (kvm_shadow_root_allocated(kvm)) + return true; + + rcu_read_lock(); + + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { + struct tdp_iter iter; + + if (kvm_mmu_page_as_id(root) != range->slot->as_id) + continue; + + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) { + u64 *sptep = rcu_dereference(iter.sptep); + + VM_WARN_ON_ONCE(!page_count(virt_to_page(sptep))); + + if (!(iter.old_spte & shadow_accessed_mask)) + continue; + + if (kvm_should_clear_young(range, iter.gfn)) + clear_bit(offset, (unsigned long *)sptep); + } + } + + rcu_read_unlock(); + + return false; +} + static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter, struct kvm_gfn_range *range) { From patchwork Fri May 26 23:44:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257502 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E7567C77B7A for ; Sat, 27 May 2023 00:40:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=Zl0V1jNHTsAJlQ4JcrZl8FU8ry+XF6317+X6/2fo3+4=; b=Dc3xcbiwerRC3/YAIJ3hr1zhCW Ad4QgC6dhT7BhGmPxm9hp11jWCtuQIsP8HKDxwemz8ODLL0FETtUyEQ3PFPTyIC4wgnv/NKNFQdFt CvrrIUbvlLIOjba9mSJf0icBA7vn/lmgmFHs7x4N1F+5xJql2Xh3I4UJneft50Extg4OdJMCYPUif 0bciIcjdIj8yy+WNoQSj3Mu4ZJ7h9gydUyMMXl7T+Xjcww+6vG55Hg1vwX5tPg0BqxW0dWwbhpE+s OH8L4qzx8FvXwqwoN8GfClzYki0rfz0moORh0JA+Y96itW8G5vPKuy3YSKN3Twm3IT33rRpsGFmIv 6rp32gJg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q2hyn-004SDV-1x; Sat, 27 May 2023 00:40:21 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2hyj-004SD5-0t for linux-arm-kernel@bombadil.infradead.org; Sat, 27 May 2023 00:40:17 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:Cc:To:From:Subject: References:Mime-Version:Message-Id:In-Reply-To:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=/RAbaZ7nXlJx1O5X4p6tDmC38Mqs0IR+WFkcAEID/ew=; b=RaX3Wo9Go6RNawQsIuvWy0wn4p eCJNnRGA95XsTSFcjXdSMwBwxA2Qvj5TuL3q3pEyS6qpY8EVHZFa1253ToNaX/DkjHP5Bpue+jy2q E3KahJTQaSI6dKWhrVMyfKUsoapz3KcOdwz0vCnP6aevtwbZ4tHO8uMwZqVmjYMdiqdmFogRLgTGK i0dGuo1gNNrHhwYnQNvhSh8mA8RJ4/xzZuPbwqt1e6jkWGlvDg94CQCQ3nNbxnLCAPt1K4nZ7OWUB CNId5UqocYPKyMkfpkjwTejWfloYtlQZDbg0SUj4urSfnD04zu+IB3s06w95A4P+b5xL46YFA6Lq9 JERpIizg==; Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q2h7F-007kNH-1M for linux-arm-kernel@lists.infradead.org; Fri, 26 May 2023 23:45:04 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-565a1788f3fso30547947b3.0 for ; Fri, 26 May 2023 16:44:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144697; x=1687736697; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/RAbaZ7nXlJx1O5X4p6tDmC38Mqs0IR+WFkcAEID/ew=; b=Mb/TwMrvjKVwx825EpuGBZelUYT9y1q5MsHWLZEtcnYl8b+S/FmG3PWbpa3sjl3UbU EMKVg4Z0PKkBIeb7T3roSS0cZqaFmagaFcdMijquvBPxR+8cgeB8lhD+sxDLIAW92iZI 96AhG517DtCX6M6wF+UEXQXmnvh+bt6+TMXBvO0EA6fZL0y2SwCLgjF2MAZnONkhWo1g K/7v7mGu5qj4MRpb1kQzn2rxfgDVCTzMnRuNoX97x4mDqSBErwlSjFlQrZ/jaGBbYaIg V4TtJk4J+bx9E13IjCR46uiaNAT2c4P7QhpLAYBCKMArI7L7xnoEXLbituEB7FbRo2Qm 2ytw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144697; x=1687736697; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/RAbaZ7nXlJx1O5X4p6tDmC38Mqs0IR+WFkcAEID/ew=; b=fQkYH+CkfnhQSvEUxobLLj0V0sA0tGn7HANAa8ih0fMsOgFWHwI9BLjGmJ9SMJMOvQ QrV6HFQQu33X0Eq/ydVEM51FJvlbUYL0RmOkAauxQYgZfOHU28HO66XH1iM4yagtgZG8 OWC4siTobVh2HLdDaJavOW6iJwZromYvLqCmIKQbGQF0BBY+qBXr82suBw70zLaHf8wM 3wuMZsRfgfr5vjwK0dA02BaVxu+/DgK/8wvklBIyNTkEA1gaupsJo+ifom9HzUEdh/9J txnVJahvA21BrQleSUvlOCcqdUl4aqrFKXBTDLOL7o9epDGiHoWnc40Ns8jqRUI+f7ju 5GCQ== X-Gm-Message-State: AC+VfDwJ8MH5hL7NDTwFKLkRcetvxa3G9NGk8lEhSTYoPpPRLuZ7kOPN OXHGh7k/LLHyLxgNcgf3k2eEOql3pWU= X-Google-Smtp-Source: ACHHUZ5acKzCjiqXNcqUiyX95r/zGZ1H0IqWVk5xr51BxFwYKtaOrUj82KqB/feXRTG2Ot17Gvh9/gGxmKY= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a81:c542:0:b0:561:1d3b:af3f with SMTP id o2-20020a81c542000000b005611d3baf3fmr2060709ywj.8.1685144697389; Fri, 26 May 2023 16:44:57 -0700 (PDT) Date: Fri, 26 May 2023 17:44:35 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-11-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 10/10] mm: multi-gen LRU: use mmu_notifier_test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230527_004501_651596_2A36FC2B X-CRM114-Status: GOOD ( 24.09 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Use mmu_notifier_test_clear_young() to handle KVM PTEs in batches when the fast path is supported. This reduces the contention on kvm->mmu_lock when the host is under heavy memory pressure. An existing selftest can quickly demonstrate the effectiveness of this patch. On a generic workstation equipped with 128 CPUs and 256GB DRAM: $ sudo max_guest_memory_test -c 64 -m 250 -s 250 MGLRU run2 ------------------ Before [1] ~64s After ~51s kswapd (MGLRU before) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.71% evict_folios 97.29% shrink_folio_list ==>> 13.05% folio_referenced 12.83% rmap_walk_file 12.31% folio_referenced_one 7.90% __mmu_notifier_clear_young 7.72% kvm_mmu_notifier_clear_young 7.34% _raw_write_lock kswapd (MGLRU after) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.59% evict_folios 80.37% shrink_folio_list ==>> 3.74% folio_referenced 3.59% rmap_walk_file 3.19% folio_referenced_one 2.53% lru_gen_look_around 1.06% __mmu_notifier_test_clear_young [1] "mm: rmap: Don't flush TLB after checking PTE young for page reference" was included so that the comparison is apples to apples. https://lore.kernel.org/r/20220706112041.3831-1-21cnbao@gmail.com/ Signed-off-by: Yu Zhao --- Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 8 +- mm/vmscan.c | 139 ++++++++++++++++-- 4 files changed, 138 insertions(+), 21 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst index 33e068830497..0ae2a6d4d94c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Clearing the accessed bit in KVM page table entries in large + batches, when KVM MMU sets it (e.g., on x86_64). This can + improve the performance of guests when the host is under memory + pressure. [yYnN] Apply to all the components above. ====== =============================================================== @@ -56,7 +60,7 @@ E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5a7ada0413da..1b148f39fabc 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -369,6 +369,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_KVM_MMU_WALK, NR_LRU_GEN_CAPS }; @@ -471,7 +472,7 @@ struct lru_gen_mm_walk { }; void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); #ifdef CONFIG_MEMCG @@ -559,8 +560,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } #ifdef CONFIG_MEMCG diff --git a/mm/rmap.c b/mm/rmap.c index ae127f60a4fb..3a2c4e6a0887 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -825,12 +825,10 @@ static bool folio_referenced_one(struct folio *folio, return false; /* To break the loop */ } - if (pvmw.pte) { - if (lru_gen_enabled() && pte_young(*pvmw.pte)) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index ef687f9be13c..3f734588b600 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -58,6 +58,7 @@ #include #include #include +#include #include #include @@ -3244,6 +3245,20 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } +#if IS_ENABLED(CONFIG_KVM) +#include + +static bool should_walk_kvm_mmu(void) +{ + return kvm_arch_has_test_clear_young() && get_cap(LRU_GEN_KVM_MMU_WALK); +} +#else +static bool should_walk_kvm_mmu(void) +{ + return false; +} +#endif + /****************************************************************************** * shorthand helpers ******************************************************************************/ @@ -3982,6 +3997,55 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, return folio; } +static bool test_spte_young(struct mm_struct *mm, unsigned long addr, unsigned long end, + unsigned long *bitmap, unsigned long *last) +{ + if (!should_walk_kvm_mmu()) + return false; + + if (*last > addr) + goto done; + + *last = end - addr > MIN_LRU_BATCH * PAGE_SIZE ? + addr + MIN_LRU_BATCH * PAGE_SIZE - 1 : end - 1; + bitmap_zero(bitmap, MIN_LRU_BATCH); + + mmu_notifier_test_clear_young(mm, addr, *last + 1, false, bitmap); +done: + return test_bit((*last - addr) / PAGE_SIZE, bitmap); +} + +static void clear_spte_young(struct mm_struct *mm, unsigned long addr, + unsigned long *bitmap, unsigned long *last) +{ + int i; + unsigned long start, end = *last + 1; + + if (addr + PAGE_SIZE != end) + return; + + i = find_last_bit(bitmap, MIN_LRU_BATCH); + if (i == MIN_LRU_BATCH) + return; + + start = end - (i + 1) * PAGE_SIZE; + + i = find_first_bit(bitmap, MIN_LRU_BATCH); + + end -= i * PAGE_SIZE; + + mmu_notifier_test_clear_young(mm, start, end, true, bitmap); +} + +static void skip_spte_young(struct mm_struct *mm, unsigned long addr, + unsigned long *bitmap, unsigned long *last) +{ + if (*last > addr) + __clear_bit((*last - addr) / PAGE_SIZE, bitmap); + + clear_spte_young(mm, addr, bitmap, last); +} + static bool suitable_to_scan(int total, int young) { int n = clamp_t(int, cache_line_size() / sizeof(pte_t), 2, 8); @@ -3997,6 +4061,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, pte_t *pte; spinlock_t *ptl; unsigned long addr; + DECLARE_BITMAP(bitmap, MIN_LRU_BATCH); + unsigned long last = 0; int total = 0; int young = 0; struct lru_gen_mm_walk *walk = args->private; @@ -4015,6 +4081,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, pte = pte_offset_map(pmd, start & PMD_MASK); restart: for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) { + bool ret; unsigned long pfn; struct folio *folio; @@ -4022,20 +4089,27 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; pfn = get_pte_pfn(pte[i], args->vma, addr); - if (pfn == -1) + if (pfn == -1) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!pte_young(pte[i])) { + ret = test_spte_young(args->vma->vm_mm, addr, end, bitmap, &last); + if (!ret && !pte_young(pte[i])) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); walk->mm_stats[MM_LEAF_OLD]++; continue; } folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); - if (!folio) + if (!folio) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + clear_spte_young(args->vma->vm_mm, addr, bitmap, &last); + if (pte_young(pte[i])) + ptep_test_and_clear_young(args->vma, addr, pte + i); young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -4629,6 +4703,23 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * rmap/PT walk feedback ******************************************************************************/ +static bool should_look_around(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int *young) +{ + int ret = mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + + if (pte_young(*pte)) { + ptep_test_and_clear_young(vma, addr, pte); + *young = true; + return true; + } + + if (ret) + *young = true; + + return ret & MMU_NOTIFIER_RANGE_LOCKLESS; +} + /* * This function exploits spatial locality when shrink_folio_list() walks the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If @@ -4636,12 +4727,14 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; + DECLARE_BITMAP(bitmap, MIN_LRU_BATCH); + unsigned long last = 0; int young = 0; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; @@ -4655,8 +4748,11 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!should_look_around(pvmw->vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4664,6 +4760,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, pvmw->vma->vm_start); end = min(addr | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4677,28 +4776,37 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; arch_enter_lazy_mmu_mode(); pte -= (addr - start) / PAGE_SIZE; for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) { + bool ret; unsigned long pfn; pfn = get_pte_pfn(pte[i], pvmw->vma, addr); - if (pfn == -1) + if (pfn == -1) { + skip_spte_young(pvmw->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!pte_young(pte[i])) + ret = test_spte_young(pvmw->vma->vm_mm, addr, end, bitmap, &last); + if (!ret && !pte_young(pte[i])) { + skip_spte_young(pvmw->vma->vm_mm, addr, bitmap, &last); continue; + } folio = get_pfn_folio(pfn, memcg, pgdat, !walk || walk->can_swap); - if (!folio) + if (!folio) { + skip_spte_young(pvmw->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + clear_spte_young(pvmw->vma->vm_mm, addr, bitmap, &last); + if (pte_young(pte[i])) + ptep_test_and_clear_young(pvmw->vma, addr, pte + i); young++; @@ -4728,6 +4836,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (suitable_to_scan(i, young)) update_bloom_filter(lruvec, max_seq, pvmw->pmd); + + return young; } /****************************************************************************** @@ -5745,6 +5855,9 @@ static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |= BIT(LRU_GEN_NONLEAF_YOUNG); + if (should_walk_kvm_mmu()) + caps |= BIT(LRU_GEN_KVM_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); }