From patchwork Mon Apr 1 23:29:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 86607CD1292 for ; Mon, 1 Apr 2024 23:30:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=VTewm4g96W+vT12F1bIIjIjXcsp8GzGD3azEx3CBXmE=; b=oAtihArofEfnMpE5RDuoHWIpro xQITVQ/eA/wU0lx6lEPxW8MVmAqomIEjeKZvv1IoLuYlWukjM1KE2/fCS5/UGDoN1RVe6GiZ6jHWV XygOjKyXmYdd474oJGHg1boF4ohtosCAFGczWoUxKmGEezlzDyaCXPwvtlg0XRBeN+NtPFPU3pb2G QO4UeDP5AcMNWQApzlZFjgwNxhq2GZGBDPNzMxbJNEH6fmYpTkn6k9IdqtOCHKuSq/PUuoPZc5OAq xe0QvOP4Wb4KiA/A8catTvssltLoHiMBDZp+SmwUe5ROGLliKvQklY5ynyttnPDBTkoksD9k3CQ9E JJT1k9+A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6I-000000097I3-3xt1; Mon, 01 Apr 2024 23:30:03 +0000 Received: from mail-vk1-xa4a.google.com ([2607:f8b0:4864:20::a4a]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6F-000000097FL-1a47 for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:00 +0000 Received: by mail-vk1-xa4a.google.com with SMTP id 71dfb90a1353d-4d456eb4b75so1814081e0c.0 for ; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014197; x=1712618997; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XjV19huKRvwcv3bjf5GpogmLoEGdyfdviROgtmpM2BI=; b=fZ6/SB7Mpp0MN9O4KDWVLCIParhpKGPsS0b+2MepDJG1huDnSO4ktU/cf54LQhUcAl ZqppHpuq/vjQ+ULgDmJB5YAWrCPGrvPJzQnaCw0umTgLJvOxj9W0zXVi6Ogoa/10eD8e EVFlkDzgF6pzp1pwRth6r4NVnoS9v2WfCvtI2C9tuRexWI7ZKfBrDatHPCrPfnAqNZmk 1yfcRLEZsyetAqM/Df5D1bP5q4uPqRBtqeYE0ljINURkG4/5q655dwPQGeQs9MWTVjqU ivSBgJJzbAfx0Nd6waxV42REZyiGEPo5LqfHK4ln9sP3VECYHMosZoMssKspvPzuoCtV M15A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014197; x=1712618997; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XjV19huKRvwcv3bjf5GpogmLoEGdyfdviROgtmpM2BI=; b=B+2OWn91pMAvaG/Bn8bYCMjQ2rchxuoxuIPZkp5rMa27Bhrz2gUbS6aBW9YiOgDSmh QrUb4POuGkE6sJF+TrGynAlUYCgXtl5lnpfXIktxjiaa6nu4LSRfT3Ou4aDxvsar/WM3 ASrD4bEG1Vi8SzzuUrFhiyYr4OMhG6Y1R0CaiSKXnLnAoL5bP4vgfEry5eEoZq586KZK fOFv6y4wLmcC0EhUJ55RUdr78w5rM+XOiH6qTmszTXHHjmDCHsNE5KFBMsF/6Axt15G9 qE5hcr4/fAsZuEy83uiNSVKeTOzKBXV8wiFhiQfBnjqaCwFbXA9gOUogzn9yy9JN8PLr Q/Ug== X-Forwarded-Encrypted: i=1; AJvYcCU2joBGlfoEQS5J2a+oHQU+uZWFbKEIPJhe2kOKAElGGf7BJ20JDWEPlr1COiY5xsX3llN3T9i3xigSf4KGU8hQ7hnNNzbcjDAllpsQk2S3v6E/Ozk= X-Gm-Message-State: AOJu0YzIttLjvfmseiYSUifyP4xwU0wEhNdJweiHxXJXxZM4lwovxUbP 2ueolxN1b/J68GloGTZGBnHjFJ7JAGvQAlB+eYq6w9gVbOIW8IFMSTJxOtH8mRLI5cIMTY0d+xE 32pMAQ0GrhVBZxnVWPw== X-Google-Smtp-Source: AGHT+IFUopEMKakeSmmDQQkpzKxlNJ65mvpQqylZZQK+yo+BzjzMVCQHtrhU8U2Q57twwQQQfTybPo3O3BawFKgE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ac5:c281:0:b0:4d8:73c1:9ec7 with SMTP id h1-20020ac5c281000000b004d873c19ec7mr84232vkk.0.1712014197363; Mon, 01 Apr 2024 16:29:57 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:40 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-2-jthoughton@google.com> Subject: [PATCH v3 1/7] mm: Add a bitmap into mmu_notifier_{clear,test}_young From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_162959_448639_DB359F7F X-CRM114-Status: GOOD ( 28.85 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The bitmap is provided for secondary MMUs to use if they support it. For test_young(), after it returns, the bitmap represents the pages that were young in the interval [start, end). For clear_young, it represents the pages that we wish the secondary MMU to clear the accessed/young bit for. If a bitmap is not provided, the mmu_notifier_{test,clear}_young() API should be unchanged except that if young PTEs are found and the architecture supports passing in a bitmap, instead of returning 1, MMU_NOTIFIER_YOUNG_FAST is returned. This allows MGLRU's look-around logic to work faster, resulting in a 4% improvement in real workloads[1]. Also introduce MMU_NOTIFIER_YOUNG_FAST to indicate to main mm that doing look-around is likely to be beneficial. If the secondary MMU doesn't support the bitmap, it must return an int that contains MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. [1]: https://lore.kernel.org/all/20230609005935.42390-1-yuzhao@google.com/ Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/mmu_notifier.h | 93 +++++++++++++++++++++++++++++++++--- include/trace/events/kvm.h | 13 +++-- mm/mmu_notifier.c | 20 +++++--- virt/kvm/kvm_main.c | 19 ++++++-- 4 files changed, 123 insertions(+), 22 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index f349e08a9dfe..daaa9db625d3 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -61,6 +61,10 @@ enum mmu_notifier_event { #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0) +#define MMU_NOTIFIER_YOUNG (1 << 0) +#define MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE (1 << 1) +#define MMU_NOTIFIER_YOUNG_FAST (1 << 2) + struct mmu_notifier_ops { /* * Called either by mmu_notifier_unregister or when the mm is @@ -106,21 +110,36 @@ struct mmu_notifier_ops { * clear_young is a lightweight version of clear_flush_young. Like the * latter, it is supposed to test-and-clear the young/accessed bitflag * in the secondary pte, but it may omit flushing the secondary tlb. + * + * If @bitmap is given but is not supported, return + * MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + * + * If the walk is done "quickly" and there were young PTEs, + * MMU_NOTIFIER_YOUNG_FAST is returned. */ int (*clear_young)(struct mmu_notifier *subscription, struct mm_struct *mm, unsigned long start, - unsigned long end); + unsigned long end, + unsigned long *bitmap); /* * test_young is called to check the young/accessed bitflag in * the secondary pte. This is used to know if the page is * frequently used without actually clearing the flag or tearing * down the secondary mapping on the page. + * + * If @bitmap is given but is not supported, return + * MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + * + * If the walk is done "quickly" and there were young PTEs, + * MMU_NOTIFIER_YOUNG_FAST is returned. */ int (*test_young)(struct mmu_notifier *subscription, struct mm_struct *mm, - unsigned long address); + unsigned long start, + unsigned long end, + unsigned long *bitmap); /* * change_pte is called in cases that pte mapping to page is changed: @@ -388,10 +407,11 @@ extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, unsigned long start, unsigned long end); extern int __mmu_notifier_clear_young(struct mm_struct *mm, - unsigned long start, - unsigned long end); + unsigned long start, unsigned long end, + unsigned long *bitmap); extern int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address); + unsigned long start, unsigned long end, + unsigned long *bitmap); extern void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); @@ -427,7 +447,25 @@ static inline int mmu_notifier_clear_young(struct mm_struct *mm, unsigned long end) { if (mm_has_notifiers(mm)) - return __mmu_notifier_clear_young(mm, start, end); + return __mmu_notifier_clear_young(mm, start, end, NULL); + return 0; +} + +/* + * When @bitmap is not provided, clear the young bits in the secondary + * MMUs for all of the pages in the interval [start, end). + * + * If any subscribed secondary MMU does not support @bitmap, this function + * will return an integer containing MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + * Some work may have been done in the secondary MMU. + */ +static inline int mmu_notifier_clear_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_young(mm, start, end, bitmap); return 0; } @@ -435,7 +473,25 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { if (mm_has_notifiers(mm)) - return __mmu_notifier_test_young(mm, address); + return __mmu_notifier_test_young(mm, address, address + 1, + NULL); + return 0; +} + +/* + * When @bitmap is not provided, test the young bits in the secondary + * MMUs for all of the pages in the interval [start, end). + * + * If any subscribed secondary MMU does not support @bitmap, this function + * will return an integer containing MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + */ +static inline int mmu_notifier_test_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_test_young(mm, start, end, bitmap); return 0; } @@ -644,12 +700,35 @@ static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, return 0; } +static inline int mmu_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return 0; +} + +static inline int mmu_notifier_clear_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + return 0; +} + static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { return 0; } +static inline int mmu_notifier_test_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + return 0; +} + static inline void mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte) { diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 011fba6b5552..e4ace8cfdbba 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -490,18 +490,21 @@ TRACE_EVENT(kvm_age_hva, ); TRACE_EVENT(kvm_test_age_hva, - TP_PROTO(unsigned long hva), - TP_ARGS(hva), + TP_PROTO(unsigned long start, unsigned long end), + TP_ARGS(start, end), TP_STRUCT__entry( - __field( unsigned long, hva ) + __field( unsigned long, start ) + __field( unsigned long, end ) ), TP_fast_assign( - __entry->hva = hva; + __entry->start = start; + __entry->end = end; ), - TP_printk("mmu notifier test age hva: %#016lx", __entry->hva) + TP_printk("mmu notifier test age hva: %#016lx -- %#016lx", + __entry->start, __entry->end) ); #endif /* _TRACE_KVM_MAIN_H */ diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index ec3b068cbbe6..e70c6222944c 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -384,7 +384,8 @@ int __mmu_notifier_clear_flush_young(struct mm_struct *mm, int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + unsigned long *bitmap) { struct mmu_notifier *subscription; int young = 0, id; @@ -395,7 +396,8 @@ int __mmu_notifier_clear_young(struct mm_struct *mm, srcu_read_lock_held(&srcu)) { if (subscription->ops->clear_young) young |= subscription->ops->clear_young(subscription, - mm, start, end); + mm, start, end, + bitmap); } srcu_read_unlock(&srcu, id); @@ -403,7 +405,8 @@ int __mmu_notifier_clear_young(struct mm_struct *mm, } int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address) + unsigned long start, unsigned long end, + unsigned long *bitmap) { struct mmu_notifier *subscription; int young = 0, id; @@ -413,9 +416,14 @@ int __mmu_notifier_test_young(struct mm_struct *mm, &mm->notifier_subscriptions->list, hlist, srcu_read_lock_held(&srcu)) { if (subscription->ops->test_young) { - young = subscription->ops->test_young(subscription, mm, - address); - if (young) + young |= subscription->ops->test_young(subscription, mm, + start, end, + bitmap); + if (young && !bitmap) + /* + * We're not using a bitmap, so there is no + * need to check any more secondary MMUs. + */ break; } } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fb49c2a60200..ca4b1ef9dfc2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -917,10 +917,15 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + unsigned long *bitmap) { trace_kvm_age_hva(start, end); + /* We don't support bitmaps. Don't test or clear anything. */ + if (bitmap) + return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; + /* * Even though we do not flush TLB, this will still adversely * affect performance on pre-Haswell Intel EPT, where there is @@ -939,11 +944,17 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, struct mm_struct *mm, - unsigned long address) + unsigned long start, + unsigned long end, + unsigned long *bitmap) { - trace_kvm_test_age_hva(address); + trace_kvm_test_age_hva(start, end); + + /* We don't support bitmaps. Don't test or clear anything. */ + if (bitmap) + return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; - return kvm_handle_hva_range_no_flush(mn, address, address + 1, + return kvm_handle_hva_range_no_flush(mn, start, end, kvm_test_age_gfn); } From patchwork Mon Apr 1 23:29:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 78039CD1292 for ; Mon, 1 Apr 2024 23:30:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=t0e0RDsh8oUXl5muGhoLtc9Jm5CqcweuAALZpevJWsc=; b=LfjKVYjydptZupCqgmVVqVwEDk 4Cx6LEbZwAWTTUItEySJuqKccRHNVobKbiM6/CMuG6six+vR4ATj1fCGT0cUUvNkGSGyxnYBYm1dK kztTd5bpQyZr9u/zHo8UyClHZSnv+ixwq8iKFLWop5ToAD2gIutUjfhncweqyK96ftB1kMVzsMQob 1c8rqsoINfzuwcH0GJOioKBICdPs2VBqBTdzBRccxmYLiUFKz8SDeYoDz+oz3eCE2IgXHJ/2RJrYF 0Q0OqwBJIayr1zYUxtBuG0k/s3tHA/oYBDmF6avU2c+mnwRFp7ttxtQ6UtPCL/ExW9LcwlHHZfGlY +0MOleEA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6L-000000097KI-2BEB; Mon, 01 Apr 2024 23:30:05 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6G-000000097FM-0BEj for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:01 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-dbf618042daso6782116276.0 for ; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014198; x=1712618998; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lJfaFUFI3gJojofhvI2fMvWTfkDCiiHnhDdEzunFXno=; b=cWLvN6b6z44xo4odF2SK8TYtfyzKXFTqO20AYzLPnqZLKYnwyC0RYrCs2IPWUm/Y+R QeFzI98zl49CcUvZBzNMwfH0loyDFI5EWTCzU80LNnBqpndjnTetixrCud1d43/WSXvf lxzMOh1OOkrNLz1d95eqVPUtMVbSFMItjOQZRB+UrdF28e+XBdszxLgvmJRKdLCWeGp+ Wgbaed9WENd8hByS4hGoL/5HBCCHG3gQi8cN7Wq3S5pKipKR1puuug3Nl8xD0sBsUVvv 1d9ctkVu83zhHdTBoqA22nHXuGzJGFt7PwuNfnFoLJhLWyHx7Im9ELwEsBc4GwE7RE4F VGyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014198; x=1712618998; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lJfaFUFI3gJojofhvI2fMvWTfkDCiiHnhDdEzunFXno=; b=ia8UyEDn4K2Y+wj8dTkxK1SRcqEBGuI8cYOLpwQIgg6hOQNbzyki+XebyP9oLuuYrm EFyd/FIdAqSssim0fu9xZZmg0jp3ufUEaCU0eRp2fl0jDnqssIUUEoox/XkPNtM5xf2G hMsbao1X6E+erJK4jnvwPKGaCXQ5vqh9e/2xqQlQHdoLCUS5iBr+ti4ztRhqwyZgHLj6 HHGTfjUYGpc6NPdSv0ojZgf/CEeE3iCjCL0zEB9i9IFv9/JRT+aSzA95NGWvJX7dO6EA orVzch8YC5oUpVrxQKjhX50H9eJwQ6q9gWogSoCaAQWnovzXNWFWsRDeqlUAw0lTe56R NwOA== X-Forwarded-Encrypted: i=1; AJvYcCWe/DPyUbhWX3jedR4XdB/CdkIhgOH6U0QF90Z3iz4cTmJxsV7H9tfkrOtt6Pw7HWrOcszW2bRXpmHOQsFXidwRmcjVM1tKBLb2GbZyspiURf6GTy8= X-Gm-Message-State: AOJu0YxhCSgZsmaxDSPMGnizjJymQqNiovE53tw6EFBCOmDAxQWEXTLr xBl42WBpwyZGwfjb9iWlW6h5oRJB9tXqJPSKhL1gemDIleMllHtYIzNNMHFLiCUDUZsuTz16I7j W2aQAxGOja0qksAwulg== X-Google-Smtp-Source: AGHT+IEzS+9pOebrRYAPHZcYp9+yLQ7vZUDEjcM1NuDRlvVAs7z0PGB+6rnD36HZsSFqff67vdpG7JIIt1VZPePZ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1006:b0:dcc:94b7:a7a3 with SMTP id w6-20020a056902100600b00dcc94b7a7a3mr767796ybt.12.1712014198123; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:41 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-3-jthoughton@google.com> Subject: [PATCH v3 2/7] KVM: Move MMU notifier function declarations From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_163000_116788_DBFD70F3 X-CRM114-Status: GOOD ( 13.69 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To allow new MMU-notifier-related functions to use gfn_to_hva_memslot(), move some declarations around. Also move mmu_notifier_to_kvm() for wider use later. Signed-off-by: James Houghton --- include/linux/kvm_host.h | 41 +++++++++++++++++++++------------------- virt/kvm/kvm_main.c | 5 ----- 2 files changed, 22 insertions(+), 24 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 48f31dcd318a..1800d03a06a9 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -257,25 +257,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER -union kvm_mmu_notifier_arg { - pte_t pte; - unsigned long attributes; -}; - -struct kvm_gfn_range { - struct kvm_memory_slot *slot; - gfn_t start; - gfn_t end; - union kvm_mmu_notifier_arg arg; - bool may_block; -}; -bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); -bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -#endif - enum { OUTSIDE_GUEST_MODE, IN_GUEST_MODE, @@ -2012,6 +1993,11 @@ extern const struct kvm_stats_header kvm_vcpu_stats_header; extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER +static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) +{ + return container_of(mn, struct kvm, mmu_notifier); +} + static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) { if (unlikely(kvm->mmu_invalidate_in_progress)) @@ -2089,6 +2075,23 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm, return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq; } + +union kvm_mmu_notifier_arg { + pte_t pte; + unsigned long attributes; +}; + +struct kvm_gfn_range { + struct kvm_memory_slot *slot; + gfn_t start; + gfn_t end; + union kvm_mmu_notifier_arg arg; + bool may_block; +}; +bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); #endif #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ca4b1ef9dfc2..d0545d88c802 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -534,11 +534,6 @@ void kvm_destroy_vcpus(struct kvm *kvm) EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER -static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) -{ - return container_of(mn, struct kvm, mmu_notifier); -} - typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); typedef void (*on_lock_fn_t)(struct kvm *kvm); From patchwork Mon Apr 1 23:29:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9DBB6CD1288 for ; Mon, 1 Apr 2024 23:30:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=IZgPZYPBdk/WA3b4KYulUuyqln9aXQgD3lkIHizX7Dg=; b=XdbL7xpJOo//YQz5pLff18NnAy xFvoxAz2APalrJpqwm3ik28sXkq8TsCC7DloCIb0n/K08GG3rb+CdSl8XQJq3d33AHQhCcEVHMWiS e1YxUbDRUnSbbqJWwHDrkS2igKd63WqoYvTKrpAvD7vET+TMmoFDcHGZYgA+t/1CksoRN2LZB9BJW 2yN+MO7E4oqwpensGN3pV0ViUXhzNRFQEtosRG6hNE8/ynVGnwh8ZKIkii1KGwhxdmz65vtVZqt8Q 1IfvF1WGelZUoWkxJUvlLk4Qs33euKLONBhGQ2IbaWcpF211polwVQ0aSFLjR1OWq3ysDtrWgA+2F 6v4hOLtA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6O-000000097NE-3RNx; Mon, 01 Apr 2024 23:30:08 +0000 Received: from mail-ua1-x94a.google.com ([2607:f8b0:4864:20::94a]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6G-000000097Fc-1Ro6 for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:02 +0000 Received: by mail-ua1-x94a.google.com with SMTP id a1e0cc1a2514c-7e331817bb4so1808140241.0 for ; Mon, 01 Apr 2024 16:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014199; x=1712618999; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PTb42yX5D5jCiyjo92iMcOe2nJRX5BH9KgDDQYg5TSM=; b=OQ4CDXTFjnXlcLy4juSLZGjN48RzbhgfgDs7AmtQRJPckA8TCZ+fuTv1z0/+l9eKDM KnTYnCM5+GT9O8F0NUxuav7seAsHmV/vETByKoJChr/i+0VR4pslAVe37T+WfeYEEkhB bNXygMWbDqncs7zF3u/fymrwEObEWnDRrIgYaPHSg6fCc6eCKil1azPYNtCRMp1JxyXa 9/xi8mang65+yw92MURM7l+UEttmH7s6XZ0jBqLUOWYw78cTJDmA0mchuXMKuFMnlesR sS0yjJrOczXW+D5iYVpgbMf8QIIl2lIXTnoqxgR03+NhbjkCD5THapyM7L018MVsY3nL 93kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014199; x=1712618999; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PTb42yX5D5jCiyjo92iMcOe2nJRX5BH9KgDDQYg5TSM=; b=wNDEv+U3zYiXUK6xeXyg+OZ7THhhy1MZnpxnNsbc2jU/nk9Qcc72JpB/pT36hjiDOX ZCJ8qWjXde3DsKxkqDVaISmicydvdYEbDGVXcwQ0OC1LPvejPcdKKLSf5NtsxRXMAUOl pmWpsnx+w7IYLCPfScbKdnTQPBVs99Amr2hAco34dTrvUutAFaOtTVeW8jpYd5+j0EBD 2mOTKLvVmVk80f1u4D0EPEu47SSY2ZwPMCBiFfDAKwLCnYjE5jtOa615FfGh1mht7vLb rY8pr+RuBoJ0oRxfrlTfy1VyZYlTrElFrBWT3/P1d8/0N+JoqeRuiwis5VemRze4+1Ag v2rA== X-Forwarded-Encrypted: i=1; AJvYcCU6zqvqjVvKFI1dmZ1xKO2oX1n+/ThlJReBV9R0u89U7mghB4cAiaL/DS+B6K9eCflNPOIFoDf0qwIo2bGnzjlDwNa0LWRzHhxb1XiBYfYNld26H2Y= X-Gm-Message-State: AOJu0Yzdp+lI986HdaJcJNbkA9PXaiRFbHZPXRAX+Vh8Va7Hf2wgDKvb POKzWwnE15fQkG5xy/IOALzFGPFDzFUKFkR/4btVhtZjtCK3I22wd17KvDYWx0gXpvpmoRCFCb1 KwOWDeKzJwfLuU24YMw== X-Google-Smtp-Source: AGHT+IGmJlmVbaC/lfrLJ5ZQ93VOyi/dOsu4Z25niMdu8RAczUBaed8myhkzYvDdfTTf9baVVIW2FHfRFR9HE1rY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:2d0d:b0:470:5ca5:3e95 with SMTP id ih13-20020a0561022d0d00b004705ca53e95mr1009542vsb.2.1712014198891; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:42 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-4-jthoughton@google.com> Subject: [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_163000_490701_C41FEC43 X-CRM114-Status: GOOD ( 25.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add kvm_arch_prepare_bitmap_age() for architectures to indiciate that they support bitmap-based aging in kvm_mmu_notifier_test_clear_young() and that they do not need KVM to grab the MMU lock for writing. This function allows architectures to do other locking or other preparatory work that it needs. If an architecture does not implement kvm_arch_prepare_bitmap_age() or is unable to do bitmap-based aging at runtime (and marks the bitmap as unreliable): 1. If a bitmap was provided, we inform the caller that the bitmap is unreliable (MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE). 2. If a bitmap was not provided, fall back to the old logic. Also add logic for architectures to easily use the provided bitmap if they are able. The expectation is that the architecture's implementation of kvm_gfn_test_age() will use kvm_gfn_record_young(), and kvm_gfn_age() will use kvm_gfn_should_age(). Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/kvm_host.h | 60 ++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++++++++----------- 2 files changed, 127 insertions(+), 25 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1800d03a06a9..5862fd7b5f9b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1992,6 +1992,26 @@ extern const struct _kvm_stats_desc kvm_vm_stats_desc[]; extern const struct kvm_stats_header kvm_vcpu_stats_header; extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; +/* + * Architectures that support using bitmaps for kvm_age_gfn() and + * kvm_test_age_gfn should return true for kvm_arch_prepare_bitmap_age() + * and do any work they need to prepare. The subsequent walk will not + * automatically grab the KVM MMU lock, so some architectures may opt + * to grab it. + * + * If true is returned, a subsequent call to kvm_arch_finish_bitmap_age() is + * guaranteed. + */ +#ifndef kvm_arch_prepare_bitmap_age +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) +{ + return false; +} +#endif +#ifndef kvm_arch_finish_bitmap_age +static inline void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) {} +#endif + #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) { @@ -2076,9 +2096,16 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm, return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq; } +struct test_clear_young_metadata { + unsigned long *bitmap; + unsigned long bitmap_offset_end; + unsigned long end; + bool unreliable; +}; union kvm_mmu_notifier_arg { pte_t pte; unsigned long attributes; + struct test_clear_young_metadata *metadata; }; struct kvm_gfn_range { @@ -2087,11 +2114,44 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); + +static inline void kvm_age_set_unreliable(struct kvm_gfn_range *range) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + args->unreliable = true; +} +static inline unsigned long kvm_young_bitmap_offset(struct kvm_gfn_range *range, + gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + return hva_to_gfn_memslot(args->end - 1, range->slot) - gfn; +} +static inline void kvm_gfn_record_young(struct kvm_gfn_range *range, gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + WARN_ON_ONCE(gfn < range->start || gfn >= range->end); + if (args->bitmap) + __set_bit(kvm_young_bitmap_offset(range, gfn), args->bitmap); +} +static inline bool kvm_gfn_should_age(struct kvm_gfn_range *range, gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + WARN_ON_ONCE(gfn < range->start || gfn >= range->end); + if (args->bitmap) + return test_bit(kvm_young_bitmap_offset(range, gfn), + args->bitmap); + return true; +} #endif #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d0545d88c802..7d80321e2ece 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -550,6 +550,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -598,6 +599,8 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, struct kvm_memslots *slots; int i, idx; + BUILD_BUG_ON(sizeof(gfn_range.arg) != sizeof(gfn_range.arg.pte)); + if (WARN_ON_ONCE(range->end <= range->start)) return r; @@ -637,15 +640,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.lockless = range->lockless; if (!r.found_memslot) { r.found_memslot = true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - break; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + break; + } } r.ret |= range->handler(kvm, &gfn_range); } @@ -654,7 +660,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, if (range->flush_on_ret && r.ret) kvm_flush_remote_tlbs(kvm); - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); @@ -682,19 +688,24 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, return __kvm_handle_hva_range(kvm, &range).ret; } -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) +static __always_inline int kvm_handle_hva_range_no_flush( + struct mmu_notifier *mn, + unsigned long start, + unsigned long end, + gfn_handler_t handler, + union kvm_mmu_notifier_arg arg, + bool lockless) { struct kvm *kvm = mmu_notifier_to_kvm(mn); const struct kvm_mmu_notifier_range range = { .start = start, .end = end, .handler = handler, + .arg = arg, .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .lockless = lockless, }; return __kvm_handle_hva_range(kvm, &range).ret; @@ -909,15 +920,36 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, kvm_age_gfn); } -static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long start, - unsigned long end, - unsigned long *bitmap) +static int kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap, + bool clear) { - trace_kvm_age_hva(start, end); + if (kvm_arch_prepare_bitmap_age(mn)) { + struct test_clear_young_metadata args = { + .bitmap = bitmap, + .end = end, + .unreliable = false, + }; + union kvm_mmu_notifier_arg arg = { + .metadata = &args + }; + bool young; + + young = kvm_handle_hva_range_no_flush( + mn, start, end, + clear ? kvm_age_gfn : kvm_test_age_gfn, + arg, true); + + kvm_arch_finish_bitmap_age(mn); - /* We don't support bitmaps. Don't test or clear anything. */ + if (!args.unreliable) + return young ? MMU_NOTIFIER_YOUNG_FAST : 0; + } + + /* A bitmap was passed but the architecture doesn't support bitmaps */ if (bitmap) return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; @@ -934,7 +966,21 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range_no_flush( + mn, start, end, clear ? kvm_age_gfn : kvm_test_age_gfn, + KVM_MMU_NOTIFIER_NO_ARG, false); +} + +static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + trace_kvm_age_hva(start, end); + + return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap, + true); } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -945,12 +991,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, { trace_kvm_test_age_hva(start, end); - /* We don't support bitmaps. Don't test or clear anything. */ - if (bitmap) - return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; - - return kvm_handle_hva_range_no_flush(mn, start, end, - kvm_test_age_gfn); + return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap, + false); } static void kvm_mmu_notifier_release(struct mmu_notifier *mn, From patchwork Mon Apr 1 23:29:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73933CD1292 for ; Mon, 1 Apr 2024 23:30:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=k42d03e9yfrRTkeHgyvQrOrYPLeRto+KV72GJO1v6sw=; b=BviBciC3dLN1jHZsSsuWsJGhMW xxHMVYVxw9JIrSXKhJzdhskWPP792xUynXCwSXtQrnAajuHkyJxzyqnrhRwj+9OuhuRUAVBZzyp44 3fcOLkBuPDDvNzC3TBHTn9CWBWp1Gt/0a+qAJHElNC4z40k+f1ToxCmiCwfUsRULPKtep0San4RZF VwVsEzqNMmfYziJq/0/5EyQ6ByygGU4gjqlfQdqRiBKmXAaBmn6OSPF0zCTN6pIinfTwpWbxleTzd QYG2g5nnCvfR0XTVOhVSaTi/7NobdnTh9QocqjaFPUWr3j8rdZBYBtLuFJAseFzIQ7jMAbP4Z1XAz TmLKbKGw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6M-000000097LG-3CPG; Mon, 01 Apr 2024 23:30:06 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6H-000000097Fs-0Of3 for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:02 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-dc6b26783b4so5073424276.0 for ; Mon, 01 Apr 2024 16:30:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014199; x=1712618999; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nylYGqt19cj6oDr0RFoeiQ+kw/0yLdSX5tVYYbKq8QU=; b=dfmRE4kiWU97B0AUM5w/WEn9+unqGq0yw/vBUw9es59UalW2xY/zlOyNQuZEOCteFu Nm6Dzlgueb1kYXBxEtWOAUQsO9LWSITfbJpWoLxuzuLtg9FMBNX97L2i+HXuFyqvYfy1 SDJZCCl2XfIqy8GyaLYcMAYayKZESJLcpIYreZOcIZZ/iBoDyl47Cl0KTs+II8z6ysmA LmrVkVyfw6lABQFl+1OYGUnl7h6cJkKNZCBd7iYaZECqXdseae+ajrF4DFqQLXxsD3uH Rnae0JD6cV55inQ3dFuzuNvyZ+5ZJHvrFGtxSQc01VPjj5NeDQHXc/iE47m9zCpDAtKY HEWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014199; x=1712618999; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nylYGqt19cj6oDr0RFoeiQ+kw/0yLdSX5tVYYbKq8QU=; b=uYR2gteIeXz9BpKoQ9mHJH3WoEH4lD1vAZWJRwD7EkOn6QL9U3wXfyWjib3Kl9dz6q O70vT53RbIXLmm+keFC++eIO5N5V1zReB5QoM9WPh5Ov/YhMbfCriQI2i2yDXalMmYwY IwxfOlQxdbcSYvSEXFnMgmVLAzg0VB7V7BteA1WyruU1cZ9heCPJhxtJY7AUP0C0NdA7 4BmbjQrixZ132WNqncO8m3BvYcjmL+yw2+6/RHt20iZHxWGyDc3hBHYP/wGwxUPgcql4 Uo6RECvNaNLAIadOys0/LKWScEGSzYsegU6nbZOSVMk/HX5vDImDL4VqlfjncOrfTEep Hxpw== X-Forwarded-Encrypted: i=1; AJvYcCXS4iT4gUnQYBwsHCS+M7kvyForRDTqzF1BtQEZM9OukWW1DiXdkHnbMh0c4E3qPncFygaQp7QCZr9WHz53V/dthUmdO9atYGxvmTdyzKMzbN5igcY= X-Gm-Message-State: AOJu0YzHD/oy9FJUfk3NCaAd0LWhPTl3gCui88vHRt+DjRTvRHNJieMC sAv/DMJUhnyo798D+Ghj0sP6UQ6ITqSZy3hEeg93yNy/1XIzwVljOPOGDj5WVsyz7c7N11OWlHY t9oCTP9ZEFj916mikVg== X-Google-Smtp-Source: AGHT+IH11gFjd5KLSe+p+dgHN5WTYlvUkDoQzbaal5VNFeM0iKdCiBmTQIhFAj+2vV8eggKrfvh7CUqHaFUquRNc X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:2182:b0:dcd:2f3e:4d18 with SMTP id dl2-20020a056902218200b00dcd2f3e4d18mr765892ybb.12.1712014199648; Mon, 01 Apr 2024 16:29:59 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:43 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-5-jthoughton@google.com> Subject: [PATCH v3 4/7] KVM: x86: Move tdp_mmu_enabled and shadow_accessed_mask From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_163001_163177_EB4A495D X-CRM114-Status: GOOD ( 13.12 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Yu Zhao tdp_mmu_enabled and shadow_accessed_mask are needed to implement kvm_arch_prepare_bitmap_age(). Signed-off-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/mmu.h | 6 ------ arch/x86/kvm/mmu/spte.h | 1 - 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 16e07a2eee19..3b58e2306621 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1847,6 +1847,7 @@ struct kvm_arch_async_pf { extern u32 __read_mostly kvm_nr_uret_msrs; extern u64 __read_mostly host_efer; +extern u64 __read_mostly shadow_accessed_mask; extern bool __read_mostly allow_smaller_maxphyaddr; extern bool __read_mostly enable_apicv; extern struct kvm_x86_ops kvm_x86_ops; @@ -1952,6 +1953,11 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, bool mask); extern bool tdp_enabled; +#ifdef CONFIG_X86_64 +extern bool tdp_mmu_enabled; +#else +#define tdp_mmu_enabled false +#endif u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 60f21bb4c27b..8ae279035900 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -270,12 +270,6 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm) return smp_load_acquire(&kvm->arch.shadow_root_allocated); } -#ifdef CONFIG_X86_64 -extern bool tdp_mmu_enabled; -#else -#define tdp_mmu_enabled false -#endif - static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index a129951c9a88..f791fe045c7d 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -154,7 +154,6 @@ extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_user_mask; -extern u64 __read_mostly shadow_accessed_mask; extern u64 __read_mostly shadow_dirty_mask; extern u64 __read_mostly shadow_mmio_value; extern u64 __read_mostly shadow_mmio_mask; From patchwork Mon Apr 1 23:29:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B60A8CD1288 for ; Mon, 1 Apr 2024 23:30:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=Bi4V3H23aI59Udt30bTZgX73HxDkKeTJOm1EifvuQfw=; b=VcKzNMoyGOWDj21acQzM6C0A7T sKEic+3OlqAbpku7DUqCZF89UFBjmFnL69kD34+LhZeL2xvcswxh4+5Ep+d7z5OOdkl2Szj926Uh7 PtGicjKx3JPDy4frSqver+BBbjl8OoCGGk3dK8hhS5K9YVVTrxRNEzLbKBHfgGZt8n5IebwJGdLyE 74CVnm9ADxzZVDcwiRfOYcheCT61QzYGGPQl917UKLE4WSktP7n9+gcCRd2Tr3v+x52tyhKPe8p5u QJ7R32CS4AX9hWD6HPz8WvMBz7D+diB/2pmqysFUFi4hUQ/UofRos9AK+M808bUfpTUMoCvut05NF OSw2Nq5w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6R-000000097PZ-3Y9G; Mon, 01 Apr 2024 23:30:12 +0000 Received: from mail-ua1-x94a.google.com ([2607:f8b0:4864:20::94a]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6I-000000097Gl-0qqD for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:04 +0000 Received: by mail-ua1-x94a.google.com with SMTP id a1e0cc1a2514c-7e05948d6e0so1710271241.2 for ; Mon, 01 Apr 2024 16:30:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014201; x=1712619001; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PXGb1c3s19z+8U24R5nbxlFRwqU1gq1c8Sy/WKQX1Zo=; b=O45NTnQ0OcqcXOVH5JVjrjbQw7gWg5KWU59aTvsZd9mj/ExrWJbzzoRCbdZVtxt3ZX C9dZgstJBySt2NOeTK0899vpqxO5949TedvzicxafQcSqpLioA+bKCYqp4/5ScJRxODb bpZXmhPy6JxWSbgT8n6/j52VoXda6MgWu+s5cnVUcqhf8JnTCOfJvbc+EQ/ywoVOT3te vAPLwN4lVElkoBdnWwswnjzzU1Lqfv4wlTogYC6QnXrFbFwR9RXCglch/lFf/XDYEnce Jz8wjFkszUGsl+Xv07XlPWX2HXmnaBOuRkTLf+Wc7QgHUFvbsUHoAjvay8HebEnAorUo kuhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014201; x=1712619001; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PXGb1c3s19z+8U24R5nbxlFRwqU1gq1c8Sy/WKQX1Zo=; b=jrvyEgyV2bZTej00FiUho+0O0Axb7MvpGvO9Cn7enIgUP1obAaZl0PEsPmtXKDhjW1 fEUk7HAnLSRMOiOODrNUwAocRhCyaEG4+1THgkEFL9r7UncdlnFLJ4sNCNr9ZHwLpPfm AaCgVlJV4ulSIJpuwbYKANbPKKNzDt4hXli8cFrSMxH4gtUfkUbdLCG2SpibrnSU+xlC EJ9mb2DV9TSari6BUlgbnzfxjc0zo3vvuEI5/UsSQA11d5x3EcfJvMhTaWijaEUkXpuR Jy1nEqVpFRtkX7AhCyec0TiKWyz4TidqfFEtju75exZozb2LRJ1zPZpzs+Cd+WuFvT/s QrnA== X-Forwarded-Encrypted: i=1; AJvYcCU3Zu0UMMv87e3wvmiVSJznB4KHk2f6/jiJuRweoIh08zcxFISbjGuxRKR6sSYr4kWOb/X/hhLPPyyfQ4HRIYI52HfB0fgLAPSp/j3P8Os5B6aGeDY= X-Gm-Message-State: AOJu0YwLZu435LbO8QsDdpDmx40Jy3OKs6U/sH1xfUb8ZZGSvg7XWJr8 TzR1Bs69RBtGDSAT3nOh4II+JI8mamj6XGD0NMalvbX6HFgzS8UeBR6kzBrhJPYcIfGjJsiyDWn RM5mF4UHPh6dzFXkMUw== X-Google-Smtp-Source: AGHT+IHiAeh9QdQ8JIxLLHU+30Xn5sQK+tQdEgkzewt+NNIBZBtyidDAUXciX6U3q2UCp3x2qiBYdkguFZKdqu9L X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:3792:0:b0:7e3:297b:9df1 with SMTP id d18-20020ab03792000000b007e3297b9df1mr67902uav.1.1712014200934; Mon, 01 Apr 2024 16:30:00 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:44 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-6-jthoughton@google.com> Subject: [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_163002_716094_F193191A X-CRM114-Status: GOOD ( 18.80 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Only handle the TDP MMU case for now. In other cases, if a bitmap was not provided, fallback to the slowpath that takes mmu_lock, or, if a bitmap was provided, inform the caller that the bitmap is unreliable. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++ arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++-- arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++++++- 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3b58e2306621..c30918d0887e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2324,4 +2324,18 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); */ #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) +#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) +{ + /* + * Indicate that we support bitmap-based aging when using the TDP MMU + * and the accessed bit is available in the TDP page tables. + * + * We have no other preparatory work to do here, so we do not need to + * redefine kvm_arch_finish_bitmap_age(). + */ + return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled + && shadow_accessed_mask; +} + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 992e651540e8..fae1a75750bb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1674,8 +1674,14 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + if (range->lockless) { + kvm_age_set_unreliable(range); + return false; + } + young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1687,8 +1693,14 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + if (range->lockless) { + kvm_age_set_unreliable(range); + return false; + } + young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d078157e62aa..edea01bc145f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1217,6 +1217,9 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, if (!is_accessed_spte(iter->old_spte)) return false; + if (!kvm_gfn_should_age(range, iter->gfn)) + return false; + if (spte_ad_enabled(iter->old_spte)) { iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep, iter->old_spte, @@ -1250,7 +1253,12 @@ bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, struct kvm_gfn_range *range) { - return is_accessed_spte(iter->old_spte); + bool young = is_accessed_spte(iter->old_spte); + + if (young) + kvm_gfn_record_young(range, iter->gfn); + + return young; } bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) From patchwork Mon Apr 1 23:29:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84107CD128A for ; Mon, 1 Apr 2024 23:30:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=ltpWC3FkUjfHKcvAshARlfpbHFicMYLaXgWpbBrMvpA=; b=WnUkX94/3s4XTCb+SonLT0CGhq DFTKoWbzzG/HkqWEixwbW39yXJcV2O5ykSN44OD6wZAa0pRh4yQfJ1iF9Za790HdHwOKCDtKbaaiP ekOhpkn9SAq7mXb2JTLJlbR3Vgm1R4Dqd4kBPjaa6WFg0LDb3S0Wli+NRwjLJD+Lukb+a7s9JBi0D 4dOKDVVJGDUH8P4Cl5WWjS41CzEAaqir7Abm0ANelmJwm8o50fmQ8nHwFp4rwHFdVn+qkHsOuH/OO DNPXcNMrj76PtuiWsOd8hKabmoMo7aRQKiDQ5VyJlA8vaT+/d6hgMgYXb4Dg5nIvx9Twc/Jb77euK ekRUUuoA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6Y-000000097W8-30yX; Mon, 01 Apr 2024 23:30:18 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6I-000000097HF-3tv3 for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:05 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-60cd041665bso81645707b3.0 for ; Mon, 01 Apr 2024 16:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014202; x=1712619002; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CifQvULm5VcvU50LwCct+JTf2gCEq9urL1yip38iv3E=; b=nufrx+9yCsVzqX2sjbDtEF3oowKYX52hk0NUeuGyFYMCi78oln7URASMNl0IXXT7qg XD2afkB0cUo7nLwnYkVGCrpl68Mx34zyy1dNrB0qXgN11qndtbOvTUEjjyR0NPe6CKug t0CbTPKkwCiY3VUkaxEyBTR949Afx+rU1F9a+eW3tbbJdbN6PSYNkQnQaSoXnxs8ZwMJ Tb4Bw7wlfKgAp8h564fQhZ4X4MTRN0/Bc2y1qZoXFbGwv70mH9mycUJEzVQZP57Amqy6 qVC38i0TLELBo/Hld1EAeIudP/wggUx8HL4TELKyRxQuADjP53XgVuSz9GkOZr47f6TT nVXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014202; x=1712619002; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CifQvULm5VcvU50LwCct+JTf2gCEq9urL1yip38iv3E=; b=hfrwSAx3zu8001gf9iNl8NZiBWykDL+Mi1qqgL4xU+CHVmX7DzZsgac6qxVCtd/tLx 0R65n5PNIBfRZ5C7Bhl9QhS+gXbdk4jVmYnnsUSEtczHgaHHfpaZaBDIFxfXVqVA1UxV 0K6g17o70qhLRGf0lqcUGARJmmJsWT9aLRpaxrKbTm1HwW+102D77sQjhmCilU4xMQxr /wfF/uMst3g0zTISMDy4ZvHO3H6Ra4AoWHMIozVTyMScryFWAvSNpPtXqiqQPLo7IDcN sBcmCB9CztiWJ+IodciLsVsyEoz2EbXC+rVDNtEtG5GltiIIzgMPlF7tcjALPB10nOq+ wdJw== X-Forwarded-Encrypted: i=1; AJvYcCWddxJvrKMFLlnqR2IKy9Q5tkGQy6YFZBNNqt2q2YLvfeO2INiSKZeNyctqvijgDOsdUUTDsoNMSxbcseariovnrxGc4RW+rch099ZJH1rqGGRyyIo= X-Gm-Message-State: AOJu0YyPGlH840n+PjwLo1DStzYMf1R6+rlg/76jYN0r1BnShHd6CVvy V8hDzN4P7gZuvO7sTph2TKD5LtOC9gqJFyUHRi12H+tdaE1v8RAf3ht0VQ9s0efLC7uSeWE40oe ib77XZo3SYVtitR5cdw== X-Google-Smtp-Source: AGHT+IGQAk6u/Sk8y21e9rySBKwdkEDK33GoLWkr+tAsmcolw1O4jbXBJxOpzuSZWzDHw9bmaDUMZf63oZp0WmR7 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:650e:b0:615:165b:8dde with SMTP id hw14-20020a05690c650e00b00615165b8ddemr268650ywb.10.1712014201787; Mon, 01 Apr 2024 16:30:01 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:45 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-7-jthoughton@google.com> Subject: [PATCH v3 6/7] KVM: arm64: Participate in bitmap-based PTE aging From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_163003_100314_A1B972D8 X-CRM114-Status: GOOD ( 23.32 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Participate in bitmap-based aging while grabbing the KVM MMU lock for reading. Ideally we wouldn't need to grab this lock at all, but that would require a more intrustive and risky change. Also pass KVM_PGTABLE_WALK_SHARED, as this software walker is safe to run in parallel with other walkers. It is safe only to grab the KVM MMU lock for reading as the kvm_pgtable is destroyed while holding the lock for writing, and freeing of the page table pages is either done while holding the MMU lock for writing or after an RCU grace period. When mkold == false, record the young pages in the passed-in bitmap. When mkold == true, only age the pages that need aging according to the passed-in bitmap. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/arm64/include/asm/kvm_host.h | 5 +++++ arch/arm64/include/asm/kvm_pgtable.h | 4 +++- arch/arm64/kvm/hyp/pgtable.c | 21 ++++++++++++++------- arch/arm64/kvm/mmu.c | 23 +++++++++++++++++++++-- 4 files changed, 43 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 9e8a496fb284..e503553cb356 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1331,4 +1331,9 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu); (get_idreg_field((kvm), id, fld) >= expand_field_sign(id, fld, min) && \ get_idreg_field((kvm), id, fld) <= expand_field_sign(id, fld, max)) +#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age +bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn); +#define kvm_arch_finish_bitmap_age kvm_arch_finish_bitmap_age +void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn); + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 19278dfe7978..1976b4e26188 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -644,6 +644,7 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr); * @addr: Intermediate physical address to identify the page-table entry. * @size: Size of the address range to visit. * @mkold: True if the access flag should be cleared. + * @range: The kvm_gfn_range that is being used for the memslot walker. * * The offset of @addr within a page is ignored. * @@ -657,7 +658,8 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr); * Return: True if any of the visited PTEs had the access flag set. */ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, - u64 size, bool mkold); + u64 size, bool mkold, + struct kvm_gfn_range *range); /** * kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 3fae5830f8d2..e881d3595aca 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1281,6 +1281,7 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr) } struct stage2_age_data { + struct kvm_gfn_range *range; bool mkold; bool young; }; @@ -1290,20 +1291,24 @@ static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx, { kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF; struct stage2_age_data *data = ctx->arg; + gfn_t gfn = ctx->addr / PAGE_SIZE; if (!kvm_pte_valid(ctx->old) || new == ctx->old) return 0; data->young = true; + /* - * stage2_age_walker() is always called while holding the MMU lock for - * write, so this will always succeed. Nonetheless, this deliberately - * follows the race detection pattern of the other stage-2 walkers in - * case the locking mechanics of the MMU notifiers is ever changed. + * stage2_age_walker() may not be holding the MMU lock for write, so + * follow the race detection pattern of the other stage-2 walkers. */ - if (data->mkold && !stage2_try_set_pte(ctx, new)) - return -EAGAIN; + if (data->mkold) { + if (kvm_gfn_should_age(data->range, gfn) && + !stage2_try_set_pte(ctx, new)) + return -EAGAIN; + } else + kvm_gfn_record_young(data->range, gfn); /* * "But where's the TLBI?!", you scream. @@ -1315,10 +1320,12 @@ static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx, } bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, - u64 size, bool mkold) + u64 size, bool mkold, + struct kvm_gfn_range *range) { struct stage2_age_data data = { .mkold = mkold, + .range = range, }; struct kvm_pgtable_walker walker = { .cb = stage2_age_walker, diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 18680771cdb0..104cc23e9bb3 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1802,6 +1802,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return false; } +bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + + /* + * We need to hold the MMU lock for reading to prevent page tables + * from being freed underneath us. + */ + read_lock(&kvm->mmu_lock); + return true; +} + +void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + + read_unlock(&kvm->mmu_lock); +} + bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { u64 size = (range->end - range->start) << PAGE_SHIFT; @@ -1811,7 +1830,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, - size, true); + size, true, range); } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) @@ -1823,7 +1842,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, - size, false); + size, false, range); } phys_addr_t kvm_mmu_get_httbr(void) From patchwork Mon Apr 1 23:29:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2D180CD128A for ; Mon, 1 Apr 2024 23:30:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=sGxSgh0/Oga+jJp2dYNVIBchFZajjSp5jrCPhyi6sDI=; b=Dh/8UIEube8mArpNHM3dTCMnlI WVd9/A9UKijaCQ8nA4n0jpv4ug25ghxtJ4OBozp2W5/u9hDcPY7Ap0Vlp2r8eqroIhfg89gK7dnck FD7XlWQjYw75hYD7MUyCZrctQNlKQ2el2PbvG8FBkY7iNmGPtj5WvEO0OMl4+ab20Icj2LsVEHKd/ N3yub474m3OLhECJQPzbwTNGBbzVpi44qZTqjrWF+2tP1ay3+CmsgtbYH7TcguDsc/bAUeg5uahgA YBVxaoh9+3rRCKaG0+MuikM0oTVRl8yuRpylsfQV18AYA+Y2x9z8OBnuw36tByXH2NoT40uIlSCHr AYHX1lNg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6f-000000097ci-3f1j; Mon, 01 Apr 2024 23:30:28 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6K-000000097IJ-00Zc for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:08 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-dc6b26783b4so5073462276.0 for ; Mon, 01 Apr 2024 16:30:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014202; x=1712619002; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WpAS/uks+xeMlZlwJrS/cYcvCVm3xLsD9iJKs+t5VOw=; b=3+w/7fFirrgh2vb32NRJBy0QMaH2o+eVtQ87UkRcsGKBlubtd3jRZCizbJhFEBxIko mLrqV4cvYxNaVFD2+9X/0XXkwpKmf2avzZlVxcW6xJqqCziURrKJxUQSqTQVffNG/Oob 9uZ5AAEzPKdVXwuSJfwULfWOhbOO5cZn18R+ck4g/f52sp97LBWZxWNZ8FnbHqYhEXIb SxnwlZvRgB2FEOZ/Yv3Wh6O+viW+CYKx4ItAxKDYC748nme4Px0dF2cqAi0xXj3cr8/5 j0oLxJhEUfsxIJ/SKHTLOPKx73frPrWEcHldrnkb8eyWOZJa0FStzCa0CHYycHmDxBCC zdvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014202; x=1712619002; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WpAS/uks+xeMlZlwJrS/cYcvCVm3xLsD9iJKs+t5VOw=; b=VCU74C9ITr9l6M+6tqHRwB+0x4EGP7hSJWmWtzTE9UWM3H66oeUioIKk3/EsTHqdOD HbyyIK9FP3x6oMvdN/WrOerePEH4EOI9ypK/XGEqRB5Ei6SIH7Iu4Bdc1/5vWCVr0s6r XxpiAi4xkvDHHveSjklMvK1pQJTY3qcf2GD5v1lqbRBSl0Dfw5JPKe8L42kvdCPJjMgV 36AhLJvCdstebFnHS8umQTm17bdWlPJB5k+WUMgCdacwpLR4zSy07tLci3O3Iect0svn 1vTUBc4WhIYicgB16obukS7yImNgtXNx6yl9w/VnNZLt/SWALhPbB3bIKJk0ro4EAnGX 25Mg== X-Forwarded-Encrypted: i=1; AJvYcCWbsVS65BcCsgT4EHpUrfNq+eV+cBnnR9nMVpN6MoBzh4yMdOdBePZpZSXYTs5MJfhDjbfxl9vE7XyhfjtmKZz1lVgg+42edkh/4i18jpoxWsNpcBg= X-Gm-Message-State: AOJu0YwNTuemyS/EzGf8aDsUVsNpH9OTzDhqoBy82QMXJHUalIQvPiO1 nxSnJRQJOCamQP9QJkFdGEBA/6SjDpkLSYY0m0chcjBNLhKObSwjulDeLnVVgIfYrr/cl0x1zxT Gur3p7kopfaD+cIr5zw== X-Google-Smtp-Source: AGHT+IFDD8pqYO4xCu07bbvePXFmo0Pl8SwWr49Tzn/Rs0hc66IZhVLSZQfRP6lopGz4i0uv+3xgxtfkPILQecVq X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:2202:b0:dcd:59a5:7545 with SMTP id dm2-20020a056902220200b00dcd59a57545mr764461ybb.10.1712014202562; Mon, 01 Apr 2024 16:30:02 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:46 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-8-jthoughton@google.com> Subject: [PATCH v3 7/7] mm: multi-gen LRU: use mmu_notifier_test_clear_young() From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_163004_132729_37F7ADAE X-CRM114-Status: GOOD ( 26.42 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Yu Zhao Use mmu_notifier_{test,clear}_young_bitmap() to handle KVM PTEs in batches when the fast path is supported. This reduces the contention on kvm->mmu_lock when the host is under heavy memory pressure. An existing selftest can quickly demonstrate the effectiveness of this patch. On a generic workstation equipped with 128 CPUs and 256GB DRAM: $ sudo max_guest_memory_test -c 64 -m 250 -s 250 MGLRU run2 ------------------ Before [1] ~64s After ~51s kswapd (MGLRU before) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.71% evict_folios 97.29% shrink_folio_list ==>> 13.05% folio_referenced 12.83% rmap_walk_file 12.31% folio_referenced_one 7.90% __mmu_notifier_clear_young 7.72% kvm_mmu_notifier_clear_young 7.34% _raw_write_lock kswapd (MGLRU after) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.59% evict_folios 80.37% shrink_folio_list ==>> 3.74% folio_referenced 3.59% rmap_walk_file 3.19% folio_referenced_one 2.53% lru_gen_look_around 1.06% __mmu_notifier_test_clear_young [1] "mm: rmap: Don't flush TLB after checking PTE young for page reference" was included so that the comparison is apples to apples. https://lore.kernel.org/r/20220706112041.3831-1-21cnbao@gmail.com/ Signed-off-by: Yu Zhao Signed-off-by: James Houghton --- Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 9 +- mm/vmscan.c | 183 ++++++++++++++---- 4 files changed, 159 insertions(+), 45 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst index 33e068830497..0ae2a6d4d94c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Clearing the accessed bit in KVM page table entries in large + batches, when KVM MMU sets it (e.g., on x86_64). This can + improve the performance of guests when the host is under memory + pressure. [yYnN] Apply to all the components above. ====== =============================================================== @@ -56,7 +60,7 @@ E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c11b7cde81ef..a98de5106990 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -397,6 +397,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_KVM_MMU_WALK, NR_LRU_GEN_CAPS }; @@ -554,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -573,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index 56b313aa2ebf..41e9fc25684e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -871,13 +871,10 @@ static bool folio_referenced_one(struct folio *folio, continue; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 293120fe54f3..fd65f3466dfc 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -2596,6 +2597,11 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } +static bool should_walk_kvm_mmu(void) +{ + return get_cap(LRU_GEN_KVM_MMU_WALK); +} + /****************************************************************************** * shorthand helpers ******************************************************************************/ @@ -3293,7 +3299,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3308,10 +3315,15 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3326,6 +3338,10 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3334,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3352,6 +3364,52 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, return folio; } +static bool test_spte_young(struct mm_struct *mm, unsigned long addr, unsigned long end, + unsigned long *bitmap, unsigned long *last) +{ + if (*last > addr) + goto done; + + *last = end - addr > MIN_LRU_BATCH * PAGE_SIZE ? + addr + MIN_LRU_BATCH * PAGE_SIZE - 1 : end - 1; + bitmap_zero(bitmap, MIN_LRU_BATCH); + + mmu_notifier_test_young_bitmap(mm, addr, *last + 1, bitmap); +done: + return test_bit((*last - addr) / PAGE_SIZE, bitmap); +} + +static void clear_spte_young(struct mm_struct *mm, unsigned long addr, + unsigned long *bitmap, unsigned long *last) +{ + int i; + unsigned long start, end = *last + 1; + + if (addr + PAGE_SIZE != end) + return; + + i = find_last_bit(bitmap, MIN_LRU_BATCH); + if (i == MIN_LRU_BATCH) + return; + + start = end - (i + 1) * PAGE_SIZE; + + i = find_first_bit(bitmap, MIN_LRU_BATCH); + + end -= i * PAGE_SIZE; + + mmu_notifier_clear_young_bitmap(mm, start, end, bitmap); +} + +static void skip_spte_young(struct mm_struct *mm, unsigned long addr, + unsigned long *bitmap, unsigned long *last) +{ + if (*last > addr) + __clear_bit((*last - addr) / PAGE_SIZE, bitmap); + + clear_spte_young(mm, addr, bitmap, last); +} + static bool suitable_to_scan(int total, int young) { int n = clamp_t(int, cache_line_size() / sizeof(pte_t), 2, 8); @@ -3367,6 +3425,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, pte_t *pte; spinlock_t *ptl; unsigned long addr; + DECLARE_BITMAP(bitmap, MIN_LRU_BATCH); + unsigned long last = 0; int total = 0; int young = 0; struct lru_gen_mm_walk *walk = args->private; @@ -3386,6 +3446,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, arch_enter_lazy_mmu_mode(); restart: for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) { + bool ret; unsigned long pfn; struct folio *folio; pte_t ptent = ptep_get(pte + i); @@ -3393,21 +3454,28 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); - if (pfn == -1) + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); + if (pfn == -1) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!pte_young(ptent)) { + ret = test_spte_young(args->vma->vm_mm, addr, end, bitmap, &last); + if (!ret && !pte_young(ptent)) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); walk->mm_stats[MM_LEAF_OLD]++; continue; } folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); - if (!folio) + if (!folio) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + clear_spte_young(args->vma->vm_mm, addr, bitmap, &last); + if (pte_young(ptent)) + ptep_test_and_clear_young(args->vma, addr, pte + i); young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,22 +3541,24 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (should_clear_pmd_young() && !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) { + walk->mm_stats[MM_LEAF_OLD]++; goto next; + } walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,19 +3615,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; + if (pfn == -1) continue; - } - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + if (!pmd_young(val) && !mm_has_notifiers(args->mm)) { + walk->mm_stats[MM_LEAF_OLD]++; continue; + } walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; @@ -3565,7 +3634,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (should_clear_pmd_young() && !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -3646,6 +3715,9 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk) struct lruvec *lruvec = walk->lruvec; struct mem_cgroup *memcg = lruvec_memcg(lruvec); + if (!should_walk_kvm_mmu() && mm_has_notifiers(mm)) + return; + walk->next_addr = FIRST_USER_ADDRESS; do { @@ -4011,6 +4083,23 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * rmap/PT walk feedback ******************************************************************************/ +static bool should_look_around(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int *young) +{ + int ret = mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + + if (pte_young(ptep_get(pte))) { + ptep_test_and_clear_young(vma, addr, pte); + *young = true; + return true; + } + + if (ret) + *young = true; + + return ret & MMU_NOTIFIER_YOUNG_FAST; +} + /* * This function exploits spatial locality when shrink_folio_list() walks the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If @@ -4018,12 +4107,14 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; + DECLARE_BITMAP(bitmap, MIN_LRU_BATCH); + unsigned long last = 0; int young = 0; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; @@ -4040,12 +4131,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!should_look_around(pvmw->vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return young; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4053,6 +4147,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4066,29 +4163,38 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; arch_enter_lazy_mmu_mode(); pte -= (addr - start) / PAGE_SIZE; for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) { + bool ret; unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); - if (pfn == -1) + pfn = get_pte_pfn(ptent, vma, addr, pgdat); + if (pfn == -1) { + skip_spte_young(vma->vm_mm, addr, bitmap, &last); continue; + } - if (!pte_young(ptent)) + ret = test_spte_young(pvmw->vma->vm_mm, addr, end, bitmap, &last); + if (!ret && !pte_young(ptent)) { + skip_spte_young(pvmw->vma->vm_mm, addr, bitmap, &last); continue; + } folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); - if (!folio) + if (!folio) { + skip_spte_young(vma->vm_mm, addr, bitmap, &last); continue; + } - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + clear_spte_young(vma->vm_mm, addr, bitmap, &last); + if (pte_young(ptent)) + ptep_test_and_clear_young(vma, addr, pte + i); young++; @@ -4118,6 +4224,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return young; } /****************************************************************************** @@ -5154,6 +5262,9 @@ static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |= BIT(LRU_GEN_NONLEAF_YOUNG); + if (should_walk_kvm_mmu()) + caps |= BIT(LRU_GEN_KVM_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); }