From patchwork Mon Apr 1 23:29:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613098 Received: from mail-vk1-f201.google.com (mail-vk1-f201.google.com [209.85.221.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D2EC56B94 for ; Mon, 1 Apr 2024 23:29:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014200; cv=none; b=jtpdTUecZ0WnrJkHX4JkTUTw2Aejs3xdSffK+O5MPmXeI/IUhQCYJQIFxclZxGCbNiQgUD3VOY9YvMcIU0szL2GiigaiuKEuQxeKChljDYV6LXV+pks7bd6V5ConkQtCxlfOeg3sy4yVpY9eEMfa5FDTB0uMk6ObzF7pvkOw/Mc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014200; c=relaxed/simple; bh=/sojkNHgAkXV2li3H4WMuqhMOGJZX+4+iRjYtJcWaOM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PxW7/mTp+Fh5zr5e6TqFHfy2UY3hZ71pytmErcwJ070CkHDJ8eZNTdqA6THSAn4Jom1nPW1BZnCuP14S8ihnZboqj/i1MwsjHB/Udsl1bf+NRHH1mimKOoqeiNDk3fTrnmJYB1PAzq+UR7Il8sV+IgDY1xraP9qWuDzLgnfhbZM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GtQvJacL; arc=none smtp.client-ip=209.85.221.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GtQvJacL" Received: by mail-vk1-f201.google.com with SMTP id 71dfb90a1353d-4d456eb4b75so1814078e0c.0 for ; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014197; x=1712618997; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XjV19huKRvwcv3bjf5GpogmLoEGdyfdviROgtmpM2BI=; b=GtQvJacLFGy8AEBpRIUED1qWman0u+sYzN2LTmlvRmNb0c1Rbx/JZwH9tre5/Ckp06 nMD/WWUrr6jN7f50DQa0x9IDfE0DJi3Cac9LYRXu5HIUkpvh9ah5bjU0QxHtQa5ZcgDh 1YBWKKbyo38dYTe+VupFioIMocvCem6e9K54F9c0EuhHlOO2jioc3+FkQGoEMgUso7GF Aa36XHhLwgFS2j7ea8rwvKuTD1WSDSC37yb1Pc5F0LKpPW/oXRveZnWoXtUDwTj4aF8o 2LRaEADrX2h+Dq+XUSFBXfHf0wghDDpHi0yJRSs8sMH+Z5xI/yD2+8X548vPoQsA3+YQ uU/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014197; x=1712618997; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XjV19huKRvwcv3bjf5GpogmLoEGdyfdviROgtmpM2BI=; b=j8BRue/lXuRzpE6GFyFChgrzJKZ1GWFe6COE5K0NQzaBtMYjgCwrgdhQUrSS/1v8YC /NFOY0gbhHIILMFLgRW8kzkWL/SWo/KzMQ83lRdY4PmJo8xfSuqYlzhzcMp2qCnQybBN Szb/w5QpA2kcvj3vaek4n3Ive2Dlvdw5PwhH84hE3jAr1MvdawJ1+FMxxeGottePXfIE kdYI/gdAONP5SQamEhggRqdpl72d89li5VR81XE3w/WIdprRxYmqtOmfkHayIhl1M4Qb w4g895q1kiJc4rtap/XBCJnu8xOSnerwQaAQvwqsOClwBFFjOLbQngNNJDpqX62+KuzV ytZw== X-Forwarded-Encrypted: i=1; AJvYcCX4Z/NIAGW8zJ/P28tPmfprFhMs/KeCShPAdQi9FJRfnG0TOWyZyzH4ZhjT2PBrD0VbWYaS53X5OLwBVSoNZ5gidkHC X-Gm-Message-State: AOJu0Yy5MK4wQmNjpm8fK4J7ITpZwFBrhYDTKWPJnzDpZGvJy0RYNjZc 3O1rxQUeIVFqSSzUBHayBGYMOpNSgrx0m6jEk2jpWjO7BWlX+jS/4BvlfMkWpED9tkqDRkNHLC3 xWPBCqjPljLga78RJGg== X-Google-Smtp-Source: AGHT+IFUopEMKakeSmmDQQkpzKxlNJ65mvpQqylZZQK+yo+BzjzMVCQHtrhU8U2Q57twwQQQfTybPo3O3BawFKgE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ac5:c281:0:b0:4d8:73c1:9ec7 with SMTP id h1-20020ac5c281000000b004d873c19ec7mr84232vkk.0.1712014197363; Mon, 01 Apr 2024 16:29:57 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:40 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-2-jthoughton@google.com> Subject: [PATCH v3 1/7] mm: Add a bitmap into mmu_notifier_{clear,test}_young From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton The bitmap is provided for secondary MMUs to use if they support it. For test_young(), after it returns, the bitmap represents the pages that were young in the interval [start, end). For clear_young, it represents the pages that we wish the secondary MMU to clear the accessed/young bit for. If a bitmap is not provided, the mmu_notifier_{test,clear}_young() API should be unchanged except that if young PTEs are found and the architecture supports passing in a bitmap, instead of returning 1, MMU_NOTIFIER_YOUNG_FAST is returned. This allows MGLRU's look-around logic to work faster, resulting in a 4% improvement in real workloads[1]. Also introduce MMU_NOTIFIER_YOUNG_FAST to indicate to main mm that doing look-around is likely to be beneficial. If the secondary MMU doesn't support the bitmap, it must return an int that contains MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. [1]: https://lore.kernel.org/all/20230609005935.42390-1-yuzhao@google.com/ Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/mmu_notifier.h | 93 +++++++++++++++++++++++++++++++++--- include/trace/events/kvm.h | 13 +++-- mm/mmu_notifier.c | 20 +++++--- virt/kvm/kvm_main.c | 19 ++++++-- 4 files changed, 123 insertions(+), 22 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index f349e08a9dfe..daaa9db625d3 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -61,6 +61,10 @@ enum mmu_notifier_event { #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0) +#define MMU_NOTIFIER_YOUNG (1 << 0) +#define MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE (1 << 1) +#define MMU_NOTIFIER_YOUNG_FAST (1 << 2) + struct mmu_notifier_ops { /* * Called either by mmu_notifier_unregister or when the mm is @@ -106,21 +110,36 @@ struct mmu_notifier_ops { * clear_young is a lightweight version of clear_flush_young. Like the * latter, it is supposed to test-and-clear the young/accessed bitflag * in the secondary pte, but it may omit flushing the secondary tlb. + * + * If @bitmap is given but is not supported, return + * MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + * + * If the walk is done "quickly" and there were young PTEs, + * MMU_NOTIFIER_YOUNG_FAST is returned. */ int (*clear_young)(struct mmu_notifier *subscription, struct mm_struct *mm, unsigned long start, - unsigned long end); + unsigned long end, + unsigned long *bitmap); /* * test_young is called to check the young/accessed bitflag in * the secondary pte. This is used to know if the page is * frequently used without actually clearing the flag or tearing * down the secondary mapping on the page. + * + * If @bitmap is given but is not supported, return + * MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + * + * If the walk is done "quickly" and there were young PTEs, + * MMU_NOTIFIER_YOUNG_FAST is returned. */ int (*test_young)(struct mmu_notifier *subscription, struct mm_struct *mm, - unsigned long address); + unsigned long start, + unsigned long end, + unsigned long *bitmap); /* * change_pte is called in cases that pte mapping to page is changed: @@ -388,10 +407,11 @@ extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, unsigned long start, unsigned long end); extern int __mmu_notifier_clear_young(struct mm_struct *mm, - unsigned long start, - unsigned long end); + unsigned long start, unsigned long end, + unsigned long *bitmap); extern int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address); + unsigned long start, unsigned long end, + unsigned long *bitmap); extern void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); @@ -427,7 +447,25 @@ static inline int mmu_notifier_clear_young(struct mm_struct *mm, unsigned long end) { if (mm_has_notifiers(mm)) - return __mmu_notifier_clear_young(mm, start, end); + return __mmu_notifier_clear_young(mm, start, end, NULL); + return 0; +} + +/* + * When @bitmap is not provided, clear the young bits in the secondary + * MMUs for all of the pages in the interval [start, end). + * + * If any subscribed secondary MMU does not support @bitmap, this function + * will return an integer containing MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + * Some work may have been done in the secondary MMU. + */ +static inline int mmu_notifier_clear_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_young(mm, start, end, bitmap); return 0; } @@ -435,7 +473,25 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { if (mm_has_notifiers(mm)) - return __mmu_notifier_test_young(mm, address); + return __mmu_notifier_test_young(mm, address, address + 1, + NULL); + return 0; +} + +/* + * When @bitmap is not provided, test the young bits in the secondary + * MMUs for all of the pages in the interval [start, end). + * + * If any subscribed secondary MMU does not support @bitmap, this function + * will return an integer containing MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. + */ +static inline int mmu_notifier_test_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_test_young(mm, start, end, bitmap); return 0; } @@ -644,12 +700,35 @@ static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, return 0; } +static inline int mmu_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return 0; +} + +static inline int mmu_notifier_clear_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + return 0; +} + static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { return 0; } +static inline int mmu_notifier_test_young_bitmap(struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + return 0; +} + static inline void mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte) { diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 011fba6b5552..e4ace8cfdbba 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -490,18 +490,21 @@ TRACE_EVENT(kvm_age_hva, ); TRACE_EVENT(kvm_test_age_hva, - TP_PROTO(unsigned long hva), - TP_ARGS(hva), + TP_PROTO(unsigned long start, unsigned long end), + TP_ARGS(start, end), TP_STRUCT__entry( - __field( unsigned long, hva ) + __field( unsigned long, start ) + __field( unsigned long, end ) ), TP_fast_assign( - __entry->hva = hva; + __entry->start = start; + __entry->end = end; ), - TP_printk("mmu notifier test age hva: %#016lx", __entry->hva) + TP_printk("mmu notifier test age hva: %#016lx -- %#016lx", + __entry->start, __entry->end) ); #endif /* _TRACE_KVM_MAIN_H */ diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index ec3b068cbbe6..e70c6222944c 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -384,7 +384,8 @@ int __mmu_notifier_clear_flush_young(struct mm_struct *mm, int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + unsigned long *bitmap) { struct mmu_notifier *subscription; int young = 0, id; @@ -395,7 +396,8 @@ int __mmu_notifier_clear_young(struct mm_struct *mm, srcu_read_lock_held(&srcu)) { if (subscription->ops->clear_young) young |= subscription->ops->clear_young(subscription, - mm, start, end); + mm, start, end, + bitmap); } srcu_read_unlock(&srcu, id); @@ -403,7 +405,8 @@ int __mmu_notifier_clear_young(struct mm_struct *mm, } int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address) + unsigned long start, unsigned long end, + unsigned long *bitmap) { struct mmu_notifier *subscription; int young = 0, id; @@ -413,9 +416,14 @@ int __mmu_notifier_test_young(struct mm_struct *mm, &mm->notifier_subscriptions->list, hlist, srcu_read_lock_held(&srcu)) { if (subscription->ops->test_young) { - young = subscription->ops->test_young(subscription, mm, - address); - if (young) + young |= subscription->ops->test_young(subscription, mm, + start, end, + bitmap); + if (young && !bitmap) + /* + * We're not using a bitmap, so there is no + * need to check any more secondary MMUs. + */ break; } } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fb49c2a60200..ca4b1ef9dfc2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -917,10 +917,15 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + unsigned long *bitmap) { trace_kvm_age_hva(start, end); + /* We don't support bitmaps. Don't test or clear anything. */ + if (bitmap) + return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; + /* * Even though we do not flush TLB, this will still adversely * affect performance on pre-Haswell Intel EPT, where there is @@ -939,11 +944,17 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, struct mm_struct *mm, - unsigned long address) + unsigned long start, + unsigned long end, + unsigned long *bitmap) { - trace_kvm_test_age_hva(address); + trace_kvm_test_age_hva(start, end); + + /* We don't support bitmaps. Don't test or clear anything. */ + if (bitmap) + return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; - return kvm_handle_hva_range_no_flush(mn, address, address + 1, + return kvm_handle_hva_range_no_flush(mn, start, end, kvm_test_age_gfn); } From patchwork Mon Apr 1 23:29:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613099 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CE2156B89 for ; Mon, 1 Apr 2024 23:29:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014200; cv=none; b=f5m7OVHlwkBT4ijd1bbk/EraXWGrlj/XPX8ev+3rd5QhDSzYbAltBSaXR2Zv7+PKa11IpHGmUuCwj5uPFGpb4TXdM6FxmQMwIr2LLYNzny3tKTTZV73gJpOenJWT1Uf2VPQsBzhsqLBOfvO4v0pSW7nKVgePHZ05LwJA0HRQkMU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014200; c=relaxed/simple; bh=0ndCTUSKNYjrC7chZIPInNHsk3SUFRlCdRJOI83//j4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kulRzRB4A6sN5HnFoPJjo5CW4aTJmMAwWXroQhBIlWbYzVHZGmmmS+v1L1MEJgohm3SYdvW90HNrKlxzME5Rnp2Fazqs9+G+t25OxiaMF8LQZzu+IMAHcLCghUsX1G9VvBp91i1x3eeVfrUL/ymKLSVKULthmxsDGF/nPhMym2Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EkUTm+Bl; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EkUTm+Bl" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-60a20c33f06so52732877b3.2 for ; Mon, 01 Apr 2024 16:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014198; x=1712618998; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lJfaFUFI3gJojofhvI2fMvWTfkDCiiHnhDdEzunFXno=; b=EkUTm+BlIHv+2yFf9PmTsBJqBNyP5xBbhe1kcuVvTw03Jg0NYO8n0slTm0ul57EyED LhWUM2QaOtiznKp+R/JqMTo6Dk54IzyK4f/LGqe1NvhnAR1JCHvyADpgX+55UVf+yZYS idgRM9fRqgjpjZfwJqMBHO0RFxKCAYEY3TgFPsXEN+cv6Pdqvu1Ebi63s4nQzgLH/Pft wmf92n6UtCnIBHneAXEAcKvKNCOXa9LqZxEgAzfD2pBivWpNGEIcsnYiTa/PqJEBhpKq 98ttLvPCxCjIdOhRyuzFBfII2Chir14fbDNEg5eaw99qBWpDmm05Aa17iqCRdt0Dyv/6 8n0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014198; x=1712618998; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lJfaFUFI3gJojofhvI2fMvWTfkDCiiHnhDdEzunFXno=; b=LipXf0x/yZIXFcJWOFegRKgAVHvbg+0UwTvc/sqTf2xWEe9QvOxPbuX/6+Z8J/JXBz lMRlv9SKn8ffYXR9ulzJgEewZpbEREd0aLXVrIsZ7a0INuUVyXWvpVo00YIuHnARMEWE Sntm8m0OxqEIMhA2n4iQ2kcYohyVqEZ0BwHwvBVbeNYZBpYjX9BpFXKul0W+ZuiUIWCz jA/guFljeGy7t8cRDLWseeRAjcoUS9zaB3+xlCm9aKrE1ptXNfsjkXma7Ht0zWM6VXLe 8cywK/RgyZx5+IXb6YO19o2/aUVyt9/n00HQaXOp75Zh4svo3qDq3IZU8qlJncxdkXT0 Gjqw== X-Forwarded-Encrypted: i=1; AJvYcCUjd44a7lMdcOZrpgPyf4RQu8y1eEbxE8k7s4UYplSaMQGJIVpNKsIxaUp74cjGdlR33dI/4Z8hNFHCZWTG6SVL0GNK X-Gm-Message-State: AOJu0YxPUZUax8IWBLJHLUs0SX1qbRxukcmbVY0jtg1xqPb3xrUgJVb4 5nfpe9f+gfoi3eQMt4g7WzlzekuvXK9kpxcCepS/bH5QQ1RIY9PkDXXx+SYi640dZlgJUIgi2v8 dxc2Cn/TrBnVQytDDgQ== X-Google-Smtp-Source: AGHT+IEzS+9pOebrRYAPHZcYp9+yLQ7vZUDEjcM1NuDRlvVAs7z0PGB+6rnD36HZsSFqff67vdpG7JIIt1VZPePZ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1006:b0:dcc:94b7:a7a3 with SMTP id w6-20020a056902100600b00dcc94b7a7a3mr767796ybt.12.1712014198123; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:41 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-3-jthoughton@google.com> Subject: [PATCH v3 2/7] KVM: Move MMU notifier function declarations From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton To allow new MMU-notifier-related functions to use gfn_to_hva_memslot(), move some declarations around. Also move mmu_notifier_to_kvm() for wider use later. Signed-off-by: James Houghton --- include/linux/kvm_host.h | 41 +++++++++++++++++++++------------------- virt/kvm/kvm_main.c | 5 ----- 2 files changed, 22 insertions(+), 24 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 48f31dcd318a..1800d03a06a9 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -257,25 +257,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER -union kvm_mmu_notifier_arg { - pte_t pte; - unsigned long attributes; -}; - -struct kvm_gfn_range { - struct kvm_memory_slot *slot; - gfn_t start; - gfn_t end; - union kvm_mmu_notifier_arg arg; - bool may_block; -}; -bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); -bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -#endif - enum { OUTSIDE_GUEST_MODE, IN_GUEST_MODE, @@ -2012,6 +1993,11 @@ extern const struct kvm_stats_header kvm_vcpu_stats_header; extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER +static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) +{ + return container_of(mn, struct kvm, mmu_notifier); +} + static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) { if (unlikely(kvm->mmu_invalidate_in_progress)) @@ -2089,6 +2075,23 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm, return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq; } + +union kvm_mmu_notifier_arg { + pte_t pte; + unsigned long attributes; +}; + +struct kvm_gfn_range { + struct kvm_memory_slot *slot; + gfn_t start; + gfn_t end; + union kvm_mmu_notifier_arg arg; + bool may_block; +}; +bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); #endif #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ca4b1ef9dfc2..d0545d88c802 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -534,11 +534,6 @@ void kvm_destroy_vcpus(struct kvm *kvm) EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER -static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) -{ - return container_of(mn, struct kvm, mmu_notifier); -} - typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); typedef void (*on_lock_fn_t)(struct kvm *kvm); From patchwork Mon Apr 1 23:29:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613100 Received: from mail-ua1-f74.google.com (mail-ua1-f74.google.com [209.85.222.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDFE857302 for ; Mon, 1 Apr 2024 23:29:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014201; cv=none; b=FWXYx0QyiS21b+2Q05Jzfn9Kyd3Ef81brlfvx5OFUBDWiSZQvL6XEq6M2+upnAdjvA6hqQzaMzJWpDSMv9375i1z+mzjg4j9++GFDUV2SQxG1nLluPSDQ1mIH1Zl8ANUDQ3zzTEai1JVjHsJyufalNDVzIiEh7aNLCbcjJYuaKg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014201; c=relaxed/simple; bh=lRPV7D03buZFkRc9Tfh1gwW8b2stqZAO6KHxYrUGYoU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lKSreQGj5oMb9JO5GIiAtbVxwEVYnaThP6eiNli8Lv5TfxaL14+Z9huWPAQLhC32Wi+nSMi2tJwWBw9pyrgLGLs4TEskDRqKakXPPqENucDpfyiuNnERHmh5O8Zkfp7H1BpdLrV7K+3zSflvJI75r3yn5ltgkORt1YpKa4OQ8mA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=o67IIE7c; arc=none smtp.client-ip=209.85.222.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="o67IIE7c" Received: by mail-ua1-f74.google.com with SMTP id a1e0cc1a2514c-7e331817bb4so1808139241.0 for ; Mon, 01 Apr 2024 16:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014199; x=1712618999; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PTb42yX5D5jCiyjo92iMcOe2nJRX5BH9KgDDQYg5TSM=; b=o67IIE7c39Tm+GT3sYvvjvRxe7p7/8GayRhbazGdACqkCtFfIYvcN9H1ox9Ie0e4fc 9OzbvBHIIKqQWH4dSvUYIq7ysmnyqwJzjfPrwJpgOfHyJBpF5Bf6/APqM11H3gsKiAQv ZylwKSkJWW23E2WLZwTCHZYk4t3dlP4KHwEZ6R2OLoObrXf5699OHxewMl4ShxCufTXz gDsJU8DE4OlOyS3UcUOWAUb4fRHXolY/0JEN1V8Z7gUl5/NZhyYoemTod5ciMdGs9nsu PHe2l0/DxDsdB6GMc0bSuOlIaPI0skW+FqFENmRbj6tOCFD0liq5zQKEzKss7rynI9pu eqhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014199; x=1712618999; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PTb42yX5D5jCiyjo92iMcOe2nJRX5BH9KgDDQYg5TSM=; b=xMd4G8N6hha7pYaRiqtxrBiEPhbiopvp8i5CcYovKiEwDc32/XWe1bPklfFJVUkqrN zLX8e8RhZTHzPwm+spZMiuE+/J2zqu5uZloa/CqZw6QMjqgCvNRWYKDS15L8Agn3uiUA dcUAzakI1OnEpULsdJNT/H1Kv4j4+xV6zISVAxsbNo3OXeXDqsj6yy+cQnOkzWBu5K46 jnOUBvAdVze/LcNs/o/RlGOU5JrEF1hRUpEZVmdxPLnCm7c4qvVlx4VV/Dg6Sl4D56Mk Fu3eYA2l0NxrVlNdhRwyI8mvY8VwKCDzv/TWVxlJl2FalxSarE5TxEBKN5klejKDjeAf lzVg== X-Forwarded-Encrypted: i=1; AJvYcCUERVkUF0oSuTbVgOpyXS45cUTBsh1xDwPkkS5ZKuf0uGs2WzhT+nEn6XwKU5PIQgOXS0gx+4EbVyVUjBgMIV/1ELs5 X-Gm-Message-State: AOJu0YxtSROjHOYg6typRgVop+d014KrqNrLUrHUc65sIJjfNE4tssC5 Y5zetUNRP4ecFZuk5uWt+S4vLVUjs0GPuhwuY+1vB4KlzrlZuUgARQEKMgHwfRrJFBd+0U2ihPg z1TQSn4LGxDgKl/7b+A== X-Google-Smtp-Source: AGHT+IGmJlmVbaC/lfrLJ5ZQ93VOyi/dOsu4Z25niMdu8RAczUBaed8myhkzYvDdfTTf9baVVIW2FHfRFR9HE1rY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:2d0d:b0:470:5ca5:3e95 with SMTP id ih13-20020a0561022d0d00b004705ca53e95mr1009542vsb.2.1712014198891; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:42 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-4-jthoughton@google.com> Subject: [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton Add kvm_arch_prepare_bitmap_age() for architectures to indiciate that they support bitmap-based aging in kvm_mmu_notifier_test_clear_young() and that they do not need KVM to grab the MMU lock for writing. This function allows architectures to do other locking or other preparatory work that it needs. If an architecture does not implement kvm_arch_prepare_bitmap_age() or is unable to do bitmap-based aging at runtime (and marks the bitmap as unreliable): 1. If a bitmap was provided, we inform the caller that the bitmap is unreliable (MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE). 2. If a bitmap was not provided, fall back to the old logic. Also add logic for architectures to easily use the provided bitmap if they are able. The expectation is that the architecture's implementation of kvm_gfn_test_age() will use kvm_gfn_record_young(), and kvm_gfn_age() will use kvm_gfn_should_age(). Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/kvm_host.h | 60 ++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++++++++----------- 2 files changed, 127 insertions(+), 25 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1800d03a06a9..5862fd7b5f9b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1992,6 +1992,26 @@ extern const struct _kvm_stats_desc kvm_vm_stats_desc[]; extern const struct kvm_stats_header kvm_vcpu_stats_header; extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; +/* + * Architectures that support using bitmaps for kvm_age_gfn() and + * kvm_test_age_gfn should return true for kvm_arch_prepare_bitmap_age() + * and do any work they need to prepare. The subsequent walk will not + * automatically grab the KVM MMU lock, so some architectures may opt + * to grab it. + * + * If true is returned, a subsequent call to kvm_arch_finish_bitmap_age() is + * guaranteed. + */ +#ifndef kvm_arch_prepare_bitmap_age +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) +{ + return false; +} +#endif +#ifndef kvm_arch_finish_bitmap_age +static inline void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) {} +#endif + #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) { @@ -2076,9 +2096,16 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm, return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq; } +struct test_clear_young_metadata { + unsigned long *bitmap; + unsigned long bitmap_offset_end; + unsigned long end; + bool unreliable; +}; union kvm_mmu_notifier_arg { pte_t pte; unsigned long attributes; + struct test_clear_young_metadata *metadata; }; struct kvm_gfn_range { @@ -2087,11 +2114,44 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); + +static inline void kvm_age_set_unreliable(struct kvm_gfn_range *range) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + args->unreliable = true; +} +static inline unsigned long kvm_young_bitmap_offset(struct kvm_gfn_range *range, + gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + return hva_to_gfn_memslot(args->end - 1, range->slot) - gfn; +} +static inline void kvm_gfn_record_young(struct kvm_gfn_range *range, gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + WARN_ON_ONCE(gfn < range->start || gfn >= range->end); + if (args->bitmap) + __set_bit(kvm_young_bitmap_offset(range, gfn), args->bitmap); +} +static inline bool kvm_gfn_should_age(struct kvm_gfn_range *range, gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + WARN_ON_ONCE(gfn < range->start || gfn >= range->end); + if (args->bitmap) + return test_bit(kvm_young_bitmap_offset(range, gfn), + args->bitmap); + return true; +} #endif #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d0545d88c802..7d80321e2ece 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -550,6 +550,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -598,6 +599,8 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, struct kvm_memslots *slots; int i, idx; + BUILD_BUG_ON(sizeof(gfn_range.arg) != sizeof(gfn_range.arg.pte)); + if (WARN_ON_ONCE(range->end <= range->start)) return r; @@ -637,15 +640,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.lockless = range->lockless; if (!r.found_memslot) { r.found_memslot = true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - break; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + break; + } } r.ret |= range->handler(kvm, &gfn_range); } @@ -654,7 +660,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, if (range->flush_on_ret && r.ret) kvm_flush_remote_tlbs(kvm); - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); @@ -682,19 +688,24 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, return __kvm_handle_hva_range(kvm, &range).ret; } -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) +static __always_inline int kvm_handle_hva_range_no_flush( + struct mmu_notifier *mn, + unsigned long start, + unsigned long end, + gfn_handler_t handler, + union kvm_mmu_notifier_arg arg, + bool lockless) { struct kvm *kvm = mmu_notifier_to_kvm(mn); const struct kvm_mmu_notifier_range range = { .start = start, .end = end, .handler = handler, + .arg = arg, .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .lockless = lockless, }; return __kvm_handle_hva_range(kvm, &range).ret; @@ -909,15 +920,36 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, kvm_age_gfn); } -static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long start, - unsigned long end, - unsigned long *bitmap) +static int kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap, + bool clear) { - trace_kvm_age_hva(start, end); + if (kvm_arch_prepare_bitmap_age(mn)) { + struct test_clear_young_metadata args = { + .bitmap = bitmap, + .end = end, + .unreliable = false, + }; + union kvm_mmu_notifier_arg arg = { + .metadata = &args + }; + bool young; + + young = kvm_handle_hva_range_no_flush( + mn, start, end, + clear ? kvm_age_gfn : kvm_test_age_gfn, + arg, true); + + kvm_arch_finish_bitmap_age(mn); - /* We don't support bitmaps. Don't test or clear anything. */ + if (!args.unreliable) + return young ? MMU_NOTIFIER_YOUNG_FAST : 0; + } + + /* A bitmap was passed but the architecture doesn't support bitmaps */ if (bitmap) return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; @@ -934,7 +966,21 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range_no_flush( + mn, start, end, clear ? kvm_age_gfn : kvm_test_age_gfn, + KVM_MMU_NOTIFIER_NO_ARG, false); +} + +static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + trace_kvm_age_hva(start, end); + + return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap, + true); } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -945,12 +991,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, { trace_kvm_test_age_hva(start, end); - /* We don't support bitmaps. Don't test or clear anything. */ - if (bitmap) - return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; - - return kvm_handle_hva_range_no_flush(mn, start, end, - kvm_test_age_gfn); + return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap, + false); } static void kvm_mmu_notifier_release(struct mmu_notifier *mn, From patchwork Mon Apr 1 23:29:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613101 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C33F657329 for ; Mon, 1 Apr 2024 23:30:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014202; cv=none; b=AH3AP5KH7hJOvqwGmUqDZh6tEXEL0dRVVAEVlZhPOUzVIg/z2Vhw6vhDeVBnjxxaXXjR94XOJoHFcIFsDygxiH9XABBPTSi3hxVRkx9NIVrp8WO3BVXDAkth9Fd0plQTl7vQI0IDRXqzM6M9FP4n6FBhU2n0MnlE15WB8dp+XbI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014202; c=relaxed/simple; bh=6SjX+fdRUwbegUJJFXD27Uh0qk0JttGkcP5G/eXp4z4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=RoWAX3pv9NZHBjDD6eQjk87P5D3Y0ayt8eU6YJsf68+d1b7SLEUW79TxLzfdrAq1DQzMA6zt6rNh6IizdUSdcsStlRTCAUny2pZeSjTj/LAlsSmgdvCBddVZ/xMFN1bUhAwMCpbBXk6EfwJF+IIIqjPi7N8ZspQQ3trozEYDRJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3RrC8O5Y; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3RrC8O5Y" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dcc05887ee9so5314180276.1 for ; Mon, 01 Apr 2024 16:30:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014199; x=1712618999; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nylYGqt19cj6oDr0RFoeiQ+kw/0yLdSX5tVYYbKq8QU=; b=3RrC8O5YwbKvUemug4OQ6BwnCCSIlGpQjo/IbSt4j3lHKJaWyOSkoFmvuELcQNbl9+ 4oUsPjkhwNGIvhQ7oA5e0XQHrvJqHbx1DYKsOfWivExSBBy34RzmmALdGVr9l8YEAlkv UXVLhE9ZHUuPiwQ0fSp1HxBnJDhIV6Q+Yf56px8l/dkNW3EW7o0W0xZfnoRydrFK75av 52m7rFdCV4W2EULzM+EMXAjHehEI6Mcrozl8xBu8k+sD9J6q7YLFjfrCuogD8EIVuz9k ySkSTrUtvCohB1QqUr8m62zdXvgSR4oE2ugBCWxjvridH9KjmmeAiLOZF8j3dZoHovr2 +7bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014199; x=1712618999; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nylYGqt19cj6oDr0RFoeiQ+kw/0yLdSX5tVYYbKq8QU=; b=EPLcPe7gCxUjpcL7FWaOTgoN4WpA4qSmjv4NwoO7gifANExvsq0z+XMp0wTfkHO3U2 f3ByNEjFYvLEolGEJpxz7MrEIOYxti44nYjFJ6lqesRST1u21mW4lJqGqJ9eV/+xvE89 Gn0ryl6uQL5UW+vMwxl59AibGQ4ljGLZWzHFQP8kCfQH7dd8ieNlWIJqi2BVkw/WeV+x sgpimej0NuY8IWX21rxqjtTYwJwgj9AoDhcQo6MeRHiQ3xICvO1NGbd+NdHPhYNCpF87 LKoNPV8m+X3VhFCYsGcKwjx1aCKOTbC7iXg4kAE906nzWCVLmVrXvt1bqn1OcNjBMt0f ObsA== X-Forwarded-Encrypted: i=1; AJvYcCWJ7Qzvi9T/CWTjnhw1SqGxmIa5W8nIk9r/2x6ronI4JmM8y92dP0gtY87daBXnpaCCuMZkbC9ZsY5JkRzG6BLES6bd X-Gm-Message-State: AOJu0YxZoW0aVR1j9pI5vryWGSzxdMtRhp6ifuSw+C8EudxwwC3HfNeZ qeq3423obAP5vfaxqx2iMFoZMFxM6hkpgJjvJnpeueSV7aXu6wa6iU+NQ33hmzeIW6Z2bnUpLuZ ZuqRamp3lh5la0n8EvQ== X-Google-Smtp-Source: AGHT+IH11gFjd5KLSe+p+dgHN5WTYlvUkDoQzbaal5VNFeM0iKdCiBmTQIhFAj+2vV8eggKrfvh7CUqHaFUquRNc X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:2182:b0:dcd:2f3e:4d18 with SMTP id dl2-20020a056902218200b00dcd2f3e4d18mr765892ybb.12.1712014199648; Mon, 01 Apr 2024 16:29:59 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:43 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-5-jthoughton@google.com> Subject: [PATCH v3 4/7] KVM: x86: Move tdp_mmu_enabled and shadow_accessed_mask From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton From: Yu Zhao tdp_mmu_enabled and shadow_accessed_mask are needed to implement kvm_arch_prepare_bitmap_age(). Signed-off-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/mmu.h | 6 ------ arch/x86/kvm/mmu/spte.h | 1 - 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 16e07a2eee19..3b58e2306621 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1847,6 +1847,7 @@ struct kvm_arch_async_pf { extern u32 __read_mostly kvm_nr_uret_msrs; extern u64 __read_mostly host_efer; +extern u64 __read_mostly shadow_accessed_mask; extern bool __read_mostly allow_smaller_maxphyaddr; extern bool __read_mostly enable_apicv; extern struct kvm_x86_ops kvm_x86_ops; @@ -1952,6 +1953,11 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, bool mask); extern bool tdp_enabled; +#ifdef CONFIG_X86_64 +extern bool tdp_mmu_enabled; +#else +#define tdp_mmu_enabled false +#endif u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 60f21bb4c27b..8ae279035900 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -270,12 +270,6 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm) return smp_load_acquire(&kvm->arch.shadow_root_allocated); } -#ifdef CONFIG_X86_64 -extern bool tdp_mmu_enabled; -#else -#define tdp_mmu_enabled false -#endif - static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index a129951c9a88..f791fe045c7d 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -154,7 +154,6 @@ extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_user_mask; -extern u64 __read_mostly shadow_accessed_mask; extern u64 __read_mostly shadow_dirty_mask; extern u64 __read_mostly shadow_mmio_value; extern u64 __read_mostly shadow_mmio_mask; From patchwork Mon Apr 1 23:29:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613102 Received: from mail-ua1-f74.google.com (mail-ua1-f74.google.com [209.85.222.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFC3358138 for ; Mon, 1 Apr 2024 23:30:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014204; cv=none; b=FLF2//MK1pcytG98QwDu1WGM6h/4kjNPHH952o3s9uDp6HUUt8RW/JWJxFuX8S6u1ARP1uJnyH7rQajnQySp/PSkF0E1ODf8sAc4mkhtz4nwCeksyHkeaQrBmd1MijnPSNX5ESQibXp6Ouia3igbBcBAgHURHm+pBEYnn11oOLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014204; c=relaxed/simple; bh=87QQcxOMBIslu//WSbbGVjdKhxGANqTulxkoL+oJYVo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=EAT0dVq+1AmBfrJcgiikU8bZ8HG5YJiP6K+N8GSWI24KL5Mz1GJA38dOM+a6vluuxMP9GET3SmGckgkWWFfi2kfbDtYkkSa1+evlb1/+1WoxROKF0vtTTvH3OiOi7jfoIg5dgDa3VNGwCnsbIbDs6JwLUZjBvijXuPZDPz7gLus= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tVYpo58p; arc=none smtp.client-ip=209.85.222.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tVYpo58p" Received: by mail-ua1-f74.google.com with SMTP id a1e0cc1a2514c-7d48198d021so1223024241.1 for ; Mon, 01 Apr 2024 16:30:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014201; x=1712619001; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PXGb1c3s19z+8U24R5nbxlFRwqU1gq1c8Sy/WKQX1Zo=; b=tVYpo58ph/mGN2tzlhUjbUuKKa6493hDaeAtfnvWMLcMqvl7KmDNYxGDkkwG121M0S DG1WQ/iHYJm6JmIaLhakVh5PVlKY7VBUAWVG58ltcS6/nK5cE4LNYkJTmvIb9/SIccKR 3hG6qVAFIeu16zQqAzIpHYS8jKeYL+d4l70iCGQ39+26GeYcCc0Ra0pPHsTbBHKHn8Y4 uzg6km64ogbA8kzaws+2vqOvsr1lNS6faeSRSZr4oBvTkiThHRdagk55GJAyrPOWS5Ds b2AwGMvBAcBPadXevfE+7OhDsedUzKDtTt2S9yP/capsiW3CFjFb7Fm72ZfwDGD/Dzo0 /6sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014201; x=1712619001; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PXGb1c3s19z+8U24R5nbxlFRwqU1gq1c8Sy/WKQX1Zo=; b=qNv0Wo/E1fcoPEokOR6IfSFBClDSXuCzjB7HTCOzcDJJVnZJk6JcfMn4n+E/damuRd Ty239YVGVu5y+OKsD2fp6YekeBFoutvP8NZgwrXONgcXaISRjdvgtVEP9wJHIMRtgyij th5/iu0M1mnKwWln0F29Jda0mipB/cQ9H/fw2tEk+QBzRXkuIrffQ6WegiQXxf8HikNr oW95U+Bigj1yiNRsCUJybSoKJo/0+n0udOurjAmNs2GT+vdjKuxgqYLTS0Q4kj0Z81z4 6szl+0susqVvxmOG/zwNETve3qBLkgymIW59sVXDIIzoSAVd2MdiX7rukKkHxJWNAOi6 Y6Bg== X-Forwarded-Encrypted: i=1; AJvYcCUhJ3ITvklgoJtbl7IMT3b7P254lxGH9mQbgBvk+WApYlRf5GAst3BhU495CLFq2qboeBWZKBm0CbUoazq5j+amCPCv X-Gm-Message-State: AOJu0YxvZSXfDBAyfg/98uS6ndnE4J3NGzvR9eDG46Q1W+CEC9yrQA4b 6KdS2NIXqMTi4mbVRPgTVVKzwA4mwaY4bAvgbJrGqJA/+A6KkrRe1yKbGOUcRU2UD9Yvt6Kuedt nDKbErle9Fho1Cmj1aQ== X-Google-Smtp-Source: AGHT+IHiAeh9QdQ8JIxLLHU+30Xn5sQK+tQdEgkzewt+NNIBZBtyidDAUXciX6U3q2UCp3x2qiBYdkguFZKdqu9L X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:3792:0:b0:7e3:297b:9df1 with SMTP id d18-20020ab03792000000b007e3297b9df1mr67902uav.1.1712014200934; Mon, 01 Apr 2024 16:30:00 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:44 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-6-jthoughton@google.com> Subject: [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton Only handle the TDP MMU case for now. In other cases, if a bitmap was not provided, fallback to the slowpath that takes mmu_lock, or, if a bitmap was provided, inform the caller that the bitmap is unreliable. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++ arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++-- arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++++++- 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3b58e2306621..c30918d0887e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2324,4 +2324,18 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); */ #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) +#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) +{ + /* + * Indicate that we support bitmap-based aging when using the TDP MMU + * and the accessed bit is available in the TDP page tables. + * + * We have no other preparatory work to do here, so we do not need to + * redefine kvm_arch_finish_bitmap_age(). + */ + return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled + && shadow_accessed_mask; +} + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 992e651540e8..fae1a75750bb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1674,8 +1674,14 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + if (range->lockless) { + kvm_age_set_unreliable(range); + return false; + } + young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1687,8 +1693,14 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + if (range->lockless) { + kvm_age_set_unreliable(range); + return false; + } + young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d078157e62aa..edea01bc145f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1217,6 +1217,9 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, if (!is_accessed_spte(iter->old_spte)) return false; + if (!kvm_gfn_should_age(range, iter->gfn)) + return false; + if (spte_ad_enabled(iter->old_spte)) { iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep, iter->old_spte, @@ -1250,7 +1253,12 @@ bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, struct kvm_gfn_range *range) { - return is_accessed_spte(iter->old_spte); + bool young = is_accessed_spte(iter->old_spte); + + if (young) + kvm_gfn_record_young(range, iter->gfn); + + return young; } bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) From patchwork Mon Apr 1 23:29:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613103 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D531458AA3 for ; Mon, 1 Apr 2024 23:30:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014205; cv=none; b=BhU8g4NuYil29SOxEUz8Tt3inzMYCTMp0pcGVDDTpiwjJr/qXF6ZpMf5tVFInqkC0a74B+3COh3OhWtevihK9K20pkk5YqwRvue0Dw3HgR3OG/IUrezSlUgiKTfancrOrnSwt8GvWWC7K0ycFrfwlnLktH8ofHb2tSuLr9vJkVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014205; c=relaxed/simple; bh=vEJ66PSQneaCo/JOqhHafRHhgsXUJtqqHHPI2pDYAj0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PFkuLeeC89uJUFevSrAtio+1ZjA16XA7Z9X8+cFa1bJJy7vRacPkbkTr1xbDempoc5SLlxcdJNgP1MjgFSVD5ERqStfLFvKqkKWaUjM8d6cjjCw5VSumzmsYIqHeaGZNf4HAqcgp/YaTnG8Vx3f8Dg4rGnH8/25hzqDa7KjaiA4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gT/v18yc; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gT/v18yc" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-610b96c8ca2so82819327b3.2 for ; Mon, 01 Apr 2024 16:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014202; x=1712619002; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CifQvULm5VcvU50LwCct+JTf2gCEq9urL1yip38iv3E=; b=gT/v18yc0/G3yrKqTf6Tq0jnhkRDsYQItXcJhCrlKkx4n0njPMoy2t28kuqCRv+eX9 3DSar7+3kbfmM9WdnQUhet8ARNqNAFvN475zRiIBdXJRUi37UofOHOiMd6yFhZblWUyK 1bQA7Vz9t5Kv3FfR0LEis6qBOVZfuz88PPSA1FTJL6ROaAGHs5BQ9q/1/tRxPbsWqtLo dPiFSNDcv3r9QJ1VDYDEPx/odHHIVPNVWa5fEThQT1IlBqVnKUQb0tzEblEBfcdU2YJU q8w6H3++aeE/FHp2TWNCuI79FemvNWVft9M9baXg925Sfc/yrYWXaw8f89QmF59TWxUk /5lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014202; x=1712619002; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CifQvULm5VcvU50LwCct+JTf2gCEq9urL1yip38iv3E=; b=bOKKdSaPQy87fHVIGUVEkf1Zhazv8ICDQVCK00kzHR4R5SCv6NBekdyOXE6jOcMJVL pOFhvkSFb2jUIz3sPHCE+Hr4Zq2SzCtsN9x18ozoqsYyLR15ZhaSmrwKmp4mlYmWs3th jiWak+j7E8mwohG9Sb1GgxSX9s9Ip5+TIaFW5vVeQ5KoPJ4/3LwT/jXNiITY8r/b4pRA m0C1CrtAjojqXCQWmPML1zPyl/kvvd+JKwS8UNPRlLFfeQgDPtxhXMfX//dLUeeOnTxe 42K+smaxrzA5eo7orL1IjtAzI1pAvG+oqMTN7i83VoWuz53qZQrTMBC3YiTo3USh/2KX 0JIg== X-Forwarded-Encrypted: i=1; AJvYcCWGPEaAUoth1VXlSSuUxJFCWvmP9WNB26MMx926XH7ztyL6eri5URKwDbkqBFvFfefWc72MkPahhRcgPpfSTBowVfvT X-Gm-Message-State: AOJu0YxqztLdIdc3zVcpd8k2goHC56MfaZoPTB0qdxZqVLAU2WLEaCh0 tcvgjXXQ4DyjBtsqQD7m7l/BO99lVFYM/yzBZmEP4spBvbjeT5zYZBCRzCJePQMzz7YlCXqnS/j eHeMUvnIbyw6EvNShcA== X-Google-Smtp-Source: AGHT+IGQAk6u/Sk8y21e9rySBKwdkEDK33GoLWkr+tAsmcolw1O4jbXBJxOpzuSZWzDHw9bmaDUMZf63oZp0WmR7 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:650e:b0:615:165b:8dde with SMTP id hw14-20020a05690c650e00b00615165b8ddemr268650ywb.10.1712014201787; Mon, 01 Apr 2024 16:30:01 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:45 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-7-jthoughton@google.com> Subject: [PATCH v3 6/7] KVM: arm64: Participate in bitmap-based PTE aging From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton Participate in bitmap-based aging while grabbing the KVM MMU lock for reading. Ideally we wouldn't need to grab this lock at all, but that would require a more intrustive and risky change. Also pass KVM_PGTABLE_WALK_SHARED, as this software walker is safe to run in parallel with other walkers. It is safe only to grab the KVM MMU lock for reading as the kvm_pgtable is destroyed while holding the lock for writing, and freeing of the page table pages is either done while holding the MMU lock for writing or after an RCU grace period. When mkold == false, record the young pages in the passed-in bitmap. When mkold == true, only age the pages that need aging according to the passed-in bitmap. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/arm64/include/asm/kvm_host.h | 5 +++++ arch/arm64/include/asm/kvm_pgtable.h | 4 +++- arch/arm64/kvm/hyp/pgtable.c | 21 ++++++++++++++------- arch/arm64/kvm/mmu.c | 23 +++++++++++++++++++++-- 4 files changed, 43 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 9e8a496fb284..e503553cb356 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1331,4 +1331,9 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu); (get_idreg_field((kvm), id, fld) >= expand_field_sign(id, fld, min) && \ get_idreg_field((kvm), id, fld) <= expand_field_sign(id, fld, max)) +#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age +bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn); +#define kvm_arch_finish_bitmap_age kvm_arch_finish_bitmap_age +void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn); + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 19278dfe7978..1976b4e26188 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -644,6 +644,7 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr); * @addr: Intermediate physical address to identify the page-table entry. * @size: Size of the address range to visit. * @mkold: True if the access flag should be cleared. + * @range: The kvm_gfn_range that is being used for the memslot walker. * * The offset of @addr within a page is ignored. * @@ -657,7 +658,8 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr); * Return: True if any of the visited PTEs had the access flag set. */ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, - u64 size, bool mkold); + u64 size, bool mkold, + struct kvm_gfn_range *range); /** * kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 3fae5830f8d2..e881d3595aca 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1281,6 +1281,7 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr) } struct stage2_age_data { + struct kvm_gfn_range *range; bool mkold; bool young; }; @@ -1290,20 +1291,24 @@ static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx, { kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF; struct stage2_age_data *data = ctx->arg; + gfn_t gfn = ctx->addr / PAGE_SIZE; if (!kvm_pte_valid(ctx->old) || new == ctx->old) return 0; data->young = true; + /* - * stage2_age_walker() is always called while holding the MMU lock for - * write, so this will always succeed. Nonetheless, this deliberately - * follows the race detection pattern of the other stage-2 walkers in - * case the locking mechanics of the MMU notifiers is ever changed. + * stage2_age_walker() may not be holding the MMU lock for write, so + * follow the race detection pattern of the other stage-2 walkers. */ - if (data->mkold && !stage2_try_set_pte(ctx, new)) - return -EAGAIN; + if (data->mkold) { + if (kvm_gfn_should_age(data->range, gfn) && + !stage2_try_set_pte(ctx, new)) + return -EAGAIN; + } else + kvm_gfn_record_young(data->range, gfn); /* * "But where's the TLBI?!", you scream. @@ -1315,10 +1320,12 @@ static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx, } bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, - u64 size, bool mkold) + u64 size, bool mkold, + struct kvm_gfn_range *range) { struct stage2_age_data data = { .mkold = mkold, + .range = range, }; struct kvm_pgtable_walker walker = { .cb = stage2_age_walker, diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 18680771cdb0..104cc23e9bb3 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1802,6 +1802,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return false; } +bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + + /* + * We need to hold the MMU lock for reading to prevent page tables + * from being freed underneath us. + */ + read_lock(&kvm->mmu_lock); + return true; +} + +void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + + read_unlock(&kvm->mmu_lock); +} + bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { u64 size = (range->end - range->start) << PAGE_SHIFT; @@ -1811,7 +1830,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, - size, true); + size, true, range); } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) @@ -1823,7 +1842,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, - size, false); + size, false, range); } phys_addr_t kvm_mmu_get_httbr(void) From patchwork Mon Apr 1 23:29:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613104 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9048E58AC1 for ; Mon, 1 Apr 2024 23:30:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014206; cv=none; b=pfJ12WHOAef+fn37TB3dzO7Sq3h67B8/WL6zysk+Klia9zQfMp4IXjAM5zRB/BmA3bxZdhKTaFX7R0g4+XCNF0/cXdyEAV06BNmauXWxqP/PblO+Qt1D9KqzRhYGteKk57HK7HiYg2Rf1QCqJ60lC5faEBEz1kAX5K9IFbJHH80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712014206; c=relaxed/simple; bh=WmcsQL1jAwPB2bQpwPfReVNE8AqM+O4XaG6OyvFYBYI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Mj2MG6PeQzzoxursWXwTgzEs1dT9OcLLQwF590GwzjP++xlm/U4Bnx7ZRKazJmh4s1AZ2+KreH3P5xkvwPVsjQnITI5CCe7mYuNqB4zBPxPmaUS/eNFAUwH7068MT/QRpDSpGaun3319f7pbtx4ARP+04gNtAF+7KFuSaK0rU1M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LtYLFYhF; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LtYLFYhF" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dcc0bcf9256so5418003276.3 for ; Mon, 01 Apr 2024 16:30:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014202; x=1712619002; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WpAS/uks+xeMlZlwJrS/cYcvCVm3xLsD9iJKs+t5VOw=; b=LtYLFYhFlgEdTD9SON2aVF74U88Svqr5IDehFiQ1CE8pJNPpbBR2bS8mST2vJYBexN vgCcPTdzbkGkWaMm9OKRQdm6tsFHwLBpPYtMxeXuOXE8pN5SYuUhPtqPD+F6zkmsNeky UHJgAdjBHpH1zdfDujmtDGhcocBg5YLuRb54UbOlYPggWoI+aeWiPy96qe1fAeuMV3F0 r30K0kysAthcgfzArrdZh1NDDhPQa/ZOJrJ0ETQY24NuAoorJircpQYLR+BC2GY+zQV5 Iin6VrAQZd+Tk+i9wDgEns1zHb/aLkL4wwOwmjffBvzeWOnvV+WcU1XLl/dCqa3ZPgOL gmjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014202; x=1712619002; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WpAS/uks+xeMlZlwJrS/cYcvCVm3xLsD9iJKs+t5VOw=; b=PHP8Pg0CsFTDef2qFcyY4+ETdegXl9c7tc/O3aDPnzXwK2hfUDhuPBbxrpoVnWLc92 kxzCN6wZ4dEVnZZe/T5orHsJ8WwsIBZHqtJ3H2lPUHeXj3UmvPVDhy+wkIEV6HQvjdx5 CyRyYIzk5ZgMdyJGojfCZTqEj5CdalxzvGRTvdkqvAOwiJq3yRQhZ40uUTD1gRWIrrMw ANRe/cIQqCjRoUHM/PL+j7xgO2ZTVkTfWBJgZQPRKyo4cJLz4/3clzlyKe284kvowe7L oFfq09/yPKleFQvIfwHmrFKouGc7t/A4cRSM47Eg081hk1EEVSwwt1Q7Mq8IwCbS2wU5 w7LA== X-Forwarded-Encrypted: i=1; AJvYcCUryk/sFmpu2UeFgcZiPr8TdYmnLVZ9+lOK3J0LCB4oXoLSGg1boQZnUrs26CphQwNWbAxw7oUjBu2SBm9v2qZl2PKY X-Gm-Message-State: AOJu0Yx2vUXh4foq44xFrW67109NLet2jlizdYtSZHuelu+RxgmqgJDI 9cTFSNNyKvx+W1mA3SZlNxcqiekgcALn6moqT21pI0R5h2e1LNW6r110Yo+u4YRbiEFyVd6Q6Z4 dZivz0caXQWcmxbyh7w== X-Google-Smtp-Source: AGHT+IFDD8pqYO4xCu07bbvePXFmo0Pl8SwWr49Tzn/Rs0hc66IZhVLSZQfRP6lopGz4i0uv+3xgxtfkPILQecVq X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:2202:b0:dcd:59a5:7545 with SMTP id dm2-20020a056902220200b00dcd59a57545mr764461ybb.10.1712014202562; Mon, 01 Apr 2024 16:30:02 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:46 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-8-jthoughton@google.com> Subject: [PATCH v3 7/7] mm: multi-gen LRU: use mmu_notifier_test_clear_young() From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton From: Yu Zhao Use mmu_notifier_{test,clear}_young_bitmap() to handle KVM PTEs in batches when the fast path is supported. This reduces the contention on kvm->mmu_lock when the host is under heavy memory pressure. An existing selftest can quickly demonstrate the effectiveness of this patch. On a generic workstation equipped with 128 CPUs and 256GB DRAM: $ sudo max_guest_memory_test -c 64 -m 250 -s 250 MGLRU run2 ------------------ Before [1] ~64s After ~51s kswapd (MGLRU before) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.71% evict_folios 97.29% shrink_folio_list ==>> 13.05% folio_referenced 12.83% rmap_walk_file 12.31% folio_referenced_one 7.90% __mmu_notifier_clear_young 7.72% kvm_mmu_notifier_clear_young 7.34% _raw_write_lock kswapd (MGLRU after) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.59% evict_folios 80.37% shrink_folio_list ==>> 3.74% folio_referenced 3.59% rmap_walk_file 3.19% folio_referenced_one 2.53% lru_gen_look_around 1.06% __mmu_notifier_test_clear_young [1] "mm: rmap: Don't flush TLB after checking PTE young for page reference" was included so that the comparison is apples to apples. https://lore.kernel.org/r/20220706112041.3831-1-21cnbao@gmail.com/ Signed-off-by: Yu Zhao Signed-off-by: James Houghton --- Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 9 +- mm/vmscan.c | 183 ++++++++++++++---- 4 files changed, 159 insertions(+), 45 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst index 33e068830497..0ae2a6d4d94c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Clearing the accessed bit in KVM page table entries in large + batches, when KVM MMU sets it (e.g., on x86_64). This can + improve the performance of guests when the host is under memory + pressure. [yYnN] Apply to all the components above. ====== =============================================================== @@ -56,7 +60,7 @@ E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c11b7cde81ef..a98de5106990 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -397,6 +397,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_KVM_MMU_WALK, NR_LRU_GEN_CAPS }; @@ -554,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -573,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index 56b313aa2ebf..41e9fc25684e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -871,13 +871,10 @@ static bool folio_referenced_one(struct folio *folio, continue; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 293120fe54f3..fd65f3466dfc 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -2596,6 +2597,11 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } +static bool should_walk_kvm_mmu(void) +{ + return get_cap(LRU_GEN_KVM_MMU_WALK); +} + /****************************************************************************** * shorthand helpers ******************************************************************************/ @@ -3293,7 +3299,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3308,10 +3315,15 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3326,6 +3338,10 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3334,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3352,6 +3364,52 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, return folio; } +static bool test_spte_young(struct mm_struct *mm, unsigned long addr, unsigned long end, + unsigned long *bitmap, unsigned long *last) +{ + if (*last > addr) + goto done; + + *last = end - addr > MIN_LRU_BATCH * PAGE_SIZE ? + addr + MIN_LRU_BATCH * PAGE_SIZE - 1 : end - 1; + bitmap_zero(bitmap, MIN_LRU_BATCH); + + mmu_notifier_test_young_bitmap(mm, addr, *last + 1, bitmap); +done: + return test_bit((*last - addr) / PAGE_SIZE, bitmap); +} + +static void clear_spte_young(struct mm_struct *mm, unsigned long addr, + unsigned long *bitmap, unsigned long *last) +{ + int i; + unsigned long start, end = *last + 1; + + if (addr + PAGE_SIZE != end) + return; + + i = find_last_bit(bitmap, MIN_LRU_BATCH); + if (i == MIN_LRU_BATCH) + return; + + start = end - (i + 1) * PAGE_SIZE; + + i = find_first_bit(bitmap, MIN_LRU_BATCH); + + end -= i * PAGE_SIZE; + + mmu_notifier_clear_young_bitmap(mm, start, end, bitmap); +} + +static void skip_spte_young(struct mm_struct *mm, unsigned long addr, + unsigned long *bitmap, unsigned long *last) +{ + if (*last > addr) + __clear_bit((*last - addr) / PAGE_SIZE, bitmap); + + clear_spte_young(mm, addr, bitmap, last); +} + static bool suitable_to_scan(int total, int young) { int n = clamp_t(int, cache_line_size() / sizeof(pte_t), 2, 8); @@ -3367,6 +3425,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, pte_t *pte; spinlock_t *ptl; unsigned long addr; + DECLARE_BITMAP(bitmap, MIN_LRU_BATCH); + unsigned long last = 0; int total = 0; int young = 0; struct lru_gen_mm_walk *walk = args->private; @@ -3386,6 +3446,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, arch_enter_lazy_mmu_mode(); restart: for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) { + bool ret; unsigned long pfn; struct folio *folio; pte_t ptent = ptep_get(pte + i); @@ -3393,21 +3454,28 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); - if (pfn == -1) + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); + if (pfn == -1) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!pte_young(ptent)) { + ret = test_spte_young(args->vma->vm_mm, addr, end, bitmap, &last); + if (!ret && !pte_young(ptent)) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); walk->mm_stats[MM_LEAF_OLD]++; continue; } folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); - if (!folio) + if (!folio) { + skip_spte_young(args->vma->vm_mm, addr, bitmap, &last); continue; + } - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + clear_spte_young(args->vma->vm_mm, addr, bitmap, &last); + if (pte_young(ptent)) + ptep_test_and_clear_young(args->vma, addr, pte + i); young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,22 +3541,24 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (should_clear_pmd_young() && !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) { + walk->mm_stats[MM_LEAF_OLD]++; goto next; + } walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,19 +3615,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; + if (pfn == -1) continue; - } - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + if (!pmd_young(val) && !mm_has_notifiers(args->mm)) { + walk->mm_stats[MM_LEAF_OLD]++; continue; + } walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; @@ -3565,7 +3634,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (should_clear_pmd_young() && !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -3646,6 +3715,9 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk) struct lruvec *lruvec = walk->lruvec; struct mem_cgroup *memcg = lruvec_memcg(lruvec); + if (!should_walk_kvm_mmu() && mm_has_notifiers(mm)) + return; + walk->next_addr = FIRST_USER_ADDRESS; do { @@ -4011,6 +4083,23 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * rmap/PT walk feedback ******************************************************************************/ +static bool should_look_around(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int *young) +{ + int ret = mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + + if (pte_young(ptep_get(pte))) { + ptep_test_and_clear_young(vma, addr, pte); + *young = true; + return true; + } + + if (ret) + *young = true; + + return ret & MMU_NOTIFIER_YOUNG_FAST; +} + /* * This function exploits spatial locality when shrink_folio_list() walks the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If @@ -4018,12 +4107,14 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; + DECLARE_BITMAP(bitmap, MIN_LRU_BATCH); + unsigned long last = 0; int young = 0; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; @@ -4040,12 +4131,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!should_look_around(pvmw->vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return young; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4053,6 +4147,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4066,29 +4163,38 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; arch_enter_lazy_mmu_mode(); pte -= (addr - start) / PAGE_SIZE; for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) { + bool ret; unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); - if (pfn == -1) + pfn = get_pte_pfn(ptent, vma, addr, pgdat); + if (pfn == -1) { + skip_spte_young(vma->vm_mm, addr, bitmap, &last); continue; + } - if (!pte_young(ptent)) + ret = test_spte_young(pvmw->vma->vm_mm, addr, end, bitmap, &last); + if (!ret && !pte_young(ptent)) { + skip_spte_young(pvmw->vma->vm_mm, addr, bitmap, &last); continue; + } folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); - if (!folio) + if (!folio) { + skip_spte_young(vma->vm_mm, addr, bitmap, &last); continue; + } - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + clear_spte_young(vma->vm_mm, addr, bitmap, &last); + if (pte_young(ptent)) + ptep_test_and_clear_young(vma, addr, pte + i); young++; @@ -4118,6 +4224,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return young; } /****************************************************************************** @@ -5154,6 +5262,9 @@ static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |= BIT(LRU_GEN_NONLEAF_YOUNG); + if (should_walk_kvm_mmu()) + caps |= BIT(LRU_GEN_KVM_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); }