From patchwork Wed Feb 26 12:01:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50E3FC021B8 for ; Wed, 26 Feb 2025 12:01:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77660280021; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FEAD280020; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4718728001D; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0DD3A28001B for ; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B035FA34B2 for ; Wed, 26 Feb 2025 12:01:49 +0000 (UTC) X-FDA: 83161956738.20.AAB8DF9 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf30.hostedemail.com (Postfix) with ESMTP id 112DE8003F for ; Wed, 26 Feb 2025 12:01:46 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571307; a=rsa-sha256; cv=none; b=cgTaiJZyg/R7H9gqtweRlfwLvnsephjpFnRSGTMv6FK7OiPqJ8emtTRhi4aPEcXRPzTtYT 98eKGsB6FmlOHq2vdsRsPwuJxr6tYJJVO3+8WSyisSR4OG2GKsSNbYjDBK+ta3ulYCJKef InApmzfSAwHtX2UZSaWnpL69SlfPxw0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=w8T46xcmGrZcaIInJnnxcZ6eX1tQ3DR9dYMEp+X+GhE=; b=3f95eCITjfzZ/u068u7wqjVF1+7Dhnv/gTcaGjwMGEQYsxW9mgiMnYF8Ao6dVGQ5K5W1U6 J1uLNFDfKhJ441kit83y7nZxWldvFQ2KcKv8VuPl8a703+k5XkVE67ht23Z4Liwuxt9dqg 68/KSo91+a/NbV4f2JqbRbY2wjGkpkQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3e1ff7000001d7ae-ea-67bf02a672e4 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 08/25] mm: introduce luf_batch to be used as hash table to store luf meta data Date: Wed, 26 Feb 2025 21:01:15 +0900 Message-Id: <20250226120132.28469-8-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBrc/aVrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV0bz2UVsBVsMKj6v8W5gPKTexcjJ ISFgInH0exMTjP38x0pWEJtNQF3ixo2fzCC2iICZxMHWP+xdjFwczALLmCT2nmhgA3GEBfoY Jfbd/MwCUsUioCqxf30D2CReAVOJ/k/dzBBT5SVWbzgAZnMCTfq3+zc7iC0kkCzRsv43C8gg CYH7bBLLN15mhGiQlDi44gbLBEbeBYwMqxiFMvPKchMzc0z0MirzMiv0kvNzNzECw3pZ7Z/o HYyfLgQfYhTgYFTi4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcY pTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYm2ZuM/neMukL98y7H28Jsp61co+rktNt ZJ5wdafIcT1bnqN8WyaYR622+TNR+8kMyeM2T4zWNkUd59bRPrLIOFC+6cmeqF9FSomLBS+y vxcPj+znEfhY3yDUHXbQ/+Q/lTMPA07x9Wpde7Dp76ybsswXm1c8CdeasNhLJNmxk5n3SEDS QbFsJZbijERDLeai4kQAcBTgV2cCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wZ5rChZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlNJ9dxFawxaDi 8xrvBsZD6l2MnBwSAiYSz3+sZAWx2QTUJW7c+MkMYosImEkcbP3D3sXIxcEssIxJYu+JBjYQ R1igj1Fi383PLCBVLAKqEvvXNzCB2LwCphL9n7qZIabKS6zecADM5gSa9G/3b3YQW0ggWaJl /W+WCYxcCxgZVjGKZOaV5SZm5pjqFWdnVOZlVugl5+duYgQG6bLaPxN3MH657H6IUYCDUYmH 98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU5mBREuf1Ck9NEBJI TyxJzU5NLUgtgskycXBKNTCGn3L/VtX9jfd+98w1O6SLw4+JH/bRMGnxkzFRZH/dyLnkfdq0 KTe87z/4I7SqpjLKaN6GrOYb38TVpQzOGffeLbka8rCmSmhbx5NjH/55n+va3MhnsDrt2DGN jUYiSq3xb2Jr280EJpYkuXYEvH7refj1prufbxVtfzpppbvlzoSpq90cJkgpsRRnJBpqMRcV JwIA2UEK004CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 112DE8003F X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: tn1bgxxuowunsx1r5n4cp51mjhfuipns X-HE-Tag: 1740571306-350603 X-HE-Meta: U2FsdGVkX1+HKF68ef9FMehNmApeBextSQIb6LUI5VuKGW2v2/aPM1biKWJjDhb7vVRnQPKW4DaiTWEFPWixzBtBjCT59OIFOl20c9G6Q/WmeEzYJS6dkQ2RUZ87rTNMfD9YX7Gv3G6sIH7qV/qOgeJ5XdgVjhXjlvEQxKlNe6u7zOtsH8VzWrCpFTuzW3kudPNZv8J2iOWDX4Za9PWa5Ol60yljvC7h7MAEd3W0j6RfoHJ8PUq2Dh+6w/JHGMOJfM/5c66UWkSkNY2e2s6FIWjSK8eieD82Es5jGZci9rHrkKMfoA3fwMt48MtxAwUrDykDEYsJsH/dQ02UXZz8QvjlfcnX80Lix8C4WrF0OhieiU3XocOWVtwzMN09jsECYdU4koAxLWB8+FhKhV1fRPZL12jLcHEpeKRIUsFmEPbc5x4lm73QOemym0bwFFAHgPVXH9XfFBJTNaPW6DvjVTvzXE2TOvjP76m9AUgrvt1MUav7IZKLW4x+atFHs8Fw7L/4NMB2LaEccxl84n8nbJEIIZaBETQOGNsN4uhikiTbNPIWnNzdiKnRqS+8CqrMiGNjC5237yhXa0VXAzHjzn0dQUfdGbOwIfJN4x0xpGnU80YCPg5qsq2mxrhcmxYMMOsYp4W1hi15Caj1nwQLW6hRHb2aRjhSfqu0QYMKkZKcM3YSn21nXS7b0ewYkAANSm46Axsg/EOFdAil+od/Umafz9N4DfXZNXlYn3m67frEGq8Nxad6hVGHi7p+6Nvf4t3oH3smK99C4NcWNQOIvYGYb8FwXAJqnx0yfqes/gB6si+uwEapEczr9PS33IaLUQFHS+pofercSP7ztmvBICmKS3as/leC7y7CYyge8zY1lJWuj5BSxMEOIhxpuAcTkMjyeHqLygjVCg6GILpMCrdUxmlivEqmqhRp9clshHWhwoRaSNhaPTXGt9qMgpCFQ+/sb1VFxzXrOaXVWlg mT3FtE0B DBqJdhS92LwlGCPgsPYeGQX1A1elySdP4pxZU4c2OBgPBlZXqabOSNZrtBR7ZOdfppVc1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to keep luf meta data per page while staying in pcp or buddy allocator. The meta data includes cpumask for tlb shootdown and luf's request generation number. Since struct page doesn't have enough room to store luf meta data, this patch introduces a hash table to store them and makes each page keep its hash key instead. Since all the pages in pcp or buddy share the hash table, confliction is inevitable so care must be taken when reading or updating its entry. Signed-off-by: Byungchul Park --- include/linux/mm_types.h | 10 ++++ mm/internal.h | 8 +++ mm/rmap.c | 122 +++++++++++++++++++++++++++++++++++++-- 3 files changed, 136 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7b15efbe9f529..f52d4e49e8736 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -33,6 +33,16 @@ struct address_space; struct mem_cgroup; +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +struct luf_batch { + struct tlbflush_unmap_batch batch; + unsigned long ugen; + rwlock_t lock; +}; +#else +struct luf_batch {}; +#endif + /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the diff --git a/mm/internal.h b/mm/internal.h index ee8af97c39f59..8ade04255dba3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1270,6 +1270,8 @@ extern struct workqueue_struct *mm_percpu_wq; void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); #else static inline void try_to_unmap_flush(void) { @@ -1280,6 +1282,12 @@ static inline void try_to_unmap_flush_dirty(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset) +{ +} +static inline void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ +} #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ extern const struct trace_print_flags pageflag_names[]; diff --git a/mm/rmap.c b/mm/rmap.c index 8439dbb194c8c..ac450a45257f6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -641,7 +641,7 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, * function, ugen_before(), should be used to evaluate the temporal * sequence of events because the number is designed to wraparound. */ -static atomic_long_t __maybe_unused luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); +static atomic_long_t luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); /* * Don't return invalid luf_ugen, zero. @@ -656,6 +656,122 @@ static unsigned long __maybe_unused new_luf_ugen(void) return ugen; } +static void reset_batch(struct tlbflush_unmap_batch *batch) +{ + arch_tlbbatch_clear(&batch->arch); + batch->flush_required = false; + batch->writable = false; +} + +void fold_batch(struct tlbflush_unmap_batch *dst, + struct tlbflush_unmap_batch *src, bool reset) +{ + if (!src->flush_required) + return; + + /* + * Fold src to dst. + */ + arch_tlbbatch_fold(&dst->arch, &src->arch); + dst->writable = dst->writable || src->writable; + dst->flush_required = true; + + if (!reset) + return; + + /* + * Reset src. + */ + reset_batch(src); +} + +/* + * The range that luf_key covers, which is 'unsigned short' type. + */ +#define NR_LUF_BATCH (1 << (sizeof(short) * 8)) + +/* + * Use 0th entry as accumulated batch. + */ +static struct luf_batch luf_batch[NR_LUF_BATCH]; + +static void luf_batch_init(struct luf_batch *lb) +{ + rwlock_init(&lb->lock); + reset_batch(&lb->batch); + lb->ugen = atomic_long_read(&luf_ugen) - 1; +} + +static int __init luf_init(void) +{ + int i; + + for (i = 0; i < NR_LUF_BATCH; i++) + luf_batch_init(&luf_batch[i]); + + return 0; +} +early_initcall(luf_init); + +/* + * key to point an entry of the luf_batch array + * + * note: zero means invalid key + */ +static atomic_t luf_kgen = ATOMIC_INIT(1); + +/* + * Don't return invalid luf_key, zero. + */ +static unsigned short __maybe_unused new_luf_key(void) +{ + unsigned short luf_key = atomic_inc_return(&luf_kgen); + + if (!luf_key) + luf_key = atomic_inc_return(&luf_kgen); + + return luf_key; +} + +static void __fold_luf_batch(struct luf_batch *dst_lb, + struct tlbflush_unmap_batch *src_batch, + unsigned long src_ugen) +{ + /* + * dst_lb->ugen represents one that requires tlb shootdown for + * it, that is, sort of request number. The newer it is, the + * more tlb shootdown might be needed to fulfill the newer + * request. Conservertively keep the newer one. + */ + if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) + dst_lb->ugen = src_ugen; + fold_batch(&dst_lb->batch, src_batch, false); +} + +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ + unsigned long flags; + + /* + * Exactly same. Nothing to fold. + */ + if (dst == src) + return; + + if (&src->lock < &dst->lock) { + read_lock_irqsave(&src->lock, flags); + write_lock(&dst->lock); + } else { + write_lock_irqsave(&dst->lock, flags); + read_lock(&src->lock); + } + + __fold_luf_batch(dst, &src->batch, src->ugen); + + write_unlock(&dst->lock); + read_unlock_irqrestore(&src->lock, flags); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -670,9 +786,7 @@ void try_to_unmap_flush(void) return; arch_tlbbatch_flush(&tlb_ubc->arch); - arch_tlbbatch_clear(&tlb_ubc->arch); - tlb_ubc->flush_required = false; - tlb_ubc->writable = false; + reset_batch(tlb_ubc); } /* Flush iff there are potentially writable TLB entries that can race with IO */