From patchwork Thu Feb 20 05:20:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FCA7C021B1 for ; Thu, 20 Feb 2025 05:20:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 749BA280298; Thu, 20 Feb 2025 00:20:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F8D1280296; Thu, 20 Feb 2025 00:20:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57814280298; Thu, 20 Feb 2025 00:20:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2A498280294 for ; Thu, 20 Feb 2025 00:20:44 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C989C1A0B57 for ; Thu, 20 Feb 2025 05:20:43 +0000 (UTC) X-FDA: 83139173166.04.602A9C6 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id E7FC04000C for ; Thu, 20 Feb 2025 05:20:41 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=7Xkw5cOHbRQYx+GyGckI9nl+nSqwvdWlBwx/zaa9ipo=; b=pzPcdJ1EMLF+GM65L5UuINatZrSMgfHvkkAUZ0f9YxRHUjT8I58MGCzfOorMw2HHxO4xSM +Z0Jl+s+hjWU/DwlFxO3B5zqBfGPWICdYyrE/Z85HM0RN/szjdkqUCJbMZ20FDKJU/VuXS Cdzo9ZOGJsZIBhOx0fvV4Sgc8/cyarw= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028842; a=rsa-sha256; cv=none; b=inAZmNwh+vBp91nEeo0kga6YlhVSYjwy4FryKaZP1Gat/O/MYuggAwOFIAJzW4RxDCSdjd ek9R5nWsEKYP+G1cOmeETc6JGD3fhhjHTNBVG+C+ypFpQn74I8m5yHRW3mqvtciWCmuTJi 84Luk4GruGPcgKuBayovL/Qpo8tK33M= X-AuditID: a67dfc5b-3c9ff7000001d7ae-b0-67b6bba6f406 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 01/26] x86/tlb: add APIs manipulating tlb batch's arch data Date: Thu, 20 Feb 2025 14:20:02 +0900 Message-Id: <20250220052027.58847-2-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDZ6fFbSYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ 85edZCpYLVTxYeVs1gbGw/xdjJwcEgImEu97FjDD2DO/TWAHsdkE1CVu3PgJFhcRMJM42PoH LM4scJdJ4kA/G4gtLBAsMenaXbA4i4CqxKfj58HqeQVMJdZd/wk1U15i9YYDYDYn0JwfM3rB eoWAat4tuMTUxcgFVPOeTeL6ud1sEA2SEgdX3GCZwMi7gJFhFaNQZl5ZbmJmjoleRmVeZoVe cn7uJkZg6C+r/RO9g/HTheBDjAIcjEo8vDNat6ULsSaWFVfmHmKU4GBWEuFtq9+SLsSbklhZ lVqUH19UmpNafIhRmoNFSZzX6Ft5ipBAemJJanZqakFqEUyWiYNTqoExoOvliaW8hz33Pz6v zD9x0r+mSVLV2nx/f349Ndd0tbjFZKNN7r9Pii2qjF4dev9q7IzC9n5OcXtVBS+vqf5tvvLe aY9/SF+vzzosk/Gk433kss8pfGG+hwuapeS01vLXSRu5+/ou/uFwcfI/JS7p7q+zeGQEXz8p DpCZULx3Neu9HzMKqrqVWIozEg21mIuKEwEGxIPeeQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g1dn2C3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlTF/2UmmgtVCFR9WzmZtYDzM38XIySEhYCIx89sEdhCbTUBd4saNn8wgtoiA mcTB1j9gcWaBu0wSB/rZQGxhgWCJSdfugsVZBFQlPh0/D1bPK2Aqse46RK+EgLzE6g0HwGxO oDk/ZvSC9QoB1bxbcIlpAiPXAkaGVYwimXlluYmZOaZ6xdkZlXmZFXrJ+bmbGIGBvKz2z8Qd jF8uux9iFOBgVOLhffB4a7oQa2JZcWXuIUYJDmYlEd62+i3pQrwpiZVVqUX58UWlOanFhxil OViUxHm9wlMThATSE0tSs1NTC1KLYLJMHJxSDYyLDJd0m/osX1s1z2FRbiejiZPpyu//m6Qf b+s8WVDScbZqitCFH5t0O/bFyD5p/Z7PPHXx7NKgdqX1vdkXlfoCnLOV1ha4F6Z3Cz3Mvrq2 y+HEtc5DT2x/HtB5VTm3JC7j9X7nW9uXsn/YKxb55rmm75PMQyrCYlxSAsfc7+v2sKl0p5sv ilRiKc5INNRiLipOBAAe/tirYAIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: E7FC04000C X-Stat-Signature: p8k3ac7pu9rz7hbnqx3jdrcxkf7he34y X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740028841-147504 X-HE-Meta: U2FsdGVkX19rCB/fU80qziCo15ELkkV/UEYFRlzfLnyk/dK48hZaRhHjHaXwlrCI1WLY/tQJnVT3mADitPEt3AmIuU2/hgeQoWYiPllaAI6MMCgCuKGFjAK20hJ2Cy5j2b8Lyrbx1npdiYy1OMZgbW5fmLbv6TwU658B9zIgamR80YBdm7Ud7i4N88or8srudkC84C1TcEigGwNbfl9CoW0KqJSqy8Yae9GLB4zoxNKvSGq42CEt660UK5jgS1N6J8ygfOEl1oe2yiJ/M97o4Ft/Ae9H2cYwJRfSNpelEVrz4F0ON2VQXPGMx00UwmXZ5CbTblFzQwZvstxkopXT3UVyrb+Kgg5JYBO/nbqOXv8FAflXsP8awhkwMmIBJS/AWH9N35UHs3SrWIoLKRP51qmPjJXK7P+OupHXvevs4OzKlH+y1bSKaWPgcN5blHFcbLNORQXnqwEmkvAEgfeE2uqgLD2c1UgfDA6kLCB6MUD4isB584wcPdffJQAIPGhDUHOKWTm0HKTLjcK7QEgQZ34ZnYgBCEcXKB+qAkjqTypXdQa54OINn/YvF4CxwvjQjkvjL+SK91IuMe+TNm6dIxWNKP0QyjPVcN83ro0xsc1bvoQlpbyqWUwIrsOBRoOGWWuOKxcFP/8x41oeS81waXO6ZKwHhR4ChOxSn1ZU3dR3jkBS03cD/qlyPIle/FMwNTSX4m2ZJ0HTtD/z1JwDDgwgEqyUeIGDm4b9HSRuvg/OuzmtL457rLemi4d/0y7y+ym5BlyDVUVRjK2x66sQ7c1fLTssn+qQgz4IV4x+m0DSQe/zx/MvwzLhsK81cRoWKe6BvG4TF9N+2Br9ud6A4GP9KpbY92UijXeGGxPAMSACz0BHzqOOmn2GajlWqiSKv2vEpyP7IePuNbfFwI924u+d2207egkUURFXGVTx7v/RcAh3SZ7xdnOg0cWFRc1boG6peH7f5QBzEswOeFf A6+Y7AYC 0G0SroIOxQlGT9lmcHrEVSZ9Bqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read-only and were unmapped, since the contents of the folios wouldn't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that needs to recognize read-only tlb entries by separating tlb batch arch data into two, one is for read-only entries and the other is for writable ones, and merging those two when needed. It also optimizes tlb shootdown by skipping CPUs that have already performed tlb flush needed since. To support it, added APIs manipulating arch data for x86. Signed-off-by: Byungchul Park --- arch/x86/include/asm/tlbflush.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 69e79fff41b80..0ae9564c7301e 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -293,6 +294,29 @@ static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm) extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + cpumask_clear(&batch->cpumask); +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + cpumask_or(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + return !cpumask_subset(mm_cpumask(mm), &batch->cpumask); +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + return !cpumask_andnot(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); +} + static inline bool pte_flags_need_flush(unsigned long oldflags, unsigned long newflags, bool ignore_access) From patchwork Thu Feb 20 05:20:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D02FC021AD for ; Thu, 20 Feb 2025 05:20:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E3D8280293; Thu, 20 Feb 2025 00:20:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DB023280295; Thu, 20 Feb 2025 00:20:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5BF1280294; Thu, 20 Feb 2025 00:20:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 66710280295 for ; Thu, 20 Feb 2025 00:20:43 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DE40AC0B25 for ; Thu, 20 Feb 2025 05:20:42 +0000 (UTC) X-FDA: 83139173124.21.29BC420 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf05.hostedemail.com (Postfix) with ESMTP id 7994D100005 for ; Thu, 20 Feb 2025 05:20:40 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028841; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=OO7zSkNWfZgyqD10C2zqG4pTNNgvuV2xeEsorRtLJoU=; b=z6xZF1tTm5xo9r+DNRQrwiZZGBYIV+TB2d/n2Y32lu0KCYxqHU5yeJKD+2KRMprT8lj5Bz gsHK+Q++ViDb4OcfnmufhhDGPvOAj0irgUFoxSdEn5HUX1FbYxYpU4xYVWrKV7p2oIbOsH lGY+qK3rhOP2FMahl+PjLf4lYQTRQfE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028841; a=rsa-sha256; cv=none; b=jrWX58z+7IOPvjnqMjwKt2RkTKPxq6cOJauAyRwpcVv5o4xVYAGt1eOIChF5Z8T8/3tJdy uMh4Wy+8RR7+MxP/PDsisO6LwO28iU7S24joDNNnmwfaS5lUB+bl7zajKTkZdM3XwOYTTc WG8XdBDjjBjsbMNwCCYXi93Oc0naRPY= X-AuditID: a67dfc5b-3c9ff7000001d7ae-b5-67b6bba69892 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 02/26] arm64/tlbflush: add APIs manipulating tlb batch's arch data Date: Thu, 20 Feb 2025 14:20:03 +0900 Message-Id: <20250220052027.58847-3-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDU6vE7aYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ G24vZC34y1dxu/U4cwPjUp4uRk4OCQETiR9bd7LD2Luv9YPZbALqEjdu/GQGsUUEzCQOtv4B izML3GWSONDPBmILC0RJNHf/ZwKxWQRUJTY2fmHtYuTg4BUwlWjpSYcYKS+xesMBsDGcQGN+ zOgFaxUCKnm34BJQKxdQzWc2iWUnNzNDNEhKHFxxg2UCI+8CRoZVjEKZeWW5iZk5JnoZlXmZ FXrJ+bmbGIGBv6z2T/QOxk8Xgg8xCnAwKvHwzmjdli7EmlhWXJl7iFGCg1lJhLetfku6EG9K YmVValF+fFFpTmrxIUZpDhYlcV6jb+UpQgLpiSWp2ampBalFMFkmDk6pBkYzxvKM++ElBjNj XpUaXuO5yS+xOplHdsuJkuVicwxbnsve3N3ApRMYvynlfurFkNxHn9b8njqD45O9iE79idNB 4pEtKQuWf9y9z9hour/aO5bo1jeHjGJMn7f7avWtiv7z6pDk6Yl9d924l/G+/cfqoHhmH/du NY/QiuuHorMXHmKpkm+pVFZiKc5INNRiLipOBABZvryXeAIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0gyf9XBZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLw3JOsFpd3zWGzuLfmP6vF+V1rWS12LN3HZHHpwAImi+O9B5gs 5t/7zGaxedNUZovjU6YyWvz+AVR8ctZkFgdBj++tfSweO2fdZfdYsKnUY/MKLY/Fe14yeWxa 1cnmsenTJHaPd+fOsXucmPGbxWPeyUCP9/uusnksfvGByWPrLzuPxqnX2Dw+b5IL4I/isklJ zcksSy3St0vgythweyFrwV++itutx5kbGJfydDFyckgImEjsvtbPDmKzCahL3LjxkxnEFhEw kzjY+gcszixwl0niQD8biC0sECXR3P2fCcRmEVCV2Nj4hbWLkYODV8BUoqUnHWKkvMTqDQfA xnACjfkxoxesVQio5N2CS0wTGLkWMDKsYhTJzCvLTczMMdUrzs6ozMus0EvOz93ECAzjZbV/ Ju5g/HLZ/RCjAAejEg/vg8db04VYE8uKK3MPMUpwMCuJ8LbVb0kX4k1JrKxKLcqPLyrNSS0+ xCjNwaIkzusVnpogJJCeWJKanZpakFoEk2Xi4JRqYJQ3mfVKIE73q1qxlZOVgA3TM5+fvZyZ L0RMv0zmtj9wO3m+/cLPrzTF6j8wtG2euMkn8aThonNCKl8+Tv33UJLj2r7Te/5dN9hQUqCz IL/rddJfCY/+/nMPatevexe/xYzv9pLr3RG3r9eI2E1kzo/akz775payvNUVbFY+Fzh+sVxY 8LP9pIgSS3FGoqEWc1FxIgDPhFPrXwIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 7994D100005 X-Stat-Signature: 7e8eef9gjin8bnma5u7dtc1r5kk67nbw X-HE-Tag: 1740028840-418149 X-HE-Meta: U2FsdGVkX1+9fkC9TPC3P8aO+qNsZIAN7uyyCyURpE5aZJsoAZHDBEgG3g9fpkz5l0seTtSGUoCh6CwwYt2XLpdryZruQp+EryBDLRgodA0kQlCbkyR11Hrty1tVJZ2c11iEBQoPDDuVXr0GMLoyKU8Yk+Eml+CndtPEsxbLEHDwLBbqyMwnNkyL8OGNO/xEIQfL5Nd6/0CccU6ZLoqscjoLKWRayuMsIzUyHjmJTJLUIkFjGIO9/2zI6QhQcyJde0bb/U55wC4n1cki6kM2qwimXvIiuP4OU1NgJf+omug65IRzqHI3LCxp4s9bPHpHViHESe77tS/1cD15LHZrsYg3rJ6pb7dTDh0TRQFvtfNeOlWfXGBcdUbybwpBulFmhgAjJFVMk5WbV6yaLscye2SShShem4VGBBN1yKqxWPvNAY9WxNQkgnTsHSf9qvKo2BgJbuKi8destBVOumqHR4reXeNsjEHoYxWZl9qsqZqJsh3fAlZdWeDis8FKWxWD3NE+teE8x7XB3RLcVkXfXyNX6pF+kj0BBit1hySG/tNJt7zxZbY52b0NgHEdg2G/m86c2W2gKY/V89hJb7U6oyHLCfYOoEakYkCgdNGbNKlUlcjuu/WeDL9uRjIo38uTarPANXoCuKe/Npr9f7Vzm+1ryhZgZSD54f6ipQ1c691376cLFe0t9mGlmeKp8l4b5nkqzP7WJm2+pXhIypuw0RDo6jR2DGOEr5phzty5rC9VnNr1kmWOh7M91eBk5zXXVPyStM29bruQip9OI84gyvHQT3Y6vVvMK/9G99ANXE6KYhdXOZj1iOGLwYv5BQpzoViEAsD91e+mTaBVm68swAsKewKG9IJ/v6gUKh1xpRchTAGFhpeRcsNGhh4Jdg763QxVVD+b6MKL7zkfNybUQpL0T5mDMdJQukQgBsyW0kIe3xtCLfY6GG4lxVhb7G7YM59L3i1s1ZkU6Vc1bNc RXw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that requires to manipulate tlb batch's arch data. Even though arm64 does nothing for tlb things, arch with CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH should provide the APIs. Signed-off-by: Byungchul Park --- arch/arm64/include/asm/tlbflush.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 95fbc8c056079..a62e1ea61e4af 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -354,6 +354,33 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) dsb(ish); } +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + /* nothing to do */ +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + /* nothing to do */ +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + /* + * Nothing is needed in this architecture. + */ + return false; +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + /* Kernel can consider tlb batch always has been done. */ + return true; +} + /* * This is meant to avoid soft lock-ups on large TLB flushing ranges and not * necessarily a performance improvement. From patchwork Thu Feb 20 05:20:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29B81C021AD for ; Thu, 20 Feb 2025 05:20:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37C51280295; Thu, 20 Feb 2025 00:20:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 24EA0280296; Thu, 20 Feb 2025 00:20:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5EAB280294; Thu, 20 Feb 2025 00:20:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 640C3280293 for ; Thu, 20 Feb 2025 00:20:43 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 01E22A29E4 for ; Thu, 20 Feb 2025 05:20:42 +0000 (UTC) X-FDA: 83139173166.15.646AC17 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf01.hostedemail.com (Postfix) with ESMTP id E31B040008 for ; Thu, 20 Feb 2025 05:20:40 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028841; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=G30xGqy1s/coEL/U1j52C+U7PjQw0ciNR8FppuYW4fo=; b=AIC306DlZ5snILwlMMrQWVAGYOcgJF0k5rVE8qK7gS8NUU0/mGIk4TKvvaZXLguMZyaMzb dSFC2SsibgfUmvAzc1kJf008x4OUeh7eC11Vr4s0lMpZQKpZxnaQyUIntNybekP1KQeLSs Id4WpHxxn3iRLzhDjHMLqVFs+6ovaVQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028841; a=rsa-sha256; cv=none; b=rh0QyKFU73x8oZ8zImsyI9qrRPIM+MC0s7y9gH/6IdTA9I74iTA+4xFso/3rpy7LpP89G0 su+92CRF36Vqmb5gEvBK7yW5RfTawsmRNx8ZEmrxYmuQela+mZFeQsVAueQeTw0/cHL4Mb Cmdnj/xPOCyW40jH4VH/pxw4mnxi55M= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-ba-67b6bba6ff58 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 03/26] riscv/tlb: add APIs manipulating tlb batch's arch data Date: Thu, 20 Feb 2025 14:20:04 +0900 Message-Id: <20250220052027.58847-4-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnke6y3dvSDRZ2ilnMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M HWvvshRcE6p48vgfUwPjK/4uRk4OCQETiW9rm9lh7GWH7rKC2GwC6hI3bvxkBrFFBMwkDrb+ AathFrjLJHGgnw3EFhYIlbh44jFYDYuAqsSulvdgNq+AqcTZXxNZIWbKS6zecAAszgk058eM XrBeIaCadwsuMXUxcgHVvGeTmP95PxtEg6TEwRU3WCYw8i5gZFjFKJSZV5abmJljopdRmZdZ oZecn7uJERj6y2r/RO9g/HQh+BCjAAejEg/vjNZt6UKsiWXFlbmHGCU4mJVEeNvqt6QL8aYk VlalFuXHF5XmpBYfYpTmYFES5zX6Vp4iJJCeWJKanZpakFoEk2Xi4JRqYJQ1XSmtdOvglDaX t5zPrGM6rSYosl3bwZXhXXrfZ2rwM8v9+vxcN3QvzYha+YXpdU2K1LWfmtPv892V5GdRT977 NDpv072Oib/zd9Vkdf93zHxue3/3dskbV+5UPZ1n/S/Ta89mkc08FzleRXW7rtK7uGE7Ewf/ phWRHBuW6DVFnW+wKK0t/a/EUpyRaKjFXFScCAAdaBcNeQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g+3HeSzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlbFj7V2WgmtCFU8e/2NqYHzF38XIySEhYCKx7NBdVhCbTUBd4saNn8wgtoiA mcTB1j/sIDazwF0miQP9bCC2sECoxMUTj8FqWARUJXa1vAezeQVMJc7+msgKMVNeYvWGA2Bx TqA5P2b0gvUKAdW8W3CJaQIj1wJGhlWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBgbys9s/E HYxfLrsfYhTgYFTi4X3weGu6EGtiWXFl7iFGCQ5mJRHetvot6UK8KYmVValF+fFFpTmpxYcY pTlYlMR5vcJTE4QE0hNLUrNTUwtSi2CyTBycUg2Mmf8unnOzMWgPtpv7YNm9NR7ndtyzjxBm c8/cdf0O+6fgDUe0CtYuZuboz4h4eTNl9azm2iSGtbU8d+6+azryIF5NqXPh7kMBmdk9jyN6 3Q+on9C1iov6canheVYog9OKibGBF3+xdcZyzheWfpq1ovml+qLM23wnX+7kTuUqb9z9W3XJ toptSizFGYmGWsxFxYkAIR/GaGACAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: E31B040008 X-Rspamd-Server: rspam07 X-Stat-Signature: aiiebrgfobisaj57dq15xne8mrpqq457 X-HE-Tag: 1740028840-509193 X-HE-Meta: U2FsdGVkX18DgvY4bTzVqeflIbJssxqidK8krBtWHdTctFjovf6jqblduMmtFl2q2NHL+JbHKQ3B5UdOZ2jTc1+58pMXC3ZLQMCSR/fnvspnDe/BJE50+Utp6fuvGf/NQ/MdXWJ+tXDXVrcmYqhD6xpCKNutBlTgMyxjOa6qG8xW+MsFAjglcyuQBS+v4zX2fGhE2nMiumkDAAzGycBtKVPV936VEizzlBrhM7Yyb3TCjz63mPYF7LKtrXbE7DBAQbuGOpxF2i0ykBbJ5ib3g5jXKpTZoEvq5b/eDYBLxG6a9qklwgPHffnj+ctl8ZdRHVOzzrXnBtTwdxujxVReml0+pKX0fc+rXV3owCGs5n1JfmL5zBSU8Yq67m8mRg4d29c1a1Njby5ceS/X95XClLHnsq2dY+NXClueIxUqr8KvDQZsYdNJwjzfS+QtHeVeHyV0p7b5GPdeZdnCYMbUmRhwa5ceFW2lQzlV8GcXRaNOPDqj1WKZKhm5GEEKjZ9Qt9J27ncFNiTm3eZD8RP6XvW/HtjrrXCwZQjlyMF+3fliuXRrBMtqTHSMrlIrXojV+5nPOsl0WRt0TQfhy19W+qZosTCB+thQiVXDHCE9Ppag1B2t8k3uif7E2X8iIEKigGN/lkG0K5o91ibJh+2bcdktgpdu91Fwhu4oluakiYLSCVYGFQrgIQ9ttCti+Lg8DgrgFBE06PT+OKtywvcuLgmWgUIiv0SG+lq1UIPXUsKVdwnGozbG8M9Sgx94I+XZeZJMDKjSH+udKCWwMHED5bjg/0rr3Bpc8saiuricd8ynif2rn80SQKWkyTta0ojQkRZyke8EhKKvGygvxUmP0jUvmGpUddz/RBYWTi44jgV7HWuRAqR2wu7KJhcKciuSyReBWyZaBAXutf9GB+IXArpStVKqqLdtfXLsvoaGz7g30tg4HYG7j6kNwYk0J5dG0sCykah4EaJuZZt2MGM wPjROh/t DIWfrUlVV17zUMrEVTNzVAJKwyA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that needs to recognize read-only tlb entries by separating tlb batch arch data into two, one is for read-only entries and the other is for writable ones, and merging those two when needed. It also optimizes tlb shootdown by skipping CPUs that have already performed tlb flush needed since. To support it, added APIs manipulating arch data for riscv. Signed-off-by: Byungchul Park --- arch/riscv/include/asm/tlbflush.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 72e5599349529..1dc7d30273d59 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -8,6 +8,7 @@ #define _ASM_RISCV_TLBFLUSH_H #include +#include #include #include @@ -65,6 +66,33 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + cpumask_clear(&batch->cpumask); + +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + cpumask_or(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); + +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + return !cpumask_subset(mm_cpumask(mm), &batch->cpumask); + +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + return !cpumask_andnot(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); + +} + extern unsigned long tlb_flush_all_threshold; #else /* CONFIG_MMU */ #define local_flush_tlb_all() do { } while (0) From patchwork Thu Feb 20 05:20:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E694C021AD for ; Thu, 20 Feb 2025 05:20:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC898280296; Thu, 20 Feb 2025 00:20:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BDAAA280294; Thu, 20 Feb 2025 00:20:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A54B7280296; Thu, 20 Feb 2025 00:20:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 64651280294 for ; Thu, 20 Feb 2025 00:20:44 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E50DE120B4C for ; Thu, 20 Feb 2025 05:20:43 +0000 (UTC) X-FDA: 83139173166.26.13F7ED2 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf22.hostedemail.com (Postfix) with ESMTP id 13797C0002 for ; Thu, 20 Feb 2025 05:20:41 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028842; a=rsa-sha256; cv=none; b=x+Sy5kYo91zH3/gwRa7Qm1vjDgqDXjc/5kneawQEH56Hz2myb5Vhvsb+uL975nH6O1Al5U 1sF++EEs+Hm6Ig8uYUeszCx/DD8WdUiT5ayqUUZc7/Ul4UIxprO8Or9mllK/j9mlND7ipV EHmiCjT//iT33D7jJWvP9YT47G8wwjo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=CU9nQq9JQzZA9ZYPcb+lZt8ZPs+raz4pWESYoIMpHZc=; b=OmClifZiaajCceAJmc1UlNCEQYEfUfX4OZWoy+NG4HorrnmlYFGmCrnk0Bqctj/IdudazO 345rRIGKs1xsRAE3G3593id6Q0VUdcJfcpEEDQfLT74ENgNERpXgOSOa0uCr8nVCFl1LWF XviGDlE9QgYD7JHQ40/mnVa4Cs0a+l8= X-AuditID: a67dfc5b-3c9ff7000001d7ae-bf-67b6bba6b39a From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 04/26] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Date: Thu, 20 Feb 2025 14:20:05 +0900 Message-Id: <20250220052027.58847-5-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrCLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDfb+FbeYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ Z288Yil4ylcxafI8tgbG2TxdjJwcEgImEv++3mGFsVfvXsoGYrMJqEvcuPGTGcQWETCTONj6 hx3EZha4yyRxoB+ohoNDWKBcYs0sPxCTRUBV4uktVZAKXgFTictfWhghJspLrN5wAGwKJ9CU HzN6waYLAdW8W3CJqYuRC6jmM5tE963rUCdIShxccYNlAiPvAkaGVYxCmXlluYmZOSZ6GZV5 mRV6yfm5mxiBYb+s9k/0DsZPF4IPMQpwMCrx8M5o3ZYuxJpYVlyZe4hRgoNZSYS3rX5LuhBv SmJlVWpRfnxRaU5q8SFGaQ4WJXFeo2/lKUIC6YklqdmpqQWpRTBZJg5OqQbGagPJ7Zt9xZsn 626cPF0wizV9rsuiZ0WaXSc4pogo+J+dOHvfvNg/Yf+fTRJv99soeam8JdDC7mCKm8CiZydf q8fvLND++czCd+q9iKUeF9aGL/vUNuOy1nWe2c4C+gmr7V8tKz6RY8IpnNX/+chT1/It6xXV 52vHKezyi/o36fTCROXbmzzXKrEUZyQaajEXFScCAFaoSSN3AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g9MP+SzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlXH2xiOWgqd8FZMmz2NrYJzN08XIySEhYCKxevdSNhCbTUBd4saNn8wgtoiA mcTB1j/sIDazwF0miQP9QDUcHMIC5RJrZvmBmCwCqhJPb6mCVPAKmEpc/tLCCDFRXmL1hgNg UziBpvyY0Qs2XQio5t2CS0wTGLkWMDKsYhTJzCvLTczMMdUrzs6ozMus0EvOz93ECAziZbV/ Ju5g/HLZ/RCjAAejEg/vg8db04VYE8uKK3MPMUpwMCuJ8LbVb0kX4k1JrKxKLcqPLyrNSS0+ xCjNwaIkzusVnpogJJCeWJKanZpakFoEk2Xi4JRqYLQ5vFln2+xUHd+QVxJOv7PWGchsd9lm du7eT/3HdrdcVacJr7sc63vT52PH3NbERVe5fLwefAzqV/E0O7WvgdvY//Upht/fyg70WMTG aWz/fVjgRnhE8uwfXc6a4Sk/63ccK12X+pyPeS/nhQqpwvwnVRt/H22JfVLvfImp+23P7hPs 6qrztiuxFGckGmoxFxUnAgBEHJIWXgIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 13797C0002 X-Stat-Signature: i9eixu33db55e849y7819gb48hsy7t4b X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740028841-748411 X-HE-Meta: U2FsdGVkX1/eJInT2P3URBG+pAp2wem33lAXwfD/aaLmrKgm6fYruTfxFj8sVN1tC/fguNYlWuhF7nnNoF87scXljKB2QWKwiioazzJC4GJov0Q3whunIy4+0pEJx08f5MC1WR0dJKknhK9TDf2nLkCwjkG+bZqF1B8a5KTYcMV9wkKTgRLYUMo5IS4M2ZFezDUuZ5cN3AGh6m33rPvjuxTIYhZp30UfIege7agXVrqzJP0FpJdonIkJ7tGt2i2AfiygSSR7PZIFa5OMtAOkRzSzgcCuYYTy2yxT6gEEhh0rmugTDzqz/0emDSFtrdqnaDbJ6KJSrf/Jx9fylGYUlkgn5HutdawiSJSSNPo9ldsCEduvxM7d5L0rLCP9/5zTxCAiZtzlA5R91dyXhZO9RWEuJjiTcH3Kn9guVPY0BTQbew9yfjXFLg24X/1nQpzZEzPRXum9/sH0NZqawTHXJuP3CCOghTapFB7jjN9q9Ba8hO/AvnXQ3157Y5fLlmtLDj5qIcRe/0kP0ADX4cAF5GqMnnc4L418qP6vSzx7odEoPiw/xfnuW0NZfFS5fMIT/mISNM/7moeWGa64g0XPzdN/xhnz7wXaenstfqZsyqaVDldZdp7oHybckXi0rEMBI0j1wKj3VlHI1XxMJdGS7ig2o8vxVp84Yl3JZkqWWas/Oo6PtJcGnqVlGJ1gUbSkAW18I4EMoiAUkidIhU7NMOS8X6MrGoLcLHq7k0vk0hj5d+GuUtuGaGoL/VkDMraH6OujdlFpwjU5uz6HxYjeTN1WkvDo6AfQE/4F/eW2ByFUSoAHKJAu9KeqQYjh2wBGUKTaEHxCN4OV27pr+nntZb1iYsgalqO0wjtpbWOWffbzL4sz5Sr9kMgqTSzV09LkJtrmTyoV/p8k6reoGHpAlrwDg/8uCOppD4f6Xh+pybbWO/S9N6rmWVPP6GW0gcYFPtkRR6a4at/95jUHhqK qd8swbIg zfPrGdMLdCymQ4O9oBcc6HivSaLgeLY4Qu5Bz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that requires to avoid redundant tlb flush by manipulating tlb batch's arch data. To achieve that, we need to separate the part clearing the tlb batch's arch data out of arch_tlbbatch_flush(). Signed-off-by: Byungchul Park --- arch/riscv/mm/tlbflush.c | 1 - arch/x86/mm/tlb.c | 2 -- mm/rmap.c | 1 + 3 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 9b6e86ce38674..36f996af6256c 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -201,5 +201,4 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); - cpumask_clear(&batch->cpumask); } diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 86593d1b787d8..860e49b223fd7 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1262,8 +1262,6 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) local_irq_enable(); } - cpumask_clear(&batch->cpumask); - put_flush_tlb_info(); put_cpu(); } diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7e..2de01de164ef0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -648,6 +648,7 @@ void try_to_unmap_flush(void) return; arch_tlbbatch_flush(&tlb_ubc->arch); + arch_tlbbatch_clear(&tlb_ubc->arch); tlb_ubc->flush_required = false; tlb_ubc->writable = false; } From patchwork Thu Feb 20 05:20:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72009C021AD for ; Thu, 20 Feb 2025 05:20:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99732280297; Thu, 20 Feb 2025 00:20:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90B3928028C; Thu, 20 Feb 2025 00:20:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B5C9280296; Thu, 20 Feb 2025 00:20:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4D44C28028C for ; Thu, 20 Feb 2025 00:20:43 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E9C1F160DBF for ; Thu, 20 Feb 2025 05:20:42 +0000 (UTC) X-FDA: 83139173124.17.FC21E4A Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf06.hostedemail.com (Postfix) with ESMTP id DB977180007 for ; Thu, 20 Feb 2025 05:20:40 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028841; a=rsa-sha256; cv=none; b=DLy5iis0yIgH+/y2cCFGrHc7z2hTauvNYZAo9jQ2+nwA2FfQqlvjHiXF9Fu4XK4Rl5gCYi aXbuD0pTgwP4Fws7tBn2hH80eog989yH60Xy2uCVqxYd6SiWYsCFBCy6RmcKf38JZcnmh1 o1EcyL/7SWLAW1VuvQB3jsJ632mhM8s= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028841; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=Ql6/s0v9OK4M4h5gb4MYChjHwKYoBcRXwF9+aDV4Khc=; b=S6OPx+G/FgHMgVpGOV4gtf01tYHffO6wCyOMdM/rQgB6x/9Lo92FCcd4NdHsz7Ions6IvE lyWXZs8odM55h1sz99TqzlcM+4C0tQTT0fG2MbXayqoJn3ikKqJp1h5ZGCB9FwLdp9rp4U ylFCdjTWryNccxcj2vexe2O6KBseFyE= X-AuditID: a67dfc5b-3c9ff7000001d7ae-c6-67b6bba6280f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 05/26] mm/buddy: make room for a new variable, luf_key, in struct page Date: Thu, 20 Feb 2025 14:20:06 +0900 Message-Id: <20250220052027.58847-6-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrCLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDe61yljMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M 042yBWtkKt4deczUwDhXrIuRg0NCwETi5ELBLkZOMLPnXz87iM0moC5x48ZPZhBbRMBM4mDr H7A4s8BdJokD/WwgtrBAnMTGuzfAalgEVCUer9jDAmLzCphK3Fu4kBFiprzE6g0HwGo4geb8 mNEL1isEVPNuwSWmLkYuoJr3bBKLj3+EapCUOLjiBssERt4FjAyrGIUy88pyEzNzTPQyKvMy K/SS83M3MQLDflntn+gdjJ8uBB9iFOBgVOLhndG6LV2INbGsuDL3EKMEB7OSCG9b/ZZ0Id6U xMqq1KL8+KLSnNTiQ4zSHCxK4rxG38pThATSE0tSs1NTC1KLYLJMHJxSDYyuHI8eKvIKlV4o nhut7fXo+BYDc0WVDr+qExpbnk46dPyVxQGZw3oxvzsalizsfZjed0xZjU033u8vo6vJRotP 5XHZqoFeO75JMBoHbBbNFnohFPpx+R9/lutGVkqfHrmy6Ro1cS7Y4HptXZ3t6T2iivVL9jEe PWL0oatC+0nQQZcDOuv7jyixFGckGmoxFxUnAgC5URuDdwIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g3WLBC3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlXG6UbZgjUzFuyOPmRoY54p1MXJySAiYSPT862cHsdkE1CVu3PjJDGKLCJhJ HGz9AxZnFrjLJHGgnw3EFhaIk9h49wZYDYuAqsTjFXtYQGxeAVOJewsXMkLMlJdYveEAWA0n 0JwfM3rBeoWAat4tuMQ0gZFrASPDKkaRzLyy3MTMHFO94uyMyrzMCr3k/NxNjMAwXlb7Z+IO xi+X3Q8xCnAwKvHwPni8NV2INbGsuDL3EKMEB7OSCG9b/ZZ0Id6UxMqq1KL8+KLSnNTiQ4zS HCxK4rxe4akJQgLpiSWp2ampBalFMFkmDk6pBsbat9XpU4JPzLIs3KJn8aru4HQ3LsuvVy+a JB6qWdprHxrP86KmmXnCUkaNHd4HhKUMIuyWdD1zfmacVJI+X1NI2adfavucGs+yI+LdGfWR iZ9cJnw+p3/8bn5E9qt1nNYx90+aF7NsjHu2QWR7xC5n3SnB0xa8fuZp1RvnfGjX7tYV3AdE tyqxFGckGmoxFxUnAgAAgnVcXwIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: DB977180007 X-Rspamd-Server: rspam12 X-Stat-Signature: mo7d9be3ywne4hhusr6dnxyt41aroi9t X-HE-Tag: 1740028840-509675 X-HE-Meta: U2FsdGVkX18MCh1oti5o8Jg3p9YfH1KpDrUmY/EGEv29IuQtnimtDwqr52a7yEpKYPrfu+G3ztphQ80vCmZP2nmkFG1eNBisD2HmawGZQoausrh17VBmEv6AypVrH0DkBXtjRGVJoAiD3SPJW9CE9UM1pegu7KMgKi2ozYnvd0undzJreWFY8iMPtw+28QDktka0NcUPhnrMN3kwOZNdN7bZpdNtGEopGSdnf0M0VGRtb3dGWYJVUlxsiS6siwUhawizB2BjnEVBtiw8jcTPr6R2pSekb13IfxqTsl9G6m+jM+IAwTA1KjSDmfPGgk71lrPLL+jiWmOHPQuqpNoEXf0Z4vuRCL4tzA8upGI7cz5ViVo/JcpHvUufKy1TxGcCP1ojZdvs9TozEt/Vnq8/3jMf5KufMX3OuXlN76P79zCgGH5zOgiLZpvnkKUzz2+1rgd/x+ncKUR7mQVChox9F9Lup3KwPFuv0dUH1ge7sWQHtG+zJBBTeApRiUgmRr32ODwXbTlZiQGVcnkeH/ihUIb0PxBeo9fJL9jIDGuWBaP6nrdqrj5KTFA+BS0wM+dIWeyNw08VMrXazVLS9susOFTjTG5ktmhKxXqkiRoCyaNO5nXkueRaK49x+6brP0LKzv+fPg7bQcTEUTktm8t2ms3ZeSN+kUD62eFRWIWv5AvKfQ8k/DoCz2Ugap7yCT+w9rEYtghsEStfUZyO2vOqRJQbwHc3u+XEARmN3zzVsokID2s4qQVGfyVrT4YQS4CKM0qr8aYDleyNefut3G4O39f1v+EBjf9/Td+hXWEd1uIyH3IDBP4lL5lNRS1pnfeRYUJSbJDvtphAKuCJMi9sHdn+JfczweUxnVT1xijnpZ5BUKD26pmybZO0axxb2xhCE0MCbQeoVW2LEM2G8h3uN3DYKSyO9Rb/sXqBID3wHm/UTqxpTePK0r0CUh9zUKGMDTSssAXmpjARIxDCcSA BwwxvD13 pZbrWMv3/vSFJLjPOz4nCEDhEA4Baqysqh2ED X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that tracks need of tlb flush for each page residing in buddy. Since the private field in struct page is used only to store page order in buddy, ranging from 0 to MAX_PAGE_ORDER, that can be covered with unsigned short. So splitted it into two smaller ones, order and luf_key, so that the both can be used in buddy at the same time. Signed-off-by: Byungchul Park --- include/linux/mm_types.h | 42 +++++++++++++++++++++++++++++++++------- mm/internal.h | 4 ++-- mm/page_alloc.c | 2 +- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 80fef38d9d645..20d85c4e609de 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -106,13 +106,27 @@ struct page { pgoff_t index; /* Our offset within mapping. */ unsigned long share; /* share count for fsdax */ }; - /** - * @private: Mapping-private opaque data. - * Usually used for buffer_heads if PagePrivate. - * Used for swp_entry_t if swapcache flag set. - * Indicates order in the buddy system if PageBuddy. - */ - unsigned long private; + union { + /** + * @private: Mapping-private opaque data. + * Usually used for buffer_heads if PagePrivate. + * Used for swp_entry_t if swapcache flag set. + * Indicates order in the buddy system if PageBuddy. + */ + unsigned long private; + struct { + /* + * Indicates order in the buddy system if PageBuddy. + */ + unsigned short order; + + /* + * For tracking need of tlb flush, + * by luf(lazy unmap flush). + */ + unsigned short luf_key; + }; + }; }; struct { /* page_pool used by netstack */ /** @@ -537,6 +551,20 @@ static inline void set_page_private(struct page *page, unsigned long private) page->private = private; } +#define page_buddy_order(page) ((page)->order) + +static inline void set_page_buddy_order(struct page *page, unsigned int order) +{ + page->order = (unsigned short)order; +} + +#define page_luf_key(page) ((page)->luf_key) + +static inline void set_page_luf_key(struct page *page, unsigned short luf_key) +{ + page->luf_key = luf_key; +} + static inline void *folio_get_private(struct folio *folio) { return folio->private; diff --git a/mm/internal.h b/mm/internal.h index 5a7302baeed7c..754f1dd763448 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -541,7 +541,7 @@ struct alloc_context { static inline unsigned int buddy_order(struct page *page) { /* PageBuddy() must be checked by the caller */ - return page_private(page); + return page_buddy_order(page); } /* @@ -555,7 +555,7 @@ static inline unsigned int buddy_order(struct page *page) * times, potentially observing different values in the tests and the actual * use of the result. */ -#define buddy_order_unsafe(page) READ_ONCE(page_private(page)) +#define buddy_order_unsafe(page) READ_ONCE(page_buddy_order(page)) /* * This function checks whether a page is free && is the buddy diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 839708353cb77..59c26f59db3d6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -576,7 +576,7 @@ void prep_compound_page(struct page *page, unsigned int order) static inline void set_buddy_order(struct page *page, unsigned int order) { - set_page_private(page, order); + set_page_buddy_order(page, order); __SetPageBuddy(page); } From patchwork Thu Feb 20 05:20:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50346C021B1 for ; Thu, 20 Feb 2025 05:20:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8A8028028C; Thu, 20 Feb 2025 00:20:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C01E0280298; Thu, 20 Feb 2025 00:20:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9968A280296; Thu, 20 Feb 2025 00:20:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 654C9280294 for ; Thu, 20 Feb 2025 00:20:43 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E9499B2A1D for ; Thu, 20 Feb 2025 05:20:42 +0000 (UTC) X-FDA: 83139173124.20.1636B58 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id 0412F1A0002 for ; Thu, 20 Feb 2025 05:20:40 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028841; a=rsa-sha256; cv=none; b=7Xv4nr8N9FU4HFRqPPXoM0lHTESsHFGgdiL+kJENhNin+ILErQn9d15nSRbw9YxREW6VKU JyD2K6swsDVVh6wEoA3juucuhJeYyp9K10m1NUGxdWoKAUlCS5TLBvF3trj/fwcJH39HaC reNQ3jqRjg7q1CDsR03VDQT/+w3F31I= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028841; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=SQ8BX0dsCDiLsyNxdvwUISZIRBOA2Cim0w/LniXoUnQ=; b=bfgXlkDDsKiS4WHeO0OR6Q5ce/9UknlvswyIJ0BH7vOIvP26b/k7HZrQCl412PdvznM9g3 fx20b4iEvCBeqfNLx0Kf1AmVfFt6D8hqKKnwcp3zTLaK+t41OCFaFrpeZMsy8eJryn0DwN xOn2C6yqqn9Wa3tJiCL38tu3CMOF1bU= X-AuditID: a67dfc5b-3c9ff7000001d7ae-cb-67b6bba62acc From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 06/26] mm: move should_skip_kasan_poison() to mm/internal.h Date: Thu, 20 Feb 2025 14:20:07 +0900 Message-Id: <20250220052027.58847-7-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDa53K1jMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M 5ocnWQoW6lQce/6MtYFxjkoXIyeHhICJxPUlW5i7GDnA7I9NqSBhNgF1iRs3fjKD2CICZhIH W/+wg9jMAneZJA70s4HYwgLBEuf2nAKrYRFQlZi04gxYnFfAVGLGhNnMEOPlJVZvOABmcwLN +TGjF6xGCKjm3YJLTF2MXEA179kkftw7wg7RIClxcMUNlgmMvAsYGVYxCmXmleUmZuaY6GVU 5mVW6CXn525iBAb+sto/0TsYP10IPsQowMGoxMM7o3VbuhBrYllxZe4hRgkOZiUR3rb6LelC vCmJlVWpRfnxRaU5qcWHGKU5WJTEeY2+lacICaQnlqRmp6YWpBbBZJk4OKUaGBPX886puR3r IndcS81q0Z6aUL1JSsnCesnTZb1V5sf/ePKtbJKjwZ+5Nf2LVf34lJ8flhTZqvNi+fGeI7wm mTE3jr+vDn/4+lzUxhOsla239/z6xtK864FCRdKT+ohXGWZMIVxSyuuyRY588PNSDyn5kXA2 4eh8a8W1E3+EMF5ln6dl5BZiq8RSnJFoqMVcVJwIAFMqBtZ4AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g0mzxCzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAldH88CRLwUKdimPPn7E2MM5R6WLk4JAQMJH42JTaxcjJwSagLnHjxk9mEFtE wEziYOsfdhCbWeAuk8SBfjYQW1ggWOLcnlNgNSwCqhKTVpwBi/MKmErMmDAbLC4hIC+xesMB MJsTaM6PGb1gNUJANe8WXGKawMi1gJFhFaNIZl5ZbmJmjqlecXZGZV5mhV5yfu4mRmAYL6v9 M3EH45fL7ocYBTgYlXh4Hzzemi7EmlhWXJl7iFGCg1lJhLetfku6EG9KYmVValF+fFFpTmrx IUZpDhYlcV6v8NQEIYH0xJLU7NTUgtQimCwTB6dUA2NZ+A6jo08ilpoIanJ3xVt93d+3XEl3 zpXJ7tbWf2ZkxWY/iF0kXjozZM65IzuvNS9VZ//nJyT67aTWkxK7eNs5HFv6WNXdT9i4z4/v naIjFTAr71dDw8KfSdsYJ9x1avacrMCQHPPF68r35qjGrKf3j0VXHPc+261ikRxvOne+/eoJ MR5HpZRYijMSDbWYi4oTAa2FdxpfAgAA X-CFilter-Loop: Reflected X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 0412F1A0002 X-Stat-Signature: etc8nqfmhkicwyy9n7b8wph3qjpcsfw6 X-Rspam-User: X-HE-Tag: 1740028840-208793 X-HE-Meta: U2FsdGVkX1+ra0sYdWfn3713IS0TKHHHYbk7b6jOyWEuqKOIYZWG5gr5UIm0SVTYO5+8ZIiXFQqpEhH91eCfS0nnLdFH49zJbZiqy61dx7pEmTIZZU/oQsxLQnovkwqo6uyoDkjXoaTeog6nPfyXAP6CLBc079DZ0M4/byVeMD8tkbylZOmqwCV8SqvgQIZJwx25+c7z0zWduGIo+c91m9DnytKx+rH9d+FbCh/FnrsORXNAm0zOfXX6FJUR3pjQzQdPgHGHi+YnjNp3xTZzUZ2OPPgFJL9gJQc2ckwJZ7zO91doQXpN8s3jrN8A8X3U0GEaYfCGX49CVMxHMdGtmkOkd0qQPY+1hG6DU99fDHdCQyQQArKv3p3yMX2/CPSKGWpE3eb4C65R4Go9f2QQaqK8oJ502BWBqFCWRt3B9G2KrPj3yi5S4B0lZ+bQJGI1wtuFksOhYkNIe+kyB1MgBtywfHw17Vh5AzxYucg53CxzwwbWnRtz+etNYZ6J7q1HKzwbs8XvPkHAcFP7ggLSaRJYxwGgfa4hrVNdL35PeXvwwzDq+aWR/YKxVArXb+cRkA6igDIFpFzf4QZSS/xUSFG+EJtQXu5WN2NSEpwo6YY2dw4xFhr3VtnaKaOIokv6QazA1AWuXrIrCOioaiyhOt/VBT57KC7nAmpFWHQZeoUyGmj73YjuEPYJoOpnl2Bld55EndiEnDfxqJOwSb/LTjeUQ2YFv3SMcl79UiTTTwuEiMpvjYSBxUoeUIdNqhAQ1rrazlNTxYuDnEBWCYXbRD7WVu1DMSVrfWwggKa+G1eeGkSWCcbbFOyEnRvRXpZuHDlm8QUnt9ZUGHpMhcnmI7HsFe+xkzYLMp357RpwYB4sjd8Sl8Nyr6EuqP/pLU+LGDs8rtA+tYLxZZRM+p6y7X4H6oReTGNE3l96/PTGNnoKIHDKBHPOLW6c6oZl3AT2hQined+g/64SAr+/1Xv +xjdF5IT gmFgr35kqi8UIHGEjzaJsqFHXPpvYhl2UFXWH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to use should_skip_kasan_poison() function in mm/internal.h. Signed-off-by: Byungchul Park --- mm/internal.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 47 ----------------------------------------------- 2 files changed, 47 insertions(+), 47 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 754f1dd763448..e3084d32272e3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1038,8 +1038,55 @@ static inline void vunmap_range_noflush(unsigned long start, unsigned long end) DECLARE_STATIC_KEY_TRUE(deferred_pages); bool __init deferred_grow_zone(struct zone *zone, unsigned int order); + +static inline bool deferred_pages_enabled(void) +{ + return static_branch_unlikely(&deferred_pages); +} +#else +static inline bool deferred_pages_enabled(void) +{ + return false; +} #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ +/* + * Skip KASAN memory poisoning when either: + * + * 1. For generic KASAN: deferred memory initialization has not yet completed. + * Tag-based KASAN modes skip pages freed via deferred memory initialization + * using page tags instead (see below). + * 2. For tag-based KASAN modes: the page has a match-all KASAN tag, indicating + * that error detection is disabled for accesses via the page address. + * + * Pages will have match-all tags in the following circumstances: + * + * 1. Pages are being initialized for the first time, including during deferred + * memory init; see the call to page_kasan_tag_reset in __init_single_page. + * 2. The allocation was not unpoisoned due to __GFP_SKIP_KASAN, with the + * exception of pages unpoisoned by kasan_unpoison_vmalloc. + * 3. The allocation was excluded from being checked due to sampling, + * see the call to kasan_unpoison_pages. + * + * Poisoning pages during deferred memory init will greatly lengthen the + * process and cause problem in large memory systems as the deferred pages + * initialization is done with interrupt disabled. + * + * Assuming that there will be no reference to those newly initialized + * pages before they are ever allocated, this should have no effect on + * KASAN memory tracking as the poison will be properly inserted at page + * allocation time. The only corner case is when pages are allocated by + * on-demand allocation and then freed again before the deferred pages + * initialization is done, but this is not likely to happen. + */ +static inline bool should_skip_kasan_poison(struct page *page) +{ + if (IS_ENABLED(CONFIG_KASAN_GENERIC)) + return deferred_pages_enabled(); + + return page_kasan_tag(page) == KASAN_TAG_KERNEL; +} + enum mminit_level { MMINIT_WARNING, MMINIT_VERIFY, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 59c26f59db3d6..244cb30496be5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -299,11 +299,6 @@ int page_group_by_mobility_disabled __read_mostly; */ DEFINE_STATIC_KEY_TRUE(deferred_pages); -static inline bool deferred_pages_enabled(void) -{ - return static_branch_unlikely(&deferred_pages); -} - /* * deferred_grow_zone() is __init, but it is called from * get_page_from_freelist() during early boot until deferred_pages permanently @@ -316,11 +311,6 @@ _deferred_grow_zone(struct zone *zone, unsigned int order) return deferred_grow_zone(zone, order); } #else -static inline bool deferred_pages_enabled(void) -{ - return false; -} - static inline bool _deferred_grow_zone(struct zone *zone, unsigned int order) { return false; @@ -993,43 +983,6 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) return ret; } -/* - * Skip KASAN memory poisoning when either: - * - * 1. For generic KASAN: deferred memory initialization has not yet completed. - * Tag-based KASAN modes skip pages freed via deferred memory initialization - * using page tags instead (see below). - * 2. For tag-based KASAN modes: the page has a match-all KASAN tag, indicating - * that error detection is disabled for accesses via the page address. - * - * Pages will have match-all tags in the following circumstances: - * - * 1. Pages are being initialized for the first time, including during deferred - * memory init; see the call to page_kasan_tag_reset in __init_single_page. - * 2. The allocation was not unpoisoned due to __GFP_SKIP_KASAN, with the - * exception of pages unpoisoned by kasan_unpoison_vmalloc. - * 3. The allocation was excluded from being checked due to sampling, - * see the call to kasan_unpoison_pages. - * - * Poisoning pages during deferred memory init will greatly lengthen the - * process and cause problem in large memory systems as the deferred pages - * initialization is done with interrupt disabled. - * - * Assuming that there will be no reference to those newly initialized - * pages before they are ever allocated, this should have no effect on - * KASAN memory tracking as the poison will be properly inserted at page - * allocation time. The only corner case is when pages are allocated by - * on-demand allocation and then freed again before the deferred pages - * initialization is done, but this is not likely to happen. - */ -static inline bool should_skip_kasan_poison(struct page *page) -{ - if (IS_ENABLED(CONFIG_KASAN_GENERIC)) - return deferred_pages_enabled(); - - return page_kasan_tag(page) == KASAN_TAG_KERNEL; -} - static void kernel_init_pages(struct page *page, int numpages) { int i; From patchwork Thu Feb 20 05:20:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983326 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9A54C021AD for ; Thu, 20 Feb 2025 05:21:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EBE3280299; Thu, 20 Feb 2025 00:20:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 52A6D28029A; Thu, 20 Feb 2025 00:20:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 268B928029E; Thu, 20 Feb 2025 00:20:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BFD8028029B for ; Thu, 20 Feb 2025 00:20:45 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5B5B9120B46 for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) X-FDA: 83139173250.10.709DFA2 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf05.hostedemail.com (Postfix) with ESMTP id 3FDAF100005 for ; Thu, 20 Feb 2025 05:20:42 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=YViRiZjtPoB9kuQIa3Bm4xJPkLXziufXUzBCUgtihVs=; b=uN6gWB0d0IOBD7nsicKBZUZfGPEeOCTPZsB3gcTP1S+wAleenpau2NNoxYYXOHEiagSa4k v4jfWnfKtCxHXihqAfjw5aqX7cA6qgR1SYIovxIdYV8kbzFaJOHzC0ncUPYTVulzhV9Ica B4TNDvCgxWO+hls7G8uF633RQe2qXpU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028843; a=rsa-sha256; cv=none; b=2XTBY+5DxV8bu5DZhQOpvxOmnjMD42YxLO1euh/RNGWataXFthCvRzCoZ2TPLpDjuKVc7j tEJrba6cpz8SoYuA4YBl8RYP+p19CtRcnNP/5qZSV6EnEUNy0RRblObkL4RDGRnkyKE+3S wZSLR9JQvpS+IkfhimdniORm9dKpckc= X-AuditID: a67dfc5b-3c9ff7000001d7ae-d0-67b6bba63805 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 07/26] mm: introduce luf_ugen to be used as a global timestamp Date: Thu, 20 Feb 2025 14:20:08 +0900 Message-Id: <20250220052027.58847-8-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnke6y3dvSDdZuU7WYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ PbsvsBXcEKm4PHkHSwPjaYEuRk4OCQETia3zj7DC2AveHWUEsdkE1CVu3PjJDGKLCJhJHGz9 ww5iMwvcZZI40M8GYgsLhElcvX4PrIZFQFXi5px9YDW8AqYSzT/vsUDMlJdYveEAWA0n0Jwf M3rBeoWAat4tuMTUxcgFVPOeTWLvmQnMEA2SEgdX3GCZwMi7gJFhFaNQZl5ZbmJmjoleRmVe ZoVecn7uJkZg6C+r/RO9g/HTheBDjAIcjEo8vDNat6ULsSaWFVfmHmKU4GBWEuFtq9+SLsSb klhZlVqUH19UmpNafIhRmoNFSZzX6Ft5ipBAemJJanZqakFqEUyWiYNTqoHRWkWcZQ/z1K3b Ct17P9uv/y4+12zX8/SHrS8C98anT2o/d4vR8J/mi5USYus2HxSw/TdhhU7lsidJheu3Wz/8 UB+324vZpCvwxNqaSc8VrnL53/Buv7wkr+yRv6nRtq6mp/1JVydeja7+J74rbOMFJff2sKWv 1r88dWjpmowLpz2r9Zf+3SW8SomlOCPRUIu5qDgRAPjD1ol5AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g47nEhZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLw3JOsFpd3zWGzuLfmP6vF+V1rWS12LN3HZHHpwAImi+O9B5gs 5t/7zGaxedNUZovjU6YyWvz+AVR8ctZkFgdBj++tfSweO2fdZfdYsKnUY/MKLY/Fe14yeWxa 1cnmsenTJHaPd+fOsXucmPGbxWPeyUCP9/uusnksfvGByWPrLzuPxqnX2Dw+b5IL4I/isklJ zcksSy3St0vgyujZfYGt4IZIxeXJO1gaGE8LdDFyckgImEgseHeUEcRmE1CXuHHjJzOILSJg JnGw9Q87iM0scJdJ4kA/G4gtLBAmcfX6PbAaFgFViZtz9oHV8AqYSjT/vMcCMVNeYvWGA2A1 nEBzfszoBesVAqp5t+AS0wRGrgWMDKsYRTLzynITM3NM9YqzMyrzMiv0kvNzNzECA3lZ7Z+J Oxi/XHY/xCjAwajEw/vg8dZ0IdbEsuLK3EOMEhzMSiK8bfVb0oV4UxIrq1KL8uOLSnNSiw8x SnOwKInzeoWnJggJpCeWpGanphakFsFkmTg4pRoYW/d2RHkca+B9vJFh53KuN8mTdZ89zMzL ffNLI+Dor52v2Rhtzm6fuVwy99qCTT6F68QVG9aYGk9+e5wl+lNu2VUJrdN/73efdPK6WZox YYH6w6279jZP63dWlda+EOl/ZorP4Rdv9Mo6D9XcOfz5aLa5x0rlW9opqxhN3YTyJ+bUur7h ZPD1V2Ipzkg01GIuKk4EAAldmBBgAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 3FDAF100005 X-Stat-Signature: 7gsn4tg99pg5ndyg3b1a1tssojz8b3m3 X-HE-Tag: 1740028842-491042 X-HE-Meta: U2FsdGVkX1+F2yTtUNIVaBeXxI30DmE+8VF0qTCi8NTHSyUvC7rWqlIYtgsDSBFWRhBBPKMqzNAXPN/ZJKg9rIObG5gn+mYNtd6De9Lv6KF/NJqEpOEsjfbNZHQkCRTOPQclX1scMuA+KNGuwO7p14LgJ60x+1qTfjpvJkbSBsmedB14G0ezSyYat6IMyuu8SDssVb3mPWnES46g8xR5Nq39Ol5J2QWGbXzu868htRGMXYITT7S0nNm4cyxqw9OZXEts8/MaRmscnCUk+QPpLkFyPPEWkNZ+0r/zoVFJ8tUGjffXu4MOvpg5ck6J25CsiY8cFDoHd9tdvUn09S2KV/QL3XQKLfWhW6j/IrU3x2DtE0lZW58iAhVKnZQkfCD0xs+FPrt7+toIy067WFHIrplLJlQnteA4DdrO/SeXPSrj7O/N0C8W5erQfm1ZfdCtPfB3nzOokLx223uFcBBrELNbbELGM+mtoRVb1RMikDCqqdpVy1anZPtX0Gm9LUltJBwv4nVSQHHKXBbBUf7YlMH+g2pGH32Z9nVlWwMnV5o0cENe1nLuwf1EjXRrbIoCggKERL+X9XTtyVnqiDafjMeJOEGgdNzLrJc5ScTfre7BVVtv7Yibn/JCu6+759TNNr46MgAxfiYIF6XyizdfK1osh16HijesYvHmLXYmyjq2caLsaMNNqDSI+kzHdmWuhy3G/TVlImG/i5fEYzBK0oPmBAWBklm14pSX8E/U5GHj9cMWWp0EVNfptrv92zo4/6uD9EVpvn0yxxf/3y0rxEDnVWowpure/VfGtuwMgro5xGUTyVAm/KJDWPnnhskdqzzoL1lndhdJc9/B0vpKpSKty8EDznEIxS9P8tsBe2wMUXjPYmyb+vddiROY05IFgm3A3rZtFzz5MnxAoY4/jN44f7ZT0FBv8PzqG6e9Eqc9KedfxUesSf1U/xLwMP5efRq/jGFe0PaYVUoU+W/ a87w/eVa MAFs16Ga5B/60KgbI0EgyyOn4dy3wDnZMaxuRT2xMMCnjDbVHWAEU66T3memOJ5kZCOYV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to evaluate the temporal sequence of events to determine whether tlb flush required has been done on each CPU. To achieve that, this patch introduced a generation number, luf_ugen, and a few APIs manipulating the number. It's worth noting the number is designed to wraparound so care must be taken when using it. Signed-off-by: Byungchul Park --- include/linux/mm.h | 34 ++++++++++++++++++++++++++++++++++ mm/rmap.c | 22 ++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index fecd47239fa99..53a5f1cb21e0d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4161,4 +4161,38 @@ static inline int do_mseal(unsigned long start, size_t len_in, unsigned long fla } #endif +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +/* + * luf_ugen will start with 2 so that 1 can be regarded as a passed one. + */ +#define LUF_UGEN_INIT 2 + +static inline bool ugen_before(unsigned long a, unsigned long b) +{ + /* + * Consider wraparound. + */ + return (long)(a - b) < 0; +} + +static inline unsigned long next_ugen(unsigned long ugen) +{ + if (ugen + 1) + return ugen + 1; + /* + * Avoid invalid ugen, zero. + */ + return ugen + 2; +} + +static inline unsigned long prev_ugen(unsigned long ugen) +{ + if (ugen - 1) + return ugen - 1; + /* + * Avoid invalid ugen, zero. + */ + return ugen - 2; +} +#endif #endif /* _LINUX_MM_H */ diff --git a/mm/rmap.c b/mm/rmap.c index 2de01de164ef0..ed345503e4f88 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -634,6 +634,28 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, } #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +/* + * This generation number is primarily used as a global timestamp to + * determine whether tlb flush required has been done on each CPU. The + * function, ugen_before(), should be used to evaluate the temporal + * sequence of events because the number is designed to wraparound. + */ +static atomic_long_t __maybe_unused luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); + +/* + * Don't return invalid luf_ugen, zero. + */ +static unsigned long __maybe_unused new_luf_ugen(void) +{ + unsigned long ugen = atomic_long_inc_return(&luf_ugen); + + if (!ugen) + ugen = atomic_long_inc_return(&luf_ugen); + + return ugen; +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed From patchwork Thu Feb 20 05:20:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A470CC021B0 for ; Thu, 20 Feb 2025 05:21:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52B8228029B; Thu, 20 Feb 2025 00:20:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CE14280299; Thu, 20 Feb 2025 00:20:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F138E280299; Thu, 20 Feb 2025 00:20:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 95E22280299 for ; Thu, 20 Feb 2025 00:20:45 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4439E140CCE for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) X-FDA: 83139173250.11.96ABEDE Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id 49C261A0009 for ; Thu, 20 Feb 2025 05:20:42 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028843; a=rsa-sha256; cv=none; b=Ppx3W03I2ebiGz0Ykr0HlieuHgPmB+C5uJ7QgiYeF9O6GCvgiqRo67vYnBNaXtSJgILZr2 bhSuoYJ6BtFNfUJL3A0hEP9o+2MN6/Qp1MeZGg49RVM77X+OMNerWXD6KVWcUEEdrfTxD9 j4I7L/r+KA/uWAz80lj88vNow9IqiD0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=FleeLhAiB1SYf8SWMjoFc9A3m5x9+5/Drj21WqH5cko=; b=u9O8JLfC1idPrEUHLHMCjUOvHtlaVe4DZZu1q/DyEg6z+TutvSpaM3+k7669DqUiAmkhk9 VbP15/aCP07MckxcONtFFQp7o+M5UQU6U+1CSstz357Qhm0wCK+Mmwq/sQ1eZms1x8qF93 Q98VXsX/+D2/SN0Z9us4KWcwsfxoh/s= X-AuditID: a67dfc5b-3c9ff7000001d7ae-d5-67b6bba6e98b From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 08/26] mm: introduce luf_batch to be used as hash table to store luf meta data Date: Thu, 20 Feb 2025 14:20:09 +0900 Message-Id: <20250220052027.58847-9-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDe4f1rCYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ XzoWshVsMaho2NDO1sB4SL2LkZNDQsBEom/RBRYY+0zjQzYQm01AXeLGjZ/MILaIgJnEwdY/ 7CA2s8BdJokD/WA1wgJpEle2XwazWQRUJWZPPMcIYvMKmEr0z7vKBDFTXmL1hgNgcziB5vyY 0QtWLwRU827BJaAaLqCa92wSDZ+ns0E0SEocXHGDZQIj7wJGhlWMQpl5ZbmJmTkmehmVeZkV esn5uZsYgaG/rPZP9A7GTxeCDzEKcDAq8fDOaN2WLsSaWFZcmXuIUYKDWUmEt61+S7oQb0pi ZVVqUX58UWlOavEhRmkOFiVxXqNv5SlCAumJJanZqakFqUUwWSYOTqkGxlY1HU9Rh223vj46 pv4+4Wr7Njc1rffTUz+eLbNqeyV0S3/7Q/V3mbdtll8z8bvBdtGx/sz2b0uSvjTtXS65VlL3 peyshw+2ijR3LbfqWtl5bJHFTTYVbnGtY2vqHB5vSrqy3Dms5Eji3lX8uxyahaLyT+g7ysRm +6hwCdnvXFkX3Oar+DivTYmlOCPRUIu5qDgRAGvyUlB5AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g+trpS3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlfGlYyFbwRaDioYN7WwNjIfUuxg5OSQETCTOND5kA7HZBNQlbtz4yQxiiwiY SRxs/cMOYjML3GWSONAPViMskCZxZftlMJtFQFVi9sRzjCA2r4CpRP+8q0wQM+UlVm84ADaH E2jOjxm9YPVCQDXvFlximsDItYCRYRWjSGZeWW5iZo6pXnF2RmVeZoVecn7uJkZgIC+r/TNx B+OXy+6HGAU4GJV4eB883pouxJpYVlyZe4hRgoNZSYS3rX5LuhBvSmJlVWpRfnxRaU5q8SFG aQ4WJXFer/DUBCGB9MSS1OzU1ILUIpgsEwenVAOjfqX8x5vquXsLZnGpK+5u0PXZdvR7R0D9 qTyvKexFop1y2qXm1ut16+PWvbnu6HnY86bbomPlJ3pea/yrzP2X/KRv/va6JxsW7d68IZPN c8J5qZhLjyzL1zWw1RnGRDaEnfpgY5txdl36zvVa+Rrr9zwv3KBTsdulubEh6tp9ppxpBz9p vNynxFKckWioxVxUnAgAi7hxFGACAAA= X-CFilter-Loop: Reflected X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 49C261A0009 X-Stat-Signature: yh4uydj6dho6xh5o794wmfwedz89ih9r X-Rspam-User: X-HE-Tag: 1740028842-809869 X-HE-Meta: U2FsdGVkX1+gEpQcKYQqjDD3PmV8l9Nhej5puiVRO3lUkdpXCFmUQrFxZw10ejbavAWiM46I2phLv+B2Ctq/pEj2hpccQS0bGnMF3h00j2uVx6HKgbHMy40g33dkX/OIFqak5WZJM3gzcvEnbYBU+57+XEkx1Jv7cyxzcU7Gcgwy3VL4r1l8IE9zWw9hs+2Iu7mWLfze/Lt3Us/9e+BlNuVFFBs/o1RC7eYUr5DtTNno+WIfImUnoJdP8wIyOJ9jVlkwNZSO2DiJWjWIncLh6iu6cw+icnmK8zVJVWj587G8rxTKmGbTTyUlC4roFZxptdvMLLdAxN7mEyUCB1/rRxp08ahdT5Dk8EcQ3LK9px8tzoXttNtyqWnr1UBV4NfVwl+CVWfamECyOSQRk/+hpjrBEaSzFUSgktg3Lypdu1XGcrReeBih2JKEZmOB16kakLI8iUOjoV+0NHDogDGZbKe1RQ1etnyHlXQ18/KHEXhqcCLhKwae9B5BiYCI3L5D0mopQIkSJeor1wyAbUt2d5msJkl/fDHQF28I6SDbsmHYX24fHM8uRlza7GNySSnNHXHXGYfTdAZB6jmtoUz0a1hYk1/D8ZNGDlYd9qOdBUQQQ6UbitXivrZpkIKVolLaUTYTM56xR/pGmLYQfQWD5O5KKXJlhIKFR5OqjZHFfBwf4H7FKQ79OG84fB1hDOS6WDdlr5BDCgOkpUrKY5tLOyWkekdtVrIcdkkjbHr9tMm9ERAd1DQEECZ7WXAt7dkT0lt8t8mpHbaENdMZlHaIPQ7uVrCQ0xJ5lwF0iaxuVUfV+9iPJgxHGojOVUOhYJ+ozwdtiEK8GgE5qxzZYiKVNfZUru+uLovMes+DCD5011ZzyPEsM6OTXPEB4PRgpmGosNcDVgJzymU4cl8sDiaUf3mW8QG416pIQx1ovbelo3p6FWFlQJuJTc0n7lZWn+15EIcIUJQNLvn+G0Bsxkh ObOVMk+L o5YfKW1tifn2VC+mrO4D566y0K4DerKWbq0BeIE0UmQBWPDgPZMhzP11F0f9ikU80rz6ngk7KuP9GkvQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to keep luf meta data per page while staying in pcp or buddy allocator. The meta data includes cpumask for tlb shootdown and luf's request generation number. Since struct page doesn't have enough room to store luf meta data, this patch introduces a hash table to store them and makes each page keep its hash key instead. Since all the pages in pcp or buddy share the hash table, confliction is inevitable so care must be taken when reading or updating its entry. Signed-off-by: Byungchul Park --- include/linux/mm_types.h | 10 ++++ mm/internal.h | 8 +++ mm/rmap.c | 122 +++++++++++++++++++++++++++++++++++++-- 3 files changed, 136 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 20d85c4e609de..39a6b5124b01f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -32,6 +32,16 @@ struct address_space; struct mem_cgroup; +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +struct luf_batch { + struct tlbflush_unmap_batch batch; + unsigned long ugen; + rwlock_t lock; +}; +#else +struct luf_batch {}; +#endif + /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the diff --git a/mm/internal.h b/mm/internal.h index e3084d32272e3..b38a9ae9d6993 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1240,6 +1240,8 @@ extern struct workqueue_struct *mm_percpu_wq; void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); #else static inline void try_to_unmap_flush(void) { @@ -1250,6 +1252,12 @@ static inline void try_to_unmap_flush_dirty(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset) +{ +} +static inline void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ +} #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ extern const struct trace_print_flags pageflag_names[]; diff --git a/mm/rmap.c b/mm/rmap.c index ed345503e4f88..74fbf6c2fb3a7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -641,7 +641,7 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, * function, ugen_before(), should be used to evaluate the temporal * sequence of events because the number is designed to wraparound. */ -static atomic_long_t __maybe_unused luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); +static atomic_long_t luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); /* * Don't return invalid luf_ugen, zero. @@ -656,6 +656,122 @@ static unsigned long __maybe_unused new_luf_ugen(void) return ugen; } +static void reset_batch(struct tlbflush_unmap_batch *batch) +{ + arch_tlbbatch_clear(&batch->arch); + batch->flush_required = false; + batch->writable = false; +} + +void fold_batch(struct tlbflush_unmap_batch *dst, + struct tlbflush_unmap_batch *src, bool reset) +{ + if (!src->flush_required) + return; + + /* + * Fold src to dst. + */ + arch_tlbbatch_fold(&dst->arch, &src->arch); + dst->writable = dst->writable || src->writable; + dst->flush_required = true; + + if (!reset) + return; + + /* + * Reset src. + */ + reset_batch(src); +} + +/* + * The range that luf_key covers, which is 'unsigned short' type. + */ +#define NR_LUF_BATCH (1 << (sizeof(short) * 8)) + +/* + * Use 0th entry as accumulated batch. + */ +static struct luf_batch luf_batch[NR_LUF_BATCH]; + +static void luf_batch_init(struct luf_batch *lb) +{ + rwlock_init(&lb->lock); + reset_batch(&lb->batch); + lb->ugen = atomic_long_read(&luf_ugen) - 1; +} + +static int __init luf_init(void) +{ + int i; + + for (i = 0; i < NR_LUF_BATCH; i++) + luf_batch_init(&luf_batch[i]); + + return 0; +} +early_initcall(luf_init); + +/* + * key to point an entry of the luf_batch array + * + * note: zero means invalid key + */ +static atomic_t luf_kgen = ATOMIC_INIT(1); + +/* + * Don't return invalid luf_key, zero. + */ +static unsigned short __maybe_unused new_luf_key(void) +{ + unsigned short luf_key = atomic_inc_return(&luf_kgen); + + if (!luf_key) + luf_key = atomic_inc_return(&luf_kgen); + + return luf_key; +} + +static void __fold_luf_batch(struct luf_batch *dst_lb, + struct tlbflush_unmap_batch *src_batch, + unsigned long src_ugen) +{ + /* + * dst_lb->ugen represents one that requires tlb shootdown for + * it, that is, sort of request number. The newer it is, the + * more tlb shootdown might be needed to fulfill the newer + * request. Conservertively keep the newer one. + */ + if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) + dst_lb->ugen = src_ugen; + fold_batch(&dst_lb->batch, src_batch, false); +} + +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ + unsigned long flags; + + /* + * Exactly same. Nothing to fold. + */ + if (dst == src) + return; + + if (&src->lock < &dst->lock) { + read_lock_irqsave(&src->lock, flags); + write_lock(&dst->lock); + } else { + write_lock_irqsave(&dst->lock, flags); + read_lock(&src->lock); + } + + __fold_luf_batch(dst, &src->batch, src->ugen); + + write_unlock(&dst->lock); + read_unlock_irqrestore(&src->lock, flags); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -670,9 +786,7 @@ void try_to_unmap_flush(void) return; arch_tlbbatch_flush(&tlb_ubc->arch); - arch_tlbbatch_clear(&tlb_ubc->arch); - tlb_ubc->flush_required = false; - tlb_ubc->writable = false; + reset_batch(tlb_ubc); } /* Flush iff there are potentially writable TLB entries that can race with IO */ From patchwork Thu Feb 20 05:20:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983327 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 542F0C021B1 for ; Thu, 20 Feb 2025 05:21:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A5FB328029A; Thu, 20 Feb 2025 00:20:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7507928029C; Thu, 20 Feb 2025 00:20:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01D2C28029C; Thu, 20 Feb 2025 00:20:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C3F7728029C for ; Thu, 20 Feb 2025 00:20:45 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 64A034B794 for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) X-FDA: 83139173250.09.04CBB72 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf01.hostedemail.com (Postfix) with ESMTP id 5307840008 for ; Thu, 20 Feb 2025 05:20:43 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=vQFAHc+axUGA7yRBKGsJtD6KO/5bvpteh4R9TndyrU0=; b=j8n6CAU1eaWdngXffCjEqYijeSuRsLE4wrWf8nDYqWbjoDLA7yNHhenvm75x404P5SFjek X7pyu+Ja6CyKUnvH9AQRxzGGJaexhYSfR9iq9VkemgpFnOZQ+vZEvOIFuwBxjg1bd43t8e R3QilkQvr4sNgsUiR4keP8jNEDpdt7I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028843; a=rsa-sha256; cv=none; b=8DEhnDK24+bfFfdYC8o9tBqMwLtYyNGsZRTyYrGHuJOuTkaGlXKUFP/xecHzwort93YnEP pzqOkzhpsElCOlDddLOQNVFmcit9Mz4ocftN7+/9eofMZ6e55jIEnpPUam9mmQh2RFJK34 Toc9HeQE/wt/FCcKNSEvLpGa1Mx+1Wk= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-db-67b6bba6bdc3 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 09/26] mm: introduce API to perform tlb shootdown on exit from page allocator Date: Thu, 20 Feb 2025 14:20:10 +0900 Message-Id: <20250220052027.58847-10-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDZqvaFjMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M +yvPMxc8Eq24MTG2gXG2UBcjJ4eEgInEjvZmRhj70fdrbCA2m4C6xI0bP5lBbBEBM4mDrX/Y QWxmgbtMEgf6wWqEBVIl9v3azwpiswioSnxZshKshheofsmViSwQM+UlVm84ADaHEyj+Y0Yv WK+QgKnEuwWXmLoYuYBq3rNJvNo0jQ2iQVLi4IobLBMYeRcwMqxiFMrMK8tNzMwx0cuozMus 0EvOz93ECAz8ZbV/oncwfroQfIhRgINRiYd3Ruu2dCHWxLLiytxDjBIczEoivG31W9KFeFMS K6tSi/Lji0pzUosPMUpzsCiJ8xp9K08REkhPLEnNTk0tSC2CyTJxcEo1MBaz89d1iDzqnOTv Wsv6OfT3ye8lpr67Dr3OvbT1yZ+oO7P1LnBV+f8qrBOOfT/Z5L92Z8dHsZRdZz4sVGV0u+Gl X9T0Up0zMH7hpd+Z50wTlAsjhWffMMx4L8zztVr4T8OtaC3lmTa/X7I5G16xjRdaffaukLwR b5auDzMT19VjPxg/mszrUmIpzkg01GIuKk4EAEubtG54AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g5UXZS3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlXF/5XnmgkeiFTcmxjYwzhbqYuTkkBAwkXj0/RobiM0moC5x48ZPZhBbRMBM 4mDrH3YQm1ngLpPEgX6wGmGBVIl9v/azgtgsAqoSX5asBKvhBapfcmUiC8RMeYnVGw6AzeEE iv+Y0QvWKyRgKvFuwSWmCYxcCxgZVjGKZOaV5SZm5pjqFWdnVOZlVugl5+duYgSG8bLaPxN3 MH657H6IUYCDUYmH98HjrelCrIllxZW5hxglOJiVRHjb6rekC/GmJFZWpRblxxeV5qQWH2KU 5mBREuf1Ck9NEBJITyxJzU5NLUgtgskycXBKNTDufxh/WJi78Pbx/Q1Xyk4sWdB0wM5nl/fC WMbUo0+0dp+P49vOF9PKni/QUx93dW+8g63bNjeNJY/2/gvKbbkn5phUrT5RwKph5vfZ6T9j L4r4r9i/8bejsbnO569m9v7sB2tPuZhv2Nfy+Whv56JpMbfX/ZvQYuhtfuSggv/lWyxPbCTX Ln2jxFKckWioxVxUnAgA3SuVa18CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 5307840008 X-Rspamd-Server: rspam07 X-Stat-Signature: z1xobbf31yiipdwudehdgiq144cc43zm X-HE-Tag: 1740028843-471591 X-HE-Meta: U2FsdGVkX18kENZzc+2a9j1fLTIGOS8Vysf6jk83ki9ivvoP9ulitH6Z6EY5X2ZzHKJM0D9aKbyoa9VmYtDqHgp8BeQsXKOH1bARTi0qc66oCCJuUYgP5/fcj5w9I/QQz+rOvMqK8U6bWYlfXXbNzIKQjQkWD8hEmhJmKr5TeIk5bceBStHQH1X2UIFWZdcCDQZMIyDBcWYClZyVxNKxQBhtjbXpD28J9UfFmmGQQCwlytE8XuZMSKyoNiz3ctp0enqF2l0yjkHqbkYLbyEreiMSVJwK5JZy8Svo3OOsYdZbWbjx+BV4nlE34jWtYzgEARy1xwkw4g0eJOHIEer5h1Y0TNJfpYraoCL1205jj/bPnnf28s56qsMkEnnVqLWuTx4lseKMCpACa+wG161K7yxtblEla27Bp/dA9o4lKdmXrmiGvhaItMyiwkLpOzuuizUzskoqxDsjpnf4qupzSOw5LxIl6qS9IzYe0a2wzP3sKZMouOhjYhnYA/+zbj3XagJf0RRmW5/XUDU+trcS9QsJLBFTh+x04nvk0kkodLSW0fXgwjLtwfvfgBDNHFvV3tL072TVhEFPcVfsyj+ghB3aqYNkhV3afZp7KcU0FGJm0i/XvxWUsWR9KUXa6ugnEPCVukh1rEDmgsobja0Tvj1nyYilZH/LCdcgLKXSt6z1awA03wEyD79rJDru0ix6crvUlDL60NJV/TR5Rb5lbFCLB1NRAlO8JxpUOU7yIfc34KjtHKyQlcfbsMS2UShvcfcSDq/MOGA3DOTAh/adHVrx1gvFkcpLNzPAartLNv+yGlu1yYKzTOyUXowKYPFAzSNnW5IaaSQf2J/ivpV3nu22Dz9/lsFZfODUcEzHqPbTo04E8UtLX17zyCYQp+AjX+FI0AtVPE6f52MjKUqksKJhhRfPnzilkleUVUpjKE3GmUmoepe+7LX4yUKlGLmH/GzPovoukyf2JalFJPN aZg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that performs tlb shootdown required on exit from page allocator. This patch introduced a new API rather than making use of existing try_to_unmap_flush() to avoid repeated and redundant tlb shootdown due to frequent page allocations during a session of batched unmap flush. Signed-off-by: Byungchul Park --- include/linux/sched.h | 1 + mm/internal.h | 4 ++++ mm/rmap.c | 20 ++++++++++++++++++++ 3 files changed, 25 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index bb343136ddd05..8e6e7a83332cf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1375,6 +1375,7 @@ struct task_struct { #endif struct tlbflush_unmap_batch tlb_ubc; + struct tlbflush_unmap_batch tlb_ubc_takeoff; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/mm/internal.h b/mm/internal.h index b38a9ae9d6993..cbdebf8a02437 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1239,6 +1239,7 @@ extern struct workqueue_struct *mm_percpu_wq; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); +void try_to_unmap_flush_takeoff(void); void flush_tlb_batched_pending(struct mm_struct *mm); void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); @@ -1249,6 +1250,9 @@ static inline void try_to_unmap_flush(void) static inline void try_to_unmap_flush_dirty(void) { } +static inline void try_to_unmap_flush_takeoff(void) +{ +} static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } diff --git a/mm/rmap.c b/mm/rmap.c index 74fbf6c2fb3a7..72c5e665e59a4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -772,6 +772,26 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +void try_to_unmap_flush_takeoff(void) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + + if (!tlb_ubc_takeoff->flush_required) + return; + + arch_tlbbatch_flush(&tlb_ubc_takeoff->arch); + + /* + * Now that tlb shootdown of tlb_ubc_takeoff has been performed, + * it's good chance to shrink tlb_ubc if possible. + */ + if (arch_tlbbatch_done(&tlb_ubc->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc); + + reset_batch(tlb_ubc_takeoff); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed From patchwork Thu Feb 20 05:20:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39494C021B0 for ; Thu, 20 Feb 2025 05:21:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F169328029D; Thu, 20 Feb 2025 00:20:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E514728029A; Thu, 20 Feb 2025 00:20:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9E0A28029D; Thu, 20 Feb 2025 00:20:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9CF1028029A for ; Thu, 20 Feb 2025 00:20:45 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 64888C0B27 for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) X-FDA: 83139173250.07.A93EE6E Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf06.hostedemail.com (Postfix) with ESMTP id 55531180007 for ; Thu, 20 Feb 2025 05:20:43 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028843; a=rsa-sha256; cv=none; b=TjaJr1cFZCKg9Lp25ehEtpY1UopypWyazxru5/2HbSoMNMIS6dkXgU5WN6oZoaMgV0FFFw u5KkYrix0H2SFqfEBefTvx1qvKfLu6XhZuoUo3Hk4LlVrJolnHFhT49kFzHFl4bWGBTiWc iRc8MOJZ4tH9pTrrBneZY7EUCBjavnY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=XzE1kT1uptzc64TkYVR6n8S5++uBGc/ihOZIXS2Ozyo=; b=OwPGdGRodJEiPyZqh3e0g9qOZSg3tvMyCfvi7chjLb2ZV1Mu2iP40pR1a749t19mxECv6G xa+v5eP43Th1ceSXPTTeMsEH05PgACDyLBn5CKGbWCCbEIRl9jgJ+5zaQ2CVh90P22zMyU A6EEnU9eESpVIDIOCx1/3RPPFKNFa+4= X-AuditID: a67dfc5b-3c9ff7000001d7ae-e0-67b6bba66069 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 10/26] mm: introduce APIs to check if the page allocation is tlb shootdownable Date: Thu, 20 Feb 2025 14:20:11 +0900 Message-Id: <20250220052027.58847-11-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnke6y3dvSDWZ+0bGYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ N57sYin4aVyx6vhV1gbG41pdjJwcEgImEls7G5lg7G2vrjCD2GwC6hI3bvwEs0UEzCQOtv5h B7GZBe4ySRzoZwOxhQXSJH7PmMQCYrMIqEq0bJoAZvMC1f9+soAVYqa8xOoNB8DmcALFf8zo BesVEjCVeLfgEtBeLqCa92wSs+92Qh0hKXFwxQ2WCYy8CxgZVjEKZeaV5SZm5pjoZVTmZVbo JefnbmIEhv6y2j/ROxg/XQg+xCjAwajEwzujdVu6EGtiWXFl7iFGCQ5mJRHetvot6UK8KYmV ValF+fFFpTmpxYcYpTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYp9RLvdDJNF78MEbk HLP+r1fMKenVZxlEuVUvH3u2XHUG053ERflFFfdlf7Ku+DZPZGrpSecC/n1TA3i0b8V0F8xy uKpR7vg5QGLFmrifb+QSz7bPTvG98Gpmcmhd2oOps+IeSQnt7t3rPN2N/d7lgqf3L2v6XL03 /9fUCPZ7sa6rn5QeE6pVVmIpzkg01GIuKk4EAEnlmJR5AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g0VvlCzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlXHjyS6Wgp/GFauOX2VtYDyu1cXIySEhYCKx7dUVZhCbTUBd4saNn2C2iICZ xMHWP+wgNrPAXSaJA/1sILawQJrE7xmTWEBsFgFViZZNE8BsXqD6308WsELMlJdYveEA2BxO oPiPGb1gvUICphLvFlximsDItYCRYRWjSGZeWW5iZo6pXnF2RmVeZoVecn7uJkZgIC+r/TNx B+OXy+6HGAU4GJV4eB883pouxJpYVlyZe4hRgoNZSYS3rX5LuhBvSmJlVWpRfnxRaU5q8SFG aQ4WJXFer/DUBCGB9MSS1OzU1ILUIpgsEwenVAOj1Knmcj6lN6yen68f2Rr47sFtX+ubN1pz zE0WrGzM0DbZemdqp5+7vuonVyfp+vX8p32yN/1apSYS8kHNzyJY4qd8kv+nTb0pTTJlPBei ktwCJzrnXpU4elP9RvCdk6dmSj+72c67naFhdk5QYfvez0++JC/tudS6yfOa+b1bW412lE5R f7hMiaU4I9FQi7moOBEAyKhgsmACAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 55531180007 X-Rspamd-Server: rspam12 X-Stat-Signature: nqdyuooz5pg9h68cptw88eorpx9fbj97 X-HE-Tag: 1740028843-318014 X-HE-Meta: U2FsdGVkX198m66CRIMQPk/I+eArbLEmz0ZcrgdoTBkb/Ogwq4rvJ9/5VR+ahjCsVua1LchXGBX916LhuJWqBM6x9Djh7pp39IAcXaUjg3lSnX39DeRINYp8EW0mm7JGVolKf9dYA5pHdTpI+RbffVhsyrLMv1k0a1+dVztWHp6KiobRDYIdKIAzdHyC1FABzK0bTMkHUR+0IJbHOp+CO2SWfflEWs1ZqXT6Hhj/y/YZJjR9Cl8myIpEStpAueLBJKFVgpDARmYmNrXuDXEkheyjdF2wRNl+5Bj26WAxzWIEErzV+DRl1vzB+Ohll2NiOz7umOIpJ7NEQrxN/qgfSybDwItcH3+64u13OoqWTROob3gmt+7dSLbN5DkpoCi5mWQWC7wOihmWDKDNPJZbLZmewh/SPFKDs/izDJ0UG+l8KyQhTv4obpwxdq3PNpcHfrYIJSQ60sF1rP7cFcsUdkdx+l52BwCEZZstpLD/CdBdzLYkKkhG2YdJf8dimXP7iwqeGwxgFPdgT8e0I3PeHLgO+20k9f0+mx5RSnXndT48GFxt3CVHkw32pPMq0MCMFzVxIXWATAk+aHi6k7dYdWHFK3t2MLAqUluVtIOKhbvsquMGlWFc/w2CVFiF7ELtx0+XLBXO4aO0qqtDQSItDn9oOVoCosGNj+nSC1PpY/nUGQ/p/UCttVY9A92GP9FaqpEuWrYmdIewPrg6YHYPiNZ63lJgHAAGe3Vuf0GoFWHYryxRlWsTjgHjLaSzCdBRCUZ/7JtnNRy0XJqQgXFRMGUplOXgPdHGfQ4NZttzyY2Cz0WvPs33kaLYwYLGG255sS4J1rlizuM2kU6UvwCu1N71Wa53c+leR77XFHdWn1aHrR+i9byTdiZ43IcMi1pdPA5R8oJaRhxMG4Fagg0E3SvKxyoU8V1F4UyySm6ua/AVKZuovaxCmKJA9Px9/JpxDYhbuFRzSd3CxNWrcmf PNA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that should indentify if tlb shootdown can be performed on page allocation. In a context with irq disabled or non-task, tlb shootdown cannot be performed because of deadlock issue. Thus, page allocator should work being aware of whether tlb shootdown can be performed on returning page. This patch introduced APIs that pcp or buddy page allocator can use to delimit the critical sections taking off pages and indentify whether tlb shootdown can be performed. Signed-off-by: Byungchul Park --- include/linux/sched.h | 5 ++ mm/internal.h | 14 ++++ mm/page_alloc.c | 159 ++++++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 2 +- 4 files changed, 179 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8e6e7a83332cf..c4ff83e1d5953 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1374,6 +1374,11 @@ struct task_struct { struct callback_head cid_work; #endif +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + int luf_no_shootdown; + int luf_takeoff_started; +#endif + struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; diff --git a/mm/internal.h b/mm/internal.h index cbdebf8a02437..55bc8ca0d6118 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1583,6 +1583,20 @@ static inline void accept_page(struct page *page) { } #endif /* CONFIG_UNACCEPTED_MEMORY */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +extern struct luf_batch luf_batch[]; +bool luf_takeoff_start(void); +void luf_takeoff_end(void); +bool luf_takeoff_no_shootdown(void); +bool luf_takeoff_check(struct page *page); +bool luf_takeoff_check_and_fold(struct page *page); +#else +static inline bool luf_takeoff_start(void) { return false; } +static inline void luf_takeoff_end(void) {} +static inline bool luf_takeoff_no_shootdown(void) { return true; } +static inline bool luf_takeoff_check(struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +#endif /* pagewalk.c */ int walk_page_range_mm(struct mm_struct *mm, unsigned long start, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 244cb30496be5..cac2c95ca2430 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -622,6 +622,165 @@ compaction_capture(struct capture_control *capc, struct page *page, } #endif /* CONFIG_COMPACTION */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +static bool no_shootdown_context(void) +{ + /* + * If it performs with irq disabled, that might cause a deadlock. + * Avoid tlb shootdown in this case. + */ + return !(!irqs_disabled() && in_task()); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_start(void) +{ + unsigned long flags; + bool no_shootdown = no_shootdown_context(); + + local_irq_save(flags); + + /* + * It's the outmost luf_takeoff_start(). + */ + if (!current->luf_takeoff_started) + VM_WARN_ON(current->luf_no_shootdown); + + /* + * current->luf_no_shootdown > 0 doesn't mean tlb shootdown is + * not allowed at all. However, it guarantees tlb shootdown is + * possible once current->luf_no_shootdown == 0. It might look + * too conservative but for now do this way for simplity. + */ + if (no_shootdown || current->luf_no_shootdown) + current->luf_no_shootdown++; + + current->luf_takeoff_started++; + local_irq_restore(flags); + + return !no_shootdown; +} + +/* + * Should be called within the same context of luf_takeoff_start(). + */ +void luf_takeoff_end(void) +{ + unsigned long flags; + bool no_shootdown; + bool outmost = false; + + local_irq_save(flags); + VM_WARN_ON(!current->luf_takeoff_started); + + /* + * Assume the context and irq flags are same as those at + * luf_takeoff_start(). + */ + if (current->luf_no_shootdown) + current->luf_no_shootdown--; + + no_shootdown = !!current->luf_no_shootdown; + + current->luf_takeoff_started--; + + /* + * It's the outmost luf_takeoff_end(). + */ + if (!current->luf_takeoff_started) + outmost = true; + + local_irq_restore(flags); + + if (no_shootdown) + goto out; + + try_to_unmap_flush_takeoff(); +out: + if (outmost) + VM_WARN_ON(current->luf_no_shootdown); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_no_shootdown(void) +{ + bool no_shootdown = true; + unsigned long flags; + + local_irq_save(flags); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + goto out; + } + no_shootdown = current->luf_no_shootdown; +out: + local_irq_restore(flags); + return no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check(struct page *page) +{ + unsigned short luf_key = page_luf_key(page); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + return !current->luf_no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check_and_fold(struct page *page) +{ + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long flags; + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc_takeoff, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + return true; +} +#endif + static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) { diff --git a/mm/rmap.c b/mm/rmap.c index 72c5e665e59a4..1581b1a00f974 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -693,7 +693,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, /* * Use 0th entry as accumulated batch. */ -static struct luf_batch luf_batch[NR_LUF_BATCH]; +struct luf_batch luf_batch[NR_LUF_BATCH]; static void luf_batch_init(struct luf_batch *lb) { From patchwork Thu Feb 20 05:20:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983328 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C185AC021B0 for ; Thu, 20 Feb 2025 05:21:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC5882802A0; Thu, 20 Feb 2025 00:20:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E768A28029C; Thu, 20 Feb 2025 00:20:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7A032802A0; Thu, 20 Feb 2025 00:20:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7748628029E for ; Thu, 20 Feb 2025 00:20:46 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3E6D4A29E3 for ; Thu, 20 Feb 2025 05:20:46 +0000 (UTC) X-FDA: 83139173292.20.324B155 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id 2323E4000B for ; Thu, 20 Feb 2025 05:20:43 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028844; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=BlPr9MummTSz2XZ4A9QJ7I4gK8u6yvXv1Ly2oUZzdzo=; b=wIJ0vUIgrM9iVucbVSw7c+qKBKGLaAkfO/tIY1QReS++UpZ7fEx4DD0uWpsAHhiB0kVRpN pm1m5SnRj/xzBIRzoF7S6pquA4JKqP0AdA5j91qFhhu7sW9jItHfDgxuUgE4bi75YwQDU4 TP3j9E2DyiyeJ6LDy82iyZk8X/ijxW4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028844; a=rsa-sha256; cv=none; b=QOxCXW6ukDlI3YmXdWbp+Q6nGkD8+pjoF6d532rsQZ9HftW0QeriPg0gHc9Kt8DN/5fTsE 8TOG5XrAjU6fwnD0RSTEOlGb02HSMMPZn/G5/LKZR7p9DPDE8NCSqkeADYl71n/uLZN7ZN bE/BYRI3lXzARYyyY8I3i8UtIgPjrIQ= X-AuditID: a67dfc5b-3c9ff7000001d7ae-e5-67b6bba63df4 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 11/26] mm: deliver luf_key to pcp or buddy on free after unmapping Date: Thu, 20 Feb 2025 14:20:12 +0900 Message-Id: <20250220052027.58847-12-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDe5tMbCYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ vd82shUsb2esmNrdwtbAuDS3i5GTQ0LARGLi79NMMPaKpmuMIDabgLrEjRs/mUFsEQEziYOt f9hBbGaBu0wSB/rZQGxhgSiJZ18/sYDYLAKqElN+tQDVcHDwAtX3rMuAGCkvsXrDAbAxnEDh HzN6wVqFBEwl3i24BLSWC6jmM5vElo19zBANkhIHV9xgmcDIu4CRYRWjUGZeWW5iZo6JXkZl XmaFXnJ+7iZGYOgvq/0TvYPx04XgQ4wCHIxKPLwzWrelC7EmlhVX5h5ilOBgVhLhbavfki7E m5JYWZValB9fVJqTWnyIUZqDRUmc1+hbeYqQQHpiSWp2ampBahFMlomDU6qBMXnmpRM6aZlm j5xl2bZV/gtaMClEdGbDi4ePT3x6/qDtoNzhgzuvvNXVrvvxTCm6XbRea3LlrR8iQjc3/Fmb UOr/bM3aiXUr8nJvz0+cuG/V/jOMz7Zs61nxbGvHlV0Lkp9L3LTOaXDNk7l06rqcW/7FXFPv Z/v5VZc+y5i2WqS+ivdV2MHZdYpKLMUZiYZazEXFiQCY1sPSeQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g8ZLqhZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLw3JOsFpd3zWGzuLfmP6vF+V1rWS12LN3HZHHpwAImi+O9B5gs 5t/7zGaxedNUZovjU6YyWvz+AVR8ctZkFgdBj++tfSweO2fdZfdYsKnUY/MKLY/Fe14yeWxa 1cnmsenTJHaPd+fOsXucmPGbxWPeyUCP9/uusnksfvGByWPrLzuPxqnX2Dw+b5IL4I/isklJ zcksSy3St0vgyuj9tpGtYHk7Y8XU7ha2BsaluV2MnBwSAiYSK5quMYLYbALqEjdu/GQGsUUE zCQOtv5hB7GZBe4ySRzoZwOxhQWiJJ59/cQCYrMIqEpM+dUCVMPBwQtU37MuA2KkvMTqDQfA xnAChX/M6AVrFRIwlXi34BLTBEauBYwMqxhFMvPKchMzc0z1irMzKvMyK/SS83M3MQIDeVnt n4k7GL9cdj/EKMDBqMTD++Dx1nQh1sSy4srcQ4wSHMxKIrxt9VvShXhTEiurUovy44tKc1KL DzFKc7AoifN6hacmCAmkJ5akZqemFqQWwWSZODilGhjdRPmd/t9OuqkS7a9ccOAc09F/Hy4E HvtpzrVyvfKdBJddIp63Qg/MnHJNm/H00bun4lduuX4yMu+7SkXb1v11M0o26tfc/3Jmj8e7 2uYsr6AD257sXsne+XR3SvfvsHf9Lh63eTaln1jzdf46CeuNjnK1KlUXfsSrtWRMj/ok2XaX Me/aFq1SJZbijERDLeai4kQAPKqb5WACAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 2323E4000B X-Stat-Signature: ag1ppihdamqsunr34fhe9c5catw494w6 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740028843-819054 X-HE-Meta: U2FsdGVkX1/JkdXLWcQwNw8TApyouwbxw+sdnZRFD3Jf38na4Y5IH6EwuCf4jGf7MfaoyzdTRKltOOlfZofOecplykfkIAT3fDS0Rzmp/kvY7rQU9exiq+9SqvpePOFaXfJIeYBr5ia2se135WQjsfaymFfBkQFH+FcOYQBMHrdcUReiKdESTiOQD+NTd1FaH62+QAUUu+ptfR8uPo38PbgBZ4Rnwifa2TaB819SnF6utc0qqJniVeDI42sP8g9AOr4Zga1nTvnpb63Z43Sf80swRo4dhfPrm7C/zgWUY4Rr7iRIhIgF4A0DZPXr/axfdi3bG4095dNTK47pM9hmiVUkCobYg1+lQfD2hHhg94ezCcw0W3UIB6KAJxQ8BPB+B/Ija0UexpvwgL/ApjUethq9vPaasR/7gGSzk1DibtRB4BhpKXLWGNBmjr85OAYfxG9RDqMnAR8GqXNG4ggBGdp7aDGdTILFQbPmH3ptewJY7FdIr30e309goOpC4O1G52L5AyUExp8EhgLCF/T0Qfylp65b8roFYB8AgWmvbHOH617xMx0Jlk0YzVYA5OzH5O3y3eKiO11Dt3WuZ/i296mJOuM4IV4kvMfyzTQmWW0ZifLcjN9RH3bwYzkY5DTAz9ys0y7/0JofnybC1uQdZeTHtGjgGMueDPglbv043TM70o6Vf1WLlRcg0jQUAsAuLO5riESZNMyCaqFAmHmV37K33GFx+nZ42jNoEWNwO+xC9A+tq7xm7Eq6BCW940allnSSrIhbvOxYIuX6e4wa0PrNUZ28FwE7QwLyHKHCqC24ZiyfKJaEvMwBn+kw5pkZ2EFDWXOJNMcC4RHe8M++nCiIjllAMgV7G6dJ+Y9KI2KRSYVRZGYJYUrK/16+gt49MIn3UGo0KO3mAWacs+O11paRGRs+YlPpU17SdqUoANLyiBPnhRWcDuv6tHq0x8q9z7iA9zfiFNhAiY/5qwd kKIadqdV cNO5/FTLracvrOoUkO+e65v1G90uf4mY/JuPW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to pass luf_key to pcp or buddy allocator on free after unmapping e.g. during page reclaim or page migration. The luf_key will be used to track need of tlb shootdown and which cpus need to perform tlb flush, per page residing in pcp or buddy, and should be handed over properly when pages travel between pcp and buddy. Signed-off-by: Byungchul Park --- mm/internal.h | 4 +- mm/page_alloc.c | 120 ++++++++++++++++++++++++++++++++------------ mm/page_isolation.c | 6 +++ mm/page_reporting.c | 6 +++ mm/swap.c | 4 +- mm/vmscan.c | 8 +-- 6 files changed, 109 insertions(+), 39 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 55bc8ca0d6118..2bb54bc04260b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -741,8 +741,8 @@ extern bool free_pages_prepare(struct page *page, unsigned int order); extern int user_min_free_kbytes; -void free_unref_page(struct page *page, unsigned int order); -void free_unref_folios(struct folio_batch *fbatch); +void free_unref_page(struct page *page, unsigned int order, unsigned short luf_key); +void free_unref_folios(struct folio_batch *fbatch, unsigned short luf_key); extern void zone_pcp_reset(struct zone *zone); extern void zone_pcp_disable(struct zone *zone); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cac2c95ca2430..05a1098f8c61f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -212,7 +212,7 @@ unsigned int pageblock_order __read_mostly; #endif static void __free_pages_ok(struct page *page, unsigned int order, - fpi_t fpi_flags); + fpi_t fpi_flags, unsigned short luf_key); /* * results with 256, 32 in the lowmem_reserve sysctl: @@ -850,8 +850,13 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon list_del(&page->buddy_list); __ClearPageBuddy(page); - set_page_private(page, 0); zone->free_area[order].nr_free--; + + /* + * Keep head page's private until post_alloc_hook(). + * + * XXX: Tail pages' private doesn't get cleared. + */ } static inline void del_page_from_free_list(struct page *page, struct zone *zone, @@ -920,7 +925,7 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn, static inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, - int migratetype, fpi_t fpi_flags) + int migratetype, fpi_t fpi_flags, unsigned short luf_key) { struct capture_control *capc = task_capc(zone); unsigned long buddy_pfn = 0; @@ -937,10 +942,21 @@ static inline void __free_one_page(struct page *page, account_freepages(zone, 1 << order, migratetype); + /* + * Use the page's luf_key unchanged if luf_key == 0. Worth + * noting that page_luf_key() will be 0 in most cases since it's + * initialized at free_pages_prepare(). + */ + if (luf_key) + set_page_luf_key(page, luf_key); + else + luf_key = page_luf_key(page); + while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; + unsigned short buddy_luf_key; - if (compaction_capture(capc, page, order, migratetype)) { + if (!luf_key && compaction_capture(capc, page, order, migratetype)) { account_freepages(zone, -(1 << order), migratetype); return; } @@ -973,6 +989,18 @@ static inline void __free_one_page(struct page *page, else __del_page_from_free_list(buddy, zone, order, buddy_mt); + /* + * !buddy_luf_key && !luf_key : do nothing + * buddy_luf_key && !luf_key : luf_key = buddy_luf_key + * !buddy_luf_key && luf_key : do nothing + * buddy_luf_key && luf_key : merge two into luf_key + */ + buddy_luf_key = page_luf_key(buddy); + if (buddy_luf_key && !luf_key) + luf_key = buddy_luf_key; + else if (buddy_luf_key && luf_key) + fold_luf_batch(&luf_batch[luf_key], &luf_batch[buddy_luf_key]); + if (unlikely(buddy_mt != migratetype)) { /* * Match buddy type. This ensures that an @@ -984,6 +1012,7 @@ static inline void __free_one_page(struct page *page, combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); + set_page_luf_key(page, luf_key); pfn = combined_pfn; order++; } @@ -1164,6 +1193,11 @@ __always_inline bool free_pages_prepare(struct page *page, VM_BUG_ON_PAGE(PageTail(page), page); + /* + * Ensure private is zero before using it inside allocator. + */ + set_page_private(page, 0); + trace_mm_page_free(page, order); kmsan_free_page(page, order); @@ -1329,7 +1363,8 @@ static void free_pcppages_bulk(struct zone *zone, int count, count -= nr_pages; pcp->count -= nr_pages; - __free_one_page(page, pfn, zone, order, mt, FPI_NONE); + __free_one_page(page, pfn, zone, order, mt, FPI_NONE, 0); + trace_mm_page_pcpu_drain(page, order, mt); } while (count > 0 && !list_empty(list)); } @@ -1353,7 +1388,7 @@ static void split_large_buddy(struct zone *zone, struct page *page, while (pfn != end) { int mt = get_pfnblock_migratetype(page, pfn); - __free_one_page(page, pfn, zone, order, mt, fpi); + __free_one_page(page, pfn, zone, order, mt, fpi, 0); pfn += 1 << order; page = pfn_to_page(pfn); } @@ -1361,11 +1396,18 @@ static void split_large_buddy(struct zone *zone, struct page *page, static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, - fpi_t fpi_flags) + fpi_t fpi_flags, unsigned short luf_key) { unsigned long flags; spin_lock_irqsave(&zone->lock, flags); + + /* + * valid luf_key can be passed only if order == 0. + */ + VM_WARN_ON(luf_key && order); + set_page_luf_key(page, luf_key); + split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -1373,13 +1415,13 @@ static void free_one_page(struct zone *zone, struct page *page, } static void __free_pages_ok(struct page *page, unsigned int order, - fpi_t fpi_flags) + fpi_t fpi_flags, unsigned short luf_key) { unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); if (free_pages_prepare(page, order)) - free_one_page(zone, page, pfn, order, fpi_flags); + free_one_page(zone, page, pfn, order, fpi_flags, luf_key); } void __meminit __free_pages_core(struct page *page, unsigned int order, @@ -1433,7 +1475,7 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, * Bypass PCP and place fresh pages right to the tail, primarily * relevant for memory onlining. */ - __free_pages_ok(page, order, FPI_TO_TAIL); + __free_pages_ok(page, order, FPI_TO_TAIL, 0); } /* @@ -2459,6 +2501,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, if (unlikely(page == NULL)) break; + /* + * Keep the page's luf_key. + */ + /* * Split buddy pages returned by expand() are received here in * physical page order. The page is added to the tail of @@ -2740,12 +2786,14 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, unsigned short luf_key) { int high, batch; int pindex; bool free_high = false; + set_page_luf_key(page, luf_key); + /* * On freeing, reduce the number of pages that are batch allocated. * See nr_pcp_alloc() where alloc_factor is increased for subsequent @@ -2754,7 +2802,16 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, pcp->alloc_factor >>= 1; __count_vm_events(PGFREE, 1 << order); pindex = order_to_pindex(migratetype, order); - list_add(&page->pcp_list, &pcp->lists[pindex]); + + /* + * Defer tlb shootdown as much as possible by putting luf'd + * pages to the tail. + */ + if (luf_key) + list_add_tail(&page->pcp_list, &pcp->lists[pindex]); + else + list_add(&page->pcp_list, &pcp->lists[pindex]); + pcp->count += 1 << order; batch = READ_ONCE(pcp->batch); @@ -2789,7 +2846,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, /* * Free a pcp page */ -void free_unref_page(struct page *page, unsigned int order) +void free_unref_page(struct page *page, unsigned int order, + unsigned short luf_key) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2798,7 +2856,7 @@ void free_unref_page(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, FPI_NONE, luf_key); return; } @@ -2815,7 +2873,7 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, FPI_NONE, luf_key); return; } migratetype = MIGRATE_MOVABLE; @@ -2825,10 +2883,10 @@ void free_unref_page(struct page *page, unsigned int order) pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, migratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order, luf_key); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE, luf_key); } pcp_trylock_finish(UP_flags); } @@ -2836,7 +2894,7 @@ void free_unref_page(struct page *page, unsigned int order) /* * Free a batch of folios */ -void free_unref_folios(struct folio_batch *folios) +void free_unref_folios(struct folio_batch *folios, unsigned short luf_key) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp = NULL; @@ -2857,7 +2915,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (!pcp_allowed_order(order)) { free_one_page(folio_zone(folio), &folio->page, - pfn, order, FPI_NONE); + pfn, order, FPI_NONE, luf_key); continue; } folio->private = (void *)(unsigned long)order; @@ -2893,7 +2951,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (is_migrate_isolate(migratetype)) { free_one_page(zone, &folio->page, pfn, - order, FPI_NONE); + order, FPI_NONE, luf_key); continue; } @@ -2906,7 +2964,7 @@ void free_unref_folios(struct folio_batch *folios) if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); free_one_page(zone, &folio->page, pfn, - order, FPI_NONE); + order, FPI_NONE, luf_key); continue; } locked_zone = zone; @@ -2921,7 +2979,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_unref_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, luf_key); } if (pcp) { @@ -3013,7 +3071,7 @@ void __putback_isolated_page(struct page *page, unsigned int order, int mt) /* Return isolated page to tail of freelist. */ __free_one_page(page, page_to_pfn(page), zone, order, mt, - FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL); + FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL, 0); } /* @@ -4983,11 +5041,11 @@ void __free_pages(struct page *page, unsigned int order) struct alloc_tag *tag = pgalloc_tag_get(page); if (put_page_testzero(page)) - free_unref_page(page, order); + free_unref_page(page, order, 0); else if (!head) { pgalloc_tag_sub_pages(tag, (1 << order) - 1); while (order-- > 0) - free_unref_page(page + (1 << order), order); + free_unref_page(page + (1 << order), order, 0); } } EXPORT_SYMBOL(__free_pages); @@ -5049,7 +5107,7 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); if (page_ref_sub_and_test(page, count)) - free_unref_page(page, compound_order(page)); + free_unref_page(page, compound_order(page), 0); } EXPORT_SYMBOL(__page_frag_cache_drain); @@ -5090,7 +5148,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, goto refill; if (unlikely(nc->pfmemalloc)) { - free_unref_page(page, compound_order(page)); + free_unref_page(page, compound_order(page), 0); goto refill; } @@ -5134,7 +5192,7 @@ void page_frag_free(void *addr) struct page *page = virt_to_head_page(addr); if (unlikely(put_page_testzero(page))) - free_unref_page(page, compound_order(page)); + free_unref_page(page, compound_order(page), 0); } EXPORT_SYMBOL(page_frag_free); @@ -5154,7 +5212,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order, last = page + (1UL << order); for (page += nr; page < last; page++) - __free_pages_ok(page, 0, FPI_TO_TAIL); + __free_pages_ok(page, 0, FPI_TO_TAIL, 0); } return (void *)addr; } @@ -7124,7 +7182,7 @@ bool put_page_back_buddy(struct page *page) int migratetype = get_pfnblock_migratetype(page, pfn); ClearPageHWPoisonTakenOff(page); - __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE, 0); if (TestClearPageHWPoison(page)) { ret = true; } @@ -7193,7 +7251,7 @@ static void __accept_page(struct zone *zone, unsigned long *flags, accept_memory(page_to_phys(page), PAGE_SIZE << MAX_PAGE_ORDER); - __free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL); + __free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL, 0); if (last) static_branch_dec(&zones_with_unaccepted_pages); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 7e04047977cfe..8467838d4dbc8 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -258,6 +258,12 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) WARN_ON_ONCE(!move_freepages_block_isolate(zone, page, migratetype)); } else { set_pageblock_migratetype(page, migratetype); + + /* + * Do not clear the page's private to keep its luf_key + * unchanged. + */ + __putback_isolated_page(page, order, migratetype); } zone->nr_isolate_pageblock--; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index e4c428e61d8c1..c05afb7a395f1 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -116,6 +116,12 @@ page_reporting_drain(struct page_reporting_dev_info *prdev, int mt = get_pageblock_migratetype(page); unsigned int order = get_order(sg->length); + /* + * Ensure private is zero before putting into the + * allocator. + */ + set_page_private(page, 0); + __putback_isolated_page(page, order, mt); /* If the pages were not reported due to error skip flagging */ diff --git a/mm/swap.c b/mm/swap.c index 10decd9dffa17..54b0ba10dbb86 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -109,7 +109,7 @@ void __folio_put(struct folio *folio) page_cache_release(folio); folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); - free_unref_page(&folio->page, folio_order(folio)); + free_unref_page(&folio->page, folio_order(folio), 0); } EXPORT_SYMBOL(__folio_put); @@ -959,7 +959,7 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) folios->nr = j; mem_cgroup_uncharge_folios(folios); - free_unref_folios(folios); + free_unref_folios(folios, 0); } EXPORT_SYMBOL(folios_put_refs); diff --git a/mm/vmscan.c b/mm/vmscan.c index 76378bc257e38..2970a8f35d3d3 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1480,7 +1480,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); } continue; @@ -1548,7 +1548,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); list_splice(&ret_folios, folio_list); count_vm_events(PGACTIVATE, pgactivate); @@ -1868,7 +1868,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); spin_lock_irq(&lruvec->lru_lock); } @@ -1890,7 +1890,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (free_folios.nr) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); spin_lock_irq(&lruvec->lru_lock); } From patchwork Thu Feb 20 05:20:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983329 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E854C021AD for ; Thu, 20 Feb 2025 05:21:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32C5728029C; Thu, 20 Feb 2025 00:20:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2641128029F; Thu, 20 Feb 2025 00:20:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC1C628029E; Thu, 20 Feb 2025 00:20:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9E10A28029F for ; Thu, 20 Feb 2025 00:20:46 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 55924160DA9 for ; Thu, 20 Feb 2025 05:20:46 +0000 (UTC) X-FDA: 83139173292.15.D785411 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf22.hostedemail.com (Postfix) with ESMTP id 465EEC0002 for ; Thu, 20 Feb 2025 05:20:43 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028844; a=rsa-sha256; cv=none; b=LKI9tnhIA6zOKc0cm25XhQtHW9D+XHR49bQs1BL1IewcooE3NGVr3Wibb4CtZdzOn4LiiZ kolRvv6bx5VznOSFgjGlNJxkv6IqSiLOFg7U+krqIuSmKQ6JpEmXUib75OhwRnMXjjjJj5 y7g30S4ebaWnCss5yWI62g7ifLgsHaA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028844; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=whP2bQnFsDqlvqZMb4Jc88JotmoO0WPggvce4XIFNPY=; b=EKheyHAid4/nqy3VQFUN6bAn/zuuZiPUPua6/jbOhxPVrOUNFoouWds37qnha8ie6nn65u vi52DrDlIhbkkVvKu0t8JMUJMHqlfSeGZvMA81isBZ2+l+Ts7evsora1ekST0BNGRzYgII Ftjr7GyFGdAFNUkNyYHeT8e8CSmSScI= X-AuditID: a67dfc5b-3c9ff7000001d7ae-ea-67b6bba63180 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 12/26] mm: delimit critical sections to take off pages from pcp or buddy alloctor Date: Thu, 20 Feb 2025 14:20:13 +0900 Message-Id: <20250220052027.58847-13-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDR4sMrKYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ x5f/YyrY2cxYsXzPO5YGxslZXYycHBICJhJ/eu+wwthfzixhB7HZBNQlbtz4yQxiiwiYSRxs /QMWZxa4yyRxoJ8NxBYWyJTYtriHEcRmEVCVaD3xEqyeF6j+SEcHG8RMeYnVGw6AxTmB4j9m 9ILFhQRMJd4tuMQEUfOeTWLbzlAIW1Li4IobLBMYeRcwMqxiFMrMK8tNzMwx0cuozMus0EvO z93ECAz9ZbV/oncwfroQfIhRgINRiYd3Ruu2dCHWxLLiytxDjBIczEoivG31W9KFeFMSK6tS i/Lji0pzUosPMUpzsCiJ8xp9K08REkhPLEnNTk0tSC2CyTJxcEo1MDrdv2l+UXvJvPWyoksb XniXPWXyC4rILqhYlH+G0+P/2vIp4txqi74X+f2yNWKdkN30+mzZ9y9HGvKaZp4zEOvuOV8Z JL8vcFUX8/ZpTR8mW5648e+9ZCtH6YN/B613FbJHKn1lM/r+TF7vpPLUP0+NazZ2ddmy+tc1 GfgZsgevYltUJBD9R4mlOCPRUIu5qDgRABKViE55AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrHLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g4NbNSzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlXF8+T+mgp3NjBXL97xjaWCcnNXFyMkhIWAi8eXMEnYQm01AXeLGjZ/MILaI gJnEwdY/YHFmgbtMEgf62UBsYYFMiW2LexhBbBYBVYnWEy/B6nmB6o90dLBBzJSXWL3hAFic Eyj+Y0YvWFxIwFTi3YJLTBMYuRYwMqxiFMnMK8tNzMwx1SvOzqjMy6zQS87P3cQIDOVltX8m 7mD8ctn9EKMAB6MSD++Dx1vThVgTy4orcw8xSnAwK4nwttVvSRfiTUmsrEotyo8vKs1JLT7E KM3BoiTO6xWemiAkkJ5YkpqdmlqQWgSTZeLglGpgnHt7wvJP0y7922u/c6misKL3zsNL+HUC PuXz19zbxBT/7IlNSuW05S33xPas1/n3Vj7QX65Qg2UhM+epsqDD7Eud3mjEqarOWTrrWLGZ /zqR7UlqAoL9sw2aUkLUs3qkWdPqkt23/Pl6/8mkE8e/GCpcWvFj+kTLCzvn2S1R2P1E9xzb HoGH0kosxRmJhlrMRcWJAIisZQVhAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 465EEC0002 X-Stat-Signature: 66q6boxdezictfecnchgrchk87aurjgb X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740028843-109672 X-HE-Meta: U2FsdGVkX19os/9V/xuYvZEEuKDfIZnFLGlr2TF1HS9LO6NqV8zGgWeKQ2Bsjifh35+IrKzKax+rB45nUOh5YsqFZwfTeJ/O3yiskVcEjdCeKwLzgIkxqZAo6RHn3iNEjW0Oez4LowQBGrUCbK+3UUGwTfDYNGqDWiMscavyR/a9WWm/EnJDmSmtg+g10ZX9C2q0BXM4BAGp095VwGRhvxLlZ4RP7E02bFbRdEpmU+qeTiJPYX16uW/KlCcF0QSDpdzeNv4GvRw3j/pK/YO/U0PfrGL4no4/zzDBofYj83jGf9fF2uD13YSIFPyLYH03fif5BoOwG3kCu2Ykq8PXLSZY+YEpndKE93Jni0XMYEXj3BxcbdF3f7uE1at5lVfrzpw4F2MY2ovG9nL8NzN5FkFPy2WRjyYUQEjOspByHGFDtVN30S4OuXh7GfXciDhx1ILSklGFGIYDg2oF1rqXQO0JrEl8l6juynbnEWACTlP6BiJZmbvHwaa2vDg3Z1p6rIvAj7u2VypyVVroTLeGpqwOq6zrYGbzBYsu7g0WzT2GAuMuVw9lTj8Ylejrb9Nvqqj36zwYpDUx/gS9bReOyg9U506244iYnnhix9vgPGD4GsK6Xxz2bcD/z2+3qgaUH6ZUAY/BxwqpGKA3uIE9rjy1ux3l3RXe1CRID2ixCbEjuYm2d6KraKJ5utG2/Fqvryvxz/nkbci5zTUs2DDmUDYjh8RM72d7YKmTEG95u2xdNVxW42O8EEuJ5Of89pIBoXntv2Kq69veRamNeSt9rCcvX1Gmv9SLv+Q7CbZLbB/4jCcBEBh/7yOEG1LRXXKDmQislfGVQgpstAZnWmTVLe3rzAwjmY4TUzodEZ3+hCOgEENw6DaDaRqc4RmXZ8EUd4tfKwJOBSos8scJ2MmWAeW3r8F+7DRNbBtsBvxeXhp2idqONW+ctpahYT8sdpHusbtlTa7Upk1zI0k/BpH 4sg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now that luf mechanism has been introduced, tlb shootdown might be necessary when luf'd pages exit from pcp or buddy allocator. Check if it's okay to take off pages and can perform for luf'd pages before use. Signed-off-by: Byungchul Park --- mm/compaction.c | 32 ++++++++++++++++-- mm/internal.h | 2 +- mm/page_alloc.c | 79 +++++++++++++++++++++++++++++++++++++++++++-- mm/page_isolation.c | 4 ++- mm/page_reporting.c | 20 +++++++++++- 5 files changed, 129 insertions(+), 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 6009f5d1021a6..90f5c34f333db 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -605,6 +605,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, page = pfn_to_page(blockpfn); + luf_takeoff_start(); /* Isolate free pages. */ for (; blockpfn < end_pfn; blockpfn += stride, page += stride) { int isolated; @@ -652,9 +653,12 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, goto isolate_fail; } + if (!luf_takeoff_check(page)) + goto isolate_fail; + /* Found a free page, will break it into order-0 pages */ order = buddy_order(page); - isolated = __isolate_free_page(page, order); + isolated = __isolate_free_page(page, order, false); if (!isolated) break; set_page_private(page, order); @@ -682,6 +686,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (locked) spin_unlock_irqrestore(&cc->zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* * Be careful to not go outside of the pageblock. */ @@ -1589,6 +1598,7 @@ static void fast_isolate_freepages(struct compact_control *cc) if (!area->nr_free) continue; + luf_takeoff_start(); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; list_for_each_entry_reverse(freepage, freelist, buddy_list) { @@ -1596,6 +1606,10 @@ static void fast_isolate_freepages(struct compact_control *cc) order_scanned++; nr_scanned++; + + if (!luf_takeoff_check(freepage)) + goto scan_next; + pfn = page_to_pfn(freepage); if (pfn >= highest) @@ -1615,7 +1629,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* Shorten the scan if a candidate is found */ limit >>= 1; } - +scan_next: if (order_scanned >= limit) break; } @@ -1633,7 +1647,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* Isolate the page if available */ if (page) { - if (__isolate_free_page(page, order)) { + if (__isolate_free_page(page, order, false)) { set_page_private(page, order); nr_isolated = 1 << order; nr_scanned += nr_isolated - 1; @@ -1650,6 +1664,11 @@ static void fast_isolate_freepages(struct compact_control *cc) spin_unlock_irqrestore(&cc->zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* Skip fast search if enough freepages isolated */ if (cc->nr_freepages >= cc->nr_migratepages) break; @@ -2369,7 +2388,14 @@ static enum compact_result compact_finished(struct compact_control *cc) { int ret; + /* + * luf_takeoff_{start,end}() is required to identify whether + * this compaction context is tlb shootdownable for luf'd pages. + */ + luf_takeoff_start(); ret = __compact_finished(cc); + luf_takeoff_end(); + trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) ret = COMPACT_CONTINUE; diff --git a/mm/internal.h b/mm/internal.h index 2bb54bc04260b..3a6da77d04ed3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -662,7 +662,7 @@ static inline void clear_zone_contiguous(struct zone *zone) zone->contiguous = false; } -extern int __isolate_free_page(struct page *page, unsigned int order); +extern int __isolate_free_page(struct page *page, unsigned int order, bool willputback); extern void __putback_isolated_page(struct page *page, unsigned int order, int mt); extern void memblock_free_pages(struct page *page, unsigned long pfn, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 05a1098f8c61f..f2ea69596ff15 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -869,8 +869,13 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, static inline struct page *get_page_from_free_area(struct free_area *area, int migratetype) { - return list_first_entry_or_null(&area->free_list[migratetype], + struct page *page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + + return NULL; } /* @@ -1579,6 +1584,8 @@ static __always_inline void page_del_and_expand(struct zone *zone, int nr_pages = 1 << high; __del_page_from_free_list(page, zone, high, migratetype); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); nr_pages -= expand(zone, page, low, high, migratetype); account_freepages(zone, -nr_pages, migratetype); } @@ -1950,6 +1957,13 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(buddy, zone, order, get_pfnblock_migratetype(buddy, pfn)); + + /* + * No need to luf_takeoff_check_and_fold() since it's + * going back to buddy. luf_key will be handed over in + * split_large_buddy(). + */ + set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, buddy, pfn, order, FPI_NONE); return true; @@ -1961,6 +1975,13 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(page, zone, order, get_pfnblock_migratetype(page, pfn)); + + /* + * No need to luf_takeoff_check_and_fold() since it's + * going back to buddy. luf_key will be handed over in + * split_large_buddy(). + */ + set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, page, pfn, order, FPI_NONE); return true; @@ -2085,6 +2106,8 @@ steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int nr_added; del_page_from_free_list(page, zone, current_order, block_type); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); change_pageblock_range(page, current_order, start_type); nr_added = expand(zone, page, order, current_order, start_type); account_freepages(zone, nr_added, start_type); @@ -2165,6 +2188,9 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (free_area_empty(area, fallback_mt)) continue; + if (luf_takeoff_no_shootdown()) + continue; + if (can_steal_fallback(order, migratetype)) *can_steal = true; @@ -2256,6 +2282,11 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, pageblock_nr_pages) continue; + /* + * luf_takeoff_{start,end}() is required for + * get_page_from_free_area() to use luf_takeoff_check(). + */ + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); @@ -2313,10 +2344,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, WARN_ON_ONCE(ret == -1); if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); return ret; } } spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); } return false; @@ -2494,6 +2527,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -2518,6 +2552,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add_tail(&page->pcp_list, list); } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return i; } @@ -3012,7 +3050,7 @@ void split_page(struct page *page, unsigned int order) } EXPORT_SYMBOL_GPL(split_page); -int __isolate_free_page(struct page *page, unsigned int order) +int __isolate_free_page(struct page *page, unsigned int order, bool willputback) { struct zone *zone = page_zone(page); int mt = get_pageblock_migratetype(page); @@ -3031,6 +3069,8 @@ int __isolate_free_page(struct page *page, unsigned int order) } del_page_from_free_list(page, zone, order, mt); + if (unlikely(!willputback && !luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); /* * Set the pageblock if the isolated page is at least half of a @@ -3110,6 +3150,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); @@ -3127,10 +3168,15 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (!page) { spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); return NULL; } } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); } while (check_new_pages(page, order)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3214,6 +3260,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } page = list_first_entry(list, struct page, pcp_list); + if (!luf_takeoff_check_and_fold(page)) + return NULL; list_del(&page->pcp_list); pcp->count -= 1 << order; } while (check_new_pages(page, order)); @@ -3231,11 +3279,13 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long __maybe_unused UP_flags; + luf_takeoff_start(); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (!pcp) { pcp_trylock_finish(UP_flags); + luf_takeoff_end(); return NULL; } @@ -3249,6 +3299,10 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); @@ -4853,6 +4907,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, if (unlikely(!zone)) goto failed; + luf_takeoff_start(); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -4891,6 +4946,10 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(zonelist_zone(ac.preferred_zoneref), zone, nr_account); @@ -4900,6 +4959,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, failed_irq: pcp_trylock_finish(UP_flags); + luf_takeoff_end(); failed: page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask); @@ -7036,6 +7096,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); while (pfn < end_pfn) { page = pfn_to_page(pfn); @@ -7064,9 +7125,15 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, VM_WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); order = buddy_order(page); del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return end_pfn - start_pfn - already_offline; } @@ -7142,6 +7209,7 @@ bool take_page_off_buddy(struct page *page) unsigned int order; bool ret = false; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); @@ -7154,6 +7222,8 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order, migratetype); + if (unlikely(!luf_takeoff_check_and_fold(page_head))) + VM_WARN_ON(1); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); SetPageHWPoisonTakenOff(page); @@ -7164,6 +7234,11 @@ bool take_page_off_buddy(struct page *page) break; } spin_unlock_irqrestore(&zone->lock, flags); + + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return ret; } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 8467838d4dbc8..eae33d188762b 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -211,6 +211,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) struct page *buddy; zone = page_zone(page); + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); if (!is_migrate_isolate_page(page)) goto out; @@ -229,7 +230,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) buddy = find_buddy_page_pfn(page, page_to_pfn(page), order, NULL); if (buddy && !is_migrate_isolate_page(buddy)) { - isolated_page = !!__isolate_free_page(page, order); + isolated_page = !!__isolate_free_page(page, order, true); /* * Isolating a free page in an isolated pageblock * is expected to always work as watermarks don't @@ -269,6 +270,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) zone->nr_isolate_pageblock--; out: spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(zone); } static inline struct page * diff --git a/mm/page_reporting.c b/mm/page_reporting.c index c05afb7a395f1..03a7f5f6dc073 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -167,6 +167,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (list_empty(list)) return err; + luf_takeoff_start(); spin_lock_irq(&zone->lock); /* @@ -191,6 +192,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (PageReported(page)) continue; + if (!luf_takeoff_check(page)) { + VM_WARN_ON(1); + continue; + } + /* * If we fully consumed our budget then update our * state to indicate that we are requesting additional @@ -204,7 +210,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* Attempt to pull page from list and place in scatterlist */ if (*offset) { - if (!__isolate_free_page(page, order)) { + if (!__isolate_free_page(page, order, false)) { next = page; break; } @@ -227,6 +233,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* release lock before waiting on report processing */ spin_unlock_irq(&zone->lock); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* begin processing pages in local list */ err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY); @@ -236,6 +247,8 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* update budget to reflect call to report function */ budget--; + luf_takeoff_start(); + /* reacquire zone lock and resume processing */ spin_lock_irq(&zone->lock); @@ -259,6 +272,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, spin_unlock_irq(&zone->lock); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + return err; } From patchwork Thu Feb 20 05:20:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983330 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 786DEC021B1 for ; Thu, 20 Feb 2025 05:21:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EED1328029F; Thu, 20 Feb 2025 00:20:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4DB128029E; Thu, 20 Feb 2025 00:20:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC55228029F; Thu, 20 Feb 2025 00:20:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A8E9028029E for ; Thu, 20 Feb 2025 00:20:47 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 45529140C7B for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) X-FDA: 83139173334.10.BC08FB9 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf07.hostedemail.com (Postfix) with ESMTP id D445440002 for ; Thu, 20 Feb 2025 05:20:44 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028845; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=wfwX3huH/RjV156ekG2rLDX8Wgzb0d9hkQ9x8r8+pMA=; b=YQxZ9K/GpEV4W65PYe5jkrBOOejAViPfCFuxRvYT56yvuQ5CKQ/ww3ySm173wnMwCbU25B 6Iu38UICsebMdifBTXQ0b6jp/9qcOOS6WUJH05CEUYMJI5ye3diB2rr4fRg+P3c1NXwOvN xNIEYXbSoAMVooDxdqENjThCbyd/c6s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028845; a=rsa-sha256; cv=none; b=YGs9Dv5VSINBcxTncPmhAf3NL0yRLY2AfZ7JZ4/VAIK7Z5az6Fdjy7z+48qoXRpFUG02Mr L2/qQHZisvseDzG7J5epmV3zUgt98gGMe8Q1bOQPaxKbcnfmpqOEIPowHfFDfHzT6ImnCS W70vQQ37yLW0uuUIW7jgY88UWhPrjns= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-ef-67b6bba6fd7c From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 13/26] mm: introduce pend_list in struct free_area to track luf'd pages Date: Thu, 20 Feb 2025 14:20:14 +0900 Message-Id: <20250220052027.58847-14-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrOLMWRmVeSWpSXmKPExsXC9ZZnke6y3dvSDX4eM7WYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ vROaGQs+T2as2HflMHMD446yLkZODgkBE4lZi04yw9jTZj1gB7HZBNQlbtz4CRYXETCTONj6 ByzOLHCXSeJAPxuILSwQL3G/fxVYDYuAqsTKW/PAbF6g+t9/pkDNlJdYveEAmM0JFP8xoxes V0jAVOLdgktMXYxcQDXv2SQ23bnLCtEgKXFwxQ2WCYy8CxgZVjEKZeaV5SZm5pjoZVTmZVbo JefnbmIEBv+y2j/ROxg/XQg+xCjAwajEwzujdVu6EGtiWXFl7iFGCQ5mJRHetvot6UK8KYmV ValF+fFFpTmpxYcYpTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYvRdeDw4L/fdw06vX 9g8Ndr+2PdL8xrvffvORUwaVlWtN+QwrOezuNrN8at5godinLXvq9/Qfi8QuTAkslHb4bv9p qsmiE4oZrQyCf14eC/uhVTevs3Pa8p5a+6aV2kFBCxslAirZ92yYGWTyde7t/boJJ4z3ur9O rdfOkRUp1j0WxX+5qI1HiaU4I9FQi7moOBEAsDiU8XoCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrHLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0gxuHtC3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAldE7oZmx4PNkxop9Vw4zNzDuKOti5OSQEDCRmDbrATuIzSagLnHjxk9mEFtE wEziYOsfsDizwF0miQP9bCC2sEC8xP3+VWA1LAKqEitvzQOzeYHqf/+ZwgwxU15i9YYDYDYn UPzHjF6wXiEBU4l3Cy4xTWDkWsDIsIpRJDOvLDcxM8dUrzg7ozIvs0IvOT93EyMwlJfV/pm4 g/HLZfdDjAIcjEo8vA8eb00XYk0sK67MPcQowcGsJMLbVr8lXYg3JbGyKrUoP76oNCe1+BCj NAeLkjivV3hqgpBAemJJanZqakFqEUyWiYNTqoExMULLrFJKhHEmr++NmJdbPjF5PPx7ffXp k2e3blk9/cb5DXx2EQeuTLWN3ry6ZcYV0d9fEh/5d63qP79SwLOTTcO9ZmLPimsMMx4vW+vM wbbaXOV/rF2vmbXVphf1/BflnpY16z/zlJkpcDn486WXeccNCnZ47tgWbNUYPyMtr7Vi1saO Y75pSizFGYmGWsxFxYkAQp74/2ECAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: D445440002 X-Rspamd-Server: rspam07 X-Stat-Signature: 55rgk1ak8163od8m9mfpz48kre9xwhti X-HE-Tag: 1740028844-767872 X-HE-Meta: U2FsdGVkX185ypd/+/v7jI5mIs8IPFUh72gFx7hMhEydnH4AZ7/jcasgpnUI3ytl4/glcPUh9wlEL//lhuA/lF4RiI3dzN5D0KouOvBo77toiAS38IuuBMhH5GwR193qWM2Zrq826Ut0neZbCH2+tfcHsMGSqCmCEVHgehjauLX32n7HYg1hNPMd0/Owiz4oZQ3ddx3DQwyUkI6UZjZ9EtTf6GKnEHAjzpLTd/i/IETtKhs9aaGdDp/pVaw3AV7HB8/r1H9M7lGChDX389TDnYZDUDlFgvHSZj/bpu/7/ywPC5zMUaGfehfvj1bYdrmvngtTi3O6p96vYXtal/sd/3Fxb/1/YX3vOwR7I/+r58DxBUPJiu6Jgog3QcpOlPbncfMXznZRTfKFW/ZBhbxuGZ2eaFX1hFHG3tLihy+ifvm/nTq9eYgiVhm0wcP+sAr0r27mIIEEL5rtp3vXK9NSnvUQ+erfjCTYIiD1NMJpyYmZvDlPeGxzEha7pHkDYqi771FjXvFnW+59FgBTj5zi0xQ4zdeiUpFRvr1/DejCVlxy3NwnJR7dVInEDDkhIz/ePbrfksO0f/dhr5FJRwCkhIVM7/kQHs3VLzKEJgftlDQfacY27swGiSd+59hQT0Z+f0EzEUMsOltulkcuPHhcQkzpAMSrSsRfDBdLDaT5/HtitY38VwUZM60XoAJ+GPh4JoaOiLwuAKiBp+qYd81rNka9AbmmCZ9Wj46R1cBIv0iUgwX4wp9AAsy5G75Xdt/m7OXmyILWXPACFgYq4PIGx6HUkqrKjRY0dXbknpZ7CjyuU+K+MaCKL3bHiHOaHLdSsKPnoj6i+pjMLWUShyeu+h6LMyWHa6LpQpkslWIOCvLissZpRnqIMPgSmPMbT4EQZiEoInvV7kErLVsyXvcvlU2ZcsAOz0HsI0dnf0w+iLKvWUuBWHkeZRzP2kkk2n0ML8aQ8LIy2sPjPxnQd+K u192lnQP U2sps38R4wYL13tDUVGZQu9448YmkwaTwj6Sa X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: luf'd pages requires tlb shootdown on exiting from page allocator. For some page allocation request, it's okay to return luf'd page followed by tlb shootdown but it's not okay for e.g. irq context. This patch splitted the list in free_area into two, 'free_list' for non-luf'd pages and 'pend_list' for luf'd pages so that the buddy allocator can work better with various conditions of context. Signed-off-by: Byungchul Park --- include/linux/mmzone.h | 3 ++ kernel/power/snapshot.c | 14 ++++++ kernel/vmcore_info.c | 2 + mm/compaction.c | 33 ++++++++++--- mm/internal.h | 17 ++++++- mm/mm_init.c | 2 + mm/page_alloc.c | 105 ++++++++++++++++++++++++++++++++++------ mm/page_reporting.c | 22 ++++++--- mm/vmstat.c | 15 ++++++ 9 files changed, 184 insertions(+), 29 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b36124145a16f..ac3178b5fc50b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -116,6 +116,7 @@ extern int page_group_by_mobility_disabled; MIGRATETYPE_MASK) struct free_area { struct list_head free_list[MIGRATE_TYPES]; + struct list_head pend_list[MIGRATE_TYPES]; unsigned long nr_free; }; @@ -995,6 +996,8 @@ struct zone { /* Zone statistics */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; + /* Count pages that need tlb shootdown on allocation */ + atomic_long_t nr_luf_pages; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 30894d8f0a781..863b0c54185dc 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1288,6 +1288,20 @@ static void mark_free_pages(struct zone *zone) swsusp_set_page_free(pfn_to_page(pfn + i)); } } + + list_for_each_entry(page, + &zone->free_area[order].pend_list[t], buddy_list) { + unsigned long i; + + pfn = page_to_pfn(page); + for (i = 0; i < (1UL << order); i++) { + if (!--page_count) { + touch_nmi_watchdog(); + page_count = WD_PAGE_COUNT; + } + swsusp_set_page_free(pfn_to_page(pfn + i)); + } + } } spin_unlock_irqrestore(&zone->lock, flags); } diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c index 1fec61603ef32..638deb57f9ddd 100644 --- a/kernel/vmcore_info.c +++ b/kernel/vmcore_info.c @@ -188,11 +188,13 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_OFFSET(zone, vm_stat); VMCOREINFO_OFFSET(zone, spanned_pages); VMCOREINFO_OFFSET(free_area, free_list); + VMCOREINFO_OFFSET(free_area, pend_list); VMCOREINFO_OFFSET(list_head, next); VMCOREINFO_OFFSET(list_head, prev); VMCOREINFO_LENGTH(zone.free_area, NR_PAGE_ORDERS); log_buf_vmcoreinfo_setup(); VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); + VMCOREINFO_LENGTH(free_area.pend_list, MIGRATE_TYPES); VMCOREINFO_NUMBER(NR_FREE_PAGES); VMCOREINFO_NUMBER(PG_lru); VMCOREINFO_NUMBER(PG_private); diff --git a/mm/compaction.c b/mm/compaction.c index 90f5c34f333db..27f3d743762bb 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1590,24 +1590,28 @@ static void fast_isolate_freepages(struct compact_control *cc) order = next_search_order(cc, order)) { struct free_area *area = &cc->zone->free_area[order]; struct list_head *freelist; + struct list_head *high_pfn_list; struct page *freepage; unsigned long flags; unsigned int order_scanned = 0; unsigned long high_pfn = 0; + bool consider_pend = false; + bool can_shootdown; if (!area->nr_free) continue; - luf_takeoff_start(); + can_shootdown = luf_takeoff_start(); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; +retry: list_for_each_entry_reverse(freepage, freelist, buddy_list) { unsigned long pfn; order_scanned++; nr_scanned++; - if (!luf_takeoff_check(freepage)) + if (unlikely(consider_pend && !luf_takeoff_check(freepage))) goto scan_next; pfn = page_to_pfn(freepage); @@ -1620,26 +1624,34 @@ static void fast_isolate_freepages(struct compact_control *cc) cc->fast_search_fail = 0; cc->search_order = order; page = freepage; - break; + goto done; } if (pfn >= min_pfn && pfn > high_pfn) { high_pfn = pfn; + high_pfn_list = freelist; /* Shorten the scan if a candidate is found */ limit >>= 1; } scan_next: if (order_scanned >= limit) - break; + goto done; } + if (!consider_pend && can_shootdown) { + consider_pend = true; + freelist = &area->pend_list[MIGRATE_MOVABLE]; + goto retry; + } +done: /* Use a maximum candidate pfn if a preferred one was not found */ if (!page && high_pfn) { page = pfn_to_page(high_pfn); /* Update freepage for the list reorder below */ freepage = page; + freelist = high_pfn_list; } /* Reorder to so a future search skips recent pages */ @@ -2036,18 +2048,20 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) struct list_head *freelist; unsigned long flags; struct page *freepage; + bool consider_pend = false; if (!area->nr_free) continue; spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; +retry: list_for_each_entry(freepage, freelist, buddy_list) { unsigned long free_pfn; if (nr_scanned++ >= limit) { move_freelist_tail(freelist, freepage); - break; + goto done; } free_pfn = page_to_pfn(freepage); @@ -2070,9 +2084,16 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) pfn = cc->zone->zone_start_pfn; cc->fast_search_fail = 0; found_block = true; - break; + goto done; } } + + if (!consider_pend) { + consider_pend = true; + freelist = &area->pend_list[MIGRATE_MOVABLE]; + goto retry; + } +done: spin_unlock_irqrestore(&cc->zone->lock, flags); } diff --git a/mm/internal.h b/mm/internal.h index 3a6da77d04ed3..0dc374553f9b5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -836,11 +836,16 @@ void init_cma_reserved_pageblock(struct page *page); int find_suitable_fallback(struct free_area *area, unsigned int order, int migratetype, bool only_stealable, bool *can_steal); -static inline bool free_area_empty(struct free_area *area, int migratetype) +static inline bool free_list_empty(struct free_area *area, int migratetype) { return list_empty(&area->free_list[migratetype]); } +static inline bool free_area_empty(struct free_area *area, int migratetype) +{ + return list_empty(&area->free_list[migratetype]) && + list_empty(&area->pend_list[migratetype]); +} /* mm/util.c */ struct anon_vma *folio_anon_vma(const struct folio *folio); @@ -1590,12 +1595,22 @@ void luf_takeoff_end(void); bool luf_takeoff_no_shootdown(void); bool luf_takeoff_check(struct page *page); bool luf_takeoff_check_and_fold(struct page *page); + +static inline bool non_luf_pages_ok(struct zone *zone) +{ + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); + unsigned long min_wm = min_wmark_pages(zone); + unsigned long nr_luf_pages = atomic_long_read(&zone->nr_luf_pages); + + return nr_free - nr_luf_pages > min_wm; +} #else static inline bool luf_takeoff_start(void) { return false; } static inline void luf_takeoff_end(void) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct page *page) { return true; } static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +static inline bool non_luf_pages_ok(struct zone *zone) { return true; } #endif /* pagewalk.c */ diff --git a/mm/mm_init.c b/mm/mm_init.c index 1c205b0a86ed5..12b96cd6a87b0 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1396,12 +1396,14 @@ static void __meminit zone_init_free_lists(struct zone *zone) unsigned int order, t; for_each_migratetype_order(order, t) { INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); + INIT_LIST_HEAD(&zone->free_area[order].pend_list[t]); zone->free_area[order].nr_free = 0; } #ifdef CONFIG_UNACCEPTED_MEMORY INIT_LIST_HEAD(&zone->unaccepted_pages); #endif + atomic_long_set(&zone->nr_luf_pages, 0); } void __meminit init_currently_empty_zone(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f2ea69596ff15..65acc437d8387 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -804,15 +804,28 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, bool tail) { struct free_area *area = &zone->free_area[order]; + struct list_head *list; VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), migratetype, 1 << order); + /* + * When identifying whether a page requires tlb shootdown, false + * positive is okay because it will cause just additional tlb + * shootdown. + */ + if (page_luf_key(page)) { + list = &area->pend_list[migratetype]; + atomic_long_add(1 << order, &zone->nr_luf_pages); + } else + list = &area->free_list[migratetype]; + if (tail) - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); + list_add_tail(&page->buddy_list, list); else - list_add(&page->buddy_list, &area->free_list[migratetype]); + list_add(&page->buddy_list, list); + area->nr_free++; } @@ -831,7 +844,20 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), old_mt, 1 << order); - list_move_tail(&page->buddy_list, &area->free_list[new_mt]); + /* + * The page might have been taken from a pfn where it's not + * clear which list was used. Therefore, conservatively + * consider it as pend_list, not to miss any true ones that + * require tlb shootdown. + * + * When identifying whether a page requires tlb shootdown, false + * positive is okay because it will cause just additional tlb + * shootdown. + */ + if (page_luf_key(page)) + list_move_tail(&page->buddy_list, &area->pend_list[new_mt]); + else + list_move_tail(&page->buddy_list, &area->free_list[new_mt]); account_freepages(zone, -(1 << order), old_mt); account_freepages(zone, 1 << order, new_mt); @@ -848,6 +874,9 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon if (page_reported(page)) __ClearPageReported(page); + if (page_luf_key(page)) + atomic_long_sub(1 << order, &zone->nr_luf_pages); + list_del(&page->buddy_list); __ClearPageBuddy(page); zone->free_area[order].nr_free--; @@ -866,15 +895,48 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, account_freepages(zone, -(1 << order), migratetype); } -static inline struct page *get_page_from_free_area(struct free_area *area, - int migratetype) +static inline struct page *get_page_from_free_area(struct zone *zone, + struct free_area *area, int migratetype) { - struct page *page = list_first_entry_or_null(&area->free_list[migratetype], - struct page, buddy_list); + struct page *page; + bool pend_first; - if (page && luf_takeoff_check(page)) - return page; + /* + * XXX: Make the decision preciser if needed e.g. using + * zone_watermark_ok() or its family, but for now, don't want to + * make it heavier. + * + * Try free_list, holding non-luf pages, first if there are + * enough non-luf pages to aggressively defer tlb flush, but + * should try pend_list first instead if not. + */ + pend_first = !non_luf_pages_ok(zone); + + if (pend_first) { + page = list_first_entry_or_null(&area->pend_list[migratetype], + struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + + page = list_first_entry_or_null(&area->free_list[migratetype], + struct page, buddy_list); + + if (page) + return page; + } else { + page = list_first_entry_or_null(&area->free_list[migratetype], + struct page, buddy_list); + + if (page) + return page; + page = list_first_entry_or_null(&area->pend_list[migratetype], + struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + } return NULL; } @@ -1027,6 +1089,8 @@ static inline void __free_one_page(struct page *page, if (fpi_flags & FPI_TO_TAIL) to_tail = true; + else if (page_luf_key(page)) + to_tail = true; else if (is_shuffle_order(order)) to_tail = shuffle_pick_tail(); else @@ -1556,6 +1620,8 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, unsigned int nr_added = 0; while (high > low) { + bool tail = false; + high--; size >>= 1; VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); @@ -1569,7 +1635,10 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, if (set_page_guard(zone, &page[size], high)) continue; - __add_to_free_list(&page[size], zone, high, migratetype, false); + if (page_luf_key(&page[size])) + tail = true; + + __add_to_free_list(&page[size], zone, high, migratetype, tail); set_buddy_order(&page[size], high); nr_added += size; } @@ -1754,7 +1823,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, /* Find a page of the appropriate size in the preferred list */ for (current_order = order; current_order < NR_PAGE_ORDERS; ++current_order) { area = &(zone->free_area[current_order]); - page = get_page_from_free_area(area, migratetype); + page = get_page_from_free_area(zone, area, migratetype); if (!page) continue; @@ -2188,7 +2257,8 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (free_area_empty(area, fallback_mt)) continue; - if (luf_takeoff_no_shootdown()) + if (free_list_empty(area, fallback_mt) && + luf_takeoff_no_shootdown()) continue; if (can_steal_fallback(order, migratetype)) @@ -2292,7 +2362,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, struct free_area *area = &(zone->free_area[order]); int mt; - page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); + page = get_page_from_free_area(zone, area, MIGRATE_HIGHATOMIC); if (!page) continue; @@ -2430,7 +2500,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, VM_BUG_ON(current_order > MAX_PAGE_ORDER); do_steal: - page = get_page_from_free_area(area, fallback_mt); + page = get_page_from_free_area(zone, area, fallback_mt); /* take off list, maybe claim block, expand remainder */ page = steal_suitable_fallback(zone, page, current_order, order, @@ -7180,6 +7250,8 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, struct page *current_buddy; while (high > low) { + bool tail = false; + high--; size >>= 1; @@ -7193,7 +7265,10 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, if (set_page_guard(zone, current_buddy, high)) continue; - add_to_free_list(current_buddy, zone, high, migratetype, false); + if (page_luf_key(current_buddy)) + tail = true; + + add_to_free_list(current_buddy, zone, high, migratetype, tail); set_buddy_order(current_buddy, high); } } diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 03a7f5f6dc073..e152b22fbba8a 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -159,15 +159,17 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, struct page *page, *next; long budget; int err = 0; + bool consider_pend = false; + bool can_shootdown; /* * Perform early check, if free area is empty there is * nothing to process so we can skip this free_list. */ - if (list_empty(list)) + if (free_area_empty(area, mt)) return err; - luf_takeoff_start(); + can_shootdown = luf_takeoff_start(); spin_lock_irq(&zone->lock); /* @@ -185,14 +187,14 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, * should always be a power of 2. */ budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16); - +retry: /* loop through free list adding unreported pages to sg list */ list_for_each_entry_safe(page, next, list, lru) { /* We are going to skip over the reported pages. */ if (PageReported(page)) continue; - if (!luf_takeoff_check(page)) { + if (unlikely(consider_pend && !luf_takeoff_check(page))) { VM_WARN_ON(1); continue; } @@ -205,14 +207,14 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (budget < 0) { atomic_set(&prdev->state, PAGE_REPORTING_REQUESTED); next = page; - break; + goto done; } /* Attempt to pull page from list and place in scatterlist */ if (*offset) { if (!__isolate_free_page(page, order, false)) { next = page; - break; + goto done; } /* Add page to scatter list */ @@ -263,9 +265,15 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* exit on error */ if (err) - break; + goto done; } + if (!consider_pend && can_shootdown) { + consider_pend = true; + list = &area->pend_list[mt]; + goto retry; + } +done: /* Rotate any leftover pages to the head of the freelist */ if (!list_entry_is_head(next, list, lru) && !list_is_first(&next->lru, list)) list_rotate_to_front(&next->lru, list); diff --git a/mm/vmstat.c b/mm/vmstat.c index 4d016314a56c9..3fb9a5f6dd6da 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1581,6 +1581,21 @@ static void pagetypeinfo_showfree_print(struct seq_file *m, break; } } + list_for_each(curr, &area->pend_list[mtype]) { + /* + * Cap the pend_list iteration because it might + * be really large and we are under a spinlock + * so a long time spent here could trigger a + * hard lockup detector. Anyway this is a + * debugging tool so knowing there is a handful + * of pages of this order should be more than + * sufficient. + */ + if (++freecount >= 100000) { + overflow = true; + break; + } + } seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount); spin_unlock_irq(&zone->lock); cond_resched(); From patchwork Thu Feb 20 05:20:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983331 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 839F3C021AD for ; Thu, 20 Feb 2025 05:21:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68AE92802A3; Thu, 20 Feb 2025 00:20:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E8742802A1; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4647E28029E; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1078A2802A2 for ; Thu, 20 Feb 2025 00:20:48 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A31ED4BBE7 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) X-FDA: 83139173334.25.F425614 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id 9668F1A0009 for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028846; a=rsa-sha256; cv=none; b=2YB6Z/IvopK8uN4QR+Jypi8TUrtrEQG4CdjH6mpsQUropo/lG66pQdBoFNwMnqjs62mzxp xC2vFlTUmoGGN5rZ9YnrJ1Z3YxrsxqaizzYcp5zYRWGTjENvj52BJm1JOKnfQfPmigcw3A HrcpOthUviECbeL7CDoXN8XU+wVSyy8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=mXsBLddH25iJJSPexx1TGsWZenYlkNcZaNynnasNZoo=; b=PrSXtBQ0AhD1gBu1NPS4/jfYJFY72ea1q6+8L5YzYY7/hcAAulms4TZd6s+30k10T3Z0jG JLnkkk6OCrtS56nRBD3ZFvBHGCg8/6SC/vUjmifnCNV2OM/HVpRR0DX0PGMG46b5vFN30q 580pTL4kW3jIFjVQxTpBljr0siwDaeo= X-AuditID: a67dfc5b-3c9ff7000001d7ae-f4-67b6bba6ccf4 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 14/26] mm/rmap: recognize read-only tlb entries during batched tlb flush Date: Thu, 20 Feb 2025 14:20:15 +0900 Message-Id: <20250220052027.58847-15-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrMLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDU5cMLeYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ P/ZsZC04KVbxpHUrWwPjC6EuRk4OCQETiTWT2lm6GDnA7Edz+UHCbALqEjdu/GQGsUUEzCQO tv5hB7GZBe4ySRzoZwOxhQUSJNY2P2ABsVkEVCVmNhxlAhnDC1Tf2F0OMV1eYvWGA2BjOIHC P2b0grUKCZhKvFtwiQmi5jObxLsd6hC2pMTBFTdYJjDyLmBkWMUolJlXlpuYmWOil1GZl1mh l5yfu4kRGPTLav9E72D8dCH4EKMAB6MSD++M1m3pQqyJZcWVuYcYJTiYlUR42+q3pAvxpiRW VqUW5ccXleakFh9ilOZgURLnNfpWniIkkJ5YkpqdmlqQWgSTZeLglGpgzPlhdov7meX1hzlX j7eIndjz/NQBa59mlz2uVx+zeyfJPbG9bXpQ8VRw2ZXGgvtii24+77AN+bCo4f5XlfMW2Qpd sTHCd12Xl3qedX6+9tHaANZ3J3P2b9kfq9y7fnF27vt/YicuWS++OH+fDue9uBARCfZn2Xyt Z5PY32m5H6j7d9y4sdP7oBJLcUaioRZzUXEiAFB+uU92AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g5U3dS3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlfFjz0bWgpNiFU9at7I1ML4Q6mLk4JAQMJF4NJe/i5GTg01AXeLGjZ/MILaI gJnEwdY/7CA2s8BdJokD/WwgtrBAgsTa5gcsIDaLgKrEzIajTCBjeIHqG7vLQcISAvISqzcc ABvDCRT+MaMXrFVIwFTi3YJLTBMYuRYwMqxiFMnMK8tNzMwx1SvOzqjMy6zQS87P3cQIDOJl tX8m7mD8ctn9EKMAB6MSD++Dx1vThVgTy4orcw8xSnAwK4nwttVvSRfiTUmsrEotyo8vKs1J LT7EKM3BoiTO6xWemiAkkJ5YkpqdmlqQWgSTZeLglGpg1LkjXdLXPGX5tN8CbH3Hpggxr9z6 XeOBpoS47P71YiuMjF//X65ldlPM2+rrMxEra3+WyZtTLmvcXpXWtGzPutn/fPUZ9mw8oMv1 3IG35IPY3ir749UaZ8NX8e1y0L7XekdBTFT7wWFOkcctobuPzZqjGW33yzFLR69jhVfiQvXp y7YJHNrkpsRSnJFoqMVcVJwIALEjBHJeAgAA X-CFilter-Loop: Reflected X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9668F1A0009 X-Stat-Signature: dzmutxxfog4df1t91fjbrhx83f93md7o X-Rspam-User: X-HE-Tag: 1740028845-225210 X-HE-Meta: U2FsdGVkX1/SQdcnOc9XOMCDkaJLzrHHR4IJgHrGikcjjE4X+RgtHGZpqbU7oZ0J2xURaV54hHmHyAPqUP5c4QdcuAqAsiWZP2ApBLiVcqAidKB2FcP2+ImQa5EER3JPq+igHtR8KWD02K7Y3XE669fZG2B4UPY82sC595BuvloNNL7yKe0rMHwxADbmr1vWyVqBWZ2RaVIXdrlWnKn7TkanPGogmcz1KW6r77tR09NHFsYagq829EIKgTofN15ZocXFLzAzpxe1SsBF9YthoHilPT3KHlrc8a8qDUXX4l1yY2P+SoceueQ4omumDwv7jB42QOuguogKFts9NKzv/TeTb/uDyE1GHrjUzI3LbsLgOa8jBkCrnXgqlD6F8aodyNWvkh9DCsRGlvBe/ut5WtZWVHIR8AJA5q1yTbXluXLOeyfnzCqQi9AeJWf8RCDy5BIpLDp/2cOwe9ntsAZ7/j0+czVexU/TrVvt/mptVmdCUvBFIkVjWCbc0EXw/BU1o+wpuUdhQQdZjAEM3y882AEvtC1WyJsb5mYjzLE3MCfTfNiZsFnrcJP3tjfmErmt6yl0WZfqfdXP1DbwNjghDxwkALo9lX3HgpO76O/WW+x32qj6KTcqyE/YWV4DMN94exx/6E4tNSOobbuOdItOQP6kk4tOn8QTkwAXQId4kazR3rka7aJfjbycw5FBJV2ZrG0UysskPPGbv876BpYHw1A3gvEbRlLCQ6Ew2bO3urca9hz/6VsxrJykijeMIgXQEFQHRKCUEgcPVdvjH3ITv26tTPo4ao3zf4l+MpzB0+GyPmUMCjsZQ1U+oLJCaB1bgSpF/dRj5fYxW0C9zPBPlNerhTEjGsYSrZKEviI9B5v4zen9YMOMOHDoNOsef0gCU1C2IxC2ksSzAJRUpENmI9u4mMhBnxVYBL6zjYiS5ReV7bm1Uzd0xqVZae13r4Ox5oplpCzRfjIqotn4QjR xGg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that requires to recognize read-only tlb entries and handle them in a different way. The newly introduced API in this patch, fold_ubc(), will be used by luf mechanism. Signed-off-by: Byungchul Park --- include/linux/sched.h | 1 + mm/rmap.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index c4ff83e1d5953..a217d6011fdfe 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1381,6 +1381,7 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; + struct tlbflush_unmap_batch tlb_ubc_ro; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/mm/rmap.c b/mm/rmap.c index 1581b1a00f974..3ed6234dd777e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -775,6 +775,7 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; if (!tlb_ubc_takeoff->flush_required) @@ -789,6 +790,9 @@ void try_to_unmap_flush_takeoff(void) if (arch_tlbbatch_done(&tlb_ubc->arch, &tlb_ubc_takeoff->arch)) reset_batch(tlb_ubc); + if (arch_tlbbatch_done(&tlb_ubc_ro->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc_ro); + reset_batch(tlb_ubc_takeoff); } @@ -801,7 +805,9 @@ void try_to_unmap_flush_takeoff(void) void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + fold_batch(tlb_ubc, tlb_ubc_ro, true); if (!tlb_ubc->flush_required) return; @@ -813,8 +819,9 @@ void try_to_unmap_flush(void) void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; - if (tlb_ubc->writable) + if (tlb_ubc->writable || tlb_ubc_ro->writable) try_to_unmap_flush(); } @@ -831,13 +838,18 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr) { - struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc; int batch; bool writable = pte_dirty(pteval); if (!pte_accessible(mm, pteval)) return; + if (pte_write(pteval)) + tlb_ubc = ¤t->tlb_ubc; + else + tlb_ubc = ¤t->tlb_ubc_ro; + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; From patchwork Thu Feb 20 05:20:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983332 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3081C021AD for ; Thu, 20 Feb 2025 05:21:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 935EC2802A1; Thu, 20 Feb 2025 00:20:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BECF2802A5; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 525412802A2; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 05B492802A1 for ; Thu, 20 Feb 2025 00:20:47 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BF4E6160DC1 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) X-FDA: 83139173334.01.ED9D419 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf05.hostedemail.com (Postfix) with ESMTP id A12A7100002 for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=FqEI9pRkW8l1TaTgkUrYeo+0PbXPTC+emjXWW1540go=; b=oIQs0pegHAdqxDFNvvkJUDE9ATSii+hNu2wyCNtCfKhO7w93TKC1hjZXHPmdnAlGngRfRM Pyzqva8MxKc0pfdYzyeBYuxnPGNXgbErPHMf0iAyAMEiECLnvDJYqNrGj6lwj5+MqOHSSA /xNDUzjqyYKUMlHfOe+wSKpTEsLo+4w= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028846; a=rsa-sha256; cv=none; b=PMOntkqlvcJZGrFbTQZWlKEiNUkiW+DtiDzBCeZvBQi+fLhkcFTPjp4HA1Xy/sONHRFw1g XHvKC4G6lGAb1qVSLd271qq/O3z0llFJeZRhwBaigParVpl0+MTPNvi3nO/6CyX+x4HWMe XWuZnTbP9CJHEgpw1HROUFG0gS4aPd8= X-AuditID: a67dfc5b-3c9ff7000001d7ae-f9-67b6bba672fe From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 15/26] fs, filemap: refactor to gather the scattered ->write_{begin,end}() calls Date: Thu, 20 Feb 2025 14:20:16 +0900 Message-Id: <20250220052027.58847-16-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrCLMWRmVeSWpSXmKPExsXC9ZZnoe6y3dvSDU7vs7KYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ 388mFszMrujZfYGtgbE1tIuRk0NCwESires2M4zd8aaRFcRmE1CXuHHjJ1hcRMBM4mDrH3YQ m1ngLpPEgX62LkYODmGBDIkJK6xBwiwCqhLXNneDtfICle/Ys5ERYqS8xOoNB8DGcALFf8zo ZQOxhQRMJd4tuMTUxcgFVPOZTeLX3s9sEA2SEgdX3GCZwMi7gJFhFaNQZl5ZbmJmjoleRmVe ZoVecn7uJkZg2C+r/RO9g/HTheBDjAIcjEo8vDNat6ULsSaWFVfmHmKU4GBWEuFtq9+SLsSb klhZlVqUH19UmpNafIhRmoNFSZzX6Ft5ipBAemJJanZqakFqEUyWiYNTqoFx1ca/ahuM56hH 1Xsd274v+fLC/17ZPw6m6X38knPm++09M58k6fzbVnXjjB1f/lzjfd3lU3YLqJjXPHRd9Mxi x8HTnnF5LztUNz91Pd74Ys6BSctLC2/qP2w+8K3A7tMBs8tTaldaFUxiSFX5d/DUpxWmGn3c +c/nWYbPUJl33H5fcYbKG+6DW5VYijMSDbWYi4oTAb3RVLJ3AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g/d7DSzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlfH9bGLBzOyKnt0X2BoYW0O7GDk5JARMJDreNLKC2GwC6hI3bvxkBrFFBMwk Drb+YQexmQXuMkkc6GfrYuTgEBbIkJiwwhokzCKgKnFtczdYKy9Q+Y49GxkhRspLrN5wAGwM J1D8x4xeNhBbSMBU4t2CS0wTGLkWMDKsYhTJzCvLTczMMdUrzs6ozMus0EvOz93ECAziZbV/ Ju5g/HLZ/RCjAAejEg/vg8db04VYE8uKK3MPMUpwMCuJ8LbVb0kX4k1JrKxKLcqPLyrNSS0+ xCjNwaIkzusVnpogJJCeWJKanZpakFoEk2Xi4JRqYOwL21D2p295TsuKHdzXTq60uv7q+1S/ A6vWx9r2xbB7+5qpWsip7gxs1dn68WzI5Qv71/M8/rP2nb1xyx/OHxezgu3q3YNfqH34e9h/ rU2PxgVrvom1H1bzXODc2n1QrVK+8PjaFZN2H69zbrXfXH6VtbpPqTfr0mTZ/kOPln0WVq+c +5Xx5QwlluKMREMt5qLiRADutSDtXgIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: A12A7100002 X-Stat-Signature: sdwxbm5mnimxt8afbznuydnztc8iaziq X-HE-Tag: 1740028845-242530 X-HE-Meta: U2FsdGVkX1+f7fsBeE7TatguNL6uCWVwpOkNTCtbcmGbFy20w1lAKK5tipFlgx2Xa6CjBQ+MAAxLOz/TproOznubShI5UgWrfpYXaHe4LeTJMbLGQIBp+zAqarNoCPqw1gsm7RYInE7YOzVGsQNe1dd3dFmPPlm8hOBx2Yb+4/CpTxUsyHyarleJugBGsM+p0oDhqrbGUhOccRG8wRSNzGh0g7sQsE0OK0pZGai6VhSUVMakOdzHynh0YpEQKvOEvLbOk3ZFwRFkT9d1t8WWFBTBR/nX0DwJ18uhNBdiEM6WR1nU2OhMxM3NdaZBFUYUadb31zgA1PiEI1vuW7SdctZGHstqkd7TZquNVkJUmOjvkP40mrgAZ91MEqW9drhm6fbzARdXo2C2IKDxhXFnsDDITFfnZQGipL2s+1aQfKFbcwocGWhqE6PHuHPfA6ly7tzYtAJLNoZUUDA7BG9dpO/gzhnMC13VipqPKyMKfZRb96cpUg1Q2t0ubKi3l9+SN0K8CunG7vgr7hQ2vVkNaIQAbvzLh3+6pKEM1gaIIuk8HezZkLfzv7MlxlQISrNX+SKX/gpxlQPWUjPK6xMJdLGSpgITUPSdizGkghg/wwRJoB03feKDg06gF1SVJN6Sd5RkBcDqsSr1jupjVfpR16RrhtaGnoy0jkl+jNtDxOElEMoM43ScUYGUtt5Q5drZpzy+Ejn5rgCzDB9Y0K1D1O2D1hAqrZHipmeq6NBltU1EVZSaM+BE2dfR+RrLlP070EvJ2hbEehOZt6Otx/JDLSdMqrgIxbvsUGYEM0lr/nDH5gOSs84qYb7SrNcnsvIIWUiDbaTuyDvblfWW5gu3sxdyQ7cqq5FgRiHlGz7FV4U3hXDcmH4nK+do5Nj6yApJYBECdvPSmMUkdIwxMPPtB/4yCGyE5iJn51qDZmNPdbgBkVGhf7l+W2ruI0EXkZqZGrpspxvGiLl/juBb/JA PDct6OBL zew5f8748ItifVrpluPjDyd4wWlCzEmhXXuuU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that requires to hook when updating page cache that might have pages that have been mapped on any tasks so that tlb flush needed can be performed. Signed-off-by: Byungchul Park --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 ++++------- fs/affs/file.c | 4 ++-- fs/buffer.c | 14 ++++++-------- fs/exfat/file.c | 5 ++--- fs/ext4/verity.c | 5 ++--- fs/f2fs/super.c | 5 ++--- fs/f2fs/verity.c | 5 ++--- fs/namei.c | 5 ++--- include/linux/fs.h | 18 ++++++++++++++++++ mm/filemap.c | 5 ++--- 10 files changed, 42 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index fe69f2c8527d7..1d475d681d3de 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -422,7 +422,6 @@ shmem_pwrite(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pwrite *arg) { struct address_space *mapping = obj->base.filp->f_mapping; - const struct address_space_operations *aops = mapping->a_ops; char __user *user_data = u64_to_user_ptr(arg->data_ptr); u64 remain; loff_t pos; @@ -481,7 +480,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj, if (err) return err; - err = aops->write_begin(obj->base.filp, mapping, pos, len, + err = mapping_write_begin(obj->base.filp, mapping, pos, len, &folio, &data); if (err < 0) return err; @@ -492,7 +491,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj, pagefault_enable(); kunmap_local(vaddr); - err = aops->write_end(obj->base.filp, mapping, pos, len, + err = mapping_write_end(obj->base.filp, mapping, pos, len, len - unwritten, folio, data); if (err < 0) return err; @@ -658,7 +657,6 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, { struct drm_i915_gem_object *obj; struct file *file; - const struct address_space_operations *aops; loff_t pos; int err; @@ -670,21 +668,20 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, GEM_BUG_ON(obj->write_domain != I915_GEM_DOMAIN_CPU); file = obj->base.filp; - aops = file->f_mapping->a_ops; pos = 0; do { unsigned int len = min_t(typeof(size), size, PAGE_SIZE); struct folio *folio; void *fsdata; - err = aops->write_begin(file, file->f_mapping, pos, len, + err = mapping_write_begin(file, file->f_mapping, pos, len, &folio, &fsdata); if (err < 0) goto fail; memcpy_to_folio(folio, offset_in_folio(folio, pos), data, len); - err = aops->write_end(file, file->f_mapping, pos, len, len, + err = mapping_write_end(file, file->f_mapping, pos, len, len, folio, fsdata); if (err < 0) goto fail; diff --git a/fs/affs/file.c b/fs/affs/file.c index a5a861dd52230..10e7f53828e93 100644 --- a/fs/affs/file.c +++ b/fs/affs/file.c @@ -885,9 +885,9 @@ affs_truncate(struct inode *inode) loff_t isize = inode->i_size; int res; - res = mapping->a_ops->write_begin(NULL, mapping, isize, 0, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, isize, 0, &folio, &fsdata); if (!res) - res = mapping->a_ops->write_end(NULL, mapping, isize, 0, 0, folio, fsdata); + res = mapping_write_end(NULL, mapping, isize, 0, 0, folio, fsdata); else inode->i_size = AFFS_I(inode)->mmu_private; mark_inode_dirty(inode); diff --git a/fs/buffer.c b/fs/buffer.c index 88e765b0699fe..7cb0295500937 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2456,7 +2456,6 @@ EXPORT_SYMBOL(block_read_full_folio); int generic_cont_expand_simple(struct inode *inode, loff_t size) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; struct folio *folio; void *fsdata = NULL; int err; @@ -2465,11 +2464,11 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size) if (err) goto out; - err = aops->write_begin(NULL, mapping, size, 0, &folio, &fsdata); + err = mapping_write_begin(NULL, mapping, size, 0, &folio, &fsdata); if (err) goto out; - err = aops->write_end(NULL, mapping, size, 0, 0, folio, fsdata); + err = mapping_write_end(NULL, mapping, size, 0, 0, folio, fsdata); BUG_ON(err > 0); out: @@ -2481,7 +2480,6 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, loff_t pos, loff_t *bytes) { struct inode *inode = mapping->host; - const struct address_space_operations *aops = mapping->a_ops; unsigned int blocksize = i_blocksize(inode); struct folio *folio; void *fsdata = NULL; @@ -2501,12 +2499,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = PAGE_SIZE - zerofrom; - err = aops->write_begin(file, mapping, curpos, len, + err = mapping_write_begin(file, mapping, curpos, len, &folio, &fsdata); if (err) goto out; folio_zero_range(folio, offset_in_folio(folio, curpos), len); - err = aops->write_end(file, mapping, curpos, len, len, + err = mapping_write_end(file, mapping, curpos, len, len, folio, fsdata); if (err < 0) goto out; @@ -2534,12 +2532,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = offset - zerofrom; - err = aops->write_begin(file, mapping, curpos, len, + err = mapping_write_begin(file, mapping, curpos, len, &folio, &fsdata); if (err) goto out; folio_zero_range(folio, offset_in_folio(folio, curpos), len); - err = aops->write_end(file, mapping, curpos, len, len, + err = mapping_write_end(file, mapping, curpos, len, len, folio, fsdata); if (err < 0) goto out; diff --git a/fs/exfat/file.c b/fs/exfat/file.c index a25d7eb789f4c..242563b9dec95 100644 --- a/fs/exfat/file.c +++ b/fs/exfat/file.c @@ -539,7 +539,6 @@ static int exfat_extend_valid_size(struct file *file, loff_t new_valid_size) struct inode *inode = file_inode(file); struct exfat_inode_info *ei = EXFAT_I(inode); struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *ops = mapping->a_ops; pos = ei->valid_size; while (pos < new_valid_size) { @@ -550,11 +549,11 @@ static int exfat_extend_valid_size(struct file *file, loff_t new_valid_size) if (pos + len > new_valid_size) len = new_valid_size - pos; - err = ops->write_begin(file, mapping, pos, len, &folio, NULL); + err = mapping_write_begin(file, mapping, pos, len, &folio, NULL); if (err) goto out; - err = ops->write_end(file, mapping, pos, len, len, folio, NULL); + err = mapping_write_end(file, mapping, pos, len, len, folio, NULL); if (err < 0) goto out; pos += len; diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c index d9203228ce979..64fa43f80c73e 100644 --- a/fs/ext4/verity.c +++ b/fs/ext4/verity.c @@ -68,7 +68,6 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, loff_t pos) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; if (pos + count > inode->i_sb->s_maxbytes) return -EFBIG; @@ -80,13 +79,13 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, void *fsdata = NULL; int res; - res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, pos, n, &folio, &fsdata); if (res) return res; memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n); - res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata); + res = mapping_write_end(NULL, mapping, pos, n, n, folio, fsdata); if (res < 0) return res; if (res != n) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 87ab5696bd482..f8d5ee466807c 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2678,7 +2678,6 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, { struct inode *inode = sb_dqopt(sb)->files[type]; struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *a_ops = mapping->a_ops; int offset = off & (sb->s_blocksize - 1); size_t towrite = len; struct folio *folio; @@ -2690,7 +2689,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, tocopy = min_t(unsigned long, sb->s_blocksize - offset, towrite); retry: - err = a_ops->write_begin(NULL, mapping, off, tocopy, + err = mapping_write_begin(NULL, mapping, off, tocopy, &folio, &fsdata); if (unlikely(err)) { if (err == -ENOMEM) { @@ -2703,7 +2702,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, memcpy_to_folio(folio, offset_in_folio(folio, off), data, tocopy); - a_ops->write_end(NULL, mapping, off, tocopy, tocopy, + mapping_write_end(NULL, mapping, off, tocopy, tocopy, folio, fsdata); offset = 0; towrite -= tocopy; diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c index 2287f238ae09e..b232589546d39 100644 --- a/fs/f2fs/verity.c +++ b/fs/f2fs/verity.c @@ -72,7 +72,6 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, loff_t pos) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; if (pos + count > F2FS_BLK_TO_BYTES(max_file_blocks(inode))) return -EFBIG; @@ -84,13 +83,13 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, void *fsdata = NULL; int res; - res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, pos, n, &folio, &fsdata); if (res) return res; memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n); - res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata); + res = mapping_write_end(NULL, mapping, pos, n, n, folio, fsdata); if (res < 0) return res; if (res != n) diff --git a/fs/namei.c b/fs/namei.c index 4a4a22a08ac20..14a701ecf1a7e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5349,7 +5349,6 @@ EXPORT_SYMBOL(page_readlink); int page_symlink(struct inode *inode, const char *symname, int len) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; bool nofs = !mapping_gfp_constraint(mapping, __GFP_FS); struct folio *folio; void *fsdata = NULL; @@ -5359,7 +5358,7 @@ int page_symlink(struct inode *inode, const char *symname, int len) retry: if (nofs) flags = memalloc_nofs_save(); - err = aops->write_begin(NULL, mapping, 0, len-1, &folio, &fsdata); + err = mapping_write_begin(NULL, mapping, 0, len-1, &folio, &fsdata); if (nofs) memalloc_nofs_restore(flags); if (err) @@ -5367,7 +5366,7 @@ int page_symlink(struct inode *inode, const char *symname, int len) memcpy(folio_address(folio), symname, len - 1); - err = aops->write_end(NULL, mapping, 0, len - 1, len - 1, + err = mapping_write_end(NULL, mapping, 0, len - 1, len - 1, folio, fsdata); if (err < 0) goto fail; diff --git a/include/linux/fs.h b/include/linux/fs.h index 3559446279c15..bfd8aaeb78bb8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -494,6 +494,24 @@ struct address_space { #define PAGECACHE_TAG_WRITEBACK XA_MARK_1 #define PAGECACHE_TAG_TOWRITE XA_MARK_2 +static inline int mapping_write_begin(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, + struct folio **foliop, void **fsdata) +{ + return mapping->a_ops->write_begin(file, mapping, pos, len, foliop, + fsdata); +} + +static inline int mapping_write_end(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct folio *folio, void *fsdata) +{ + return mapping->a_ops->write_end(file, mapping, pos, len, copied, + folio, fsdata); +} + /* * Returns true if any of the pages in the mapping are marked with the tag. */ diff --git a/mm/filemap.c b/mm/filemap.c index e582a1545d2ae..a4930449fc705 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4016,7 +4016,6 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) struct file *file = iocb->ki_filp; loff_t pos = iocb->ki_pos; struct address_space *mapping = file->f_mapping; - const struct address_space_operations *a_ops = mapping->a_ops; size_t chunk = mapping_max_folio_size(mapping); long status = 0; ssize_t written = 0; @@ -4050,7 +4049,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) break; } - status = a_ops->write_begin(file, mapping, pos, bytes, + status = mapping_write_begin(file, mapping, pos, bytes, &folio, &fsdata); if (unlikely(status < 0)) break; @@ -4065,7 +4064,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) copied = copy_folio_from_iter_atomic(folio, offset, bytes, i); flush_dcache_folio(folio); - status = a_ops->write_end(file, mapping, pos, bytes, copied, + status = mapping_write_end(file, mapping, pos, bytes, copied, folio, fsdata); if (unlikely(status != copied)) { iov_iter_revert(i, copied - max(status, 0L)); From patchwork Thu Feb 20 05:20:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0257C021AD for ; Thu, 20 Feb 2025 05:21:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C3052802A2; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E9DDF2802A6; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7FD52802A4; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 60D0128029E for ; Thu, 20 Feb 2025 00:20:48 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 15D66B2A00 for ; Thu, 20 Feb 2025 05:20:48 +0000 (UTC) X-FDA: 83139173376.16.EA0C17A Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf01.hostedemail.com (Postfix) with ESMTP id BA5D140005 for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=i2avaim5yUe1LSh3EOxKV0KxghOAbakaOeaEWjTdAeM=; b=cXktZO7Z1YdG8qk8s3uiCpup29JtRbDS9qNs4e2G2+rm1J/4ss2x26x1T679xzhtPlEgnU q5JAcWoGYlFvjLg92hUu/tNjR4yanl+AYxKoMI/qQxeysGwxSoNCjZ3FK+qWEsUh3+BqmF YOUVIETXDihLbe/OclFrRxDC8Fqc4Fc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028846; a=rsa-sha256; cv=none; b=29xCaS8jvQJcA9sQWgphgCzl28Y1PEtuilHslWTCS0OLostR4CZzsBj3Oj5/BcQIF+aakE FCwCdRPrVbjN8a0g54J10Anp0VqCCNTOYuhAoAYzzFLwDBZKdWAgT6gma9mYAWF2KkOL9i dqsohw0vf0zKSYXXhIqqzUFcVNuCvA8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-fe-67b6bba6fd6f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 16/26] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Date: Thu, 20 Feb 2025 14:20:17 +0900 Message-Id: <20250220052027.58847-17-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrBLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDXbtYLSYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ M7d/YCvovcdYsfrRNuYGxpb1jF2MnBwSAiYSv29NYIex/07oZgOx2QTUJW7c+MkMYosImEkc bP0DVsMscJdJ4kA/WI2wQJ7Eq7l/weawCKhKLHmzHqyeF6j+655JzBAz5SVWbzgAZnMCxX/M 6AXrFRIwlXi34BJTFyMXUM1nNonpzbNYIRokJQ6uuMEygZF3ASPDKkahzLyy3MTMHBO9jMq8 zAq95PzcTYzA8F9W+yd6B+OnC8GHGAU4GJV4eGe0bksXYk0sK67MPcQowcGsJMLbVr8lXYg3 JbGyKrUoP76oNCe1+BCjNAeLkjiv0bfyFCGB9MSS1OzU1ILUIpgsEwenVANjbNrfeIH/wVur pc61yf6LX2pfu2oX24UPthZxR2sWFN5M+Okp2BwW435KeNuEh3POylbNt/m7SXv6Y6l3txfK iU2fU36X+UXT75x78hdm6d0uC6heceLEkwY5nqS+Kn8D1syW317aDPtvS+t92u1ycL+8/hO1 663/l/od/9X30N2kY29lVeh7JZbijERDLeai4kQAa5ymkHsCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrPLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0gzk3jS3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlTFz+we2gt57jBWrH21jbmBsWc/YxcjJISFgIvF3QjcbiM0moC5x48ZPZhBb RMBM4mDrH3YQm1ngLpPEgX6wGmGBPIlXc/+C9bIIqEosebMerJ4XqP7rnknMEDPlJVZvOABm cwLFf8zoBesVEjCVeLfgEtMERq4FjAyrGEUy88pyEzNzTPWKszMq8zIr9JLzczcxAoN5We2f iTsYv1x2P8QowMGoxMP74PHWdCHWxLLiytxDjBIczEoivG31W9KFeFMSK6tSi/Lji0pzUosP MUpzsCiJ83qFpyYICaQnlqRmp6YWpBbBZJk4OKUaGOfxVjBz+5rknE1b3JFoOcXTy8RK8snt NTu95hoee3Q9mPXuBLW1Lz8/fZ33pLf1bnLGvI2RX0qk84/c3HdcYuHT2vk6735s1vv7c8P3 k7O4/a3YGuM+ys5/nCAYffHg0zXMm6bNDdtTrRBslO6Rtmajq86SR6XX8hrL9s68mV/z+fGT l19+GLIqsRRnJBpqMRcVJwIANMT5emICAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: BA5D140005 X-Rspamd-Server: rspam07 X-Stat-Signature: grqdizyw1j1k8icgkjpyxo3fwjc7xxxo X-HE-Tag: 1740028845-71813 X-HE-Meta: U2FsdGVkX18F4k6qK680JycuJe3lx8C4Odj3PfgoiaKmmGZV4dhGGJx57sG2LogZLf+0kRUijstmqR8kbhEPiMDcyoldQtgM4GeuYSMFIjcmk0mCAJru81z8yG0a9JB6ZAF3Uwm+5XSt/k77PlpVnQU7KvLxXg8duKaA03iW/Qe9O5Ax2BXhqDu+TCmT6FXFHkrxjjoCCi/gt+A/etunfMhRkm2y64+4OjFevgZouf+3w+HpEAViOfq5cNWlqr5zzFDCujrPvsNYWiZimO6PmV6t4ImqEaJdCQJmTjkCY7QB77NmnAW+DKxBVH99+cB/jGmWX3ab+l2CP/TtcEuCqYKBFdCnQzCst3l7USEsR/OR+1SSuqbwb9yrL5ZpU1in6DsVVtUlQQAOzxmFF4okjzr83qdvbu7qBTfaFv1ltAbnVO1hFfODa45NAZiztGoldmrHnERrVx+x4gj4kS8hX6wctKzNqhtme2NL9xB6PxNknqUbHzxaVFh7DTwIr8gRFTbPCr5HU+KNca/8XFiBDf2zanp8nJpyTwyH78wsBap8qYZYZqWrTPOlE2b2WuYe4iJlpEmx1bPncYjeACoOO9KUr5o5dMPKa0V8jD8x9y2Agufkx4jq/kc+225rUTceK5ltRfssmkZjoA/uNfyorDgFrjJDdehBsqByQeq4HSTYGphXI6cBAJqbssXEmx08xsG26I5ENARW6Fph8exwIKPJk/iZsrrAJnHF5LgXT1yOBA2bjs/jHze489/3mUH6zfw2NWgu9iKiBKBPa1LnISTVZyXflkco7IdLpC32iaa1KIB2ZSoOLLLWV8sR9+XgPfqLkV1x7VIRHVjG94AUlaVFbDYtQAE/goTDY85ufw5SF5jgd9sRec+RpttFjbWuI3QPNEg7YS4cY4XPMvhsemdpPlH9y20JJ8u6GZL/DC5VXIC++RnQeFgjS2P8gIIRl+135zHX2TFflFP27sO 2lg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read-only and were unmapped, as long as the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. tlb flush can be defered when folios get unmapped as long as it guarantees to perform tlb flush needed, before the folios actually become used, of course, only if all the corresponding ptes don't have write permission. Otherwise, the system will get messed up. To achieve that, for the folios that map only to non-writable tlb entries, prevent tlb flush during unmapping but perform it just before the folios actually become used, out of buddy or pcp. However, we should cancel the pending by LUF and perform the deferred TLB flush right away when: 1. a writable pte is newly set through fault handler 2. a file is updated 3. kasan needs poisoning on free 4. the kernel wants to init pages on free No matter what type of workload is used for performance evaluation, the result would be positive thanks to the unconditional reduction of tlb flushes, tlb misses and interrupts. For the test, I picked up one of the most popular and heavy workload, llama.cpp that is a LLM(Large Language Model) inference engine. The result would depend on memory latency and how often reclaim runs, which implies tlb miss overhead and how many times unmapping happens. In my system, the result shows: 1. tlb shootdown interrupts are reduced about 97%. 2. The test program runtime is reduced about 4.5%. The test environment and the test set are like: Machine: bare metal, x86_64, Intel(R) Xeon(R) Gold 6430 CPU: 1 socket 64 core with hyper thread on Numa: 2 nodes (64 CPUs DRAM 42GB, no CPUs CXL expander 98GB) Config: swap off, numa balancing tiering on, demotion enabled llama.cpp/main -m $(70G_model1) -p "who are you?" -s 1 -t 15 -n 20 & llama.cpp/main -m $(70G_model2) -p "who are you?" -s 1 -t 15 -n 20 & llama.cpp/main -m $(70G_model3) -p "who are you?" -s 1 -t 15 -n 20 & wait where, -t: nr of threads, -s: seed used to make the runtime stable, -n: nr of tokens that determines the runtime, -p: prompt to ask, -m: LLM model to use. Run the test set 5 times successively with caches dropped every run via 'echo 3 > /proc/sys/vm/drop_caches'. Each inference prints its runtime at the end of each. The results are like: 1. Runtime from the output of llama.cpp BEFORE ------ llama_print_timings: total time = 883450.54 ms / 24 tokens llama_print_timings: total time = 861665.91 ms / 24 tokens llama_print_timings: total time = 898079.02 ms / 24 tokens llama_print_timings: total time = 879897.69 ms / 24 tokens llama_print_timings: total time = 892360.75 ms / 24 tokens llama_print_timings: total time = 884587.85 ms / 24 tokens llama_print_timings: total time = 861023.19 ms / 24 tokens llama_print_timings: total time = 900022.18 ms / 24 tokens llama_print_timings: total time = 878771.88 ms / 24 tokens llama_print_timings: total time = 889027.98 ms / 24 tokens llama_print_timings: total time = 880783.90 ms / 24 tokens llama_print_timings: total time = 856475.29 ms / 24 tokens llama_print_timings: total time = 896842.21 ms / 24 tokens llama_print_timings: total time = 878883.53 ms / 24 tokens llama_print_timings: total time = 890122.10 ms / 24 tokens AFTER ----- llama_print_timings: total time = 871060.86 ms / 24 tokens llama_print_timings: total time = 825609.53 ms / 24 tokens llama_print_timings: total time = 836854.81 ms / 24 tokens llama_print_timings: total time = 843147.99 ms / 24 tokens llama_print_timings: total time = 831426.65 ms / 24 tokens llama_print_timings: total time = 873939.23 ms / 24 tokens llama_print_timings: total time = 826127.69 ms / 24 tokens llama_print_timings: total time = 835489.26 ms / 24 tokens llama_print_timings: total time = 842589.62 ms / 24 tokens llama_print_timings: total time = 833700.66 ms / 24 tokens llama_print_timings: total time = 875996.19 ms / 24 tokens llama_print_timings: total time = 826401.73 ms / 24 tokens llama_print_timings: total time = 839341.28 ms / 24 tokens llama_print_timings: total time = 841075.10 ms / 24 tokens llama_print_timings: total time = 835136.41 ms / 24 tokens 2. tlb shootdowns from 'cat /proc/interrupts' BEFORE ------ TLB: 80911532 93691786 100296251 111062810 109769109 109862429 108968588 119175230 115779676 118377498 119325266 120300143 124514185 116697222 121068466 118031913 122660681 117494403 121819907 116960596 120936335 117217061 118630217 122322724 119595577 111693298 119232201 120030377 115334687 113179982 118808254 116353592 140987367 137095516 131724276 139742240 136501150 130428761 127585535 132483981 133430250 133756207 131786710 126365824 129812539 133850040 131742690 125142213 128572830 132234350 131945922 128417707 133355434 129972846 126331823 134050849 133991626 121129038 124637283 132830916 126875507 122322440 125776487 124340278 TLB shootdowns AFTER ----- TLB: 2121206 2615108 2983494 2911950 3055086 3092672 3204894 3346082 3286744 3307310 3357296 3315940 3428034 3112596 3143325 3185551 3186493 3322314 3330523 3339663 3156064 3272070 3296309 3198962 3332662 3315870 3234467 3353240 3281234 3300666 3345452 3173097 4009196 3932215 3898735 3726531 3717982 3671726 3728788 3724613 3799147 3691764 3620630 3684655 3666688 3393974 3448651 3487593 3446357 3618418 3671920 3712949 3575264 3715385 3641513 3630897 3691047 3630690 3504933 3662647 3629926 3443044 3832970 3548813 TLB shootdowns Signed-off-by: Byungchul Park --- include/asm-generic/tlb.h | 5 ++ include/linux/fs.h | 12 +++- include/linux/mm_types.h | 6 ++ include/linux/sched.h | 9 +++ kernel/sched/core.c | 1 + mm/internal.h | 94 ++++++++++++++++++++++++- mm/memory.c | 15 ++++ mm/pgtable-generic.c | 2 + mm/rmap.c | 141 +++++++++++++++++++++++++++++++++++--- mm/truncate.c | 55 +++++++++++++-- mm/vmscan.c | 12 +++- 11 files changed, 333 insertions(+), 19 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 709830274b756..4a99351be111e 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -549,6 +549,11 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct * static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) { + /* + * Don't leave stale tlb entries for this vma. + */ + luf_flush(0); + if (tlb->fullmm) return; diff --git a/include/linux/fs.h b/include/linux/fs.h index bfd8aaeb78bb8..ec88270221bfe 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -499,8 +499,18 @@ static inline int mapping_write_begin(struct file *file, loff_t pos, unsigned len, struct folio **foliop, void **fsdata) { - return mapping->a_ops->write_begin(file, mapping, pos, len, foliop, + int ret; + + ret = mapping->a_ops->write_begin(file, mapping, pos, len, foliop, fsdata); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + if (!ret) + luf_flush(0); + + return ret; } static inline int mapping_write_end(struct file *file, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 39a6b5124b01f..b3eb5a4e45efb 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1270,6 +1270,12 @@ extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_finish_mmu(struct mmu_gather *tlb); +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +void luf_flush(unsigned short luf_key); +#else +static inline void luf_flush(unsigned short luf_key) {} +#endif + struct vm_fault; /** diff --git a/include/linux/sched.h b/include/linux/sched.h index a217d6011fdfe..94321d51b91e8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1382,6 +1382,15 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; struct tlbflush_unmap_batch tlb_ubc_ro; + struct tlbflush_unmap_batch tlb_ubc_luf; + +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + /* + * whether all the mappings of a folio during unmap are read-only + * so that luf can work on the folio + */ + bool can_luf; +#endif /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 719e0ed1e9761..aea08d8a9e258 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5225,6 +5225,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) if (mm) { membarrier_mm_sync_core_before_usermode(mm); mmdrop_lazy_tlb_sched(mm); + luf_flush(0); } if (unlikely(prev_state == TASK_DEAD)) { diff --git a/mm/internal.h b/mm/internal.h index 0dc374553f9b5..fe4a1c174895f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1604,13 +1604,105 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } -#else + +unsigned short fold_unmap_luf(void); + +/* + * Reset the indicator indicating there are no writable mappings at the + * beginning of every rmap traverse for unmap. luf can work only when + * all the mappings are read-only. + */ +static inline void can_luf_init(struct folio *f) +{ + if (IS_ENABLED(CONFIG_DEBUG_PAGEALLOC)) + current->can_luf = false; + /* + * Pages might get updated inside buddy. + */ + else if (want_init_on_free()) + current->can_luf = false; + /* + * Pages might get updated inside buddy. + */ + else if (!should_skip_kasan_poison(folio_page(f, 0))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles zone device folio. + */ + else if (unlikely(folio_is_zone_device(f))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles hugetlb folio. + */ + else if (unlikely(folio_test_hugetlb(f))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles large folio. + */ + else if (unlikely(folio_test_large(f))) + current->can_luf = false; + /* + * Can track write of anon folios through fault handler. + */ + else if (folio_test_anon(f)) + current->can_luf = true; + /* + * Can track write of file folios through page cache or truncation. + */ + else if (folio_mapping(f)) + current->can_luf = true; + /* + * For niehter anon nor file folios, do not apply luf. + */ + else + current->can_luf = false; +} + +/* + * Mark the folio is not applicable to luf once it found a writble or + * dirty pte during rmap traverse for unmap. + */ +static inline void can_luf_fail(void) +{ + current->can_luf = false; +} + +/* + * Check if all the mappings are read-only. + */ +static inline bool can_luf_test(void) +{ + return current->can_luf; +} + +static inline bool can_luf_vma(struct vm_area_struct *vma) +{ + /* + * Shared region requires a medium like file to keep all the + * associated mm_struct. luf makes use of strcut address_space + * for that purpose. + */ + if (vma->vm_flags & VM_SHARED) + return !!vma->vm_file; + + /* + * Private region can be handled through its mm_struct. + */ + return true; +} +#else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ static inline bool luf_takeoff_start(void) { return false; } static inline void luf_takeoff_end(void) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct page *page) { return true; } static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } static inline bool non_luf_pages_ok(struct zone *zone) { return true; } +static inline unsigned short fold_unmap_luf(void) { return 0; } + +static inline void can_luf_init(struct folio *f) {} +static inline void can_luf_fail(void) {} +static inline bool can_luf_test(void) { return false; } +static inline bool can_luf_vma(struct vm_area_struct *vma) { return false; } #endif /* pagewalk.c */ diff --git a/mm/memory.c b/mm/memory.c index 209885a4134f7..0e85c49bc5028 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6081,6 +6081,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, struct mm_struct *mm = vma->vm_mm; vm_fault_t ret; bool is_droppable; + bool flush = false; __set_current_state(TASK_RUNNING); @@ -6106,6 +6107,14 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, lru_gen_enter_fault(vma); + /* + * Any potential cases that make pte writable even forcely + * should be considered. + */ + if (vma->vm_flags & (VM_WRITE | VM_MAYWRITE) || + flags & FAULT_FLAG_WRITE) + flush = true; + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else @@ -6137,6 +6146,12 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, out: mm_account_fault(mm, regs, address, flags, ret); + /* + * Ensure to clean stale tlb entries for this vma. + */ + if (flush) + luf_flush(0); + return ret; } EXPORT_SYMBOL_GPL(handle_mm_fault); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5297dcc38c37a..215d8d93560fd 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -99,6 +99,8 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, pte = ptep_get_and_clear(mm, address, ptep); if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); + else + luf_flush(0); return pte; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index 3ed6234dd777e..0aaf02b1b34c3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -646,7 +646,7 @@ static atomic_long_t luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); /* * Don't return invalid luf_ugen, zero. */ -static unsigned long __maybe_unused new_luf_ugen(void) +static unsigned long new_luf_ugen(void) { unsigned long ugen = atomic_long_inc_return(&luf_ugen); @@ -723,7 +723,7 @@ static atomic_t luf_kgen = ATOMIC_INIT(1); /* * Don't return invalid luf_key, zero. */ -static unsigned short __maybe_unused new_luf_key(void) +static unsigned short new_luf_key(void) { unsigned short luf_key = atomic_inc_return(&luf_kgen); @@ -776,6 +776,7 @@ void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; if (!tlb_ubc_takeoff->flush_required) @@ -793,9 +794,72 @@ void try_to_unmap_flush_takeoff(void) if (arch_tlbbatch_done(&tlb_ubc_ro->arch, &tlb_ubc_takeoff->arch)) reset_batch(tlb_ubc_ro); + if (arch_tlbbatch_done(&tlb_ubc_luf->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc_luf); + reset_batch(tlb_ubc_takeoff); } +/* + * Should be called just before try_to_unmap_flush() to optimize the tlb + * shootdown using arch_tlbbatch_done(). + */ +unsigned short fold_unmap_luf(void) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + struct luf_batch *lb; + unsigned long new_ugen; + unsigned short new_key; + unsigned long flags; + + if (!tlb_ubc_luf->flush_required) + return 0; + + /* + * fold_unmap_luf() is always followed by try_to_unmap_flush(). + */ + if (arch_tlbbatch_done(&tlb_ubc_luf->arch, &tlb_ubc->arch)) { + tlb_ubc_luf->flush_required = false; + tlb_ubc_luf->writable = false; + } + + /* + * Check again after shrinking. + */ + if (!tlb_ubc_luf->flush_required) + return 0; + + new_ugen = new_luf_ugen(); + new_key = new_luf_key(); + + /* + * Update the next entry of luf_batch table, that is the oldest + * entry among the candidate, hopefully tlb flushes have been + * done for all of the CPUs. + */ + lb = &luf_batch[new_key]; + write_lock_irqsave(&lb->lock, flags); + __fold_luf_batch(lb, tlb_ubc_luf, new_ugen); + write_unlock_irqrestore(&lb->lock, flags); + + reset_batch(tlb_ubc_luf); + return new_key; +} + +void luf_flush(unsigned short luf_key) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb = &luf_batch[luf_key]; + unsigned long flags; + + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -806,8 +870,10 @@ void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; fold_batch(tlb_ubc, tlb_ubc_ro, true); + fold_batch(tlb_ubc, tlb_ubc_luf, true); if (!tlb_ubc->flush_required) return; @@ -820,8 +886,9 @@ void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; - if (tlb_ubc->writable || tlb_ubc_ro->writable) + if (tlb_ubc->writable || tlb_ubc_ro->writable || tlb_ubc_luf->writable) try_to_unmap_flush(); } @@ -836,7 +903,8 @@ void try_to_unmap_flush_dirty(void) (TLB_FLUSH_BATCH_PENDING_MASK / 2) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + struct vm_area_struct *vma) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -845,7 +913,16 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!pte_accessible(mm, pteval)) return; - if (pte_write(pteval)) + if (can_luf_test()) { + /* + * luf cannot work with the folio once it found a + * writable or dirty mapping on it. + */ + if (pte_write(pteval) || !can_luf_vma(vma)) + can_luf_fail(); + } + + if (!can_luf_test()) tlb_ubc = ¤t->tlb_ubc; else tlb_ubc = ¤t->tlb_ubc_ro; @@ -853,6 +930,21 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; + if (can_luf_test()) { + struct luf_batch *lb; + unsigned long flags; + + /* + * Accumulate to the 0th entry right away so that + * luf_flush(0) can be uesed to properly perform pending + * TLB flush once this unmapping is observed. + */ + lb = &luf_batch[0]; + write_lock_irqsave(&lb->lock, flags); + __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); + write_unlock_irqrestore(&lb->lock, flags); + } + /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -907,6 +999,8 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) * This must be called under the PTL so that an access to tlb_flush_batched * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise * via the PTL. + * + * LUF(Lazy Unmap Flush) also relies on this for mprotect/munmap/etc. */ void flush_tlb_batched_pending(struct mm_struct *mm) { @@ -916,6 +1010,7 @@ void flush_tlb_batched_pending(struct mm_struct *mm) if (pending != flushed) { arch_flush_tlb_batched_pending(mm); + /* * If the new TLB flushing is pending during flushing, leave * mm->tlb_flush_batched as is, to avoid losing flushing. @@ -926,7 +1021,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + struct vm_area_struct *vma) { } @@ -1292,6 +1388,11 @@ int folio_mkclean(struct folio *folio) rmap_walk(folio, &rwc); + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + return cleaned; } EXPORT_SYMBOL_GPL(folio_mkclean); @@ -1961,7 +2062,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, vma); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2132,6 +2233,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range_end(&range); + if (!ret) + can_luf_fail(); return ret; } @@ -2164,11 +2267,21 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + + can_luf_init(folio); if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); + + if (can_luf_test()) + fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); + else + fold_batch(tlb_ubc, tlb_ubc_ro, true); } /* @@ -2338,7 +2451,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, vma); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2494,6 +2607,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range_end(&range); + if (!ret) + can_luf_fail(); return ret; } @@ -2513,6 +2628,9 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; /* * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and @@ -2537,10 +2655,17 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) if (!folio_test_ksm(folio) && folio_test_anon(folio)) rwc.invalid_vma = invalid_migration_vma; + can_luf_init(folio); + if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); + + if (can_luf_test()) + fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); + else + fold_batch(tlb_ubc, tlb_ubc_ro, true); } #ifdef CONFIG_DEVICE_PRIVATE diff --git a/mm/truncate.c b/mm/truncate.c index e5151703ba04a..14618c53f1910 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -124,6 +124,11 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) if (aops->invalidate_folio) aops->invalidate_folio(folio, offset, length); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -161,6 +166,11 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) truncate_cleanup_folio(folio); filemap_remove_folio(folio); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return 0; } @@ -206,6 +216,12 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) if (folio_needs_release(folio)) folio_invalidate(folio, offset, length); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + if (!folio_test_large(folio)) return true; if (split_folio(folio) == 0) @@ -247,19 +263,28 @@ EXPORT_SYMBOL(generic_error_remove_folio); */ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) { + long ret = 0; + /* The page may have been truncated before it was locked */ if (!mapping) - return 0; + goto out; if (folio_test_dirty(folio) || folio_test_writeback(folio)) - return 0; + goto out; /* The refcount will be elevated if any page in the folio is mapped */ if (folio_ref_count(folio) > folio_nr_pages(folio) + folio_has_private(folio) + 1) - return 0; + goto out; if (!filemap_release_folio(folio, 0)) - return 0; + goto out; - return remove_mapping(mapping, folio); + ret = remove_mapping(mapping, folio); +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + + return ret; } /** @@ -299,7 +324,7 @@ void truncate_inode_pages_range(struct address_space *mapping, bool same_folio; if (mapping_empty(mapping)) - return; + goto out; /* * 'start' and 'end' always covers the range of pages to be fully @@ -387,6 +412,12 @@ void truncate_inode_pages_range(struct address_space *mapping, truncate_folio_batch_exceptionals(mapping, &fbatch, indices); folio_batch_release(&fbatch); } + +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -502,6 +533,11 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, folio_batch_release(&fbatch); cond_resched(); } + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return count; } @@ -594,7 +630,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, int did_range_unmap = 0; if (mapping_empty(mapping)) - return 0; + goto out; folio_batch_init(&fbatch); index = start; @@ -664,6 +700,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping, if (dax_mapping(mapping)) { unmap_mapping_pages(mapping, start, end - start + 1, false); } +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index 2970a8f35d3d3..ffc4a48710f1d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -821,6 +821,8 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio, */ long remove_mapping(struct address_space *mapping, struct folio *folio) { + long ret = 0; + if (__remove_mapping(mapping, folio, false, NULL)) { /* * Unfreezing the refcount with 1 effectively @@ -828,9 +830,15 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) * atomic operation. */ folio_ref_unfreeze(folio, 1); - return folio_nr_pages(folio); + ret = folio_nr_pages(folio); } - return 0; + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + + return ret; } /** From patchwork Thu Feb 20 05:20:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48C53C021B0 for ; Thu, 20 Feb 2025 05:21:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C28D628029E; Thu, 20 Feb 2025 00:20:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 992472802A2; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6858C2802A4; Thu, 20 Feb 2025 00:20:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1CFCE2802A3 for ; Thu, 20 Feb 2025 00:20:48 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id DC301A29E9 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) X-FDA: 83139173334.24.BF9BFBC Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf06.hostedemail.com (Postfix) with ESMTP id B12C0180007 for ; Thu, 20 Feb 2025 05:20:45 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028846; a=rsa-sha256; cv=none; b=8MbRipxdEVQaQgHExjsDeWCQ/f54el/uitRRvRYBw8yCgLKnexLH+Ikf2/G5H3ubGP+VPH 2UbMGeMpDuneFioV/PEXhdA93+37Buy7bwW5InePaezLeZjQRtcKZnvXIz4Gf06E1NJXVd OZXBxrCTXT0RAxQKEy/ByMfQyQkVTl0= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=l56u5Em3AHon/CuTt0tsKpQahTWXbN3hen75pfhoFSM=; b=b9lDX7JqZoDHW2oaEEX6s0g3AKw+KDZBBzdTB2xKpVjjaYzf2OPMBLEkgtCAFccByM4cM+ +7XZ+VBt7KeVioX6B0eiPNmf5t6+IlOqdFnOOL1lqZYrSh+TzBZeCmMucGQx93AvF8H5oJ jWcJHiu3RdrugJ3V8TAO/bP1i+rh6Xs= X-AuditID: a67dfc5b-3c9ff7000001d7ae-03-67b6bba6c75d From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 17/26] x86/tlb, riscv/tlb, arm64/tlbflush, mm: remove cpus from tlb shootdown that already have been done Date: Thu, 20 Feb 2025 14:20:18 +0900 Message-Id: <20250220052027.58847-18-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrOLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDeZuZLGYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ PbsvsBU8qa14/WwqUwPj3PQuRk4OCQETiTfLzrHA2F9mLGAGsdkE1CVu3PgJZosImEkcbP3D DmIzC9xlkjjQz9bFyMUhLNDIKHH4xEEmkASLgKrEuv+v2UBsXqCGc58eM0IMlZdYveEA2CBO oPiPGb1gNUICphLvFlxiAhkkIfCZTeJg30KoKyQlDq64wTKBkXcBI8MqRqHMvLLcxMwcE72M yrzMCr3k/NxNjMDgX1b7J3oH46cLwYcYBTgYlXh4Z7RuSxdiTSwrrsw9xCjBwawkwttWvyVd iDclsbIqtSg/vqg0J7X4EKM0B4uSOK/Rt/IUIYH0xJLU7NTUgtQimCwTB6dUA+OqMpaTN4MW c7xftXZpbLcUx/1zL7OvnlyydMHLAgub/EtrTi7xLzk5fUHY/ZSU7Enl9raXnW9v2MCUwHjM 5dGhhpSwNY9zbqd/qCtNz4q8YxTFWub7qKyThVtlmcfvXUq3Z14/c2jl/DXmbWVywT4PrI9n yxkVzDBxmvNjq3pe1fzMLtV28XIlluKMREMt5qLiRAA4o4pdegIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrHLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g6OHzSzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAldGz+wJbwZPaitfPpjI1MM5N72Lk5JAQMJH4MmMBM4jNJqAucePGTzBbRMBM 4mDrH3YQm1ngLpPEgX62LkYuDmGBRkaJwycOMoEkWARUJdb9f80GYvMCNZz79JgRYqi8xOoN B8AGcQLFf8zoBasREjCVeLfgEtMERq4FjAyrGEUy88pyEzNzTPWKszMq8zIr9JLzczcxAkN5 We2fiTsYv1x2P8QowMGoxMP74PHWdCHWxLLiytxDjBIczEoivG31W9KFeFMSK6tSi/Lji0pz UosPMUpzsCiJ83qFpyYICaQnlqRmp6YWpBbBZJk4OKUaGJXPXDF0FNj1XHTD0eXRaa9NzyTe 239kpdSGmb2Oq5V7tsZNOF7I2XUtUqF8YYOVdVH3vsZrSl8kH5hOioh2XKOeH/I46KLt4/hd hYxndmkLbPuUUFti65WzPd7LIGJrfdfb91fZr220MgwJ3a34pb3/2eTNQnoqb++8dMvm0d3O uiRgQe9TNiWW4oxEQy3mouJEAG9p8vZhAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: B12C0180007 X-Rspamd-Server: rspam12 X-Stat-Signature: jgzaktwd4xisyex6xrisa43rmxqmnbt6 X-HE-Tag: 1740028845-92688 X-HE-Meta: U2FsdGVkX19q+mc3/w8kexscVlFz19mzypIAT2QL/BSnUna+CCQuMdWnbb2uzbAfyx0XaCRHOX5hdb/C6aUBXoPkY0ZsF7bjULZMdlgiWaTF/5y2yrxRiNGiGCS6JYfrGhoSB6Oo8Q1fRD9oMHn8GAUIe+U5LjAxVcgJysZSgSha1Vf3MffcCeB3B8VZyBAXoCWlZEt1WzWudtEHgqM1uZ5U2CS61XywzyxUKjUjnD+pfLT4ZZu5cVVCREE1rZUNYxTszonEinLxcFvx8XftuvG16ILP9JCAs+5qQogGeiNwMdSdqJbvjV5FY7CsuY0FdtIGaibRYkxIN3FSU9KvlQ/0jDnzjb6rMGuqO84jpG+7W/9I834IEYGIqj79hqV1raYoq8hPOQxX4zmV5YFRDNUFvPezWoNNZbOcS99wgfIiGIsuYYkqam5clTJs4K6Qc8csCMzWD4gt4B62+nflIZ5ozY0/nfaMWwF6+R74wQ8tL49Tckuw/+zxAK++QMgGolxmbqLQG9LbToLg1+mbzmgc38qxfhl4mUm5susLKePNPMnslkwBnTl1xDWjTE7YrecmBUD12yaR462eplVHyeShl5O8VWFas6fVgaQhoU1VXFpkQGTgZkhulqfR1OG+DnJfEflP4nf8QYORb9SLJ88ty9yz+JxKU91NzwvK2nLkAsfyHp4UHEwb1jEDwrOXxeVCZPLzNC1wnV7i5WDC1d3E3JIIBtk6xOVQeycTyBpcePASqbIG8IHPiVrMZGF4QgWdpproN3fAXv1lRi6RuT1E/LsaTEK0+YrSbnmrJ3Uu+4RawDuC77wWq1g8K0RVy7KVus35+xIpccJ8VIG8vuB2k6hnvcivsnfrkeQDpYWfTUlY0Sk0x2bFE/dhr0c+S7DAbhxgWdwxvLuQnoENiBHBjnxbilNySMhfcKvQydMc8LWpvXMkvpNQL+MbtwtZqCRO18Gqm+XMNAOUn9U /cg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: luf mechanism performs tlb shootdown for mappings that have been unmapped in lazy manner. However, it doesn't have to perform tlb shootdown to cpus that already have been done by others since the tlb shootdown was desired. Since luf already introduced its own generation number used as a global timestamp, luf_ugen, it's possible to selectively pick cpus that have been done tlb flush required. This patch introduced APIs that use the generation number to select and remove those cpus so that it can perform tlb shootdown with a smaller cpumask, for all the CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH archs, x86, riscv, and arm64. Signed-off-by: Byungchul Park --- arch/arm64/include/asm/tlbflush.h | 26 +++++++ arch/riscv/include/asm/tlbflush.h | 4 ++ arch/riscv/mm/tlbflush.c | 108 ++++++++++++++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 4 ++ arch/x86/mm/tlb.c | 108 ++++++++++++++++++++++++++++++ include/linux/sched.h | 1 + mm/internal.h | 4 ++ mm/page_alloc.c | 32 +++++++-- mm/rmap.c | 46 ++++++++++++- 9 files changed, 327 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index a62e1ea61e4af..f8290bec32e01 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -354,6 +354,32 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) dsb(ish); } +static inline bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* + * Nothing is needed in this architecture. + */ + return true; +} + +static inline bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* + * Nothing is needed in this architecture. + */ + return true; +} + +static inline void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* nothing to do */ +} + +static inline void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + /* nothing to do */ +} + static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { /* nothing to do */ diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 1dc7d30273d59..ec5caeb3cf8ef 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -65,6 +65,10 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, unsigned long uaddr); void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 36f996af6256c..93afb7a299003 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -202,3 +202,111 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); } + +static DEFINE_PER_CPU(atomic_long_t, ugen_done); + +static int __init luf_init_arch(void) +{ + int cpu; + + for_each_cpu(cpu, cpu_possible_mask) + atomic_long_set(per_cpu_ptr(&ugen_done, cpu), LUF_UGEN_INIT - 1); + + return 0; +} +early_initcall(luf_init_arch); + +/* + * batch will not be updated. + */ +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (ugen_before(done, ugen)) + return false; + } + return true; +out: + return cpumask_empty(&batch->cpumask); +} + +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (!ugen_before(done, ugen)) + cpumask_clear_cpu(cpu, &batch->cpumask); + } +out: + return cpumask_empty(&batch->cpumask); +} + +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, &batch->cpumask) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, mm_cpumask(mm)) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 0ae9564c7301e..1fc5bacd72dff 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -293,6 +293,10 @@ static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm) } extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +extern bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 860e49b223fd7..975f58fa4b30f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1240,6 +1240,114 @@ void __flush_tlb_all(void) } EXPORT_SYMBOL_GPL(__flush_tlb_all); +static DEFINE_PER_CPU(atomic_long_t, ugen_done); + +static int __init luf_init_arch(void) +{ + int cpu; + + for_each_cpu(cpu, cpu_possible_mask) + atomic_long_set(per_cpu_ptr(&ugen_done, cpu), LUF_UGEN_INIT - 1); + + return 0; +} +early_initcall(luf_init_arch); + +/* + * batch will not be updated. + */ +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (ugen_before(done, ugen)) + return false; + } + return true; +out: + return cpumask_empty(&batch->cpumask); +} + +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (!ugen_before(done, ugen)) + cpumask_clear_cpu(cpu, &batch->cpumask); + } +out: + return cpumask_empty(&batch->cpumask); +} + +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, &batch->cpumask) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, mm_cpumask(mm)) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info *info; diff --git a/include/linux/sched.h b/include/linux/sched.h index 94321d51b91e8..5c6c4fd021973 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1377,6 +1377,7 @@ struct task_struct { #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) int luf_no_shootdown; int luf_takeoff_started; + unsigned long luf_ugen; #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/internal.h b/mm/internal.h index fe4a1c174895f..77657c17af204 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1246,6 +1246,7 @@ void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void try_to_unmap_flush_takeoff(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void reset_batch(struct tlbflush_unmap_batch *batch); void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); #else @@ -1261,6 +1262,9 @@ static inline void try_to_unmap_flush_takeoff(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void reset_batch(struct tlbflush_unmap_batch *batch) +{ +} static inline void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset) { } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 65acc437d8387..3032fedd8392b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -668,9 +668,11 @@ bool luf_takeoff_start(void) */ void luf_takeoff_end(void) { + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned long flags; bool no_shootdown; bool outmost = false; + unsigned long cur_luf_ugen; local_irq_save(flags); VM_WARN_ON(!current->luf_takeoff_started); @@ -697,10 +699,19 @@ void luf_takeoff_end(void) if (no_shootdown) goto out; + cur_luf_ugen = current->luf_ugen; + + current->luf_ugen = 0; + + if (cur_luf_ugen && arch_tlbbatch_diet(&tlb_ubc_takeoff->arch, cur_luf_ugen)) + reset_batch(tlb_ubc_takeoff); + try_to_unmap_flush_takeoff(); out: - if (outmost) + if (outmost) { VM_WARN_ON(current->luf_no_shootdown); + VM_WARN_ON(current->luf_ugen); + } } /* @@ -757,6 +768,7 @@ bool luf_takeoff_check_and_fold(struct page *page) struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned short luf_key = page_luf_key(page); struct luf_batch *lb; + unsigned long lb_ugen; unsigned long flags; /* @@ -770,13 +782,25 @@ bool luf_takeoff_check_and_fold(struct page *page) if (!luf_key) return true; - if (current->luf_no_shootdown) - return false; - lb = &luf_batch[luf_key]; read_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + + if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + read_unlock_irqrestore(&lb->lock, flags); + return true; + } + + if (current->luf_no_shootdown) { + read_unlock_irqrestore(&lb->lock, flags); + return false; + } + fold_batch(tlb_ubc_takeoff, &lb->batch, false); read_unlock_irqrestore(&lb->lock, flags); + + if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) + current->luf_ugen = lb_ugen; return true; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index 0aaf02b1b34c3..cf6667fb18fe2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -656,7 +656,7 @@ static unsigned long new_luf_ugen(void) return ugen; } -static void reset_batch(struct tlbflush_unmap_batch *batch) +void reset_batch(struct tlbflush_unmap_batch *batch) { arch_tlbbatch_clear(&batch->arch); batch->flush_required = false; @@ -743,8 +743,14 @@ static void __fold_luf_batch(struct luf_batch *dst_lb, * more tlb shootdown might be needed to fulfill the newer * request. Conservertively keep the newer one. */ - if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) + if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) { + /* + * Good chance to shrink the batch using the old ugen. + */ + if (dst_lb->ugen && arch_tlbbatch_diet(&dst_lb->batch.arch, dst_lb->ugen)) + reset_batch(&dst_lb->batch); dst_lb->ugen = src_ugen; + } fold_batch(&dst_lb->batch, src_batch, false); } @@ -772,17 +778,45 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +static unsigned long tlb_flush_start(void) +{ + /* + * Memory barrier implied in the atomic operation prevents + * reading luf_ugen from happening after the following + * tlb flush. + */ + return new_luf_ugen(); +} + +static void tlb_flush_end(struct arch_tlbflush_unmap_batch *arch, + struct mm_struct *mm, unsigned long ugen) +{ + /* + * Prevent the following marking from placing prior to the + * actual tlb flush. + */ + smp_mb(); + + if (arch) + arch_tlbbatch_mark_ugen(arch, ugen); + if (mm) + arch_mm_mark_ugen(mm, ugen); +} + void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned long ugen; if (!tlb_ubc_takeoff->flush_required) return; + ugen = tlb_flush_start(); arch_tlbbatch_flush(&tlb_ubc_takeoff->arch); + tlb_flush_end(&tlb_ubc_takeoff->arch, NULL, ugen); /* * Now that tlb shootdown of tlb_ubc_takeoff has been performed, @@ -871,13 +905,17 @@ void try_to_unmap_flush(void) struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + unsigned long ugen; fold_batch(tlb_ubc, tlb_ubc_ro, true); fold_batch(tlb_ubc, tlb_ubc_luf, true); if (!tlb_ubc->flush_required) return; + ugen = tlb_flush_start(); arch_tlbbatch_flush(&tlb_ubc->arch); + tlb_flush_end(&tlb_ubc->arch, NULL, ugen); + reset_batch(tlb_ubc); } @@ -1009,7 +1047,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm) int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT; if (pending != flushed) { + unsigned long ugen; + + ugen = tlb_flush_start(); arch_flush_tlb_batched_pending(mm); + tlb_flush_end(NULL, mm, ugen); /* * If the new TLB flushing is pending during flushing, leave From patchwork Thu Feb 20 05:20:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44918C021B1 for ; Thu, 20 Feb 2025 05:21:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D91028013C; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6126B2800FF; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35EDB2800FF; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E4FE52802A5 for ; Thu, 20 Feb 2025 00:20:48 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 68F671C6A53 for ; Thu, 20 Feb 2025 05:20:48 +0000 (UTC) X-FDA: 83139173376.24.76EBCEA Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id 8553140015 for ; Thu, 20 Feb 2025 05:20:46 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=ak+zjQsMqCP09rv/UE+8oNX25Q2cRqMkqJx9nTZ+WlE=; b=dg87N9T2DBsGNJj0hf7uZhb0ff0e7BdMN2+I+GmPAj5cl4uAuQxstB8zdIOG2TNGILZlqn cv3D3hpwZo4+I1rZPIn5ABTldPRle/CjugvsL+uc+dvQI9pbm5EYf7PcsI2tyw+GN9y8qs Bfmt/8ZY0pducSDW/qP7WDO5CtvVb6U= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028846; a=rsa-sha256; cv=none; b=Fq1cmRbMNX6uWGg2j7Ukqp1u4mp6LRmkSFObVP/W5tPORsiibQo92efdWjQo/LIl3/fFNc 55sBWllJ2HIpafq4GyVo1kBR71rri3XF56eHYDLRvY+Z2jLfkLlJhhpIUyNKhXGR9+uBoS gwwBEK1oVmmamTdSMHFhSFxxXjEkNNQ= X-AuditID: a67dfc5b-3c9ff7000001d7ae-08-67b6bba767e5 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 18/26] mm/page_alloc: retry 3 times to take pcp pages on luf check failure Date: Thu, 20 Feb 2025 14:20:19 +0900 Message-Id: <20250220052027.58847-19-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrMLMWRmVeSWpSXmKPExsXC9ZZnke7y3dvSDVpesVnMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M XV9+MhU84ao4/HITcwPjFo4uRg4OCQETiS13hGDM010uXYycHGwC6hI3bvxkBrFFBMwkDrb+ YQexmQXuMkkc6GcDsYUFkiROz3vPCmKzCKhK/H3ezAgyhheofntzGEhYQkBeYvWGA2BjOIHC P2b0grUKCZhKvFtwiamLkQuo5jObxKSnS1kgGiQlDq64wTKBkXcBI8MqRqHMvLLcxMwcE72M yrzMCr3k/NxNjMCgX1b7J3oH46cLwYcYBTgYlXh4Z7RuSxdiTSwrrsw9xCjBwawkwttWvyVd iDclsbIqtSg/vqg0J7X4EKM0B4uSOK/Rt/IUIYH0xJLU7NTUgtQimCwTB6dUA+PSYMcWnQkq 7Eafp5ns4pp3XCGXu9bNfsrEWeuPl4YzZ9k4K92Iv+lY9+hlytktKwxude141fXmy+2qJIFv 69zWvJ0a8ff0S4sPt+skprOLHdzm43bid3Db+hs2a/5qpu3geLvm7iPxdae2svjqPnaY1XFi WdZfma1VBjdWzvvBrmO4cNPuj18SlViKMxINtZiLihMB+JLdZ3YCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNLMWRmVeSWpSXmKPExsXC5WfdrLts97Z0g6lTrC3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlbHry0+mgidcFYdfbmJuYNzC0cXIwSEhYCJxusuli5GTg01AXeLGjZ/MILaI gJnEwdY/7CA2s8BdJokD/WwgtrBAksTpee9ZQWwWAVWJv8+bGUHG8ALVb28OAwlLCMhLrN5w AGwMJ1D4x4xesFYhAVOJdwsuMU1g5FrAyLCKUSQzryw3MTPHVK84O6MyL7NCLzk/dxMjMIiX 1f6ZuIPxy2X3Q4wCHIxKPLwPHm9NF2JNLCuuzD3EKMHBrCTC21a/JV2INyWxsiq1KD++qDQn tfgQozQHi5I4r1d4aoKQQHpiSWp2ampBahFMlomDU6qBMTvRs3+fScG/3O+HxW8ZCC/efUyM /ZMf/63QKu5Io7NbK9s1ii00Lu30ktBIKT43yfWpspkrS8+nnFspUZH+mv2f/ZZsn3WbcdHu nrVrtUV/mrbsPtWb96pmXlXEjdXG7TMdUn1SUua6aX8q8wpMORdo4daxhWOHEqP83cTvtUci tG+bTK5TYinOSDTUYi4qTgQA9wFB2l4CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 8553140015 X-Stat-Signature: ysce59yzxfhosy7uu5xsox7t75uu6uei X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740028846-562732 X-HE-Meta: U2FsdGVkX1+KzFML8jZsQ4A+8+9npHvJGfp5tIGojV/uM6wdd7ebiLCf3brtRlU+z0zAPF7jdAZ0jPSWEbVn/QyhpXb7hTgrb6/0Jc8bPDY3DXVJ1YZd2ta9rBbV2kRrEjb214W++KanO+c30DyFqf5kMklzoOM6vDH+rvu6RPWSi1f56tTo7PeZIfR6lM51wBrNxjZkp8ZZMxI/VB2uH+lwLgz3vFW+kW0LImSg4rEANxbSrgFxSceRuu+hijETPDyVoYkHmE/JnO+9EdkvUcKPQEFQUJezA6g9XWtGBGwcpxUHSKVVlQHhamGiejgAnQmQrXACNakQzLH0Xt3LNtlr6AS4iDMfA5OWFOgCdQOSrRrt5LlQMFVLXMT5Fh6G6DxPNi72cNLi93SrlxBH5vHQCEuwMrSPUPxzPtJHhgqXkrvLpbnnwez7UrpfTqrFXoOIrPJHKj+oJO/SDi+WcvvDbjYKtHXsWRAvELRi9UjDwuDhL1Ztj4ysAa/qRNMG/mdBQPDbUW6APeGrZzyZnu/X/FonEDEuYUfnedj6QXbGxpD1DRLMW3j+UQHo8xITDtKZuhnLv2uIpASZ0lzuFHAiGyTynoeULn7qY0IstVoYSVcd8KmGDFiGcscfzNewlqQVyL91viqRIXU9lNV+TQX+80nmuC584I2rzR9LmxFYcyzIdSpy0sjSmm3CreyBhNJUlX9nvfUyGh0t++iIze0H9ts2NQ1eqUaDV8LTg0Q2x0cdSkTr5Mn2tHXpsqvJmRd3b30kz9PMlHtZ1T+rpNUoZEXk0ef3rhSKYu+MhNB5Aj1aOovaEb2M48pd9EDSjNfEDgvp27dluA/ZOdQvZQmeQzMYNb0tYC7XAMexH4NKeICY37/9fVaN5rOWdrYIJJJCFcstZEq2yWPt1DSAp76vmxl/Wk1OPTksBmMXA1qSxfYevUKEIbbV4X26fIgeJ6IZaoxuXUPkSyuAoT6 j2vA94UR 953cD1psFyrbbj362mRQyLKnGTtujFz3UwKRB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Signed-off-by: Byungchul Park --- mm/page_alloc.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3032fedd8392b..0b6e7f235c4a1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3339,6 +3339,12 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, { struct page *page; + /* + * give up taking page from pcp if it fails to take pcp page + * 3 times due to the tlb shootdownable issue. + */ + int try_luf_pages = 3; + do { if (list_empty(list)) { int batch = nr_pcp_alloc(pcp, zone, order); @@ -3353,11 +3359,21 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, return NULL; } - page = list_first_entry(list, struct page, pcp_list); - if (!luf_takeoff_check_and_fold(page)) + list_for_each_entry(page, list, pcp_list) { + if (luf_takeoff_check_and_fold(page)) { + list_del(&page->pcp_list); + pcp->count -= 1 << order; + break; + } + if (!--try_luf_pages) + return NULL; + } + + /* + * If all the pages in the list fails... + */ + if (list_entry_is_head(page, list, pcp_list)) return NULL; - list_del(&page->pcp_list); - pcp->count -= 1 << order; } while (check_new_pages(page, order)); return page; From patchwork Thu Feb 20 05:20:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983336 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC06EC021B0 for ; Thu, 20 Feb 2025 05:21:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98A2B2800FF; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BEEA280152; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59BC528013A; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0E6422802A4 for ; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A0A93120B43 for ; Thu, 20 Feb 2025 05:20:48 +0000 (UTC) X-FDA: 83139173376.07.F04504C Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf22.hostedemail.com (Postfix) with ESMTP id BFB6FC0004 for ; Thu, 20 Feb 2025 05:20:46 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028847; a=rsa-sha256; cv=none; b=HBUG2UlzLCUG6Lax7hB/QRuke5SXTs/V0d6fVqKtxSKnH8DZquDxQt1GRjoXf1MN4AyoEe VrVZIsCc7BNj+ZJnIA+bl1kbf0HeyQdpoJzd5XV/A7SS+lh7/UdflrLIppMHgonAds//a+ 9Cjsr0AGPcdKGzlYgyGVTkWpfv51m34= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=GUxt7lsTEdOVq9d0xRPT8KmjmXeE+LrP7+FwfOr0VCU=; b=LORiYWVGQ9IlMt+wLX3zsnvuhdhyIgIo6lRqMPZJknClX1WUufX88S+17SQrDd23JnTgeR 95HJ5QEOiKyXW3etgw9Lmt/QGPvEyxykmwikbOvf4d/Ys8IyizbKwTKNoeWZHGoD+g6483 r0IBh0/GWP6bsop15V7leGQTYsPt3gQ= X-AuditID: a67dfc5b-3c9ff7000001d7ae-0d-67b6bba748fa From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 19/26] mm: skip luf tlb flush for luf'd mm that already has been done Date: Thu, 20 Feb 2025 14:20:20 +0900 Message-Id: <20250220052027.58847-20-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDWbM4bWYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ K7ofsRacta3YcNyhgbHdqIuRk0NCwESieeZ5Zhj76YtjTCA2m4C6xI0bP8HiIgJmEgdb/7CD 2MwCd5kkDvSzgdjCArESs3asZgGxWQRUJd7sOwVWzwtU3/nhFdRMeYnVGw6A2ZxA8R8zesF6 hQRMJd4tuAS0iwuo5j2bxLYzH6EaJCUOrrjBMoGRdwEjwypGocy8stzEzBwTvYzKvMwKveT8 3E2MwMBfVvsnegfjpwvBhxgFOBiVeHhntG5LF2JNLCuuzD3EKMHBrCTC21a/JV2INyWxsiq1 KD++qDQntfgQozQHi5I4r9G38hQhgfTEktTs1NSC1CKYLBMHp1QD49p/iQEPVnjsjfuRceze +ic5Rclr1rxfW5dq35Tnm73DpiN2Tl7cIbHeqbz8npyZH0QSY9sX1bE4iyi8TuQN1Nfj03rt p+lhWPKi5cefyrW7L213EX8vETnzq3plkq3QJRu2W66B0g7JHh5Pw7d+2+9tcWfRn4cNsRXL WHmfVwj+2OT07MEUJZbijERDLeai4kQAqLvAnngCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g1PtzBZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLw3JOsFpd3zWGzuLfmP6vF+V1rWS12LN3HZHHpwAImi+O9B5gs 5t/7zGaxedNUZovjU6YyWvz+AVR8ctZkFgdBj++tfSweO2fdZfdYsKnUY/MKLY/Fe14yeWxa 1cnmsenTJHaPd+fOsXucmPGbxWPeyUCP9/uusnksfvGByWPrLzuPxqnX2Dw+b5IL4I/isklJ zcksSy3St0vgyljR/Yi14KxtxYbjDg2M7UZdjJwcEgImEk9fHGMCsdkE1CVu3PjJDGKLCJhJ HGz9ww5iMwvcZZI40M8GYgsLxErM2rGaBcRmEVCVeLPvFFg9L1B954dXzBAz5SVWbzgAZnMC xX/M6AXrFRIwlXi34BLTBEauBYwMqxhFMvPKchMzc0z1irMzKvMyK/SS83M3MQLDeFntn4k7 GL9cdj/EKMDBqMTD++Dx1nQh1sSy4srcQ4wSHMxKIrxt9VvShXhTEiurUovy44tKc1KLDzFK c7AoifN6hacmCAmkJ5akZqemFqQWwWSZODilGhgVkh5HP29I0P5TYnToldmyfXmbbaZdK/y8 1e7HjO2sKQe+ann5Lz3otuF5kXndQdW0JlXlYJf5/Tsfl9yO+f7pOaM9i1NtV8HRDp/eqU9P /uudv+LdXRX+t8KPf2raLta9t/iK0+8zGpOsD7198Siu6fP3qvdlMlO9ZqYqxPtZP3EQC+DL c61RYinOSDTUYi4qTgQATtMI518CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: BFB6FC0004 X-Stat-Signature: 37uwds3h56poduafked8cyz55hyxwny4 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740028846-451433 X-HE-Meta: U2FsdGVkX18ESSsCpGCUXt7tqP58elXy3Uj7HnVrhaQlIhXXjkMT84yXZpNCZNPiY7rLtbGsRZSdLKhobvexRxi+enObosSz4spErwjQoHwTcwodv8GuDcjZ/EkBrl8/2lGKR5/lGzFQGQEJZOFXNn9qIs0tVhdQm3p7ezX8b9T7Q9uNYFPAxahdGJHiZbGEBHp+XouP4VGs0ImMavky+HhnArPkFMU/xRmJPULMzK/s1ECW3M4Oi8PmWyV3aFoLuizTGtiFGGVbnwYK20i6Qp/RY5FIsN8EyaQ9FqkrqAqaRallYcQB7fnmCUoaNw4UnZjZ67jelc+neYWMzC+Ymh8dmhtf6xycW0Rn7lQ2YCJO3TEm6Qt8Ykf74yEEsmZbY2DEqPu+N31HpiRnzgsHn/1zaDmxnA8kSI1CWkI9gKwGZFMlrIeAgz33UY24elHGO60RgfHipfYBv/h+i4TDSX9YGLgbEjYJsT0bDkU9lfbrz6OcCOu1SAbRjHQEEl737y47TJz9MsF0bzuTCNjuOQqndGgKLdYZ17Xz0fEXXiJdTpM5urrvrSfiu8yJtNkGLHjTWwKK1T7gtndWy3ewZmCdf2cUhRJrF1lg9NB8GA9dDnbkznIaspxi15rY/SJVqn3HOpFxW/6hrOEOnCkmRwIzxXDUKU88bClbHrrbzSBEscOuNmxQeZh4CoO4I7MGEqq9gEulFFV+pMP088zngRtW+vqjq6RyeQhnNYD/l9e6brUXyrgaixEQ83U8CXJyN2actLPUIUdLwNrTdZCTx1qfQxuwyqoULu5VQ8xttOPvaMeGHQP0VKpNtBSGWIqewf6vnXGdVGWhKJIT8wQilR6vNjzRnnOO5eYrSYH6KSz5MuRbYP3HNCkcAMAkI8xqB8wnPlEJLRAl6xCEoKMtoMy7WDIkSo/mCmkNQ8pTtdA6K8lmXolMxi3cEGKQSyG2rDDf9+u5xVQz0f5Dr3c d7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fault hander performs tlb flush pended by luf when a new pte becomes to have write permission, no matter whether tlb flush required has been performed or not. By storing luf generation number, luf_ugen, in struct mm_struct, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- include/asm-generic/tlb.h | 2 +- include/linux/mm_types.h | 9 +++++ kernel/fork.c | 1 + kernel/sched/core.c | 2 +- mm/memory.c | 22 ++++++++++-- mm/pgtable-generic.c | 2 +- mm/rmap.c | 74 +++++++++++++++++++++++++++++++++++++-- 7 files changed, 104 insertions(+), 8 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 4a99351be111e..94b329a5127a7 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -552,7 +552,7 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm /* * Don't leave stale tlb entries for this vma. */ - luf_flush(0); + luf_flush_vma(vma); if (tlb->fullmm) return; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index b3eb5a4e45efb..8de4c190ad514 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -38,8 +38,10 @@ struct luf_batch { unsigned long ugen; rwlock_t lock; }; +void luf_batch_init(struct luf_batch *lb); #else struct luf_batch {}; +static inline void luf_batch_init(struct luf_batch *lb) {} #endif /* @@ -1022,6 +1024,9 @@ struct mm_struct { * moving a PROT_NONE mapped page. */ atomic_t tlb_flush_pending; + + /* luf batch for this mm */ + struct luf_batch luf_batch; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* See flush_tlb_batched_pending() */ atomic_t tlb_flush_batched; @@ -1272,8 +1277,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) void luf_flush(unsigned short luf_key); +void luf_flush_mm(struct mm_struct *mm); +void luf_flush_vma(struct vm_area_struct *vma); #else static inline void luf_flush(unsigned short luf_key) {} +static inline void luf_flush_mm(struct mm_struct *mm) {} +static inline void luf_flush_vma(struct vm_area_struct *vma) {} #endif struct vm_fault; diff --git a/kernel/fork.c b/kernel/fork.c index 0061cf2450efd..593e74235ea8a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1268,6 +1268,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); spin_lock_init(&mm->arg_lock); + luf_batch_init(&mm->luf_batch); mm_init_cpumask(mm); mm_init_aio(mm); mm_init_owner(mm, p); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index aea08d8a9e258..c7665cb93f617 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5225,7 +5225,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) if (mm) { membarrier_mm_sync_core_before_usermode(mm); mmdrop_lazy_tlb_sched(mm); - luf_flush(0); + luf_flush_mm(mm); } if (unlikely(prev_state == TASK_DEAD)) { diff --git a/mm/memory.c b/mm/memory.c index 0e85c49bc5028..b02f86b1adb91 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6081,6 +6081,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, struct mm_struct *mm = vma->vm_mm; vm_fault_t ret; bool is_droppable; + struct address_space *mapping = NULL; bool flush = false; __set_current_state(TASK_RUNNING); @@ -6112,9 +6113,17 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, * should be considered. */ if (vma->vm_flags & (VM_WRITE | VM_MAYWRITE) || - flags & FAULT_FLAG_WRITE) + flags & FAULT_FLAG_WRITE) { flush = true; + /* + * Doesn't care the !VM_SHARED cases because it won't + * update the pages that might be shared with others. + */ + if (vma->vm_flags & VM_SHARED && vma->vm_file) + mapping = vma->vm_file->f_mapping; + } + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else @@ -6149,8 +6158,15 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, /* * Ensure to clean stale tlb entries for this vma. */ - if (flush) - luf_flush(0); + if (flush) { + /* + * If it has a VM_SHARED mapping, all the mms involved + * should be luf_flush'ed. + */ + if (mapping) + luf_flush(0); + luf_flush_mm(mm); + } return ret; } diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 215d8d93560fd..5a876c1c93a80 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -100,7 +100,7 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); else - luf_flush(0); + luf_flush_vma(vma); return pte; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index cf6667fb18fe2..e0304dc74c3a7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -695,7 +695,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, */ struct luf_batch luf_batch[NR_LUF_BATCH]; -static void luf_batch_init(struct luf_batch *lb) +void luf_batch_init(struct luf_batch *lb) { rwlock_init(&lb->lock); reset_batch(&lb->batch); @@ -778,6 +778,31 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +static void fold_luf_batch_mm(struct luf_batch *dst, + struct mm_struct *mm) +{ + unsigned long flags; + bool need_fold = false; + + read_lock_irqsave(&dst->lock, flags); + if (arch_tlbbatch_need_fold(&dst->batch.arch, mm)) + need_fold = true; + read_unlock(&dst->lock); + + write_lock(&dst->lock); + if (unlikely(need_fold)) + arch_tlbbatch_add_pending(&dst->batch.arch, mm, 0); + + /* + * dst->ugen represents sort of request for tlb shootdown. The + * newer it is, the more tlb shootdown might be needed to + * fulfill the newer request. Keep the newest one not to miss + * necessary tlb shootdown. + */ + dst->ugen = new_luf_ugen(); + write_unlock_irqrestore(&dst->lock, flags); +} + static unsigned long tlb_flush_start(void) { /* @@ -894,6 +919,49 @@ void luf_flush(unsigned short luf_key) } EXPORT_SYMBOL(luf_flush); +void luf_flush_vma(struct vm_area_struct *vma) +{ + struct mm_struct *mm; + struct address_space *mapping = NULL; + + if (!vma) + return; + + mm = vma->vm_mm; + /* + * Doesn't care the !VM_SHARED cases because it won't + * update the pages that might be shared with others. + */ + if (vma->vm_flags & VM_SHARED && vma->vm_file) + mapping = vma->vm_file->f_mapping; + + if (mapping) + luf_flush(0); + luf_flush_mm(mm); +} + +void luf_flush_mm(struct mm_struct *mm) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mm) + return; + + lb = &mm->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -962,8 +1030,10 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!can_luf_test()) tlb_ubc = ¤t->tlb_ubc; - else + else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; From patchwork Thu Feb 20 05:20:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A198C021AD for ; Thu, 20 Feb 2025 05:21:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F124F280152; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E9A0B28013A; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D14DB280152; Thu, 20 Feb 2025 00:20:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AA3D928013A for ; Thu, 20 Feb 2025 00:20:49 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5C77A120B42 for ; Thu, 20 Feb 2025 05:20:49 +0000 (UTC) X-FDA: 83139173418.28.5901CD1 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf07.hostedemail.com (Postfix) with ESMTP id 8789E40002 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=Pe9lRznUK0e58WeuUTkd8QcmDM9shuTVgn2f+WFrV4k=; b=SQqPEYUFamlrPC6oE1keTX+D63lmxOwBuui7m2Z3sojUnkyU0UhfD8sZ8lgfgNi1hNsTMa MfJBx0KlPFnSx1EmULmHLDjoeTfxduDH8DOnIGNImyjQaf9JWaMc41oMr6uTFxdVwGfPzr x+NpMYFli6+967DBTZWqpGXtps0qS68= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028847; a=rsa-sha256; cv=none; b=xkAzGophfAeZgnX7CJIaqJjHnZJMxk75l+nIBIfSss5icFTXtrter320NaP1fgsisZ35gs e9ndK6EBTSZOMpOWB7KfJHU//empusSVJAubcknDtR8GhtFdZqfYvJbH9Knlnsy+kJrYnX qPFcWTGDF2y6IUW/YlSMwBPXv3HCKYA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-12-67b6bba718ad From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 20/26] mm, fs: skip tlb flushes for luf'd filemap that already has been done Date: Thu, 20 Feb 2025 14:20:21 +0900 Message-Id: <20250220052027.58847-21-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDZp3CFrMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M pQ8WMBdcCqg4uG8XWwPjWYcuRk4OCQETiRNP3zPC2NN3XGUBsdkE1CVu3PjJDGKLCJhJHGz9 ww5iMwvcZZI40M/WxcjBISyQInHgnQZImEVAVeJn4zEmEJsXqLx31TUmiJHyEqs3HAAbwwkU /zGjlw3EFhIwlXi34BJQDRdQzWc2ieOfnrFCNEhKHFxxg2UCI+8CRoZVjEKZeWW5iZk5JnoZ lXmZFXrJ+bmbGIGBv6z2T/QOxk8Xgg8xCnAwKvHwzmjdli7EmlhWXJl7iFGCg1lJhLetfku6 EG9KYmVValF+fFFpTmrxIUZpDhYlcV6jb+UpQgLpiSWp2ampBalFMFkmDk6pBsbci8EnElQl DrVo5kx1c7znJDjZ69em24H5IUe2TjyfVqzNoRQUsO3uk65EXcGqxWFWphq/r24SXFta/TWy eSrDwUOn/u+/EmHZvIRRacoZf73l8k0njgot+vV6vx1T0v4Kn9ky329dmBq9+Ijylr4/nEZX lj+fNYNR0ER60e6c4HjnyPrctNtKLMUZiYZazEXFiQAFu6/BeAIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g18f2S3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlbH0wQLmgksBFQf37WJrYDzr0MXIySEhYCIxfcdVFhCbTUBd4saNn8wgtoiA mcTB1j/sIDazwF0miQP9bF2MHBzCAikSB95pgIRZBFQlfjYeYwKxeYHKe1ddY4IYKS+xesMB sDGcQPEfM3rZQGwhAVOJdwsuMU1g5FrAyLCKUSQzryw3MTPHVK84O6MyL7NCLzk/dxMjMIyX 1f6ZuIPxy2X3Q4wCHIxKPLwPHm9NF2JNLCuuzD3EKMHBrCTC21a/JV2INyWxsiq1KD++qDQn tfgQozQHi5I4r1d4aoKQQHpiSWp2ampBahFMlomDU6qBsXjb1Vn7X9zP7HIrCK5ZZTFlypzJ M4wau3MnvGIL6AqOfmtlybexSN9Q9+PB1faP5WZlvl7IqnFE4cNH9xUbJwj33597NlE3b8GD sNY5zREfZu1WevJla3F5+beOdU2GeQoro/xOf2ap12VSXiftE73WgEH6zsada+Z99zj4SIB/ t8weiZ7VN5RYijMSDbWYi4oTAR3U4EpfAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 8789E40002 X-Rspamd-Server: rspam07 X-Stat-Signature: ibdyeguafq79k4urq6nkikgnckbmojf8 X-HE-Tag: 1740028847-114123 X-HE-Meta: U2FsdGVkX18cOFKcfmnlq34MpfvkAI1XiHoAcHGcgU60XRDThRpdF9I4Gw+1T5Ed95lncd7OKKA3ib5LBAbHs471ihrFo+fVx94VaUkIIKPsmYzve0Lsj4NaMBvVXvqWd3u7okidTzz/SHV5IoqLXhsMea2SsP8i/A4mwYL25BxSuP89TkqQQt7jjXYlIYMVNz78vp3MMMO7V5v94IJRoefKyZolMj8n/OFeLSNqNxrnaKhfoeyksQ7vCx8nHXXrONBnxQ8CoTzdoo3iEHYN3BYL4EQUZUcZRfoxWJsigH3/x5VTjuTTeSCin+SzQTujbYdoxxuXJVl9taXTHDAo/6D53nGnieZHrQlCsFZJeXl59lbRGrhRQ1izicpnTVRhHFb61x3iJcPPuh5c7gLecgineBJaqv4TY8ooyHxJU76vZmW8x1oTdxq9Kk6SpuiKSnGgdZpfPnQSSBWSk+9Lm0o3l4nqeNrU/dcViNEyIvfRc4PBcZvqC0DdF+e+5WEi9gdLkolyU5WjBfsjbLYHRkmRmZCn5b0l8Il3Oy86eUci+RetWvlN4CS/ONoA/K3Ri7R73fnqC0UGeHEuHUgtYEawwcqujTOb+YaVAaxJOjNMD0AuqIdrV5NOLdFtFx0IG0WA0Hm5gdunUpIkFvWVTc8vFMGOdQeUQUCMHBrege1DF2KEd98L3E24ywZ/OulEEwWZYXZO8mIkzuQKS99THG/GUjzs/CVLRP8CFSxwmm7UFHTB+TbU7E5bG9ErHO3aH0MIGZkVEcj990KxJlHwTAKen60ClDn0PkFiWVPKHNG+zHxMw1M4QSrGva8YQlIZmbDXmnWBFy5ec213o/8wwYcc+jipriF3wAg+WSSDbOfkEVaCc1Vh1/Wt9FVZ4Djlnz955wv/T+vSG45PAmlgFxMfVx9fYxIQDSq3kbiSVh6MmFhuemNyUSYOXtN3Kc1cYbBkKfDuyus4qFhjiwA Tvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For luf'd filemap, tlb shootdown is performed when updating page cache, no matter whether tlb flushes required already has been done or not. By storing luf meta data in struct address_space and updating the luf meta data properly, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- fs/inode.c | 1 + include/linux/fs.h | 4 ++- include/linux/mm_types.h | 2 ++ mm/memory.c | 4 +-- mm/rmap.c | 59 +++++++++++++++++++++++++--------------- mm/truncate.c | 14 +++++----- mm/vmscan.c | 2 +- 7 files changed, 53 insertions(+), 33 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 46fbd5b234822..e155e51be2d28 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -404,6 +404,7 @@ static void __address_space_init_once(struct address_space *mapping) init_rwsem(&mapping->i_mmap_rwsem); INIT_LIST_HEAD(&mapping->i_private_list); spin_lock_init(&mapping->i_private_lock); + luf_batch_init(&mapping->luf_batch); mapping->i_mmap = RB_ROOT_CACHED; } diff --git a/include/linux/fs.h b/include/linux/fs.h index ec88270221bfe..0cc588c704cd1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -461,6 +461,7 @@ extern const struct address_space_operations empty_aops; * @i_private_lock: For use by the owner of the address_space. * @i_private_list: For use by the owner of the address_space. * @i_private_data: For use by the owner of the address_space. + * @luf_batch: Data to track need of tlb flush by luf. */ struct address_space { struct inode *host; @@ -482,6 +483,7 @@ struct address_space { struct list_head i_private_list; struct rw_semaphore i_mmap_rwsem; void * i_private_data; + struct luf_batch luf_batch; } __attribute__((aligned(sizeof(long)))) __randomize_layout; /* * On most architectures that alignment is already the case; but @@ -508,7 +510,7 @@ static inline int mapping_write_begin(struct file *file, * Ensure to clean stale tlb entries for this mapping. */ if (!ret) - luf_flush(0); + luf_flush_mapping(mapping); return ret; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8de4c190ad514..c50cfc1c6282f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1279,10 +1279,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); void luf_flush(unsigned short luf_key); void luf_flush_mm(struct mm_struct *mm); void luf_flush_vma(struct vm_area_struct *vma); +void luf_flush_mapping(struct address_space *mapping); #else static inline void luf_flush(unsigned short luf_key) {} static inline void luf_flush_mm(struct mm_struct *mm) {} static inline void luf_flush_vma(struct vm_area_struct *vma) {} +static inline void luf_flush_mapping(struct address_space *mapping) {} #endif struct vm_fault; diff --git a/mm/memory.c b/mm/memory.c index b02f86b1adb91..c98af5e567e89 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6161,10 +6161,10 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, if (flush) { /* * If it has a VM_SHARED mapping, all the mms involved - * should be luf_flush'ed. + * in the struct address_space should be luf_flush'ed. */ if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } diff --git a/mm/rmap.c b/mm/rmap.c index e0304dc74c3a7..0cb13e8fcd739 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -691,7 +691,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, #define NR_LUF_BATCH (1 << (sizeof(short) * 8)) /* - * Use 0th entry as accumulated batch. + * XXX: Reserve the 0th entry for later use. */ struct luf_batch luf_batch[NR_LUF_BATCH]; @@ -936,7 +936,7 @@ void luf_flush_vma(struct vm_area_struct *vma) mapping = vma->vm_file->f_mapping; if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } @@ -962,6 +962,29 @@ void luf_flush_mm(struct mm_struct *mm) try_to_unmap_flush(); } +void luf_flush_mapping(struct address_space *mapping) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mapping) + return; + + lb = &mapping->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush_mapping); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -1010,7 +1033,8 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -1032,27 +1056,15 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, tlb_ubc = ¤t->tlb_ubc; else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + if (mapping) + fold_luf_batch_mm(&mapping->luf_batch, mm); } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; - if (can_luf_test()) { - struct luf_batch *lb; - unsigned long flags; - - /* - * Accumulate to the 0th entry right away so that - * luf_flush(0) can be uesed to properly perform pending - * TLB flush once this unmapping is observed. - */ - lb = &luf_batch[0]; - write_lock_irqsave(&lb->lock, flags); - __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); - write_unlock_irqrestore(&lb->lock, flags); - } - /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -1134,7 +1146,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { } @@ -1503,7 +1516,7 @@ int folio_mkclean(struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return cleaned; } @@ -2037,6 +2050,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2174,7 +2188,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2414,6 +2428,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2563,7 +2578,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } diff --git a/mm/truncate.c b/mm/truncate.c index 14618c53f1910..f9a3416610231 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -128,7 +128,7 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -170,7 +170,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return 0; } @@ -220,7 +220,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); if (!folio_test_large(folio)) return true; @@ -282,7 +282,7 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } @@ -417,7 +417,7 @@ void truncate_inode_pages_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -537,7 +537,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return count; } @@ -704,7 +704,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index ffc4a48710f1d..cbca027d2a10e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -836,7 +836,7 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } From patchwork Thu Feb 20 05:20:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36C7AC021AD for ; Thu, 20 Feb 2025 05:21:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1F5F280170; Thu, 20 Feb 2025 00:20:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BDBCA28018F; Thu, 20 Feb 2025 00:20:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F70328013A; Thu, 20 Feb 2025 00:20:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 69AF3280170 for ; Thu, 20 Feb 2025 00:20:50 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 21A5A1C848F for ; Thu, 20 Feb 2025 05:20:50 +0000 (UTC) X-FDA: 83139173460.25.8D294BC Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id C6C2D1A0010 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028848; a=rsa-sha256; cv=none; b=EQOr3THhtQrX7YNN1NiBHGskV0wJn/d83vXKskAOssiMnEeVg7FBUHSDtxyZZm19PE9Ql5 RxaEKj8hIFAu4cusu+03B6SdEnBfh0s161WrBxjgJ2UQ1BQBu9y35afpg3C2DHvggNYM4E h1qnbOxN+FZQKXQvDOUZz+0nTvVEHvc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028848; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=ZP6+FC0YqZP5Ls8tMbMfTt5infjnPGb2P1iJV+kafV8=; b=Qe8w8pHC2yC6TCu0Hs1xbUuwb6/UzJC7w9ReRLoqPqF+mc7IJIhJUtwTNiFaCYK8ZiUZgi xBvgwnfueNHDt6K7+YhDMJRvYXdB8beshe5015VLXZ3e3zcLuQLD+CBL33Hr/7CNK6vlmG 2eCiZczDqzUWEn7ro8alkDmbDogz87o= X-AuditID: a67dfc5b-3c9ff7000001d7ae-17-67b6bba717eb From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 21/26] mm: perform luf tlb shootdown per zone in batched manner Date: Thu, 20 Feb 2025 14:20:22 +0900 Message-Id: <20250220052027.58847-22-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrBLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDea+ErOYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ f87tZy6438NUsfH/HMYGxj/XGbsYOTkkBEwkXnyfxw5jP7rxH8xmE1CXuHHjJzOILSJgJnGw 9Q9YnFngLpPEgX42EFtYIFzi99I7rCA2i4CqxLOna8Bm8gLVb7mxnxViprzE6g0HwOZwAsV/ zOgF6xUSMJV4t+ASUxcjF1DNZzaJjrXT2SAaJCUOrrjBMoGRdwEjwypGocy8stzEzBwTvYzK vMwKveT83E2MwPBfVvsnegfjpwvBhxgFOBiVeHhntG5LF2JNLCuuzD3EKMHBrCTC21a/JV2I NyWxsiq1KD++qDQntfgQozQHi5I4r9G38hQhgfTEktTs1NSC1CKYLBMHp1QDo9AnntNepS/s 7kW/2Mt863fB50MTE6tdz0Y/MMn58Czy+b+Zx6XWlpt92/7o0/xV/8Na9WSK188orXmTkFT/ +pJKs03MreLC/73HbiuVfir7z9M/WUcu3epBcXaR63Xngy5vH9ql5c1LPnI58qHqySLGwqU7 6gr4gn0NSvSz9q29ycyYu/WWqhJLcUaioRZzUXEiAFHrTcV7AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrPLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0gwv3uS3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlfHn3H7mgvs9TBUb/89hbGD8c52xi5GTQ0LAROLRjf/sIDabgLrEjRs/mUFs EQEziYOtf8DizAJ3mSQO9LOB2MIC4RK/l95hBbFZBFQlnj1dAzaHF6h+y439rBAz5SVWbzgA NocTKP5jRi9Yr5CAqcS7BZeYJjByLWBkWMUokplXlpuYmWOqV5ydUZmXWaGXnJ+7iREYzMtq /0zcwfjlsvshRgEORiUe3gePt6YLsSaWFVfmHmKU4GBWEuFtq9+SLsSbklhZlVqUH19UmpNa fIhRmoNFSZzXKzw1QUggPbEkNTs1tSC1CCbLxMEp1cA4q6dGsWb9c9kO+Qeu4XWnRHyb4hfe ZpRdye3SdWBJ/3KZB18P3k7YfqD2kfreKxq7DXfpNM74cIC/+NXpy0UL3TaqT6w04LX3e7Hg SnDk0teaCqni04494vftmr13a54j7z33iX9UYi+o+sRMkxTQN+zq4uIsZZull/4u1m/qmdx1 y965vJ2uxFKckWioxVxUnAgAJ6ln02ICAAA= X-CFilter-Loop: Reflected X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C6C2D1A0010 X-Stat-Signature: 4sbtr7tbco1f579epdttqzgt9ru4beuc X-Rspam-User: X-HE-Tag: 1740028847-322806 X-HE-Meta: U2FsdGVkX19yQs/B7vsgHAje3uexC2OzpY7dRiUn55LlSiCfjPTdThcc3PMLvAnjexYJ0xBLJ69ACd6WexrSkhnIygIQY+SK6c3C0A/8R/+39/i994yAtk0mXNaYKGReHjglLuRPCZnFWlQoq4rWCV0ExnbtEmMlHnXEW0PtVbEyzgvt4LRroAmUENJfYLzlvM05iGMMpkYp3GGBplg1qhFJk5moyaWhkQIi/fIp5nt+ORY3r+3YkMRvGlbq7dQ6BixYdVV2wuCLMeGxZR9kJtjpuIPgUS4+qAayPvOPp90U5USxtS6qwUvlOl/gpM5HlOHO6DF/ecrE5App0Ud/BOXyBpZLzif+Or8SAerOYgU4Tum3fByBJ3wgxbIteTmA6EHXkHWquAN84UJ7XecmGQeI8+2h8UwUYk+QRNmc4UpwrKjyLV1Udz3Zzq+Jui20Pf9HbqB43+wpvgN3v9yzBr9zVm3dfR6c9t85ZwfbyFptTwu3ZH/ZPczu9ZyB/RlOdUCYdaPc/KGhFSQROcQpAD7Nsku2nTuuPfhEm0yIW0Po6J8+ukdA57s1+p0/0rLaG8AldclObfUvXziQrdjP8Exgci+sPFolA23S2hFUK1Z85eKc0yyvNNhZFGXz5sum0sZyz+k38aaNTeV2/jp4HtsXorJI+LwkrRmb8LsACX5eyAl3dfee5ZuaTEfe29esKTlJUz6eQL/119IUw/jqFxiy1OcJZh2TeOyswPIriCxMapUfIVgbX+zFoMxalTsQqk2Va9n9iXz35QTjysoaOBV1n4TxLiIr/DBtpnEv//2juYyHpAbp6H1Nx0j9PXPkjV9HZp+zZsJSDtK4M/dKlkxLKCnDgNH02mCMwdaw7nNyBR3TKCRghghjkD4zkFIuNSmZRHnOL4W6+pyYcUcQ7cWXiOmMTo4fsav6Y25/DWP+Ln7AQAS4b7tKKf63r526gtBjQjGHotnQ3HuAOOy 2Yw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Each luf page in buddy has its pending tlb shootdown information and performs the corresponding tlb shootdown on exit from buddy. However, every exit from buddy causes small but frequent IPIs. Even though total IPIs get reduced, unnecessary waits on conflict CPUs in IPI handler have been observed via perf profiling. Thus, made it perfrom luf tlb shootdown per zone in batched manner when pages exit from buddy so as to avoid frequent IPIs. Signed-off-by: Byungchul Park --- include/linux/mm.h | 44 ++++- include/linux/mm_types.h | 19 +- include/linux/mmzone.h | 9 + include/linux/sched.h | 2 + mm/compaction.c | 10 +- mm/internal.h | 13 +- mm/mm_init.c | 5 + mm/page_alloc.c | 363 +++++++++++++++++++++++++++++++-------- mm/page_reporting.c | 9 +- mm/rmap.c | 6 +- 10 files changed, 383 insertions(+), 97 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 53a5f1cb21e0d..46638e86e8073 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4161,12 +4161,16 @@ static inline int do_mseal(unsigned long start, size_t len_in, unsigned long fla } #endif -#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) /* * luf_ugen will start with 2 so that 1 can be regarded as a passed one. */ #define LUF_UGEN_INIT 2 +/* + * zone_ugen will start with 2 so that 1 can be regarded as done. + */ +#define ZONE_UGEN_INIT 2 +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) static inline bool ugen_before(unsigned long a, unsigned long b) { /* @@ -4177,7 +4181,11 @@ static inline bool ugen_before(unsigned long a, unsigned long b) static inline unsigned long next_ugen(unsigned long ugen) { - if (ugen + 1) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if ((unsigned short)(ugen + 1)) return ugen + 1; /* * Avoid invalid ugen, zero. @@ -4187,7 +4195,11 @@ static inline unsigned long next_ugen(unsigned long ugen) static inline unsigned long prev_ugen(unsigned long ugen) { - if (ugen - 1) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if ((unsigned short)(ugen - 1)) return ugen - 1; /* * Avoid invalid ugen, zero. @@ -4195,4 +4207,30 @@ static inline unsigned long prev_ugen(unsigned long ugen) return ugen - 2; } #endif + +/* + * return the biggest ugen but it should be before the real zone_ugen. + */ +static inline unsigned long page_zone_ugen(struct zone *zone, struct page *page) +{ + unsigned long zone_ugen = zone->zone_ugen; + unsigned short short_zone_ugen = page->zone_ugen; + unsigned long cand1, cand2; + + if (!short_zone_ugen) + return 0; + + cand1 = (zone_ugen & ~(unsigned long)USHRT_MAX) | short_zone_ugen; + cand2 = cand1 - USHRT_MAX - 1; + + if (!ugen_before(zone_ugen, cand1)) + return cand1; + + return cand2; +} + +static inline void set_page_zone_ugen(struct page *page, unsigned short zone_ugen) +{ + page->zone_ugen = zone_ugen; +} #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c50cfc1c6282f..e3132e1e5e5d2 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -132,11 +132,20 @@ struct page { */ unsigned short order; - /* - * For tracking need of tlb flush, - * by luf(lazy unmap flush). - */ - unsigned short luf_key; + union { + /* + * For tracking need of + * tlb flush, by + * luf(lazy unmap flush). + */ + unsigned short luf_key; + + /* + * Casted zone_ugen with + * unsigned short. + */ + unsigned short zone_ugen; + }; }; }; }; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ac3178b5fc50b..3c1b04d21fda9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -117,6 +117,7 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_head free_list[MIGRATE_TYPES]; struct list_head pend_list[MIGRATE_TYPES]; + unsigned long pend_zone_ugen[MIGRATE_TYPES]; unsigned long nr_free; }; @@ -998,6 +999,14 @@ struct zone { atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; /* Count pages that need tlb shootdown on allocation */ atomic_long_t nr_luf_pages; + /* Generation number for that tlb shootdown has been done */ + unsigned long zone_ugen_done; + /* Generation number to control zone batched tlb shootdown */ + unsigned long zone_ugen; + /* Approximate latest luf_ugen that have ever entered */ + unsigned long luf_ugen; + /* Accumulated tlb batch for this zone */ + struct tlbflush_unmap_batch zone_batch; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/include/linux/sched.h b/include/linux/sched.h index 5c6c4fd021973..463cb2fb8f919 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1378,6 +1378,8 @@ struct task_struct { int luf_no_shootdown; int luf_takeoff_started; unsigned long luf_ugen; + unsigned long zone_ugen; + unsigned long wait_zone_ugen; #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/compaction.c b/mm/compaction.c index 27f3d743762bb..a7f17867decae 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -653,7 +653,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, goto isolate_fail; } - if (!luf_takeoff_check(page)) + if (!luf_takeoff_check(cc->zone, page)) goto isolate_fail; /* Found a free page, will break it into order-0 pages */ @@ -689,7 +689,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(cc->zone); /* * Be careful to not go outside of the pageblock. @@ -1611,7 +1611,7 @@ static void fast_isolate_freepages(struct compact_control *cc) order_scanned++; nr_scanned++; - if (unlikely(consider_pend && !luf_takeoff_check(freepage))) + if (unlikely(consider_pend && !luf_takeoff_check(cc->zone, freepage))) goto scan_next; pfn = page_to_pfn(freepage); @@ -1679,7 +1679,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(cc->zone); /* Skip fast search if enough freepages isolated */ if (cc->nr_freepages >= cc->nr_migratepages) @@ -2415,7 +2415,7 @@ static enum compact_result compact_finished(struct compact_control *cc) */ luf_takeoff_start(); ret = __compact_finished(cc); - luf_takeoff_end(); + luf_takeoff_end(cc->zone); trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) diff --git a/mm/internal.h b/mm/internal.h index 77657c17af204..e634eaf220f00 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1595,10 +1595,10 @@ static inline void accept_page(struct page *page) #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) extern struct luf_batch luf_batch[]; bool luf_takeoff_start(void); -void luf_takeoff_end(void); +void luf_takeoff_end(struct zone *zone); bool luf_takeoff_no_shootdown(void); -bool luf_takeoff_check(struct page *page); -bool luf_takeoff_check_and_fold(struct page *page); +bool luf_takeoff_check(struct zone *zone, struct page *page); +bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page); static inline bool non_luf_pages_ok(struct zone *zone) { @@ -1608,7 +1608,6 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } - unsigned short fold_unmap_luf(void); /* @@ -1696,10 +1695,10 @@ static inline bool can_luf_vma(struct vm_area_struct *vma) } #else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ static inline bool luf_takeoff_start(void) { return false; } -static inline void luf_takeoff_end(void) {} +static inline void luf_takeoff_end(struct zone *zone) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } -static inline bool luf_takeoff_check(struct page *page) { return true; } -static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +static inline bool luf_takeoff_check(struct zone *zone, struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) { return true; } static inline bool non_luf_pages_ok(struct zone *zone) { return true; } static inline unsigned short fold_unmap_luf(void) { return 0; } diff --git a/mm/mm_init.c b/mm/mm_init.c index 12b96cd6a87b0..58e616ceef52a 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1397,6 +1397,7 @@ static void __meminit zone_init_free_lists(struct zone *zone) for_each_migratetype_order(order, t) { INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); INIT_LIST_HEAD(&zone->free_area[order].pend_list[t]); + zone->free_area[order].pend_zone_ugen[t] = ZONE_UGEN_INIT; zone->free_area[order].nr_free = 0; } @@ -1404,6 +1405,10 @@ static void __meminit zone_init_free_lists(struct zone *zone) INIT_LIST_HEAD(&zone->unaccepted_pages); #endif atomic_long_set(&zone->nr_luf_pages, 0); + zone->zone_ugen_done = ZONE_UGEN_INIT - 1; + zone->zone_ugen = ZONE_UGEN_INIT; + zone->luf_ugen = LUF_UGEN_INIT - 1; + reset_batch(&zone->zone_batch); } void __meminit init_currently_empty_zone(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0b6e7f235c4a1..b81931c6f2cfd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -663,16 +663,29 @@ bool luf_takeoff_start(void) return !no_shootdown; } +static void wait_zone_ugen_done(struct zone *zone, unsigned long zone_ugen) +{ + while (ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + cond_resched(); +} + +static void set_zone_ugen_done(struct zone *zone, unsigned long zone_ugen) +{ + WRITE_ONCE(zone->zone_ugen_done, zone_ugen); +} + /* * Should be called within the same context of luf_takeoff_start(). */ -void luf_takeoff_end(void) +void luf_takeoff_end(struct zone *zone) { struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned long flags; bool no_shootdown; bool outmost = false; unsigned long cur_luf_ugen; + unsigned long cur_zone_ugen; + unsigned long cur_wait_zone_ugen; local_irq_save(flags); VM_WARN_ON(!current->luf_takeoff_started); @@ -700,6 +713,8 @@ void luf_takeoff_end(void) goto out; cur_luf_ugen = current->luf_ugen; + cur_zone_ugen = current->zone_ugen; + cur_wait_zone_ugen = current->wait_zone_ugen; current->luf_ugen = 0; @@ -707,10 +722,38 @@ void luf_takeoff_end(void) reset_batch(tlb_ubc_takeoff); try_to_unmap_flush_takeoff(); + + if (cur_wait_zone_ugen || cur_zone_ugen) { + /* + * pcp(zone == NULL) doesn't work with zone batch. + */ + if (zone) { + current->zone_ugen = 0; + current->wait_zone_ugen = 0; + + /* + * Guarantee that tlb shootdown required for the + * zone_ugen has been completed once observing + * 'zone_ugen_done'. + */ + smp_mb(); + + /* + * zone->zone_ugen_done should be updated + * sequentially. + */ + if (cur_wait_zone_ugen) + wait_zone_ugen_done(zone, cur_wait_zone_ugen); + if (cur_zone_ugen) + set_zone_ugen_done(zone, cur_zone_ugen); + } + } out: if (outmost) { VM_WARN_ON(current->luf_no_shootdown); VM_WARN_ON(current->luf_ugen); + VM_WARN_ON(current->zone_ugen); + VM_WARN_ON(current->wait_zone_ugen); } } @@ -741,9 +784,9 @@ bool luf_takeoff_no_shootdown(void) * Should be called with either zone lock held and irq disabled or pcp * lock held. */ -bool luf_takeoff_check(struct page *page) +bool luf_takeoff_check(struct zone *zone, struct page *page) { - unsigned short luf_key = page_luf_key(page); + unsigned long zone_ugen; /* * No way. Delimit using luf_takeoff_{start,end}(). @@ -753,7 +796,29 @@ bool luf_takeoff_check(struct page *page) return false; } - if (!luf_key) + if (!zone) { + unsigned short luf_key = page_luf_key(page); + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + return true; + } + + zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + return true; + + /* + * Should not be zero since zone-zone_ugen has been updated in + * __free_one_page() -> update_zone_batch(). + */ + VM_WARN_ON(!zone->zone_ugen); + + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) return true; return !current->luf_no_shootdown; @@ -763,13 +828,11 @@ bool luf_takeoff_check(struct page *page) * Should be called with either zone lock held and irq disabled or pcp * lock held. */ -bool luf_takeoff_check_and_fold(struct page *page) +bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) { struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; - unsigned short luf_key = page_luf_key(page); - struct luf_batch *lb; - unsigned long lb_ugen; unsigned long flags; + unsigned long zone_ugen; /* * No way. Delimit using luf_takeoff_{start,end}(). @@ -779,28 +842,94 @@ bool luf_takeoff_check_and_fold(struct page *page) return false; } - if (!luf_key) - return true; + /* + * pcp case + */ + if (!zone) { + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long lb_ugen; - lb = &luf_batch[luf_key]; - read_lock_irqsave(&lb->lock, flags); - lb_ugen = lb->ugen; + if (!luf_key) + return true; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + + if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + read_unlock_irqrestore(&lb->lock, flags); + return true; + } + + if (current->luf_no_shootdown) { + read_unlock_irqrestore(&lb->lock, flags); + return false; + } - if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + fold_batch(tlb_ubc_takeoff, &lb->batch, false); read_unlock_irqrestore(&lb->lock, flags); + + if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) + current->luf_ugen = lb_ugen; return true; } - if (current->luf_no_shootdown) { - read_unlock_irqrestore(&lb->lock, flags); + zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + return true; + + /* + * Should not be zero since zone-zone_ugen has been updated in + * __free_one_page() -> update_zone_batch(). + */ + VM_WARN_ON(!zone->zone_ugen); + + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + return true; + + if (current->luf_no_shootdown) return false; - } - fold_batch(tlb_ubc_takeoff, &lb->batch, false); - read_unlock_irqrestore(&lb->lock, flags); + /* + * zone batched flush has been already set. + */ + if (current->zone_ugen) + return true; + + /* + * Others are already performing tlb shootdown for us. All we + * need is to wait for those to complete. + */ + if (zone_ugen != zone->zone_ugen) { + if (!current->wait_zone_ugen || + ugen_before(current->wait_zone_ugen, zone_ugen)) + current->wait_zone_ugen = zone_ugen; + /* + * It's the first time that zone->zone_ugen has been set to + * current->zone_ugen. current->luf_ugen also get set. + */ + } else { + current->wait_zone_ugen = prev_ugen(zone->zone_ugen); + current->zone_ugen = zone->zone_ugen; + current->luf_ugen = zone->luf_ugen; + + /* + * Now that tlb shootdown for the zone_ugen will be + * performed at luf_takeoff_end(), advance it so that + * the next zone->lock holder can efficiently avoid + * unnecessary tlb shootdown. + */ + zone->zone_ugen = next_ugen(zone->zone_ugen); - if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) - current->luf_ugen = lb_ugen; + /* + * All the luf pages will eventually become non-luf + * pages by tlb flushing at luf_takeoff_end() and, + * flush_pend_list_if_done() will empty pend_list. + */ + atomic_long_set(&zone->nr_luf_pages, 0); + fold_batch(tlb_ubc_takeoff, &zone->zone_batch, true); + } return true; } #endif @@ -822,6 +951,42 @@ static inline void account_freepages(struct zone *zone, int nr_pages, zone->nr_free_highatomic + nr_pages); } +static void flush_pend_list_if_done(struct zone *zone, + struct free_area *area, int migratetype) +{ + unsigned long zone_ugen_done = READ_ONCE(zone->zone_ugen_done); + + /* + * tlb shootdown required for the zone_ugen already has been + * done. Thus, let's move pages in pend_list to free_list to + * secure more non-luf pages. + */ + if (!ugen_before(zone_ugen_done, area->pend_zone_ugen[migratetype])) + list_splice_init(&area->pend_list[migratetype], + &area->free_list[migratetype]); +} + +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +/* + * Should be called with zone->lock held and irq disabled. + */ +static void update_zone_batch(struct zone *zone, unsigned short luf_key) +{ + unsigned long lb_ugen; + struct luf_batch *lb = &luf_batch[luf_key]; + + read_lock(&lb->lock); + fold_batch(&zone->zone_batch, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock(&lb->lock); + + if (ugen_before(zone->luf_ugen, lb_ugen)) + zone->luf_ugen = lb_ugen; +} +#else +static void update_zone_batch(struct zone *zone, unsigned short luf_key) {} +#endif + /* Used for pages not on another list */ static inline void __add_to_free_list(struct page *page, struct zone *zone, unsigned int order, int migratetype, @@ -830,6 +995,12 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, struct free_area *area = &zone->free_area[order]; struct list_head *list; + /* + * Good chance to flush pend_list just before updating the + * {free,pend}_list. + */ + flush_pend_list_if_done(zone, area, migratetype); + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), migratetype, 1 << order); @@ -839,8 +1010,9 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, * positive is okay because it will cause just additional tlb * shootdown. */ - if (page_luf_key(page)) { + if (page_zone_ugen(zone, page)) { list = &area->pend_list[migratetype]; + area->pend_zone_ugen[migratetype] = zone->zone_ugen; atomic_long_add(1 << order, &zone->nr_luf_pages); } else list = &area->free_list[migratetype]; @@ -862,6 +1034,7 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, unsigned int order, int old_mt, int new_mt) { struct free_area *area = &zone->free_area[order]; + unsigned long zone_ugen = page_zone_ugen(zone, page); /* Free page moving can fail, so it happens before the type update */ VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt, @@ -878,9 +1051,12 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, * positive is okay because it will cause just additional tlb * shootdown. */ - if (page_luf_key(page)) + if (zone_ugen) { list_move_tail(&page->buddy_list, &area->pend_list[new_mt]); - else + if (!area->pend_zone_ugen[new_mt] || + ugen_before(area->pend_zone_ugen[new_mt], zone_ugen)) + area->pend_zone_ugen[new_mt] = zone_ugen; + } else list_move_tail(&page->buddy_list, &area->free_list[new_mt]); account_freepages(zone, -(1 << order), old_mt); @@ -898,7 +1074,7 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon if (page_reported(page)) __ClearPageReported(page); - if (page_luf_key(page)) + if (page_zone_ugen(zone, page)) atomic_long_sub(1 << order, &zone->nr_luf_pages); list_del(&page->buddy_list); @@ -936,29 +1112,39 @@ static inline struct page *get_page_from_free_area(struct zone *zone, */ pend_first = !non_luf_pages_ok(zone); + /* + * Good chance to flush pend_list just before updating the + * {free,pend}_list. + */ + flush_pend_list_if_done(zone, area, migratetype); + if (pend_first) { page = list_first_entry_or_null(&area->pend_list[migratetype], struct page, buddy_list); - if (page && luf_takeoff_check(page)) + if (page && luf_takeoff_check(zone, page)) return page; page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); - if (page) + if (page) { + set_page_zone_ugen(page, 0); return page; + } } else { page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); - if (page) + if (page) { + set_page_zone_ugen(page, 0); return page; + } page = list_first_entry_or_null(&area->pend_list[migratetype], struct page, buddy_list); - if (page && luf_takeoff_check(page)) + if (page && luf_takeoff_check(zone, page)) return page; } return NULL; @@ -1023,6 +1209,7 @@ static inline void __free_one_page(struct page *page, unsigned long combined_pfn; struct page *buddy; bool to_tail; + unsigned long zone_ugen; VM_BUG_ON(!zone_is_initialized(zone)); VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); @@ -1034,20 +1221,25 @@ static inline void __free_one_page(struct page *page, account_freepages(zone, 1 << order, migratetype); /* - * Use the page's luf_key unchanged if luf_key == 0. Worth - * noting that page_luf_key() will be 0 in most cases since it's - * initialized at free_pages_prepare(). + * Use the page's zone_ugen unchanged if luf_key == 0. Worth + * noting that page_zone_ugen() will be 0 in most cases since + * it's initialized at free_pages_prepare(). + * + * Update page's zone_ugen and zone's batch only if a valid + * luf_key was passed. */ - if (luf_key) - set_page_luf_key(page, luf_key); - else - luf_key = page_luf_key(page); + if (luf_key) { + zone_ugen = zone->zone_ugen; + set_page_zone_ugen(page, (unsigned short)zone_ugen); + update_zone_batch(zone, luf_key); + } else + zone_ugen = page_zone_ugen(zone, page); while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; - unsigned short buddy_luf_key; + unsigned long buddy_zone_ugen; - if (!luf_key && compaction_capture(capc, page, order, migratetype)) { + if (!zone_ugen && compaction_capture(capc, page, order, migratetype)) { account_freepages(zone, -(1 << order), migratetype); return; } @@ -1080,17 +1272,15 @@ static inline void __free_one_page(struct page *page, else __del_page_from_free_list(buddy, zone, order, buddy_mt); + buddy_zone_ugen = page_zone_ugen(zone, buddy); + /* - * !buddy_luf_key && !luf_key : do nothing - * buddy_luf_key && !luf_key : luf_key = buddy_luf_key - * !buddy_luf_key && luf_key : do nothing - * buddy_luf_key && luf_key : merge two into luf_key + * if (!zone_ugen && !buddy_zone_ugen) : nothing to do + * if ( zone_ugen && !buddy_zone_ugen) : nothing to do */ - buddy_luf_key = page_luf_key(buddy); - if (buddy_luf_key && !luf_key) - luf_key = buddy_luf_key; - else if (buddy_luf_key && luf_key) - fold_luf_batch(&luf_batch[luf_key], &luf_batch[buddy_luf_key]); + if ((!zone_ugen && buddy_zone_ugen) || + ( zone_ugen && buddy_zone_ugen && ugen_before(zone_ugen, buddy_zone_ugen))) + zone_ugen = buddy_zone_ugen; if (unlikely(buddy_mt != migratetype)) { /* @@ -1103,7 +1293,7 @@ static inline void __free_one_page(struct page *page, combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); - set_page_luf_key(page, luf_key); + set_page_zone_ugen(page, zone_ugen); pfn = combined_pfn; order++; } @@ -1446,6 +1636,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, do { unsigned long pfn; int mt; + unsigned short luf_key; page = list_last_entry(list, struct page, pcp_list); pfn = page_to_pfn(page); @@ -1456,7 +1647,16 @@ static void free_pcppages_bulk(struct zone *zone, int count, count -= nr_pages; pcp->count -= nr_pages; - __free_one_page(page, pfn, zone, order, mt, FPI_NONE, 0); + /* + * page private in pcp stores luf_key while it + * stores zone_ugen in buddy. Thus, the private + * needs to be cleared and the luf_key needs to + * be passed to buddy. + */ + luf_key = page_luf_key(page); + set_page_private(page, 0); + + __free_one_page(page, pfn, zone, order, mt, FPI_NONE, luf_key); trace_mm_page_pcpu_drain(page, order, mt); } while (count > 0 && !list_empty(list)); @@ -1499,7 +1699,15 @@ static void free_one_page(struct zone *zone, struct page *page, * valid luf_key can be passed only if order == 0. */ VM_WARN_ON(luf_key && order); - set_page_luf_key(page, luf_key); + + /* + * Update page's zone_ugen and zone's batch only if a valid + * luf_key was passed. + */ + if (luf_key) { + set_page_zone_ugen(page, (unsigned short)zone->zone_ugen); + update_zone_batch(zone, luf_key); + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -1659,7 +1867,7 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, if (set_page_guard(zone, &page[size], high)) continue; - if (page_luf_key(&page[size])) + if (page_zone_ugen(zone, &page[size])) tail = true; __add_to_free_list(&page[size], zone, high, migratetype, tail); @@ -1677,7 +1885,7 @@ static __always_inline void page_del_and_expand(struct zone *zone, int nr_pages = 1 << high; __del_page_from_free_list(page, zone, high, migratetype); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); nr_pages -= expand(zone, page, low, high, migratetype); account_freepages(zone, -nr_pages, migratetype); @@ -2199,7 +2407,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int nr_added; del_page_from_free_list(page, zone, current_order, block_type); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); change_pageblock_range(page, current_order, start_type); nr_added = expand(zone, page, order, current_order, start_type); @@ -2438,12 +2646,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, WARN_ON_ONCE(ret == -1); if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); return ret; } } spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); } return false; @@ -2644,12 +2852,15 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages are ordered properly. */ list_add_tail(&page->pcp_list, list); + + /* + * Reset all the luf fields. tlb shootdown will be + * performed at luf_takeoff_end() below if needed. + */ + set_page_private(page, 0); } spin_unlock_irqrestore(&zone->lock, flags); - /* - * Check and flush before using the pages taken off. - */ - luf_takeoff_end(); + luf_takeoff_end(zone); return i; } @@ -3163,7 +3374,7 @@ int __isolate_free_page(struct page *page, unsigned int order, bool willputback) } del_page_from_free_list(page, zone, order, mt); - if (unlikely(!willputback && !luf_takeoff_check_and_fold(page))) + if (unlikely(!willputback && !luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); /* @@ -3262,7 +3473,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (!page) { spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); return NULL; } } @@ -3270,7 +3481,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); } while (check_new_pages(page, order)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3360,7 +3571,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } list_for_each_entry(page, list, pcp_list) { - if (luf_takeoff_check_and_fold(page)) { + if (luf_takeoff_check_and_fold(NULL, page)) { list_del(&page->pcp_list); pcp->count -= 1 << order; break; @@ -3395,7 +3606,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (!pcp) { pcp_trylock_finish(UP_flags); - luf_takeoff_end(); + luf_takeoff_end(NULL); return NULL; } @@ -3412,7 +3623,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(NULL); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); @@ -3451,6 +3662,7 @@ struct page *rmqueue(struct zone *preferred_zone, migratetype); out: + /* Separate test+clear to avoid unnecessary atomics */ if ((alloc_flags & ALLOC_KSWAPD) && unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { @@ -5059,7 +5271,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(NULL); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(zonelist_zone(ac.preferred_zoneref), zone, nr_account); @@ -5069,7 +5281,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, failed_irq: pcp_trylock_finish(UP_flags); - luf_takeoff_end(); + luf_takeoff_end(NULL); failed: page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask); @@ -7235,7 +7447,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, VM_WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); order = buddy_order(page); del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); pfn += (1 << order); } @@ -7243,7 +7455,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return end_pfn - start_pfn - already_offline; } @@ -7305,7 +7517,7 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, if (set_page_guard(zone, current_buddy, high)) continue; - if (page_luf_key(current_buddy)) + if (page_zone_ugen(zone, current_buddy)) tail = true; add_to_free_list(current_buddy, zone, high, migratetype, tail); @@ -7337,7 +7549,7 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order, migratetype); - if (unlikely(!luf_takeoff_check_and_fold(page_head))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page_head))) VM_WARN_ON(1); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); @@ -7353,7 +7565,7 @@ bool take_page_off_buddy(struct page *page) /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return ret; } @@ -7372,6 +7584,13 @@ bool put_page_back_buddy(struct page *page) int migratetype = get_pfnblock_migratetype(page, pfn); ClearPageHWPoisonTakenOff(page); + + /* + * Reset all the luf fields. tlb shootdown has already + * been performed by take_page_off_buddy(). + */ + set_page_private(page, 0); + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE, 0); if (TestClearPageHWPoison(page)) { ret = true; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index e152b22fbba8a..b23d3ed34ec07 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -118,7 +118,8 @@ page_reporting_drain(struct page_reporting_dev_info *prdev, /* * Ensure private is zero before putting into the - * allocator. + * allocator. tlb shootdown has already been performed + * at isolation. */ set_page_private(page, 0); @@ -194,7 +195,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (PageReported(page)) continue; - if (unlikely(consider_pend && !luf_takeoff_check(page))) { + if (unlikely(consider_pend && !luf_takeoff_check(zone, page))) { VM_WARN_ON(1); continue; } @@ -238,7 +239,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); /* begin processing pages in local list */ err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY); @@ -283,7 +284,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return err; } diff --git a/mm/rmap.c b/mm/rmap.c index 0cb13e8fcd739..ebe91ff1bcb16 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -650,7 +650,11 @@ static unsigned long new_luf_ugen(void) { unsigned long ugen = atomic_long_inc_return(&luf_ugen); - if (!ugen) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if (!(unsigned short)ugen) ugen = atomic_long_inc_return(&luf_ugen); return ugen; From patchwork Thu Feb 20 05:20:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD622C021AD for ; Thu, 20 Feb 2025 05:21:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A31D828018E; Thu, 20 Feb 2025 00:20:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 96B5D280170; Thu, 20 Feb 2025 00:20:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E78A28018E; Thu, 20 Feb 2025 00:20:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4FEAE28013A for ; Thu, 20 Feb 2025 00:20:50 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EA6CF80401 for ; Thu, 20 Feb 2025 05:20:49 +0000 (UTC) X-FDA: 83139173418.10.473B2B8 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf05.hostedemail.com (Postfix) with ESMTP id 11287100002 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028848; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=tg2ormUe7Dk+B90+06rL39NFasy1qx1uIZuUCnos9Eo=; b=D+cK2V5m48xQsyJ7HxcYn0rogmJvVgre/0YnMpH2A5w8y21Xyrhcckn1EIIHAaog+DHYAk h+LXxLQ/6MHB4HHycstMz5YGFpQyeMXUCbaS75QP6UYo6da7c7GdoXZU+a04HQhI7itndP 6e43EsLU94XRCyw6t/vCvbZlAsOEUAM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028848; a=rsa-sha256; cv=none; b=u8+tYHnugP7xFHsnBfTrgvS7EOG8f6Zt7T9WO0zI0U2iWdoxiL3AAmI/FVUKZu2V9cEdia 1j6P17JIkrSjbWgaqNPE6aYf1nxwNYoQ6ENscpil6ju8gOKUV14b1Azs3i3ESGYQAvXGwE 2toTOEqJc3vvfHTSlbmIC+vWH7lmWx8= X-AuditID: a67dfc5b-3c9ff7000001d7ae-1c-67b6bba7c1f0 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 22/26] mm/page_alloc: not allow to tlb shootdown if !preemptable() && non_luf_pages_ok() Date: Thu, 20 Feb 2025 14:20:23 +0900 Message-Id: <20250220052027.58847-23-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrCLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDTafkbKYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ /z8LFXyxqnjyaxJrA+M7gy5GDg4JAROJCQ/9uhg5wcwZvfeYQWw2AXWJGzd+gtkiAmYSB1v/ sIPYzAJ3mSQO9LOB2MICBRLbfpwFi7MIqEosODeZBcTmBao/umcyO8RMeYnVGw6AzeEEiv+Y 0QvWKyRgKvFuwSWmLkYuoJr3bBKXmjczQjRIShxccYNlAiPvAkaGVYxCmXlluYmZOSZ6GZV5 mRV6yfm5mxiBYb+s9k/0DsZPF4IPMQpwMCrx8M5o3ZYuxJpYVlyZe4hRgoNZSYS3rX5LuhBv SmJlVWpRfnxRaU5q8SFGaQ4WJXFeo2/lKUIC6YklqdmpqQWpRTBZJg5OqQZGqb/ib325lm7U UV2V5PjJkflWyEXbJsZVFQpHXhycpc0a+Ez9Za3bt5odtx9MjLlzaOLt92pim7WZll2Ynb/n 3ku9U171OvN72DZe7z0zWdR9wp13LoYLvTp3f5Pe5JAcflVFz3phb424T6oqo7X2gcK3bk+O vQ5bpfx/tkRe0jEzKa/dR5aoKrEUZyQaajEXFScCAB6fEFp3AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g62/BSzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlfH/s1DBF6uKJ78msTYwvjPoYuTkkBAwkZjRe48ZxGYTUJe4ceMnmC0iYCZx sPUPO4jNLHCXSeJAPxuILSxQILHtx1mwOIuAqsSCc5NZQGxeoPqjeyazQ8yUl1i94QDYHE6g +I8ZvWC9QgKmEu8WXGKawMi1gJFhFaNIZl5ZbmJmjqlecXZGZV5mhV5yfu4mRmAYL6v9M3EH 45fL7ocYBTgYlXh4Hzzemi7EmlhWXJl7iFGCg1lJhLetfku6EG9KYmVValF+fFFpTmrxIUZp DhYlcV6v8NQEIYH0xJLU7NTUgtQimCwTB6dUA+OKor7yNV8CgneJvYxO6zj1tKOtUPXc/jam j+aXcvfd6DU3lN+3e6l/9NHNlZvmJNb5vzhx6ejWwB3Pt8ZKLfJgWybJU14fNH3tmbVV4u8U Mja+5f4TKl01qfT16oUn9vQ9Ulq/4EzE9YeL576ocf+UXO2/JrAy97ZTx4rvh8IULnfrfJu+ 76qREktxRqKhFnNRcSIAHburZl8CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 11287100002 X-Stat-Signature: 8em5j6bomkr4bttd5imaxs5ig45xhzjc X-HE-Tag: 1740028847-25331 X-HE-Meta: U2FsdGVkX19SUeVH7aRVL3s1RJOwe356m5PVCi7CG315+DXPO72VnSx7y4KQOC9dgb48zl2vM69RumpkjoYH6Kd/xt9tZjzTGQU3GctdjsV8nB5vGJfHoOvN1ZKzcQg8hgipvJiN94RSJRnqfCR0TvglabYdFDWSKIO9O7H103I8g3g2eg1hKIJbSzeKf+gwynVhfhfnygpBZg4wAY3D3i8TuRkA4ETCxCLCBlIUVe3Yqo3JHGcItkpXqx5fLnVpKDumVld9pad/H5GXV1tgM5fW1MmUSQ8NRslcRIuYIHpl7lnzg4LQ7AZqDjDCRHtdn+xaYRdNdbO4U3gnEOEIkVmO4OMNX4LpWoboJRRzLCuk1loZ5Q1rvEWpYlJWpkMvuSHdkcNwkpyWHRvxzQddPwdwDB6F2dy28hKQjI/Pkk0c/MUCBnPH7xcO9UByQCGDGFMiahVxiEJNmlGhlrn1reiiQDJdDClKrMsPO7Nz9JC+H5MAcKl42KgUf96gj/iXckesCECklvR/Z9pYSzvDi+b0egQKNoWhh0rFAdYfv6dejn4yPeJZ6xn0IobZH6Rmysdb1NfZg+YBX/TasKO+gKdNG3WfNG7qNgdraoUVbFecZCWk+MbF5Tjq62J3YvflttQKaafF/DPl3ZTiFl8tr82MEECaUQeFqXlvjEcov29lJRv0QdiVIgGaiBvGkuzANbpnk5zq0bUfcuz4VuTljGNT0UbT/qG/FyhZBlj5jW2wlnRnNa04yRtwl8bzWnZ9wbwmBsLAKjUMkxcqaAa/LhVEDD6/w/3uo7//xj/SFKXf2WYzIkREcHZkhxR7oLmMo5m9GPBeSp+PqmtmbI7IFjCaQXAxF3FVkaTpoUcjxCyMYjjxo44yi7yYQq/n6/IhcgbWghfgeDV86G9xYCYeUWsoWzbQW11eY1ybC7KQnJ+fj7Cfxq3rSBTxJ1pkiVj6RPqf5f4MBU4HuG/r+4C i6D5SXLz NFb5PPvSziFA3MlHjPYj7OMjBU8zEX+z7HdFewlElxnZLZZyIOgz6T8Tg3sML8fJzgkrny8SfxA0y4+4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Do not perform tlb shootdown if the context is in preempt disable and there are already enough non luf pages, not to hurt preemptibility. Signed-off-by: Byungchul Park --- mm/compaction.c | 6 +++--- mm/internal.h | 5 +++-- mm/page_alloc.c | 27 +++++++++++++++------------ mm/page_isolation.c | 2 +- mm/page_reporting.c | 4 ++-- 5 files changed, 24 insertions(+), 20 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a7f17867decae..8fa9de6db2441 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -605,7 +605,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, page = pfn_to_page(blockpfn); - luf_takeoff_start(); + luf_takeoff_start(cc->zone); /* Isolate free pages. */ for (; blockpfn < end_pfn; blockpfn += stride, page += stride) { int isolated; @@ -1601,7 +1601,7 @@ static void fast_isolate_freepages(struct compact_control *cc) if (!area->nr_free) continue; - can_shootdown = luf_takeoff_start(); + can_shootdown = luf_takeoff_start(cc->zone); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; retry: @@ -2413,7 +2413,7 @@ static enum compact_result compact_finished(struct compact_control *cc) * luf_takeoff_{start,end}() is required to identify whether * this compaction context is tlb shootdownable for luf'd pages. */ - luf_takeoff_start(); + luf_takeoff_start(cc->zone); ret = __compact_finished(cc); luf_takeoff_end(cc->zone); diff --git a/mm/internal.h b/mm/internal.h index e634eaf220f00..fba19c283ac48 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1594,7 +1594,7 @@ static inline void accept_page(struct page *page) #endif /* CONFIG_UNACCEPTED_MEMORY */ #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) extern struct luf_batch luf_batch[]; -bool luf_takeoff_start(void); +bool luf_takeoff_start(struct zone *zone); void luf_takeoff_end(struct zone *zone); bool luf_takeoff_no_shootdown(void); bool luf_takeoff_check(struct zone *zone, struct page *page); @@ -1608,6 +1608,7 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } + unsigned short fold_unmap_luf(void); /* @@ -1694,7 +1695,7 @@ static inline bool can_luf_vma(struct vm_area_struct *vma) return true; } #else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ -static inline bool luf_takeoff_start(void) { return false; } +static inline bool luf_takeoff_start(struct zone *zone) { return false; } static inline void luf_takeoff_end(struct zone *zone) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct zone *zone, struct page *page) { return true; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b81931c6f2cfd..ccbe49b78190a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -623,22 +623,25 @@ compaction_capture(struct capture_control *capc, struct page *page, #endif /* CONFIG_COMPACTION */ #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) -static bool no_shootdown_context(void) +static bool no_shootdown_context(struct zone *zone) { /* - * If it performs with irq disabled, that might cause a deadlock. - * Avoid tlb shootdown in this case. + * Tries to avoid tlb shootdown if !preemptible(). However, it + * should be allowed under heavy memory pressure. */ + if (zone && non_luf_pages_ok(zone)) + return !(preemptible() && in_task()); + return !(!irqs_disabled() && in_task()); } /* * Can be called with zone lock released and irq enabled. */ -bool luf_takeoff_start(void) +bool luf_takeoff_start(struct zone *zone) { unsigned long flags; - bool no_shootdown = no_shootdown_context(); + bool no_shootdown = no_shootdown_context(zone); local_irq_save(flags); @@ -2588,7 +2591,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * luf_takeoff_{start,end}() is required for * get_page_from_free_area() to use luf_takeoff_check(). */ - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); @@ -2829,7 +2832,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -3455,7 +3458,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); @@ -3600,7 +3603,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long __maybe_unused UP_flags; - luf_takeoff_start(); + luf_takeoff_start(NULL); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -5229,7 +5232,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, if (unlikely(!zone)) goto failed; - luf_takeoff_start(); + luf_takeoff_start(NULL); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -7418,7 +7421,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); while (pfn < end_pfn) { page = pfn_to_page(pfn); @@ -7536,7 +7539,7 @@ bool take_page_off_buddy(struct page *page) unsigned int order; bool ret = false; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index eae33d188762b..ccd36838f9cff 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -211,7 +211,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) struct page *buddy; zone = page_zone(page); - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); if (!is_migrate_isolate_page(page)) goto out; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index b23d3ed34ec07..83b66e7f0d257 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -170,7 +170,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (free_area_empty(area, mt)) return err; - can_shootdown = luf_takeoff_start(); + can_shootdown = luf_takeoff_start(zone); spin_lock_irq(&zone->lock); /* @@ -250,7 +250,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* update budget to reflect call to report function */ budget--; - luf_takeoff_start(); + luf_takeoff_start(zone); /* reacquire zone lock and resume processing */ spin_lock_irq(&zone->lock); From patchwork Thu Feb 20 05:20:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15BC6C021AD for ; Thu, 20 Feb 2025 05:21:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F49A28013A; Thu, 20 Feb 2025 00:20:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7EAC72802A5; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3ED0E28013A; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C2AE3280191 for ; Thu, 20 Feb 2025 00:20:50 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7EEB0C0B29 for ; Thu, 20 Feb 2025 05:20:50 +0000 (UTC) X-FDA: 83139173460.07.3DBC45A Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf06.hostedemail.com (Postfix) with ESMTP id 3F6BF180006 for ; Thu, 20 Feb 2025 05:20:47 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028848; a=rsa-sha256; cv=none; b=Y4hP4pymbevZzx131a1tlVNvtxFersM3K94yKTSW0QlSlUNL50y+WW3exzAMT6dmzKUNrn oWFL8+KCdnvaWp0eZ8IDIA4FVk9QZqTeyU/uacQjJQIxv49nXa/VvyDjQuHHU7CwZSMFD/ vFFDI9DW6l6a7zADV2MTv8nhf0U4G+k= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf06.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028848; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=uLJWCBtm4dkgJCOA4D+ZyaFpyZUNuIiztjgfO3bzPF8=; b=2iPEPepOsc/fisYWuwo6BXdnaANtabYQk+T3rwpYbVvQECrnT+frvTCm3pc6EbeUjFYYEj qpeIA/8uckfHIkAVZB9Ye4flJEO9KaYilg1iw/d8na36zXS2h1IhOVQs4KBfU6NrXh2LvJ 4ji+h4ozArEMrZkGUSlN16YiA/GUv9Q= X-AuditID: a67dfc5b-3c9ff7000001d7ae-21-67b6bba759bc From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 23/26] mm: separate move/undo parts from migrate_pages_batch() Date: Thu, 20 Feb 2025 14:20:24 +0900 Message-Id: <20250220052027.58847-24-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnke7y3dvSDfq+y1jMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M rz1eBX+0Kx49O8nawDhTuYuRk0NCwERi1depbDD2z89vWEFsNgF1iRs3fjKD2CICZhIHW/+w g9jMAneZJA70g9ULC4RJ3NyxG8xmEVCVmDBrN1gvL1B9+465zBAz5SVWbzgAZnMCxX/M6AWr FxIwlXi34BJTFyMXUM17NoljR+axQjRIShxccYNlAiPvAkaGVYxCmXlluYmZOSZ6GZV5mRV6 yfm5mxiBgb+s9k/0DsZPF4IPMQpwMCrx8M5o3ZYuxJpYVlyZe4hRgoNZSYS3rX5LuhBvSmJl VWpRfnxRaU5q8SFGaQ4WJXFeo2/lKUIC6YklqdmpqQWpRTBZJg5OqQbGHpWW1MRliikTmYu/ MP99xr33yLZCy+1FOgorwri2LrrsK23LHVTyaJN9+K0zTscu/rA6eGXfq4rNjY9Z17ov9Gs+ lP3L+rNeaXhe3K2iB3LyDULfHj2cendGo7Rfg7vfA71rbS5VEu8zw9/obZ1x483b13wvgycX PRbYKvTJLGf5u57q9dH9SizFGYmGWsxFxYkAlW02a3gCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g5+zhS3mrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlfG1x6vgj3bFo2cnWRsYZyp3MXJySAiYSPz8/IYVxGYTUJe4ceMnM4gtImAm cbD1DzuIzSxwl0niQD8biC0sECZxc8duMJtFQFViwqzdYL28QPXtO+YyQ8yUl1i94QCYzQkU /zGjF6xeSMBU4t2CS0wTGLkWMDKsYhTJzCvLTczMMdUrzs6ozMus0EvOz93ECAzjZbV/Ju5g /HLZ/RCjAAejEg/vg8db04VYE8uKK3MPMUpwMCuJ8LbVb0kX4k1JrKxKLcqPLyrNSS0+xCjN waIkzusVnpogJJCeWJKanZpakFoEk2Xi4JRqYGTljrVtzTKauYx3slvNo0tzjbZqBwqUbp28 aYbFHIv4T/PXH/A7mZ8S5xcxqeLfTqPPxtPFShmeimwuW7GmeVbUosmiF36vvqAZ99oofVfg 4WdmqXnTD77452glwv3nz6q9LVsnnpL2fGL/ItCvPnBNo12pkFFVQOyy1DvtEs1TpSXm6r9T vaLEUpyRaKjFXFScCADMZTXRXwIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 3F6BF180006 X-Rspamd-Server: rspam12 X-Stat-Signature: an4soss7i77bwwss6rfie66qrhi5ommw X-HE-Tag: 1740028847-670905 X-HE-Meta: U2FsdGVkX19fov+1l0I8vnw1pjcHzEJgbmbC9xjBBs4VIYN2LTVQuXRUhZPK1qdr89JZYjcuVfGc5ihNomfwvsMtLkIZKEiqKvrxTNgwLbRRtlwT2DRGSvbyh48M9AyO47gdqOZUYiekGTVJfaxynU83ZCWCRdx2A43QstKWAj/k0X5Lpq/wNxmwbzTV7Ta6qP+lnC143t3A6o//0EuHII6F9mNg6Y8xMe2xIsjxkoujCnfcxNbavkvowBIKjOlkWDS8PRRfFequhIH7HiGV+kXNHHBAmwaXngc9beHIzQ+OQLmcYvifpCPss8cXUEubG/wiz1qIi9aHpBvxoWwXjns0HDm5c14PZ353R4V9uTUQUQMsdR/cuyRHJVR0Y8eeQQ06sVg0+zofAxBF9ufPu2sl7sqnKJ00VHSAJwhdQB5C4oltfgzde2DOMBRiXLZ4Yif7lstfodtrLpLlIcpjB0dw/z5VwZnIzP2l+RaCv2xh+o+bRRueoxEsGJzfV88J3i/LADWAtZvmUEI3qMZtNK+ttCf87UR80IWyrFl9SiGFg7fbE8R9wkwPCLbSp+TMeIr1xaCRjaSTTP9KpMnNqvIYawNrDIBo9S70+IiHncybgMcYiqXFyJQ6t4JCJ2WJthmjongOi4whQ85Pgc+2k5L16Fbq5ySxTAx55PQIu75E77s5BHxrOTYu8S/t7KDHmLwQx3GSRiyFIfcd/0bievMrhugFjLTcq8x8bFjg3rC7uMS5jn9lhaolO89SeT4cHx6DaUgvbB9mfOIQOPJ4g/8vT6ScixYa9LWDMe4+PRYLL6zCx/KzaB6E0zlXPeMISPY2yItKZs5XnxpwxHxst8UZMFF2UmCt7s7aubwP4peHsPVRoGwE8KQKUSlIWKQyHyCb5/IlMbJzpJzub4nU6yJgWI7vsBYeTUou1n5y/eoYJULUgrC+2KrdNI9eg7LaLBVnFTadSo6WahtPK6a KxkPLVN2 LMQMfB7S79NAF72KIrsRqgytXNabRZUAAlYN5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that requires to use separated folio lists for its own handling during migration. Refactored migrate_pages_batch() so as to separate move/undo parts from migrate_pages_batch(). Signed-off-by: Byungchul Park --- mm/migrate.c | 134 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 83 insertions(+), 51 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index dfb5eba3c5223..5e12023dbc75a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1695,6 +1695,81 @@ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio, return nr_failed; } +static void migrate_folios_move(struct list_head *src_folios, + struct list_head *dst_folios, + free_folio_t put_new_folio, unsigned long private, + enum migrate_mode mode, int reason, + struct list_head *ret_folios, + struct migrate_pages_stats *stats, + int *retry, int *thp_retry, int *nr_failed, + int *nr_retry_pages) +{ + struct folio *folio, *folio2, *dst, *dst2; + bool is_thp; + int nr_pages; + int rc; + + dst = list_first_entry(dst_folios, struct folio, lru); + dst2 = list_next_entry(dst, lru); + list_for_each_entry_safe(folio, folio2, src_folios, lru) { + is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio); + nr_pages = folio_nr_pages(folio); + + cond_resched(); + + rc = migrate_folio_move(put_new_folio, private, + folio, dst, mode, + reason, ret_folios); + /* + * The rules are: + * Success: folio will be freed + * -EAGAIN: stay on the unmap_folios list + * Other errno: put on ret_folios list + */ + switch (rc) { + case -EAGAIN: + *retry += 1; + *thp_retry += is_thp; + *nr_retry_pages += nr_pages; + break; + case MIGRATEPAGE_SUCCESS: + stats->nr_succeeded += nr_pages; + stats->nr_thp_succeeded += is_thp; + break; + default: + *nr_failed += 1; + stats->nr_thp_failed += is_thp; + stats->nr_failed_pages += nr_pages; + break; + } + dst = dst2; + dst2 = list_next_entry(dst, lru); + } +} + +static void migrate_folios_undo(struct list_head *src_folios, + struct list_head *dst_folios, + free_folio_t put_new_folio, unsigned long private, + struct list_head *ret_folios) +{ + struct folio *folio, *folio2, *dst, *dst2; + + dst = list_first_entry(dst_folios, struct folio, lru); + dst2 = list_next_entry(dst, lru); + list_for_each_entry_safe(folio, folio2, src_folios, lru) { + int old_page_state = 0; + struct anon_vma *anon_vma = NULL; + + __migrate_folio_extract(dst, &old_page_state, &anon_vma); + migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED, + anon_vma, true, ret_folios); + list_del(&dst->lru); + migrate_folio_undo_dst(dst, true, put_new_folio, private); + dst = dst2; + dst2 = list_next_entry(dst, lru); + } +} + /* * migrate_pages_batch() first unmaps folios in the from list as many as * possible, then move the unmapped folios. @@ -1717,7 +1792,7 @@ static int migrate_pages_batch(struct list_head *from, int pass = 0; bool is_thp = false; bool is_large = false; - struct folio *folio, *folio2, *dst = NULL, *dst2; + struct folio *folio, *folio2, *dst = NULL; int rc, rc_saved = 0, nr_pages; LIST_HEAD(unmap_folios); LIST_HEAD(dst_folios); @@ -1888,42 +1963,11 @@ static int migrate_pages_batch(struct list_head *from, thp_retry = 0; nr_retry_pages = 0; - dst = list_first_entry(&dst_folios, struct folio, lru); - dst2 = list_next_entry(dst, lru); - list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) { - is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio); - nr_pages = folio_nr_pages(folio); - - cond_resched(); - - rc = migrate_folio_move(put_new_folio, private, - folio, dst, mode, - reason, ret_folios); - /* - * The rules are: - * Success: folio will be freed - * -EAGAIN: stay on the unmap_folios list - * Other errno: put on ret_folios list - */ - switch(rc) { - case -EAGAIN: - retry++; - thp_retry += is_thp; - nr_retry_pages += nr_pages; - break; - case MIGRATEPAGE_SUCCESS: - stats->nr_succeeded += nr_pages; - stats->nr_thp_succeeded += is_thp; - break; - default: - nr_failed++; - stats->nr_thp_failed += is_thp; - stats->nr_failed_pages += nr_pages; - break; - } - dst = dst2; - dst2 = list_next_entry(dst, lru); - } + /* Move the unmapped folios */ + migrate_folios_move(&unmap_folios, &dst_folios, + put_new_folio, private, mode, reason, + ret_folios, stats, &retry, &thp_retry, + &nr_failed, &nr_retry_pages); } nr_failed += retry; stats->nr_thp_failed += thp_retry; @@ -1932,20 +1976,8 @@ static int migrate_pages_batch(struct list_head *from, rc = rc_saved ? : nr_failed; out: /* Cleanup remaining folios */ - dst = list_first_entry(&dst_folios, struct folio, lru); - dst2 = list_next_entry(dst, lru); - list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) { - int old_page_state = 0; - struct anon_vma *anon_vma = NULL; - - __migrate_folio_extract(dst, &old_page_state, &anon_vma); - migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED, - anon_vma, true, ret_folios); - list_del(&dst->lru); - migrate_folio_undo_dst(dst, true, put_new_folio, private); - dst = dst2; - dst2 = list_next_entry(dst, lru); - } + migrate_folios_undo(&unmap_folios, &dst_folios, + put_new_folio, private, ret_folios); return rc; } From patchwork Thu Feb 20 05:20:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA0C3C021B1 for ; Thu, 20 Feb 2025 05:21:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C8792801D3; Thu, 20 Feb 2025 00:20:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5522B2801CA; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26B0728018F; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CC2E628013A for ; Thu, 20 Feb 2025 00:20:50 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 95D8BC0B2B for ; Thu, 20 Feb 2025 05:20:50 +0000 (UTC) X-FDA: 83139173460.26.FC32AA5 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf01.hostedemail.com (Postfix) with ESMTP id 6856A4000F for ; Thu, 20 Feb 2025 05:20:48 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028848; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=5p5ntld4g8OjOWaErdoALvFDdNmR4Src3R46ZMd0RRY=; b=PQZJQGV4pen8+VsZOyzEg6Ughd9v0zl6rv6YP5sTtD6ah97aB0gpijQuelN0CS1S6H6G47 k1sPnVFk175u45hE2X4fdncjkGqAfazkRRD1Z1V31blzfgK8omSLwhqg6Nr++TRFd7GdXB u68YlNt8ejPWrLABHsTnCKhLh6mxbiM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028848; a=rsa-sha256; cv=none; b=f6u7udexMXZ4GRcADjTq91y884ZlbHZczTAdk8YJSGOXB6RUp602rADsO54XZeYMIXHBue vUa39VCqtN1RSWw0eD77B2Y0ML5ZUi/4h4rvYR5bQkrvZpKYRIKW91MG+9+apzY2qMCBlg ifijqMJfU/CHyEIqVmEdzgNpgHf0HPg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-26-67b6bba76383 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 24/26] mm/migrate: apply luf mechanism to unmapping during migration Date: Thu, 20 Feb 2025 14:20:25 +0900 Message-Id: <20250220052027.58847-25-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDbbfU7CYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ a+ZsZCq4GlCx6sx8pgbGNw5djJwcEgImEt2vpjLD2Ee+T2MFsdkE1CVu3PgJFhcRMJM42PqH HcRmFrjLJHGgnw3EFhaIkdh8oAeshkVAVWL2h32MIDYvUP3CiefZIWbKS6zecACshhMo/mNG L1ivkICpxLsFl5i6GLmAat6zSfS/fccE0SApcXDFDZYJjLwLGBlWMQpl5pXlJmbmmOhlVOZl Vugl5+duYgSG/rLaP9E7GD9dCD7EKMDBqMTDO6N1W7oQa2JZcWXuIUYJDmYlEd62+i3pQrwp iZVVqUX58UWlOanFhxilOViUxHmNvpWnCAmkJ5akZqemFqQWwWSZODilGhjVTWtvvjIv25Tn LrrsZ0FYmYfr2UY1pbZzv9eE3lRy0WV9mMlgrqWqujx0d7bGJPWi+ltxH/Nj97zwfSIZfvXT ymvbGXyuO7+8fe7kzFOCnboV3A8d7mcekZQyXbj2f8hHqVtpIduKNLP3xjPnO6rc1hc3/MUi 19V90LjTUEOt+NqCXuVHBUosxRmJhlrMRcWJAF4LOdZ5AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g8Z/ohZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLw3JOsFpd3zWGzuLfmP6vF+V1rWS12LN3HZHHpwAImi+O9B5gs 5t/7zGaxedNUZovjU6YyWvz+AVR8ctZkFgdBj++tfSweO2fdZfdYsKnUY/MKLY/Fe14yeWxa 1cnmsenTJHaPd+fOsXucmPGbxWPeyUCP9/uusnksfvGByWPrLzuPxqnX2Dw+b5IL4I/isklJ zcksSy3St0vgylgzZyNTwdWAilVn5jM1ML5x6GLk5JAQMJE48n0aK4jNJqAucePGT2YQW0TA TOJg6x92EJtZ4C6TxIF+NhBbWCBGYvOBHrAaFgFVidkf9jGC2LxA9QsnnmeHmCkvsXrDAbAa TqD4jxm9YL1CAqYS7xZcYprAyLWAkWEVo0hmXlluYmaOqV5xdkZlXmaFXnJ+7iZGYCAvq/0z cQfjl8vuhxgFOBiVeHgfPN6aLsSaWFZcmXuIUYKDWUmEt61+S7oQb0piZVVqUX58UWlOavEh RmkOFiVxXq/w1AQhgfTEktTs1NSC1CKYLBMHp1QDI4PEzeR3q75x/Pd+/fmll+1bdZ+87Gt6 P017ZdpuXi5ZLhdkNY1n18KEHaea2jaZWVjkJrxqNY20m85muV2qY8YB5q+Xfk9dsXRR3Ud3 NpaOQ5bfAx6Gr9j5+lDjb/ko+dzzCSUac82t/keH1k23j2WbwHXt89ubwVNa757c4P9yVV+3 +O3wM0osxRmJhlrMRcWJAO3GVDlgAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 6856A4000F X-Rspamd-Server: rspam07 X-Stat-Signature: bpqstbj9f57q1q91b6rhrbuofos55ses X-HE-Tag: 1740028848-9593 X-HE-Meta: U2FsdGVkX1+P7mKz6nPncqfbyUZygyOrQCpmUtkDL+JiSbwz4ibLOOlTpfrmOWp2+tNf/K1HlaaDdcDCa7MGXvXGmqdzQShEZZ5lAZ7ek4AY9MbnWtam+aoRkz8f+gHJTmltw5rS8TLWMCvrDqOKZcSKxCLwZz0R9K2ETi8DG08GCUX2sm2dE0DfxjfrmJhgYhiB+nwEKCUQ7Dqstbvq86FrUI9sOHvsFSealjGICcCDuYyDw+CKch+4ukp36eTZ7hlaNspwriIYzos1vpLLG6qwEreWffl8zOhVV61RHo+NRDh3oaQCEr84LMa8zv4ZJg+bo1rTI9AcGqfIZfMzGLJFjKPYqpc+JDZDA3nEaSm70svIn3cLheedNuX/ML/DAYt68S+YGeihd6CJCBONSXi6guvLEXgoE7Mb5Od3mni+WBPoyHnxsLWF3MQQvQsSfg6A4pJD9feohA/kEreJ9RDKLtSfxLxsWidEtR0JI+xEhDkBE8UBMSpWoasvYXNnnJ1Wz3/DCtJgKSW8zwhKEcm1PqKx1DQD+OtUMc9Yr3xOl+ZaEJA09Ypp59stDUFES/B82/8j42SCGXDnAiT1W9l+kzlkK7K+wBrk5RDH6LWLyc9Qz6zoUZrIfPH7elXBo8caIZTEYogSmIkfHcY28+a9XNqe1M2neU9PZeGDWrQ/s1P8avoYgnVCvax24ncBpRY4zvRDMwbhw6oLgA0n+DQsjES/mbwoXzBOm3x/ujQ0inaeAlc7khZeoBrCBB86Kwr2BQ60HHuiQdKgaNw6dXO333q61UBi+iJrw0yAuTFA0PybLjcFesJxb8tbEujWWCyVTU0tBqMP8WqCiquKOM38Q1HF5LfYi+pgZOHEJX5bccYg0GTylMWRqV3CjX38sdf2eVyguFhyxgWNjDYyJh++eXNUqOQUWo6ak3rCQei1c/LcBwXqjTavnIo2zBFImYYWayYBWdKGDOPGZ9Q Fcw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. Applied the mechanism to unmapping during migration. Signed-off-by: Byungchul Park --- include/linux/mm.h | 2 ++ include/linux/rmap.h | 2 +- mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++---------- mm/rmap.c | 15 ++++++---- mm/swap.c | 2 +- 5 files changed, 63 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 46638e86e8073..5c81c9831bc5d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1476,6 +1476,8 @@ static inline void folio_put(struct folio *folio) __folio_put(folio); } +void page_cache_release(struct folio *folio); + /** * folio_put_refs - Reduce the reference count on a folio. * @folio: The folio. diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 683a04088f3f2..cedba4812ccc7 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -660,7 +660,7 @@ static inline int folio_try_share_anon_rmap_pmd(struct folio *folio, int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); -void try_to_migrate(struct folio *folio, enum ttu_flags flags); +bool try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, diff --git a/mm/migrate.c b/mm/migrate.c index 5e12023dbc75a..6b77efee4ebd7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1172,7 +1172,8 @@ static void migrate_folio_undo_dst(struct folio *dst, bool locked, /* Cleanup src folio upon migration success */ static void migrate_folio_done(struct folio *src, - enum migrate_reason reason) + enum migrate_reason reason, + unsigned short luf_key) { /* * Compaction can migrate also non-LRU pages which are @@ -1183,16 +1184,30 @@ static void migrate_folio_done(struct folio *src, mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); - if (reason != MR_MEMORY_FAILURE) - /* We release the page in page_handle_poison. */ + /* We release the page in page_handle_poison. */ + if (reason == MR_MEMORY_FAILURE) + luf_flush(luf_key); + else if (!luf_key) folio_put(src); + else { + /* + * Should be the last reference. + */ + if (unlikely(!folio_put_testzero(src))) + VM_WARN_ON(1); + + page_cache_release(src); + mem_cgroup_uncharge(src); + free_unref_page(&src->page, folio_order(src), luf_key); + } } /* Obtain the lock on page, remove all ptes. */ static int migrate_folio_unmap(new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio **dstp, enum migrate_mode mode, - enum migrate_reason reason, struct list_head *ret) + enum migrate_reason reason, struct list_head *ret, + bool *can_luf) { struct folio *dst; int rc = -EAGAIN; @@ -1208,7 +1223,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, folio_clear_unevictable(src); /* free_pages_prepare() will clear PG_isolated. */ list_del(&src->lru); - migrate_folio_done(src, reason); + migrate_folio_done(src, reason, 0); return MIGRATEPAGE_SUCCESS; } @@ -1325,7 +1340,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, /* Establish migration ptes */ VM_BUG_ON_FOLIO(folio_test_anon(src) && !folio_test_ksm(src) && !anon_vma, src); - try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); + *can_luf = try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); old_page_state |= PAGE_WAS_MAPPED; } @@ -1353,7 +1368,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio *dst, enum migrate_mode mode, enum migrate_reason reason, - struct list_head *ret) + struct list_head *ret, unsigned short luf_key) { int rc; int old_page_state = 0; @@ -1407,7 +1422,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, if (anon_vma) put_anon_vma(anon_vma); folio_unlock(src); - migrate_folio_done(src, reason); + migrate_folio_done(src, reason, luf_key); return rc; out: @@ -1702,7 +1717,7 @@ static void migrate_folios_move(struct list_head *src_folios, struct list_head *ret_folios, struct migrate_pages_stats *stats, int *retry, int *thp_retry, int *nr_failed, - int *nr_retry_pages) + int *nr_retry_pages, unsigned short luf_key) { struct folio *folio, *folio2, *dst, *dst2; bool is_thp; @@ -1719,7 +1734,7 @@ static void migrate_folios_move(struct list_head *src_folios, rc = migrate_folio_move(put_new_folio, private, folio, dst, mode, - reason, ret_folios); + reason, ret_folios, luf_key); /* * The rules are: * Success: folio will be freed @@ -1796,7 +1811,11 @@ static int migrate_pages_batch(struct list_head *from, int rc, rc_saved = 0, nr_pages; LIST_HEAD(unmap_folios); LIST_HEAD(dst_folios); + LIST_HEAD(unmap_folios_luf); + LIST_HEAD(dst_folios_luf); bool nosplit = (reason == MR_NUMA_MISPLACED); + unsigned short luf_key; + bool can_luf; VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC && !list_empty(from) && !list_is_singular(from)); @@ -1871,9 +1890,11 @@ static int migrate_pages_batch(struct list_head *from, continue; } + can_luf = false; rc = migrate_folio_unmap(get_new_folio, put_new_folio, private, folio, &dst, mode, reason, - ret_folios); + ret_folios, &can_luf); + /* * The rules are: * Success: folio will be freed @@ -1919,7 +1940,8 @@ static int migrate_pages_batch(struct list_head *from, /* nr_failed isn't updated for not used */ stats->nr_thp_failed += thp_retry; rc_saved = rc; - if (list_empty(&unmap_folios)) + if (list_empty(&unmap_folios) && + list_empty(&unmap_folios_luf)) goto out; else goto move; @@ -1933,8 +1955,13 @@ static int migrate_pages_batch(struct list_head *from, stats->nr_thp_succeeded += is_thp; break; case MIGRATEPAGE_UNMAP: - list_move_tail(&folio->lru, &unmap_folios); - list_add_tail(&dst->lru, &dst_folios); + if (can_luf) { + list_move_tail(&folio->lru, &unmap_folios_luf); + list_add_tail(&dst->lru, &dst_folios_luf); + } else { + list_move_tail(&folio->lru, &unmap_folios); + list_add_tail(&dst->lru, &dst_folios); + } break; default: /* @@ -1954,6 +1981,8 @@ static int migrate_pages_batch(struct list_head *from, stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; move: + /* Should be before try_to_unmap_flush() */ + luf_key = fold_unmap_luf(); /* Flush TLBs for all unmapped folios */ try_to_unmap_flush(); @@ -1967,7 +1996,11 @@ static int migrate_pages_batch(struct list_head *from, migrate_folios_move(&unmap_folios, &dst_folios, put_new_folio, private, mode, reason, ret_folios, stats, &retry, &thp_retry, - &nr_failed, &nr_retry_pages); + &nr_failed, &nr_retry_pages, 0); + migrate_folios_move(&unmap_folios_luf, &dst_folios_luf, + put_new_folio, private, mode, reason, + ret_folios, stats, &retry, &thp_retry, + &nr_failed, &nr_retry_pages, luf_key); } nr_failed += retry; stats->nr_thp_failed += thp_retry; @@ -1978,6 +2011,8 @@ static int migrate_pages_batch(struct list_head *from, /* Cleanup remaining folios */ migrate_folios_undo(&unmap_folios, &dst_folios, put_new_folio, private, ret_folios); + migrate_folios_undo(&unmap_folios_luf, &dst_folios_luf, + put_new_folio, private, ret_folios); return rc; } diff --git a/mm/rmap.c b/mm/rmap.c index ebe91ff1bcb16..b6b61b8103655 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2750,8 +2750,9 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * * Tries to remove all the page table entries which are mapping this folio and * replace them with special swap entries. Caller must hold the folio lock. + * Return true if all the mappings are read-only, otherwise false. */ -void try_to_migrate(struct folio *folio, enum ttu_flags flags) +bool try_to_migrate(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_migrate_one, @@ -2769,11 +2770,11 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) */ if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC | TTU_BATCH_FLUSH))) - return; + return false; if (folio_is_zone_device(folio) && (!folio_is_device_private(folio) && !folio_is_device_coherent(folio))) - return; + return false; /* * During exec, a temporary VMA is setup and later moved. @@ -2793,10 +2794,12 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) else rmap_walk(folio, &rwc); - if (can_luf_test()) + if (can_luf_test()) { fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); - else - fold_batch(tlb_ubc, tlb_ubc_ro, true); + return true; + } + fold_batch(tlb_ubc, tlb_ubc_ro, true); + return false; } #ifdef CONFIG_DEVICE_PRIVATE diff --git a/mm/swap.c b/mm/swap.c index 54b0ba10dbb86..d6c29fdc67ca5 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -84,7 +84,7 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, * This path almost never happens for VM activity - pages are normally freed * in batches. But it gets used by networking - and for compound pages. */ -static void page_cache_release(struct folio *folio) +void page_cache_release(struct folio *folio) { struct lruvec *lruvec = NULL; unsigned long flags; From patchwork Thu Feb 20 05:20:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61F01C021B0 for ; Thu, 20 Feb 2025 05:21:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA29E2801CF; Thu, 20 Feb 2025 00:20:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C52772801AE; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C3EF2801CF; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EBFE528019B for ; Thu, 20 Feb 2025 00:20:50 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AD9581A0B67 for ; Thu, 20 Feb 2025 05:20:50 +0000 (UTC) X-FDA: 83139173460.18.7307C2A Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id D18074000C for ; Thu, 20 Feb 2025 05:20:48 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=68jtF8bh/6PpVySMDDmZN7w3TMKn0y1jOqyrbTP/+SQ=; b=RidkWD5ge0xl6Pee5W3jIDpAv7cUeqmYP90vv4UCa9mS6TsEyUfvgieOHdb58r4Nf5Vc5w WgczKIEVQZgGdPCzJy84hfG7659/ZMk1RLK/AC0LCEtkAdcPJck3Y55/de+jv8sN9e3HrH D1JIYCNLRYIwWTnIHeZzncBnyJGkJrY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028849; a=rsa-sha256; cv=none; b=azXtodUcGmIC/R0VVX5ZJvx31/ZZBY2eWyyQi++8y7fJqo7mbImhhOm/pV9Z7/N/wUfJZf kTKZ2DV/fLDluMg+c2LVZEwn4jirgfJ1XyIUm6ZMP5jQmGPCgkDvA81zCXFDfMGzbQrm/r ziS3iLCn8JV2/zIFv9eKWrPVrZ2HYbs= X-AuditID: a67dfc5b-3c9ff7000001d7ae-2c-67b6bba7d1ea From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 25/26] mm/vmscan: apply luf mechanism to unmapping during folio reclaim Date: Thu, 20 Feb 2025 14:20:26 +0900 Message-Id: <20250220052027.58847-26-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrKLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDfYsVbGYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ u9doF5zVr7i1YCZzA+MLtS5GTg4JAROJaw8OscLYD4/+ZQGx2QTUJW7c+MkMYosImEkcbP3D DmIzC9xlkjjQzwZiCwvESxz5dgOshkVAVWLb+gVgNbxA9aeu34eaKS+xesMBsBpOoPiPGb1g vUICphLvFlxi6mLkAqp5zyaxZe9ydogGSYmDK26wTGDkXcDIsIpRKDOvLDcxM8dEL6MyL7NC Lzk/dxMjMPCX1f6J3sH46ULwIUYBDkYlHt4ZrdvShVgTy4orcw8xSnAwK4nwttVvSRfiTUms rEotyo8vKs1JLT7EKM3BoiTOa/StPEVIID2xJDU7NbUgtQgmy8TBKdXAyFX5TjnlXPL5P/Ut 60vypuxwao5Wehad11vRnWXHJdyptnizjfrGEvOZZ7hamJb0Wd1nmXTRWL+74uDx7tT9RwTn cIo6PFWVX2fEoqv9UHM7s0unQPIx3cWHVPUDDvxRq92nJ3IxQI/hydNE05/7t+XbTUwT41rh 9LVthuqtos81brffbdRSYinOSDTUYi4qTgQAC5qRB3gCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g3kvpSzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlbF7jXbBWf2KWwtmMjcwvlDrYuTkkBAwkXh49C8LiM0moC5x48ZPZhBbRMBM 4mDrH3YQm1ngLpPEgX42EFtYIF7iyLcbYDUsAqoS29YvAKvhBao/df0+K8RMeYnVGw6A1XAC xX/M6AXrFRIwlXi34BLTBEauBYwMqxhFMvPKchMzc0z1irMzKvMyK/SS83M3MQLDeFntn4k7 GL9cdj/EKMDBqMTD++Dx1nQh1sSy4srcQ4wSHMxKIrxt9VvShXhTEiurUovy44tKc1KLDzFK c7AoifN6hacmCAmkJ5akZqemFqQWwWSZODilGhjvrs+bpHgq486nzRm/9aO03u18/FdCTu2P qvi/uDmye99Zcy5sL809s/uc0j5JabWXBrMPn1xhqeK30fDMrHdr7sk5z2M38da4Xlt49vq3 P6yzjN0CDu/iq5HYu/Oc7MSWT0e+d6zaXiZxc/3816fuHvNY+qTqEOeelW3mTEstnMRfZUU1 rzg4SYmlOCPRUIu5qDgRAMF1a/tfAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: D18074000C X-Stat-Signature: ybb1ktcz3xdq5suugmzcp8h8es6i537h X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740028848-940937 X-HE-Meta: U2FsdGVkX19IlYcU3Lmm5SVIJ9eQwD7+EPBLaSuHmfynQ6gs4NmMBcX5rjg1ETvVAkT6VZm3x+fLXS8VDzc0kH+huuL7zm+oH6+/cb6kolm7LZvC3KLnMU1kpUjvwicj7MO/OGGtf0oTAwAnPXYCPwK/7Y2+F8PxMzi04Q3FOczSbhhdCe9UcykNrPCycLjpDQiDFDlTM/D/JSVzs5XxnqYFE4F1Dt3oLOIKxfttyBL6hGPr8sCbU16mreQ36fKIK3vKABvkzhUXQYgw+dWniebl+VOpL6TH1Kz60Wpx950DR58+X9U/s1Xw968MHEv+ZFs4mpx7CfLSH08lFHo2otT6RTuoomXP5jkM/84VCvXBZOWG1VocwvMYIuIOZJ9AyzxAjNuqYVHld69GblmT3x0HGFdZO1COKs5fD7vYEP7xua1GL1iMLXngHiEqEFstb4TVa0Ycj2ANmKldetDGdvL8TRZRvay0johy6XJcbSe7FDEx51kBZ1G9G7/Fka5ZF1XL6vHeE9PEKb0na6+D0udXzbtf4U5Awn80CVlmX4GeHpbPVtjY50/3USrQQWTou75T0EmZ+2h3510Z5DeqF3ttfOG/p3EQrSqt0qUHpUA3svV7k0B9BuDSaan5WpPb0j3ujZQ8B4U0w5ZOp8eeIycvXhVkPkr39HemHAtlzRP8K3GiLV5t8MviCHsoPIcaNPd8QRP/zfYt7dlPNuKDKUTx+mYOjSwaMXa6TA2WDTVFuOGL+XtQXk1npLU1kmACiW7SwNjRZ351EzM173N66m91BD9Qn+UJ0uFmucP3V5YOrz+ImNNDNjzjH/zJgLLOeqVLQGWjOuQ+lAqEMolSBd3DD4kggXz+BsIgRDTdO5DZ8/e37Tc/PgKA8K6loXYuix44KNJUNEZcYbvnaf+q2lJHYSsp+1+OmxNDr7kbypdTvqxgbPavuiz1PxRMK8cxtf4nZg/2eVJisyfSuvO BJA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. Applied the mechanism to unmapping during folio reclaim. Signed-off-by: Byungchul Park --- include/linux/rmap.h | 5 +++-- mm/rmap.c | 11 +++++++---- mm/vmscan.c | 37 ++++++++++++++++++++++++++++++++----- 3 files changed, 42 insertions(+), 11 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index cedba4812ccc7..854b41441d466 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -661,7 +661,7 @@ int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); bool try_to_migrate(struct folio *folio, enum ttu_flags flags); -void try_to_unmap(struct folio *, enum ttu_flags flags); +bool try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, unsigned long end, struct page **pages, @@ -794,8 +794,9 @@ static inline int folio_referenced(struct folio *folio, int is_locked, return 0; } -static inline void try_to_unmap(struct folio *folio, enum ttu_flags flags) +static inline bool try_to_unmap(struct folio *folio, enum ttu_flags flags) { + return false; } static inline int folio_mkclean(struct folio *folio) diff --git a/mm/rmap.c b/mm/rmap.c index b6b61b8103655..55003eb0b4936 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2386,10 +2386,11 @@ static int folio_not_mapped(struct folio *folio) * Tries to remove all the page table entries which are mapping this * folio. It is the caller's responsibility to check if the folio is * still mapped if needed (use TTU_SYNC to prevent accounting races). + * Return true if all the mappings are read-only, otherwise false. * * Context: Caller must hold the folio lock. */ -void try_to_unmap(struct folio *folio, enum ttu_flags flags) +bool try_to_unmap(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_unmap_one, @@ -2408,10 +2409,12 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) else rmap_walk(folio, &rwc); - if (can_luf_test()) + if (can_luf_test()) { fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); - else - fold_batch(tlb_ubc, tlb_ubc_ro, true); + return true; + } + fold_batch(tlb_ubc, tlb_ubc_ro, true); + return false; } /* diff --git a/mm/vmscan.c b/mm/vmscan.c index cbca027d2a10e..1ece0ccfccefb 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1052,14 +1052,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, struct reclaim_stat *stat, bool ignore_references) { struct folio_batch free_folios; + struct folio_batch free_folios_luf; LIST_HEAD(ret_folios); LIST_HEAD(demote_folios); unsigned int nr_reclaimed = 0; unsigned int pgactivate = 0; bool do_demote_pass; struct swap_iocb *plug = NULL; + unsigned short luf_key; folio_batch_init(&free_folios); + folio_batch_init(&free_folios_luf); memset(stat, 0, sizeof(*stat)); cond_resched(); do_demote_pass = can_demote(pgdat->node_id, sc); @@ -1071,6 +1074,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, enum folio_references references = FOLIOREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; + bool can_luf = false; cond_resched(); @@ -1309,7 +1313,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_test_large(folio)) flags |= TTU_SYNC; - try_to_unmap(folio, flags); + can_luf = try_to_unmap(folio, flags); if (folio_mapped(folio)) { stat->nr_unmap_fail += nr_pages; if (!was_swapbacked && @@ -1453,6 +1457,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * leave it off the LRU). */ nr_reclaimed += nr_pages; + if (can_luf) + luf_flush(fold_unmap_luf()); continue; } } @@ -1485,6 +1491,19 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, nr_reclaimed += nr_pages; folio_unqueue_deferred_split(folio); + + if (can_luf) { + if (folio_batch_add(&free_folios_luf, folio) == 0) { + mem_cgroup_uncharge_folios(&free_folios); + mem_cgroup_uncharge_folios(&free_folios_luf); + luf_key = fold_unmap_luf(); + try_to_unmap_flush(); + free_unref_folios(&free_folios, 0); + free_unref_folios(&free_folios_luf, luf_key); + } + continue; + } + if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1519,9 +1538,21 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, list_add(&folio->lru, &ret_folios); VM_BUG_ON_FOLIO(folio_test_lru(folio) || folio_test_unevictable(folio), folio); + if (can_luf) + luf_flush(fold_unmap_luf()); } /* 'folio_list' is always empty here */ + /* + * Finalize this turn before demote_folio_list(). + */ + mem_cgroup_uncharge_folios(&free_folios); + mem_cgroup_uncharge_folios(&free_folios_luf); + luf_key = fold_unmap_luf(); + try_to_unmap_flush(); + free_unref_folios(&free_folios, 0); + free_unref_folios(&free_folios_luf, luf_key); + /* Migrate folios selected for demotion */ stat->nr_demoted = demote_folio_list(&demote_folios, pgdat); nr_reclaimed += stat->nr_demoted; @@ -1554,10 +1585,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, pgactivate = stat->nr_activate[0] + stat->nr_activate[1]; - mem_cgroup_uncharge_folios(&free_folios); - try_to_unmap_flush(); - free_unref_folios(&free_folios, 0); - list_splice(&ret_folios, folio_list); count_vm_events(PGACTIVATE, pgactivate); From patchwork Thu Feb 20 05:20:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13983343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8244FC021B0 for ; Thu, 20 Feb 2025 05:21:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04E322801AE; Thu, 20 Feb 2025 00:20:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EC44B2802AE; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B51372802AB; Thu, 20 Feb 2025 00:20:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 549852801AE for ; Thu, 20 Feb 2025 00:20:51 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 063CE1C8495 for ; Thu, 20 Feb 2025 05:20:51 +0000 (UTC) X-FDA: 83139173502.30.5A6FBDA Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf22.hostedemail.com (Postfix) with ESMTP id E9928C0004 for ; Thu, 20 Feb 2025 05:20:48 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740028849; a=rsa-sha256; cv=none; b=XZqiW8MRg3xjhQWBuJw/Rvh3F76JKabZVHGGPDyeL4VZDu2Y6xI5hCGgCKRkGk+syNoIcg Xsc+REDr4rytAihZ985Ud+YePRmgHfU4a169otEFRilkA4zfsvwGWIblk5S1+yO229mM9n n+hEGqwjhfrv6V73sCN1xRDz8E9gjPc= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740028849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=eflJzl7VSEaqGKvgkFzki2wcWZKCLQN7bCrhcgmNqhI=; b=5IcA7GF06wejtZwIVb+2oJiyfzXTFLwCbKuuQUbGjxH/a/d1XV2mccjC16J5o1785/x34+ mkq49G0vjRNXcJkInONTJ0duA9EdYSyPJS0CqwXZNHWLuL3t06NMLL6b9Q5RZJXYJmwgXx hsP+EaukBqhMLOdMcGWs8S4YZmP9oKY= X-AuditID: a67dfc5b-3c9ff7000001d7ae-32-67b6bba7c76f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: [RFC PATCH v12 26/26] mm/luf: implement luf debug feature Date: Thu, 20 Feb 2025 14:20:27 +0900 Message-Id: <20250220052027.58847-27-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250220052027.58847-1-byungchul@sk.com> References: <20250220052027.58847-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrBLMWRmVeSWpSXmKPExsXC9ZZnoe7y3dvSDY781LCYs34Nm8XnDf/Y LF5saGe0+Lr+F7PF0099LBaXd81hs7i35j+rxflda1ktdizdx2Rx6cACJovjvQeYLObf+8xm sXnTVGaL41OmMlr8/gFUfHLWZBYHAY/vrX0sHjtn3WX3WLCp1GPzCi2PxXteMnlsWtXJ5rHp 0yR2j3fnzrF7nJjxm8Vj3slAj/f7rrJ5bP1l59E49Rqbx+dNcgF8UVw2Kak5mWWpRfp2CVwZ +99NZC24eJKxYv6a80wNjHuXMHYxcnJICJhI/H81lwnG/rV8ASuIzSagLnHjxk9mEFtEwEzi YOsfdhCbWeAuk8SBfjYQW1jASWLbpqNgc1gEVCXmPF/IAmLzAtWfWncXar68xOoNB8DmcALF f8zoBesVEjCVeLfgEtBeLqCaz2wSn5+cZoZokJQ4uOIGywRG3gWMDKsYhTLzynITM3NM9DIq 8zIr9JLzczcxAsN/We2f6B2Mny4EH2IU4GBU4uGd0botXYg1say4MvcQowQHs5IIb1v9lnQh 3pTEyqrUovz4otKc1OJDjNIcLErivEbfylOEBNITS1KzU1MLUotgskwcnFINjHlVHLaf1We6 uZ03Xyd2bwFXxfUPj2aGS6ju7HXQWVIwLWBFOXtPlZsh09fteWXBMadOKHcIbt3Ut/X/rcVK Bp+7uLeJLNCd7+fJ7mZ+dvW7SJ/vd9adiWXdbLlBtq7ymv1iqx3NAXlKTUsernnKEHQ/SJ7J 491+g5rEM4I+O//ZvFnlc47vkRJLcUaioRZzUXEiAIVnIgB7AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrPLMWRmVeSWpSXmKPExsXC5WfdrLt897Z0g9UT5CzmrF/DZvF5wz82 ixcb2hktvq7/xWzx9FMfi8XhuSdZLS7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfNY/OIDk8fWX3YejVOvsXl83iQXwB/FZZOS mpNZllqkb5fAlbH/3UTWgosnGSvmrznP1MC4dwljFyMnh4SAicSv5QtYQWw2AXWJGzd+MoPY IgJmEgdb/7CD2MwCd5kkDvSzgdjCAk4S2zYdBetlEVCVmPN8IQuIzQtUf2rdXaiZ8hKrNxwA m8MJFP8xoxesV0jAVOLdgktMExi5FjAyrGIUycwry03MzDHVK87OqMzLrNBLzs/dxAgM5mW1 fybuYPxy2f0QowAHoxIP74PHW9OFWBPLiitzDzFKcDArifC21W9JF+JNSaysSi3Kjy8qzUkt PsQozcGiJM7rFZ6aICSQnliSmp2aWpBaBJNl4uCUamDkLTu88uLcvaKTz2lZ23uvl8iNDtLO +257/PKfqAW1p20nvXRKn75KNVB36vE1TLOiZ365JPyycBuHxaZ77Vs95khsebh2Vc2FOG2p Jex8T0XO3FxxeLfosgvMPA0u/32UhdYE/nrcmXL4fgCLiJOXWIinPDPT0/9Bkr9dDSZV3/KQ 86ph8HBQYinOSDTUYi4qTgQApwYahmICAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: E9928C0004 X-Stat-Signature: b5p416jyn8q8sjxgwiehmn9ocx85iz7i X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740028848-152064 X-HE-Meta: U2FsdGVkX18Li1Pv8fEOrsi9zTTxeGFcM2s8al+En5CJphRT3PAUpOpoO+TJFOkkdOZh130qDXt9ZVy+Ok9EUMJ+qEsfSZGCiPDB5WIAwR9yfsI4TIHkLaGzQIp077IUVyh65lcW7NOpkJt2AG2AwS8iFftBuP3aGVnLPWTkLfr+FeOI8k1D7g6CPhZKDnOF5ow3bzY3MaePOXVt1WWGTnN7a2Bjr/sUREQD071sB2W97L8F6e8hfTT7Ptgfjd+d//P+ByefFoTDmmkkX2ofX7srDTbifObbRetT8hJ5PvKlp5BvDuj5v9lOsQQGb+U4DJE4Mho0MjJlexqVcrpXeFALlEuvTMUFn/Mze7h6wCxJU39MO/fCUS5TS0kBgC6RFA+1oRvLeV16fbYFvGvoy6D/MIfQpGFvPaBGvhT2Jvlwfb+5FSHS1+E1xbpdLTD4hFezYcv8bjOdHOxquuv7uILJ06lLMWucc+aqi/MLqqY2km1JBUquz/T4a1yZPcqGwUY7+ElaIYJ5cHdcDfEBD3vlQLvgNBVbFyVgutMx+2HVg40+ki7SXt+RCss4OnZREZuHyex9xzQdEG+PZe/mbQcms3Yc96YnWVujxaWcpeffpXpSa+7c874WefB+bXV9gUvE+UP+55eNQpA12C6A+NOtPqfdZSUEdGRmTIO1WCzXYjA/05mtxWjCAZPwzpFmSp2MKFaXnbEKXk9hhPPCHo/r37/EPEMD7gKQ1NBzBrz6oVNjg3kiL1OspL/6Qn85BcZ29sCG6ELx9ZKpFQdGcC6YrLJfxFne2kdfDaaYsQw15RNFepH8aIzch/wEw/KeCUhz0oNBvyAQKNWTDRt5g34UypHfllblw1UQqFx7Fk9EkyElfeI8dILte37DLBPR8HJwdMaSEOgvuqp73tHrxZdoCxqj0I3IUSXRlRSaCrqrCY/pyjuQE++zXyp7WPfXXTi9VXVVtlIIR9ncQCZ 81g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We need luf debug feature to detect when luf goes wrong by any chance. As a RFC, suggest a simple implementation to report problematic situations by luf. Signed-off-by: Byungchul Park --- arch/riscv/include/asm/tlbflush.h | 3 + arch/riscv/mm/tlbflush.c | 35 ++++- arch/x86/include/asm/pgtable.h | 10 ++ arch/x86/include/asm/tlbflush.h | 3 + arch/x86/mm/pgtable.c | 10 ++ arch/x86/mm/tlb.c | 35 ++++- include/linux/highmem-internal.h | 5 + include/linux/mm.h | 20 ++- include/linux/mm_types.h | 16 +-- include/linux/mm_types_task.h | 16 +++ include/linux/sched.h | 5 + mm/highmem.c | 1 + mm/memory.c | 12 ++ mm/page_alloc.c | 34 ++++- mm/page_ext.c | 3 + mm/rmap.c | 229 ++++++++++++++++++++++++++++++ 16 files changed, 418 insertions(+), 19 deletions(-) diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index ec5caeb3cf8ef..9451f3d22f229 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -69,6 +69,9 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); +#ifdef CONFIG_LUF_DEBUG +extern void print_lufd_arch(void); +#endif static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 93afb7a299003..de91bfe0426c2 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -216,6 +216,25 @@ static int __init luf_init_arch(void) } early_initcall(luf_init_arch); +#ifdef CONFIG_LUF_DEBUG +static DEFINE_SPINLOCK(luf_debug_lock); +#define lufd_lock(f) spin_lock_irqsave(&luf_debug_lock, (f)) +#define lufd_unlock(f) spin_unlock_irqrestore(&luf_debug_lock, (f)) + +void print_lufd_arch(void) +{ + int cpu; + + pr_cont("LUFD ARCH:"); + for_each_cpu(cpu, cpu_possible_mask) + pr_cont(" %lu", atomic_long_read(per_cpu_ptr(&ugen_done, cpu))); + pr_cont("\n"); +} +#else +#define lufd_lock(f) do { (void)(f); } while(0) +#define lufd_unlock(f) do { (void)(f); } while(0) +#endif + /* * batch will not be updated. */ @@ -223,17 +242,22 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); - if (ugen_before(done, ugen)) + if (ugen_before(done, ugen)) { + lufd_unlock(flags); return false; + } } + lufd_unlock(flags); return true; out: return cpumask_empty(&batch->cpumask); @@ -243,10 +267,12 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; @@ -254,6 +280,7 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, if (!ugen_before(done, ugen)) cpumask_clear_cpu(cpu, &batch->cpumask); } + lufd_unlock(flags); out: return cpumask_empty(&batch->cpumask); } @@ -262,10 +289,12 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -283,15 +312,18 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, mm_cpumask(mm)) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -309,4 +341,5 @@ void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 593f10aabd45a..414bcabb23b51 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -695,12 +695,22 @@ static inline pud_t pud_mkyoung(pud_t pud) return pud_set_flags(pud, _PAGE_ACCESSED); } +#ifdef CONFIG_LUF_DEBUG +pud_t pud_mkwrite(pud_t pud); +static inline pud_t __pud_mkwrite(pud_t pud) +{ + pud = pud_set_flags(pud, _PAGE_RW); + + return pud_clear_saveddirty(pud); +} +#else static inline pud_t pud_mkwrite(pud_t pud) { pud = pud_set_flags(pud, _PAGE_RW); return pud_clear_saveddirty(pud); } +#endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline int pte_soft_dirty(pte_t pte) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 1fc5bacd72dff..2825f4befb272 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -297,6 +297,9 @@ extern bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, un extern bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); extern void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); extern void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); +#ifdef CONFIG_LUF_DEBUG +extern void print_lufd_arch(void); +#endif static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 5745a354a241c..f72e4cfdb0a8d 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -901,6 +901,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { + lufd_check_pages(pte_page(pte), 0); if (vma->vm_flags & VM_SHADOW_STACK) return pte_mkwrite_shstk(pte); @@ -911,6 +912,7 @@ pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { + lufd_check_pages(pmd_page(pmd), PMD_ORDER); if (vma->vm_flags & VM_SHADOW_STACK) return pmd_mkwrite_shstk(pmd); @@ -919,6 +921,14 @@ pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd_clear_saveddirty(pmd); } +#ifdef CONFIG_LUF_DEBUG +pud_t pud_mkwrite(pud_t pud) +{ + lufd_check_pages(pud_page(pud), PUD_ORDER); + return __pud_mkwrite(pud); +} +#endif + void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) { /* diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 975f58fa4b30f..e9ae0d8f73442 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1253,6 +1253,25 @@ static int __init luf_init_arch(void) } early_initcall(luf_init_arch); +#ifdef CONFIG_LUF_DEBUG +static DEFINE_SPINLOCK(luf_debug_lock); +#define lufd_lock(f) spin_lock_irqsave(&luf_debug_lock, (f)) +#define lufd_unlock(f) spin_unlock_irqrestore(&luf_debug_lock, (f)) + +void print_lufd_arch(void) +{ + int cpu; + + pr_cont("LUFD ARCH:"); + for_each_cpu(cpu, cpu_possible_mask) + pr_cont(" %lu", atomic_long_read(per_cpu_ptr(&ugen_done, cpu))); + pr_cont("\n"); +} +#else +#define lufd_lock(f) do { (void)(f); } while(0) +#define lufd_unlock(f) do { (void)(f); } while(0) +#endif + /* * batch will not be updated. */ @@ -1260,17 +1279,22 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); - if (ugen_before(done, ugen)) + if (ugen_before(done, ugen)) { + lufd_unlock(flags); return false; + } } + lufd_unlock(flags); return true; out: return cpumask_empty(&batch->cpumask); @@ -1280,10 +1304,12 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; @@ -1291,6 +1317,7 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, if (!ugen_before(done, ugen)) cpumask_clear_cpu(cpu, &batch->cpumask); } + lufd_unlock(flags); out: return cpumask_empty(&batch->cpumask); } @@ -1299,10 +1326,12 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -1320,15 +1349,18 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, mm_cpumask(mm)) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -1346,6 +1378,7 @@ void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-internal.h index dd100e849f5e0..0792530d1be7b 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -41,6 +41,7 @@ static inline void *kmap(struct page *page) { void *addr; + lufd_check_pages(page, 0); might_sleep(); if (!PageHighMem(page)) addr = page_address(page); @@ -161,6 +162,7 @@ static inline struct page *kmap_to_page(void *addr) static inline void *kmap(struct page *page) { + lufd_check_pages(page, 0); might_sleep(); return page_address(page); } @@ -177,11 +179,13 @@ static inline void kunmap(struct page *page) static inline void *kmap_local_page(struct page *page) { + lufd_check_pages(page, 0); return page_address(page); } static inline void *kmap_local_folio(struct folio *folio, size_t offset) { + lufd_check_folio(folio); return page_address(&folio->page) + offset; } @@ -204,6 +208,7 @@ static inline void __kunmap_local(const void *addr) static inline void *kmap_atomic(struct page *page) { + lufd_check_pages(page, 0); if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_disable(); else diff --git a/include/linux/mm.h b/include/linux/mm.h index 5c81c9831bc5d..9572fbbb9d73f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -44,6 +44,24 @@ extern int sysctl_page_lock_unfairness; void mm_core_init(void); void init_mm_internals(void); +#ifdef CONFIG_LUF_DEBUG +void lufd_check_folio(struct folio *f); +void lufd_check_pages(const struct page *p, unsigned int order); +void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order); +void lufd_check_queued_pages(void); +void lufd_queue_page_for_check(struct page *page, int order); +void lufd_mark_folio(struct folio *f, unsigned short luf_key); +void lufd_mark_pages(struct page *p, unsigned int order, unsigned short luf_key); +#else +static inline void lufd_check_folio(struct folio *f) {} +static inline void lufd_check_pages(const struct page *p, unsigned int order) {} +static inline void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order) {} +static inline void lufd_check_queued_pages(void) {} +static inline void lufd_queue_page_for_check(struct page *page, int order) {} +static inline void lufd_mark_folio(struct folio *f, unsigned short luf_key) {} +static inline void lufd_mark_pages(struct page *p, unsigned int order, unsigned short luf_key) {} +#endif + #ifndef CONFIG_NUMA /* Don't use mapnrs, do it properly */ extern unsigned long max_mapnr; @@ -113,7 +131,7 @@ extern int mmap_rnd_compat_bits __read_mostly; #endif #ifndef page_to_virt -#define page_to_virt(x) __va(PFN_PHYS(page_to_pfn(x))) +#define page_to_virt(x) ({ lufd_check_pages(x, 0); __va(PFN_PHYS(page_to_pfn(x)));}) #endif #ifndef lm_alias diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e3132e1e5e5d2..e0c5712dc46ff 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -22,6 +22,10 @@ #include +#ifdef CONFIG_LUF_DEBUG +extern struct page_ext_operations luf_debug_ops; +#endif + #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 #endif @@ -32,18 +36,6 @@ struct address_space; struct mem_cgroup; -#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH -struct luf_batch { - struct tlbflush_unmap_batch batch; - unsigned long ugen; - rwlock_t lock; -}; -void luf_batch_init(struct luf_batch *lb); -#else -struct luf_batch {}; -static inline void luf_batch_init(struct luf_batch *lb) {} -#endif - /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index bff5706b76e14..b5dfc451c009b 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -9,6 +9,7 @@ */ #include +#include #include @@ -67,4 +68,19 @@ struct tlbflush_unmap_batch { #endif }; +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +struct luf_batch { + struct tlbflush_unmap_batch batch; + unsigned long ugen; + rwlock_t lock; +}; +void luf_batch_init(struct luf_batch *lb); +#else +struct luf_batch {}; +static inline void luf_batch_init(struct luf_batch *lb) {} +#endif + +#if defined(CONFIG_LUF_DEBUG) +#define NR_LUFD_PAGES 512 +#endif #endif /* _LINUX_MM_TYPES_TASK_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 463cb2fb8f919..eb1487fa101e6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1380,6 +1380,11 @@ struct task_struct { unsigned long luf_ugen; unsigned long zone_ugen; unsigned long wait_zone_ugen; +#if defined(CONFIG_LUF_DEBUG) + struct page *lufd_pages[NR_LUFD_PAGES]; + int lufd_pages_order[NR_LUFD_PAGES]; + int lufd_pages_nr; +#endif #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/highmem.c b/mm/highmem.c index ef3189b36cadb..a323d5a655bf9 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -576,6 +576,7 @@ void *__kmap_local_page_prot(struct page *page, pgprot_t prot) { void *kmap; + lufd_check_pages(page, 0); /* * To broaden the usage of the actual kmap_local() machinery always map * pages when debugging is enabled and the architecture has no problems diff --git a/mm/memory.c b/mm/memory.c index c98af5e567e89..89d047867d60d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6124,6 +6124,18 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, mapping = vma->vm_file->f_mapping; } +#ifdef CONFIG_LUF_DEBUG + if (luf_flush) { + /* + * If it has a VM_SHARED mapping, all the mms involved + * in the struct address_space should be luf_flush'ed. + */ + if (mapping) + luf_flush_mapping(mapping); + luf_flush_mm(mm); + } +#endif + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ccbe49b78190a..c8ab60c60bb08 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -758,6 +758,8 @@ void luf_takeoff_end(struct zone *zone) VM_WARN_ON(current->zone_ugen); VM_WARN_ON(current->wait_zone_ugen); } + + lufd_check_queued_pages(); } /* @@ -853,8 +855,10 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) struct luf_batch *lb; unsigned long lb_ugen; - if (!luf_key) + if (!luf_key) { + lufd_check_pages(page, buddy_order(page)); return true; + } lb = &luf_batch[luf_key]; read_lock_irqsave(&lb->lock, flags); @@ -875,12 +879,15 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) current->luf_ugen = lb_ugen; + lufd_queue_page_for_check(page, buddy_order(page)); return true; } zone_ugen = page_zone_ugen(zone, page); - if (!zone_ugen) + if (!zone_ugen) { + lufd_check_pages(page, buddy_order(page)); return true; + } /* * Should not be zero since zone-zone_ugen has been updated in @@ -888,17 +895,23 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) */ VM_WARN_ON(!zone->zone_ugen); - if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) { + lufd_check_pages(page, buddy_order(page)); return true; + } if (current->luf_no_shootdown) return false; + lufd_check_zone_pages(zone, page, buddy_order(page)); + /* * zone batched flush has been already set. */ - if (current->zone_ugen) + if (current->zone_ugen) { + lufd_queue_page_for_check(page, buddy_order(page)); return true; + } /* * Others are already performing tlb shootdown for us. All we @@ -933,6 +946,7 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) atomic_long_set(&zone->nr_luf_pages, 0); fold_batch(tlb_ubc_takeoff, &zone->zone_batch, true); } + lufd_queue_page_for_check(page, buddy_order(page)); return true; } #endif @@ -1238,6 +1252,11 @@ static inline void __free_one_page(struct page *page, } else zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + lufd_check_pages(page, order); + else + lufd_check_zone_pages(zone, page, order); + while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; unsigned long buddy_zone_ugen; @@ -1299,6 +1318,10 @@ static inline void __free_one_page(struct page *page, set_page_zone_ugen(page, zone_ugen); pfn = combined_pfn; order++; + if (!zone_ugen) + lufd_check_pages(page, order); + else + lufd_check_zone_pages(zone, page, order); } done_merging: @@ -3201,6 +3224,8 @@ void free_unref_page(struct page *page, unsigned int order, unsigned long pfn = page_to_pfn(page); int migratetype; + lufd_mark_pages(page, order, luf_key); + if (!pcp_allowed_order(order)) { __free_pages_ok(page, order, FPI_NONE, luf_key); return; @@ -3253,6 +3278,7 @@ void free_unref_folios(struct folio_batch *folios, unsigned short luf_key) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); + lufd_mark_folio(folio, luf_key); if (!free_pages_prepare(&folio->page, order)) continue; /* diff --git a/mm/page_ext.c b/mm/page_ext.c index 641d93f6af4c1..be40bc2a93378 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -89,6 +89,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = { #ifdef CONFIG_PAGE_TABLE_CHECK &page_table_check_ops, #endif +#ifdef CONFIG_LUF_DEBUG + &luf_debug_ops, +#endif }; unsigned long page_ext_size; diff --git a/mm/rmap.c b/mm/rmap.c index 55003eb0b4936..fd6d5cb0fa8d0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1161,6 +1161,235 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) } #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ +#ifdef CONFIG_LUF_DEBUG + +static bool need_luf_debug(void) +{ + return true; +} + +static void init_luf_debug(void) +{ + /* Do nothing */ +} + +struct page_ext_operations luf_debug_ops = { + .size = sizeof(struct luf_batch), + .need = need_luf_debug, + .init = init_luf_debug, + .need_shared_flags = false, +}; + +static bool __lufd_check_zone_pages(struct page *page, int nr, + struct tlbflush_unmap_batch *batch, unsigned long ugen) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + unsigned long lb_ugen; + unsigned long flags; + bool ret; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + write_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + ret = arch_tlbbatch_done(&lb->batch.arch, &batch->arch); + write_unlock_irqrestore(&lb->lock, flags); + page_ext_put(page_ext); + + if (!ret || ugen_before(ugen, lb_ugen)) + return false; + } + return true; +} + +void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order) +{ + bool warn; + static bool once = false; + + if (!page || !zone) + return; + + warn = !__lufd_check_zone_pages(page, 1 << order, + &zone->zone_batch, zone->luf_ugen); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} + +static bool __lufd_check_pages(const struct page *page, int nr) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + unsigned long lb_ugen; + unsigned long flags; + bool ret; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + write_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + ret = arch_tlbbatch_diet(&lb->batch.arch, lb_ugen); + write_unlock_irqrestore(&lb->lock, flags); + page_ext_put(page_ext); + + if (!ret) + return false; + } + return true; +} + +void lufd_queue_page_for_check(struct page *page, int order) +{ + struct page **parray = current->lufd_pages; + int *oarray = current->lufd_pages_order; + + if (!page) + return; + + if (current->lufd_pages_nr >= NR_LUFD_PAGES) { + VM_WARN_ONCE(1, "LUFD: NR_LUFD_PAGES is too small.\n"); + return; + } + + *(parray + current->lufd_pages_nr) = page; + *(oarray + current->lufd_pages_nr) = order; + current->lufd_pages_nr++; +} + +void lufd_check_queued_pages(void) +{ + struct page **parray = current->lufd_pages; + int *oarray = current->lufd_pages_order; + int i; + + for (i = 0; i < current->lufd_pages_nr; i++) + lufd_check_pages(*(parray + i), *(oarray + i)); + current->lufd_pages_nr = 0; +} + +void lufd_check_folio(struct folio *folio) +{ + struct page *page; + int nr; + bool warn; + static bool once = false; + + if (!folio) + return; + + page = folio_page(folio, 0); + nr = folio_nr_pages(folio); + + warn = !__lufd_check_pages(page, nr); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) nr(%d)\n", + atomic_long_read(&luf_ugen), page, nr); + print_lufd_arch(); + } +} +EXPORT_SYMBOL(lufd_check_folio); + +void lufd_check_pages(const struct page *page, unsigned int order) +{ + bool warn; + static bool once = false; + + if (!page) + return; + + warn = !__lufd_check_pages(page, 1 << order); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} +EXPORT_SYMBOL(lufd_check_pages); + +static void __lufd_mark_pages(struct page *page, int nr, unsigned short luf_key) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + fold_luf_batch(lb, &luf_batch[luf_key]); + page_ext_put(page_ext); + } +} + +void lufd_mark_folio(struct folio *folio, unsigned short luf_key) +{ + struct page *page; + int nr; + bool warn; + static bool once = false; + + if (!luf_key) + return; + + page = folio_page(folio, 0); + nr = folio_nr_pages(folio); + + warn = !__lufd_check_pages(page, nr); + __lufd_mark_pages(page, nr, luf_key); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) nr(%d)\n", + atomic_long_read(&luf_ugen), page, nr); + print_lufd_arch(); + } +} + +void lufd_mark_pages(struct page *page, unsigned int order, unsigned short luf_key) +{ + bool warn; + static bool once = false; + + if (!luf_key) + return; + + warn = !__lufd_check_pages(page, 1 << order); + __lufd_mark_pages(page, 1 << order, luf_key); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} +#endif + /** * page_address_in_vma - The virtual address of a page in this VMA. * @folio: The folio containing the page.