From patchwork Thu Apr 10 00:00:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 14045656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50A22C369A2 for ; Thu, 10 Apr 2025 00:00:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 992C86B015A; Wed, 9 Apr 2025 20:00:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 940AC6B015B; Wed, 9 Apr 2025 20:00:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E2B86B015C; Wed, 9 Apr 2025 20:00:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5D9016B015A for ; Wed, 9 Apr 2025 20:00:45 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F32261A16C8 for ; Thu, 10 Apr 2025 00:00:45 +0000 (UTC) X-FDA: 83316178050.28.E547943 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf08.hostedemail.com (Postfix) with ESMTP id 656BB16000F for ; Thu, 10 Apr 2025 00:00:44 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=uxsM42S2; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744243244; a=rsa-sha256; cv=none; b=fyBCDRlxTP48UohVtRLElx3GtpvbYFpjOrR+LG3MPTfT9Nroxe+cfj15JsikGM4AHx4GdM nHPHxzfAOCV2dsBeeKTToykOy/YTsT+Bdwxtty49ontppieJiwW5jYf56dDByCatPz+XZj 9EHtn6wLzI8gffgA0hSYskPlT2kH0hM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=uxsM42S2; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744243244; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=TAx62spDD+xv+OGBMUflC/+zJfmLBoDB95eL5y6P+nQ=; b=b3NP9G0Bp6KBR8BzX5EGfeDkm6anh+KFfOT8JetGGRYvSay7To/bJzAPMcJfi9Rw+oVej0 0M2pfYVxHeg6kl1bBomDacfBwVi+oY0wtDCKpSHGsevTgMZ+8KjC00MHyKY4Xu7haToAb1 r9Nd0GL4AvwC3Qt4rl6RfUZKyqO4ayI= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 0A1A4A4860A; Wed, 9 Apr 2025 23:55:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 16AEBC4CEE2; Thu, 10 Apr 2025 00:00:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1744243243; bh=jxcfPCWpTiqrTRsXRsgF8FFdy459/Dh6OTyboPnYG4U=; h=From:To:Cc:Subject:Date:From; b=uxsM42S2RWRsFCP0yJnNnP5qlp14BTSYNPHM15+IQxpprzjEXISeWvClUcNoq79M0 48qaZ6TPv58tVq9X0mm67p/ylPDRN23Aa7WTB2QmBejBTlg6af5idroPM3v4u6pg50 oc7M5svHI3YU8t6oiOdCqXIVEjvn8U+QqUOjUQV2eGPmd5KFG4IvO3NQVzPz/HDMJo U/b4GgBPRApYB/hAs5eU/7t7Vdh8t3HEFQpoXC0TFTfX79teuu1TtTNhTA8oeEVo/i 8JL7vFdKgQgosWP31RPQMyMsS42FrlR3SWzkkJaGFzHZGRoO/oWvg1aToqcpAicxMb UDFEi8StpR8Kg== From: SeongJae Park To: Andrew Morton Cc: SeongJae Park , "Liam R.Howlett" , David Hildenbrand , Lorenzo Stoakes , Rik van Riel , Shakeel Butt , Vlastimil Babka , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 0/4] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Date: Wed, 9 Apr 2025 17:00:18 -0700 Message-Id: <20250410000022.1901-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-Rspamd-Queue-Id: 656BB16000F X-Stat-Signature: 9xxdz6gmk1cdmwudbb7z41a5ehwixxnq X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1744243244-40756 X-HE-Meta: U2FsdGVkX1/x1jOXr/Jpv8CbcclthAglbOd0INll4Mu7cyLSyi++2cFYqgPvPKXUZT8UmUkka3uitD93CeszHM02uEJbRQ9MoYTqm5ABKB02xOj6CvlutKlsoLNLFNoJyUWKTGDAKeN4su4/8nW/PEg7ni1kZOiHPeRutkeLotj/B8ppxs5UimHQqwZxSmgj0zay1Hk0BZJEOoALepH/S57RwxIiQgcal6i7cYY4YyHDCADdPZrohhK586sv05dDqKOuUgUD+k707WYeffUQ2vXMWe3mzgIylkJ1gTWkSykMqdiRMCQ/xtx0kl8oWRSxtYu7JKH/K0xXqjeNsirUFROJyVdPS/9K6aXu0zWssEf6vyHBwrcedxIfGXai6eS0nvbYXgOEPLmvRm2XDdS+kMeZlx039HB+dr9kXerPvBHVwPnNn/HmXPrlxAnARwDvcP9CwvsvrRiN6cRph9jco2UIYObVbdQ/eq+cIgMvO1pieY4SdGD5tMBXcebzV4jj+2dTS35LRZUxWULFqb3bXKMkUA8NLvCgQBpiKvwQ8KOLHo8j5TWpcXtTv5URrpDfx4aRFlmNSEgGYqzV2r76/2BtC/Np4MGg2cOVJpKif+qFjI5NSn6/gNA/G1OkiPVQXpq3jSG/B87lxr51CJXXw0uslraPMS6R0VfB6H3Hvs0TzQtr39Y4y7ZoUY8Qi2gSEcOeWJHl90uQmiLtl+eqdGMptMyY9zUuM5Zs+qAPqj1zn9r5us+rD0H1o21quABq8lAZJMrtAyxAGaQ8/St6cl950EU7l6O9Ib3Zuwh+15vM7NuKyeaEJUGjX15LzrERGkIdZK+xksREwXuiw9wREfYlOzrQx8si+BA57cQoZwdlf0AdQQYi32Xdzv/jrkDUH8tDypGP5U0zsu8Kl7eGfc03LRsLVebrMYKpwYvJLtZDPANELr9uyRJdmvA/mCDSuOJb/PUUTU5oTTSco0g 7EhQD8B/ ctBVIX28HmE3IBKLJ7795XtaZSyq8AH3s2mpq7nRtoLnrPnxn46Yg4eqYU5eq0Eqat+Jj6i3UlJAAbv08J/NEuuFCUjXT+m/fFL8I5rdFtRDf5huQKbT+RnHZXZTygIUV5rENMcLi9dPqJt9gopDmfDqG8tioQBsCmz8nDVfYIyQRc7ZPS3ZHw8f4AFqJGPK77+/YCjvSZ/BBB1QuMdc4dJGSmAUqsVHvctPXsHe1ECYId2hfRaLDej5JA+SkCZ9efP+QGtYLDUSYGtlNT5oVJOc56nrOAL/rd1VjnPqULCxgols= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When process_madvise() is called to do MADV_DONTNEED[_LOCKED] or MADV_FREE with multiple address ranges, tlb flushes happen for each of the given address ranges. Because such tlb flushes are for the same process, doing those in a batch is more efficient while still being safe. Modify process_madvise() entry level code path to do such batched tlb flushes, while the internal unmap logic do only gathering of the tlb entries to flush. In more detail, modify the entry functions to initialize an mmu_gather object and pass it to the internal logic. And make the internal logic do only gathering of the tlb entries to flush into the received mmu_gather object. After all internal function calls are done, the entry functions flush the gathered tlb entries at once. Because process_madvise() and madvise() share the internal unmap logic, make same change to madvise() entry code together, to make code consistent and cleaner. It is only for keeping the code clean, and shouldn't degrade madvise(). It could rather provide a potential tlb flushes reduction benefit for a case that there are multiple vmas for the given address range. It is only a side effect from an effort to keep code clean, so we don't measure it separately. Similar optimizations might be applicable to other madvise behavior such as MADV_COLD and MADV_PAGEOUT. Those are simply out of the scope of this patch series, though. Patches Sequence ================ The first patch defines a new data structure for managing information that is required for batched tlb flushes (mmu_gather and behavior), and update code paths for MADV_DONTNEED[_LOCKED] and MADV_FREE handling internal logic to receive it. The second patch batches tlb flushes for MADV_FREE handling for both madvise() and process_madvise(). Remaining two patches are for MADV_DONTNEED[_LOCKED] tlb flushes batching. The third patch splits zap_page_range_single() for batching of MADV_DONTNEED[_LOCKED] handling. The fourth patch batches tlb flushes for the hint using the sub-logic that the third patch split out, and the helpers for batched tlb flushes that introduced for the MADV_FREE case, by the second patch. Test Results ============ I measured the latency to apply MADV_DONTNEED advice to 256 MiB memory using multiple process_madvise() calls. I apply the advice in 4 KiB sized regions granularity, but with varying batch size per process_madvise() call (vlen) from 1 to 1024. The source code for the measurement is available at GitHub[1]. To reduce measurement errors, I did the measurement five times. The measurement results are as below. 'sz_batch' column shows the batch size of process_madvise() calls. 'Before' and 'After' columns show the average of latencies in nanoseconds that measured five times on kernels that built without and with the tlb flushes batching of this series (patches 3 and 4), respectively. For the baseline, mm-new tree of 2025-04-09[2] has been used, after reverting the second version of this patch series and adding a temporal fix for !CONFIG_DEBUG_VM build failure[3]. 'B-stdev' and 'A-stdev' columns show ratios of latency measurements standard deviation to average in percent for 'Before' and 'After', respectively. 'Latency_reduction' shows the reduction of the latency that the 'After' has achieved compared to 'Before', in percent. Higher 'Latency_reduction' values mean more efficiency improvements. sz_batch Before B-stdev After A-stdev Latency_reduction 1 146386348 2.78 111327360.6 3.13 23.95 2 108222130 1.54 72131173.6 2.39 33.35 4 93617846.8 2.76 51859294.4 2.50 44.61 8 80555150.4 2.38 44328790 1.58 44.97 16 77272777 1.62 37489433.2 1.16 51.48 32 76478465.2 2.75 33570506 3.48 56.10 64 75810266.6 1.15 27037652.6 1.61 64.34 128 73222748 3.86 25517629.4 3.30 65.15 256 72534970.8 2.31 25002180.4 0.94 65.53 512 71809392 5.12 24152285.4 2.41 66.37 1024 73281170.2 4.53 24183615 2.09 67.00 Unexpectedly the latency has reduced (improved) even with batch size one. I think some of compiler optimizations have affected that, like also observed with the first version of this patch series. So, please focus on the proportion between the improvement and the batch size. As expected, tlb flushes batching provides latency reduction that proportional to the batch size. The efficiency gain ranges from about 33 percent with batch size 2, and up to 67 percent with batch size 1,024. Please note that this is a very simple microbenchmark, so real efficiency gain on real workload could be very different. Chagelong ========= Changes from v2 (https://lore.kernel.org/20250404210700.2156-1-sj@kernel.org) - Fix typos on cover letter - Rename madvise_behavior pointers to madv_behavior - Rename notify_unmap_single_vma() to zap_page_range_single_batched() - Add a sanity check of tlb parameter to zap_page_range_single_batched() - Add missed full stop of a comment - Add details to MADV_DONTNEED tlb flush batching commit message - Collect Reviewed-by: from Lorenzo for the second patch Changes from v1 (https://lore.kernel.org/20250310172318.653630-1-sj@kernel.org) - Split code cleanup part out - Keep the order between tlb flushes and hugetlb_zap_end() - Put mm/memory change just before its real usage - Add VM_WARN_ON_ONCE() for invlaid tlb argument to unmap_vma_single() - Cleanups following nice reviewers suggestions Changes from RFC (https://lore.kernel.org/20250305181611.54484-1-sj@kernel.org) - Clarify motivation of the change on the cover letter - Add average and stdev of evaluation results - Show latency reduction on evaluation results - Fix !CONFIG_MEMORY_FAILURE build error - Rename is_memory_populate() to is_madvise_populate() - Squash patches 5-8 - Add kerneldoc for unmap_vm_area_struct() - Squash patches 10 and 11 - Squash patches 12-14 - Squash patches 15 and 16 References ========== [1] https://github.com/sjp38/eval_proc_madvise [2] commit 3923d30a2d51 ("mm-mempolicy-support-memory-hotplug-in-weighted-interleave-checkpatch-fixes") # mm-new [3] https://lore.kernel.org/20250409165452.305371-1-sj@kernel.org SeongJae Park (4): mm/madvise: define and use madvise_behavior struct for madvise_do_behavior() mm/madvise: batch tlb flushes for MADV_FREE mm/memory: split non-tlb flushing part from zap_page_range_single() mm/madvise: batch tlb flushes for MADV_DONTNEED[_LOCKED] mm/internal.h | 3 ++ mm/madvise.c | 101 ++++++++++++++++++++++++++++++++++++++------------ mm/memory.c | 47 ++++++++++++++++++----- 3 files changed, 118 insertions(+), 33 deletions(-) base-commit: 5d1eb3ed3b3aee67f6d1bda64ef710bfcf52f342