From patchwork Mon Mar 10 17:23:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 14010441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD9BBC282EC for ; Mon, 10 Mar 2025 17:23:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38FCC280011; Mon, 10 Mar 2025 13:23:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 33F4E28000B; Mon, 10 Mar 2025 13:23:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1931D280011; Mon, 10 Mar 2025 13:23:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E53D1280004 for ; Mon, 10 Mar 2025 13:23:27 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4FF7EC1310 for ; Mon, 10 Mar 2025 17:23:28 +0000 (UTC) X-FDA: 83206312896.07.A6F34B3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf22.hostedemail.com (Postfix) with ESMTP id 8B6A3C0023 for ; Mon, 10 Mar 2025 17:23:26 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=tjFxjC3f; spf=pass (imf22.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741627406; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Kn9Qy1X1O5+xWYlOyg9QrwvLfp+sxZSbRfoOPvWcR5Y=; b=vIHkZ4W+hpi6nYPTSDx566vnfrxIzSwaa6PKxDHnNTd5DUHMWO3Y5TDnmAR0R3mcr2q8UM rQGsFiYAumwiyGONfhVBDEK5ZGI6q0d+1GGE3VGF1IzKcuwaZKADHQ00jreH88/jPiFmc1 1RyM2xQuVQV/k4Ualn1rPpaCA0MguM0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=tjFxjC3f; spf=pass (imf22.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741627406; a=rsa-sha256; cv=none; b=eNAPmP4eMvr4zu9tHJg54n7YT912mjzzoOGFSWQ5iOXYUTMgqbWPyiixDzbbb2MOeuXUyl D8z0pF02q49gWG44pkJ6HGV8hULxGBBS89IypjA8Dm/RIhRi0HvVIU9Pv8WJL2GgtBQ1Os 3i0+yTrXoqaMSqi6hzXBaGD+OF/84Sw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 85C7C5C41E5; Mon, 10 Mar 2025 17:21:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CCBB1C4CEE5; Mon, 10 Mar 2025 17:23:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741627404; bh=sNptMTwkw7HCBqwk5Gcbi9fzS9E3Cm+Gtt9fXZGKwbY=; h=From:To:Cc:Subject:Date:From; b=tjFxjC3fC0al4fi7mLlUsKCDGVx4LdeWvozhH+G/Lq1UdedzM+HZQcWUFAn5vSi7h Cw0/VQ7KyKDCVa7Kyh+IVjErdHVhZol7N6ObjhjRzbSA+gLcLtKLQl9S4dLJ0fyO66 zjTgc99ZAVAWdJgN0lGH7+/YVej6DFgtij93OxGTDX73a1ymUgPAB8vywMS9W49889 BamRtRgyJHWJold55rMVcq8aFiPoakIpeAudwfT2BcFVRTx+v78HmZtBQhPZ9KabGi e20/2oT8CR1cdAiYEmbB4pjNzW1Ty21MmxQ/xAeOK2U/+qZ+8kue/nYh2hf8JElqR9 G3MEyeLxmI3tQ== From: SeongJae Park To: Andrew Morton Cc: SeongJae Park , "Liam R. Howlett" , David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Vlastimil Babka , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 0/9] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Date: Mon, 10 Mar 2025 10:23:09 -0700 Message-Id: <20250310172318.653630-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 8B6A3C0023 X-Rspamd-Server: rspam03 X-Stat-Signature: 8w8igwwa8cdemru5miyzhncant181ff9 X-HE-Tag: 1741627406-780042 X-HE-Meta: U2FsdGVkX18A0Lljyb4IDtdMtPdzMJi+pPjplJvAABCnhaWUv0ss30MWHbWwdYE+UxHb3GtWR6Ty6gp4GAA0vR3Ei2j5fDI3QZtqpow+RqEdGEOenOWafbxWNowy3uAxGmvF6gPhaZahzM3RP0QUYhkhWtlDh8zetO+qI7/RL4Qi5vYXDPjVBrv6N+MF9wGsNZtz+f8sr/PpwIO8PXI3oSQuLULbm3C18/7AF5ew8i3bwG+OSEP5Ixosf3tpXR3E0Tbgt9pmLwKb5PdnBa6L0Ng2uJLA/e6KgoAYwy9gyNJ2i4i3mBDZcnJm87x/TmiUpS/RMUCRC9dVkeIOTV21klmYCvhkHJiC4KphIz5POCa0hQggWpXjvGKEU00R3hbmOjUqg/6aynFoid8L/gAqkH28FC7gjAbtDt6a+louASnjeGXLDILulLjcG0s/IrV2vuxwZeYSJYAqqfLI0nCa2OHp3H3LE9b8KbYthmJIF5jksFIcPXxBLuJIhK5HggFMk03KspORmTwVrfUrjP6MIvScPnjJhK3fUzxfYDmeRS9KcyNaQq87jDvte48r0gCuq40xX8IYWgMhHxx1OtTXIpyPfbGrwifuNQ954tWE5iccPlqgfNwpmPGOzM5q6TPStcoMrVXksw8Pe92p8EeRx6/srJCgTAkFqGiEeyiOnSiDYKJyWPM6Ejmf7+0lKl4pKP8+HKujIkAr+33hTALg+wsGdPH0xE5aFFURPYTlGyNKji3nWlOCAQccFrX9O2WAFHUjecGy5LcOUHJPIMWLLu6VIpnH8J9Lge6Zfa7wIXe24znIrwWW/kIPX0iW3AEIskH3myN4dl5iUoymq1yqhWoaA4uXLG2+MzNkNbgUG16jzqWKTEraCyJmTLUGKFn/Q7xI1BOrV6fPhyn6SiBT8rpsO+8Fe+C1aTzj9OgK9BCk6EotISzsdB0BsnI5dtlt15reqo159EJKhX/uTU+ Nq0xcely H4pEk8sEo22fuJAfwFgq8jB5ZQ9cG2lpkll8UpbuBytpsbYxaK5IRL2DvO7gzCd1yKC5G2iWdUeQ5S9DdGq/JxzL+hK8vw5rD9TujXM3L6CXm0l8PEc5o5IKKiNXqNn6h994+hi3aal5cWQ3tuZVZNwegdFfEnadgtTfZ1GPa1AisvhZfJpoCSf780TCpl2kvvfF72WVqMYYsSHAcY7mMhZ8As7fkB2OAvT/F5i4eRThoQ7r38A+AbIyBu1LrOHj+P1wvhQs1uTTcus12o9TCoR4moPG2fdjqb9xNBqMomtYo+HjVKIm0xowRbw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When process_madvise() is called to do MADV_DONTNEED[_LOCKED] or MADV_FREE with multiple address ranges, tlb flushes happen for each of the given address ranges. Because such tlb flushes are for same process, doing those in a batch is more efficient while still being safe. Modify process_madvise() entry level code path to do such batched tlb flushes, while the internal unmap logics do only gathering of the tlb entries to flush. In more detail, modify the entry functions to initialize an mmu_gather ojbect and pass it to the internal logics. And make the internal logics do only gathering of the tlb entries to flush into the received mmu_gather object. After all internal function calls are done, the entry functions flush the gathered tlb entries at once. The inefficiency should be smaller on madvise() use case, since it receives only a single address range. But if there are multiple vmas for the range, same problem can happen. It is unclear if such use case is common and the inefficiency is significant. Make the change for madivse(), too, since it doesn't really change madvise() internal behavior while helps keeping the code that shared between process_madvise() and madvise() internal logics clean. Patches Seuquence ================= First four patches are minor cleanups of madvise.c for readability. Fifth patch defines new data structure for managing information that required for batched tlb flushes (mmu_gather and behavior), and update code paths for MADV_DONTNEED[_LOCKED] and MADV_FREE handling internal logics to receive it. Sixth and seventh patches make internal logics for handling MADV_DONTNEED[_LOCKED] MADV_FREE be ready for batched tlb flushing. The patches keep the support of unbatched tlb flushes use case, for fine-grained and safe transitions. Eighth patch updates madvise() and process_madvise() code to do the batched tlb flushes utilizing the previous patches introduced changes. The final ninth patch removes the internal logics' unbatched tlb flushes use case support code, which is no more be used. Test Results ============ I measured the latency to apply MADV_DONTNEED advice to 256 MiB memory using multiple process_madvise() calls. I apply the advice in 4 KiB sized regions granularity, but with varying batch size per process_madvise() call (vlen) from 1 to 1024. The source code for the measurement is available at GitHub[1]. To reduce measurement errors, I did the measurement five times. The measurement results are as below. 'sz_batch' column shows the batch size of process_madvise() calls. 'Before' and 'After' columns show the average of latencies in nanoseconds that measured five times on kernels that built without and with the tlb flushes batching patch of this series, respectively. For the baseline, mm-unstable tree of 2025-03-07[2] has been used. 'B-stdev' and 'A-stdev' columns show ratios of latency measurements standard deviation to average in percent for 'Before' and 'After', respectively. 'Latency_reduction' shows the reduction of the latency that the commit has achieved, in percent. Higher 'Latency_reduction' values mean more efficiency improvements. sz_batch Before B-stdev After A-stdev Latency_reduction 1 128691595.4 6.09 106878698.4 2.76 16.95 2 94046750.8 3.30 68691907 2.79 26.96 4 80538496.8 5.35 50230673.8 5.30 37.63 8 72672225.2 5.50 43918112 3.54 39.57 16 66955104.4 4.25 36549941.2 1.62 45.41 32 65814679 5.89 33060291 3.30 49.77 64 65099205.2 2.69 26003756.4 1.56 60.06 128 62591307.2 4.02 24008180.4 1.61 61.64 256 64124368.6 2.93 23868856 2.20 62.78 512 62325618 5.72 23311330.6 1.74 62.60 1024 64802138.4 5.05 23651695.2 3.40 63.50 Interestingly, the latency has reduced (improved) even with batch size 1. I think some of compiler optimizations have affected that, like also observed with the previous process_madvise() mmap_lock optimization patch sereis[3]. So, let's focus on the proportion between the improvement and the batch size. As expected, tlb flushes batching provides latency reduction that proportional to the batch size. The efficiency gain ranges from about 27 percent with batch size 2, and up to 63 percent with batch size 1,024. Please note that this is a very simple microbenchmark, so real efficiency gain on real workload could be very different. Changes from RFC (https://lore.kernel.org/20250305181611.54484-1-sj@kernel.org) - Clarify motivation of the change on the cover letter - Add average and stdev of evaluation results - Show latency reduction on evaluation results - Fix !CONFIG_MEMORY_FAILURE build error - Rename is_memory_populate() to is_madvise_populate() - Squash patches 5-8 - Add kerneldoc for unmap_vm_area_struct() - Squash patches 10 and 11 - Squash patches 12-14 - Squash patches 15 and 16 References ========== [1] https://github.com/sjp38/eval_proc_madvise [2] commit e664d7d28a7c ("selftest: test system mappings are sealed") # mm-unstable [3] https://lore.kernel.org/20250211182833.4193-1-sj@kernel.org SeongJae Park (9): mm/madvise: use is_memory_failure() from madvise_do_behavior() mm/madvise: split out populate behavior check logic mm/madvise: deduplicate madvise_do_behavior() skip case handlings mm/madvise: remove len parameter of madvise_do_behavior() mm/madvise: define and use madvise_behavior struct for madvise_do_behavior() mm/memory: split non-tlb flushing part from zap_page_range_single() mm/madvise: let madvise_{dontneed,free}_single_vma() caller batches tlb flushes mm/madvise: batch tlb flushes for [process_]madvise(MADV_{DONTNEED[_LOCKED],FREE}) mm/madvise: remove !tlb support from madvise_{dontneed,free}_single_vma() mm/internal.h | 3 + mm/madvise.c | 221 +++++++++++++++++++++++++++++++++----------------- mm/memory.c | 38 ++++++--- 3 files changed, 176 insertions(+), 86 deletions(-) base-commit: e993f5f5b0ac851cf60578cfee5488031dfaa80c