From patchwork Mon Jan 13 03:38:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13936677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39D08E77188 for ; Mon, 13 Jan 2025 03:39:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B3C26B007B; Sun, 12 Jan 2025 22:39:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8639D6B0083; Sun, 12 Jan 2025 22:39:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DD4A6B0085; Sun, 12 Jan 2025 22:39:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4C0086B007B for ; Sun, 12 Jan 2025 22:39:21 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EC534A023D for ; Mon, 13 Jan 2025 03:39:20 +0000 (UTC) X-FDA: 83001023280.04.C413229 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf01.hostedemail.com (Postfix) with ESMTP id 1269140007 for ; Mon, 13 Jan 2025 03:39:18 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ohjkt/t2"; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736739559; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=FsECYO8sB292CmYpC3fxwMu7nIovFj7xiGwzKOJEc7U=; b=MkG+0C9r0UTKgU0cACgGExo5l05SyBJWzAoC8KrgHF1YykJ7EO78QZwZetyeHEG5Ou2sBX qZRgSdQqHYSNvdKoRN+j7hzMkPiiAVlbI05LLyjCKRjX6KuOm8Y8td8WL0klc6vnOdUBit gKWdRKuDczpprLxgw/2lJr+2Em4cfjU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ohjkt/t2"; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736739559; a=rsa-sha256; cv=none; b=3VawtEiJJpvCpPwKysSMqz44KuHUKNk8NdjFBQ09TjQv2T1No9k+JXKXCyqzY1Cfr9BxsI x41l4XG/rG8xhdmjj6YCTGAQhi2HUPHC0QPUoIg+j2KJnfIiE98x0WumPlCt2Z21VLMn5j eU7VNSKVmgAEk0jsFojO8oh1UoCaXy8= Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2ee50ffcf14so7330553a91.0 for ; Sun, 12 Jan 2025 19:39:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736739558; x=1737344358; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=FsECYO8sB292CmYpC3fxwMu7nIovFj7xiGwzKOJEc7U=; b=Ohjkt/t21URNW0XNjLTZQzik4miAcUcVEAhR0QxlknBgxWxJqwNwa9E946qd1TZKGB XemZ0b/FAS68ozKp872ozWMgEZv/bPoPdFE6DxE5eICi4khBuY+U/qtqtyLmBxrbvuFM dz7QLNGVSmBazTd6sOc4YShwxKBW5TIAp2XnzwY1FWO95TV1EoQbqwVB4bEbMnQ2ODLI oonPRr1FgzFBzwcbkpyeTZsAAcBZx8Hff4yr2BjRHOhsZTDAbDF11qejXvbApd4+d53h 4IQCieZPdzH2C9O6pI6vhIumM/pSiyCHlSR42n4eguRcQDiDZi2Sra55ggH8JfJZFEvp Et3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736739558; x=1737344358; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FsECYO8sB292CmYpC3fxwMu7nIovFj7xiGwzKOJEc7U=; b=OiYgNtPRw3xIv6m0dl4ulXWnwU2Wrz/VWFpvJjgt7KBrEK4xaX47tBxIIvdw/dJEAV Higd2Tmydx0WymlL5Rl9P4oI6L3SUMYwaNlPzYR7BxHpalCyakgcKhpbAypohJQyynu4 d13QNpBlEy3wOLMDqUHdljbacNFDY9xP/+EDHKFZwzx4z2Mw9FZZllS9e4jBPYcnfrED AbotdNIRb6LhN3sKrIH1n9ULuOlmWL+YOIHtpXlTg/l9rHDkHCBR44etZ3hjwh4dlbJe PCYXFn2Xic45AY99i56MAR2A4qSKYPsWP6+7yZDV4iVwx9POVLoXgQXSwGi3d1o6dsWi i0tg== X-Forwarded-Encrypted: i=1; AJvYcCVEpuURtWF3ousSJgl0sD446sSVKAoOncGBnoxM1etc2TVJ3XoTIABSl5L1LfRqEtqiCckMriHceg==@kvack.org X-Gm-Message-State: AOJu0Yw4fz4i+DN/glXy/T+kdVlWtSgIrv1eJmR9NKaVbjc8yAw3onQ7 4xOad1xd+7nlipEGl2saSJO+XWHw7SboiGOuukqjsl19IBCtBH9F X-Gm-Gg: ASbGncu4PW64n6X94mTDb7jgzvcd/3vl9JKeeki7Fy18vGhZXfy3zK97kD+GwQ6OplL z1voPN2M31o5g4HgdeiHkoIDP431BjkEN7+7qtMnmIRbBAUJImNYEmhh62i+7NMVh0ZzQA0q0iP VNq9V9452S1FA3YwzUwl3V+gwMiLFeKOd+4Z+2NOCxz7NSCdXbofiDBYHF+3nU9ix1pOrCtx1Hq 0YXNUMrhzPD62tk9sKwcCarB3XvP15pmhq47AkSWG4AGnB4PYtStb3Hl1jX1OYWo8J3y46D52Io K16IC2ms X-Google-Smtp-Source: AGHT+IHOVUEQIwahsNe0JPIF/xCrkqjPY47Zb3CsoPvALwPFRqBWuqOSOx1Y9Cfq48Gfus/C7iH6nw== X-Received: by 2002:a17:90b:3848:b0:2f5:63a:44f8 with SMTP id 98e67ed59e1d1-2f55443a0e3mr25854156a91.8.1736739557599; Sun, 12 Jan 2025 19:39:17 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:39b5:3f0b:acf3:9158]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f25aabfsm44368405ad.246.2025.01.12.19.39.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 12 Jan 2025 19:39:17 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, linux-riscv@lists.infradead.org, ying.huang@intel.com, zhengtangquan@oppo.com, lorenzo.stoakes@oracle.com Subject: [PATCH v2 0/4] mm: batched unmap lazyfree large folios during reclamation Date: Mon, 13 Jan 2025 16:38:57 +1300 Message-Id: <20250113033901.68951-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Rspamd-Queue-Id: 1269140007 X-Rspamd-Server: rspam12 X-Stat-Signature: fbzycu7shi4jrsba1rfb7t3e86kpm6eh X-Rspam-User: X-HE-Tag: 1736739558-468233 X-HE-Meta: U2FsdGVkX1+aFISkq/d67vDLEIwyKzTD9EetaMHdJlrO9+n6Wk2VcwMsp07C7QWzZTfpb6MoQZ1CB9c1196cHc1U4D9s3OkgEqpNBX8rC5GPPGzhKfj5NgAiJJLzAUA5H0txHIloE6edI3GyKj6s0itqGyzaDdDfayWz/I09JVgaPwusVqevlRsur/hh8jwpe74Gw3B9gQeLkaF1Yj4TBZ8NaMnNOpUmhGoDWNMVfSu6+/g7bpy5wDjylHfX9bb3nl02F6dtUGPkCI0C5ymXD0k9NOiXVhchLPnSuDBgnzuhdtRAzpIGHoZ2OtJGsJ5cGeJhOyGXVAuZqSJ7PU/jCUq4ACie3fa/JMhK2EZqSl+952k0Pzu+7X2lXKydOVVfd4a6jqFGqPwCtrXCePAJWJT83ztJQpijMKtRf1ZS8tLhU3/NUwrUYZJti2nPBjixBlmXifKyPoQqL26JqUxDdxBpUTrZ3Oq07KcH1CJi3laCwIsxceRDpMiceq+8D3SJrd6Wl9asie3hDzkdt8c72nSd1seK68EsHIVq71HoQ8apOGrJByr3X6fP03p1beq3xpjkboh1Tf4xS1At8MX31kCwyq5jQsIl7xqeU/ZzOmojOiLjDuJe4W/ass2HgNc2jZBtC4M9nDaL412M3N/amNzEhw7SMar4Y4Y4PVcTeXiE2IhdXaKRFeBA4XRJ8QlJo5JmP+jyU3f2MIy5g369DvuiEimdwD0qIPA5mTqzfpSmuA6hAtuUnbOHKbCUHox0sLTi3QIYI2NkcXv7JjPglQ9lTP+EQ/ovQEuQOCJmNxfmzkd+n0errkom4lwYnkfPGK/s2N4gjXnd3ZwtLruDQFHAFZTZZoniiPkUpdQHU4QUqXqrOcm6+wNv3mEbj1ZZsWIYmCsxULXYstXRvBkc9f4mDivRqru9vRzO3JS3L0EYhjJEZtUstUgOjsrh4b4Xso1sKS++J1XpbETigSV CrdQXVJf JMRIuPOkdNBX5HADj83m12Q04oKvwnBuPn5tvopPZ3tFkASR3nwIQxjwdA/r8ucjd1+fkp43otjJQ+3489+/gETM7zT6CX7PlQdm3u4Es2V6Yj+rFLOkF17eSC4ENvSSOOv1wCB3AUCs3aZBHr1+T/aS0Vm/LT5mQCdH8q+Az8xHNQJ5cxCZOsSJcRlHxLOYj5qBBzM4h1dYzG3I5jFTIPFVceIqi2OjPg73rUPrXZwqazFt8m1ylx8iRLPycD4YmjsHtbWLEHuHFIaovcmZR65pNejILsW3F0iHZz5NGh2kerJW+pDdjh6dvamqkgTZ0VV2mOOSMNd00tNQTJ0Osx/D5Xcc3AeiAYEt5+W61vldWH+QFL92dirsQosRtr03PHUtD3GX6+9d+Nbu2GIk1aNGgeVFkx364AzU90EUjlYLP9OAFHqCkk24e7q2uI/yw7OoxlzWAK4UYUeFjcEnFQ5d/9Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000144, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Commit 735ecdfaf4e8 ("mm/vmscan: avoid split lazyfree THP during shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c. However, those folios are still added to the deferred_split list in try_to_unmap_one() because we are unmapping PTEs and removing rmap entries one by one. Firstly, this has rendered the following counter somewhat confusing, /sys/kernel/mm/transparent_hugepage/hugepages-size/stats/split_deferred The split_deferred counter was originally designed to track operations such as partial unmap or madvise of large folios. However, in practice, most split_deferred cases arise from memory reclamation of aligned lazyfree mTHPs as observed by Tangquan. This discrepancy has made the split_deferred counter highly misleading. Secondly, this approach is slow because it requires iterating through each PTE and removing the rmap one by one for a large folio. In fact, all PTEs of a pte-mapped large folio should be unmapped at once, and the entire folio should be removed from the rmap as a whole. Thirdly, it also increases the risk of a race condition where lazyfree folios are incorrectly set back to swapbacked, as a speculative folio_get may occur in the shrinker's callback. deferred_split_scan() might call folio_try_get(folio) since we have added the folio to split_deferred list while removing rmap for the 1st subpage, and while we are scanning the 2nd to nr_pages PTEs of this folio in try_to_unmap_one(), the entire mTHP could be transitioned back to swap-backed because the reference count is incremented, which can make "ref_count == 1 + map_count" within try_to_unmap_one() false. /* * The only page refs must be one from isolation * plus the rmap(s) (dropped by discard:). */ if (ref_count == 1 + map_count && (!folio_test_dirty(folio) || ... (vma->vm_flags & VM_DROPPABLE))) { dec_mm_counter(mm, MM_ANONPAGES); goto discard; } This patchset resolves the issue by marking only genuinely dirty folios as swap-backed, as suggested by David, and transitioning to batched unmapping of entire folios in try_to_unmap_one(). Consequently, the deferred_split count drops to zero, and memory reclamation performance improves significantly — reclaiming 64KiB lazyfree large folios is now 2.5x faster(The specific data is embedded in the changelog of patch 3/4). By the way, while the patchset is primarily aimed at PTE-mapped large folios, Baolin and Lance also found that try_to_unmap_one() handles lazyfree redirtied PMD-mapped large folios inefficiently — it splits the PMD into PTEs and iterates over them. This patchset removes the unnecessary splitting, enabling us to skip redirtied PMD-mapped large folios 3.5X faster during memory reclamation. (The specific data can be found in the changelog of patch 4/4). -v2: * describle backgrounds, problems more clearly in cover-letter per Lorenzo Stoakes; * also handle redirtied pmd-mapped large folios per Baolin and Lance; * handle some corner cases such as HWPOSION, pte_unused; * riscv and x86 build issues. -v1: https://lore.kernel.org/linux-mm/20250106031711.82855-1-21cnbao@gmail.com/ Barry Song (4): mm: Set folio swapbacked iff folios are dirty in try_to_unmap_one mm: Support tlbbatch flush for a range of PTEs mm: Support batched unmap for lazyfree large folios during reclamation mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap arch/arm64/include/asm/tlbflush.h | 26 +++---- arch/arm64/mm/contpte.c | 2 +- arch/riscv/include/asm/tlbflush.h | 3 +- arch/riscv/mm/tlbflush.c | 3 +- arch/x86/include/asm/tlbflush.h | 3 +- mm/huge_memory.c | 17 ++++- mm/rmap.c | 112 ++++++++++++++++++++---------- 7 files changed, 111 insertions(+), 55 deletions(-)