From patchwork Wed Jan 15 03:38:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13939795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 365BFC02180 for ; Wed, 15 Jan 2025 03:38:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A144280002; Tue, 14 Jan 2025 22:38:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 95116280001; Tue, 14 Jan 2025 22:38:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81913280002; Tue, 14 Jan 2025 22:38:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 63F6E280001 for ; Tue, 14 Jan 2025 22:38:30 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D422180EF2 for ; Wed, 15 Jan 2025 03:38:29 +0000 (UTC) X-FDA: 83008278738.21.9CDAA22 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf08.hostedemail.com (Postfix) with ESMTP id 09ED316000A for ; Wed, 15 Jan 2025 03:38:27 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WgsrtmYQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736912308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=WrzVQvi/jgw/I6AEsUGXcvK1mN9COshrp0/mcUa7yDQ=; b=EkbrxSWLgtesLq/d54nVGJE1J8L4tx8trn4CVsgpTQQBNs++Gu42nbU2wPTrwRsk5B/lFg 4H5YbUgVt71MFNtJzumFjsUh+SpGAa+lOJMO51g3S72QM65zSNuSCjU/flfh6RDpTZ8e3o P0IDkhfrq2lrvOol5GjuxAZfZMxbM/o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736912308; a=rsa-sha256; cv=none; b=LgniAATRc3V8czoICM6pTjZgUg9OYd9dWUGGzVEm1N5VoYsfqYybd67V9FjnC1HvLWOiNS mkJWmApjrBkGWQnSWuhPtHWI9ZHZpqj9PWnX2h57ngPrfKDes2effVDrz0BkonLllDsEpa xjP+rR902VOzWS/jw0hEcrw+wKdD7y4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WgsrtmYQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-216634dd574so73128875ad.2 for ; Tue, 14 Jan 2025 19:38:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736912307; x=1737517107; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=WrzVQvi/jgw/I6AEsUGXcvK1mN9COshrp0/mcUa7yDQ=; b=WgsrtmYQ5FHvBCeIKk+L9tGVUf6fqETxjdrqfEh7MpEYKJNaaJ63CeFXP+UxgTClL2 0iN4IhOZHTIxmu2Al/q5ruOUIwLOodQkVpQ7Ljsy2B/7JssjqI66sY9oXmallc1qA7e0 Vf992VD0Ocgy0oxb0zMG814I2Gbmllxv/AZ1IyjxWC/F9T5opujQ2PVKhYOFkY0AInGK qaM5f0XeDfmYAAT6SssQKCgqr9ns6C5ZYfH3O7G/LcbmWsF0u2KQq+KOKWl5Jywxc8E3 cLWQdhSJcHZUHOx1I39KnctC9BhYEJpPjzmt9r3RYreIMKoInhmOoT5eXnzjIOzXjM50 B93Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736912307; x=1737517107; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WrzVQvi/jgw/I6AEsUGXcvK1mN9COshrp0/mcUa7yDQ=; b=S+MZabeRSUjIn7n+w9e7HHh1QJ+kJxmNmCa+Zk5vev/2tty+jDqQuLBFC41/WS9or6 1bzLdABjwDeetzAr2QksvOQIxy+/jqF2peUKk2LAEtSHgpY259cp0+NdsxsSnjLh7Q4B dGdEzWYhg0gBTz9cXZBSmc0898hkA6tR+jHveRRfh1w3NUtMBRXaHLBXJlhq+OpPYeAu wln95U7YoSQ/SIaZ5ioeVn9fJhSDouB74FBNY5Q52LLpYIbCAdaHu39NKGj5R72MomHB xGMCEo60fWmPINf1wxmURlEfmvdNGhHw2dM1iizLYshtepreOPIsu/MAxrpj9/Ev2P01 vuIA== X-Forwarded-Encrypted: i=1; AJvYcCUDRn04s54UL9pjuSxQIGysIFj6JkCnRriljLVKrVix5dXGqBydHZoOzNieIelI6210fJ+KRLtjBw==@kvack.org X-Gm-Message-State: AOJu0YxNcQmcbvaGXgyMhtPZvc+5qjwNOSLer49zMzpK3WJu317powTk l3SS0jROrqgI60+2gUEUsezGj5QzVbxWNl1pxCGCss/Cjo4K6uAZ X-Gm-Gg: ASbGncsdjAbDJJY7k5ayp6zIO8cM1GDEpF5odPKbTxK8VGe0AALKn/k77BoGg/Tn1H/ 15lB50BnS80RQBR3hEJTuU6cyBpt6PgDkpV0Ht9vzyGc9olG5CYM7hWb0PWDAgq2XZC1xDF7dLQ FMKl9iB2TuKywJeQFCIQ8V2fmWlggaQ0L1Ciczmk1ZRpoUFtd2IVrlvACkPSFVj8FKyX3eiMKaG Y5RVkTeigFk0mwTx0R22ZISG8seyBiHJ9NwRi2ba03uO0F6K6/L/tb3zULfYUAtEwxWaHuZ8R9A WVRxTWum X-Google-Smtp-Source: AGHT+IG7nVHqDcezd3xmQAp4vBQiBwgib6FzP4BDYjRCpSnpFC+KQyJk4fxnyZSyDrckcOuiOGpO6Q== X-Received: by 2002:a17:902:f705:b0:205:4721:19c with SMTP id d9443c01a7336-21a83fe48fdmr395479085ad.37.1736912306544; Tue, 14 Jan 2025 19:38:26 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:e5d5:b870:ca9b:78f8]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f10dffbsm73368195ad.49.2025.01.14.19.38.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 19:38:25 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Subject: [PATCH v3 0/4] mm: batched unmap lazyfree large folios during reclamation Date: Wed, 15 Jan 2025 16:38:04 +1300 Message-Id: <20250115033808.40641-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Stat-Signature: 6xtcc7cicc5w5xdqe9ysaqbnmgcb3m7n X-Rspamd-Queue-Id: 09ED316000A X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736912307-127524 X-HE-Meta: U2FsdGVkX1+BUBjG9ZKDwABrtjG/NBwWbdHD1lh8B5EoRDFKY/7ykn1dRu+Y/hznrV9JaXoXJxk3f1/T8StIRF7B326yepwFzeFGONRFo16Ak+bjxXfK+MhTXkCSzXvsbBMKJMxhwbGIM1BV5NzBgGYISaPzDbTuvN7hIjYvLoBzF/rS9DoJdtGg86ev9Imdbq8LpoDg8kDV5nG9Fy8T8JahSrGjSh8EOQ2yIDuVZBV0DqcOnA9xljeyFjjA3prmJBY70pjBVS5nSA+TY/8eFCrSjpHVN7PmNbckg7D2vCclW4TCb0riHO4lqiBbyTqa3y3g6OHchH2Ii9ioQzhOAdWDblTpp1ByRUPYYbhqDyiCecjg8oxGvdQlU8yYGANs9PIbyCPqc/WchGgeIKZcuGjzCP76i7YQ7pwysQt7YFUg58vPGZmSpWwYFaLSC7L+WgE5D9DBqKPsQx2fSDFNsashTcJuhdgDQ6J1OiIiaJAuALm+gqBKL26q6bFo25n0ncFYZOY5mqh/Ws33Mzq1bdwwXB4sjKIfjvwFk5tzF+MNnzzIR2FqluZgDwjQDvv9zvk0GvGpCEOEOaNRyyWJMaz6mu/AfrCZZ/kl247VnZ0o2S5uzrhaIdqa47H5sml2El11rnbcKv5H+1OnusKuO8XrIege7FxXPlEaDCGVWRJs1EzVwtqwyPaZJFcaRzIw7o8LUPQbz0Cj6VW+aCtf7Rsm+oANflEPZ+tqw7FEHMxjZzblcGFIClWDFQCdjCukt7YSzeuTX7iSDTIzuh6yOHVUzXXV/pqGn4aLXMUhHZby3NK4gbDiM+z+uOc6hQaLMXQ3xWnIiQhO+ws5muOHt2rCJsdRIJ0cB1x/CEsXRZ5pJZXD8J+fMrUjemIykHqmjPvpUxxhcROKf8PvQAojNve/51UeiqT3CIMWV+meXfTxni9iklPXsUlW1Z5J48zhHnoFwOWBWdC2gQRIQRz 5HFbXEkR z2QhMooqcNEat3mBKXVFNYxO0gZbr/67lCHkqqqB0g7cvwZZ+WOdr1JeX7KYjsD6oUb3fP9zkyiPyLUBqTE50XCJH+Q4Pn+0S+kQiO5SwD43r0FDaTIOB9skEYLHUwKIwDT3N4BXWvcn603MbPDv0tgAd46IxJVhH3ifVTUdJAP9QErCJ8EMF9GMMSZrMAsZ3eyYw39Um383ETmDH207SMi9qPNFCP7J7S9U3nZ0P5hjhoB7868L0qFHzQZwKubmOZKXKdnLCG79z9R13dZyGK2iZ6OMd9hey6/nXgiV6P6bp6tS3ZiPZIJzaXMIXS186TA0PozDB0rTfXDlOdelCKHZOFVHF6/DmgwspoPgi8mCsFoPMSbZs7n/pjU0UWLiQ9ofGmg3Ahv4QpHMrOlIq/AsX12iiVIm10faN+qs6tW+sFsIQbtXV/tnK8Zm1FudQxuXllrXeeg/ruw/fZVvZwR91ow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Commit 735ecdfaf4e8 ("mm/vmscan: avoid split lazyfree THP during shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c. However, those folios are still added to the deferred_split list in try_to_unmap_one() because we are unmapping PTEs and removing rmap entries one by one. Firstly, this has rendered the following counter somewhat confusing, /sys/kernel/mm/transparent_hugepage/hugepages-size/stats/split_deferred The split_deferred counter was originally designed to track operations such as partial unmap or madvise of large folios. However, in practice, most split_deferred cases arise from memory reclamation of aligned lazyfree mTHPs as observed by Tangquan. This discrepancy has made the split_deferred counter highly misleading. Secondly, this approach is slow because it requires iterating through each PTE and removing the rmap one by one for a large folio. In fact, all PTEs of a pte-mapped large folio should be unmapped at once, and the entire folio should be removed from the rmap as a whole. Thirdly, it also increases the risk of a race condition where lazyfree folios are incorrectly set back to swapbacked, as a speculative folio_get may occur in the shrinker's callback. deferred_split_scan() might call folio_try_get(folio) since we have added the folio to split_deferred list while removing rmap for the 1st subpage, and while we are scanning the 2nd to nr_pages PTEs of this folio in try_to_unmap_one(), the entire mTHP could be transitioned back to swap-backed because the reference count is incremented, which can make "ref_count == 1 + map_count" within try_to_unmap_one() false. /* * The only page refs must be one from isolation * plus the rmap(s) (dropped by discard:). */ if (ref_count == 1 + map_count && (!folio_test_dirty(folio) || ... (vma->vm_flags & VM_DROPPABLE))) { dec_mm_counter(mm, MM_ANONPAGES); goto discard; } This patchset resolves the issue by marking only genuinely dirty folios as swap-backed, as suggested by David, and transitioning to batched unmapping of entire folios in try_to_unmap_one(). Consequently, the deferred_split count drops to zero, and memory reclamation performance improves significantly — reclaiming 64KiB lazyfree large folios is now 2.5x faster(The specific data is embedded in the changelog of patch 3/4). By the way, while the patchset is primarily aimed at PTE-mapped large folios, Baolin and Lance also found that try_to_unmap_one() handles lazyfree redirtied PMD-mapped large folios inefficiently — it splits the PMD into PTEs and iterates over them. This patchset removes the unnecessary splitting, enabling us to skip redirtied PMD-mapped large folios 3.5X faster during memory reclamation. (The specific data can be found in the changelog of patch 4/4). -v3: * collect reviewed-by and acked-by of Baolin, David, Lance and Will. thanks! * refine pmd-mapped THP lazyfree code per Baolin and Lance. * refine tlbbatch deferred flushing range support code per David. -v2: https://lore.kernel.org/linux-mm/20250113033901.68951-1-21cnbao@gmail.com/ * describle backgrounds, problems more clearly in cover-letter per Lorenzo Stoakes; * also handle redirtied pmd-mapped large folios per Baolin and Lance; * handle some corner cases such as HWPOSION, pte_unused; * riscv and x86 build issues. -v1: https://lore.kernel.org/linux-mm/20250106031711.82855-1-21cnbao@gmail.com/ Barry Song (4): mm: Set folio swapbacked iff folios are dirty in try_to_unmap_one mm: Support tlbbatch flush for a range of PTEs mm: Support batched unmap for lazyfree large folios during reclamation mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap arch/arm64/include/asm/tlbflush.h | 25 +++---- arch/arm64/mm/contpte.c | 2 +- arch/riscv/include/asm/tlbflush.h | 5 +- arch/riscv/mm/tlbflush.c | 5 +- arch/x86/include/asm/tlbflush.h | 5 +- mm/huge_memory.c | 24 +++++-- mm/rmap.c | 115 ++++++++++++++++++++---------- 7 files changed, 117 insertions(+), 64 deletions(-)