From patchwork Fri Feb 23 04:15:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13568567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05D3DC54798 for ; Fri, 23 Feb 2024 04:17:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 509886B0071; Thu, 22 Feb 2024 23:17:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B9E16B0072; Thu, 22 Feb 2024 23:17:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 381086B0074; Thu, 22 Feb 2024 23:17:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 264D86B0071 for ; Thu, 22 Feb 2024 23:17:42 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id EAB3CC1028 for ; Fri, 23 Feb 2024 04:17:41 +0000 (UTC) X-FDA: 81821759922.20.7A02920 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf25.hostedemail.com (Postfix) with ESMTP id 52020A0002 for ; Fri, 23 Feb 2024 04:17:40 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="S/rD9N/L"; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708661860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=5+9+uMP+Bu3TiyfCDkgIe0O8ZamRmCynrJ5soRd+G7M=; b=EicIDNmAzd44HcJ28MA6FHdD3yvqgGggyANLiYN8eaGz35lK2zLj9VBi+XT5APzNcumeig bDov0Sf4eKhiUFv81SiTqAgFU5P6HyXub18nXqkEf0vONJNfeihrtbtGr4anOPbIl1eqIr A+6AjevUPPfhLDEMBbRRD/mNpp6dg8w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708661860; a=rsa-sha256; cv=none; b=XPhGrdDiNCtBija4/7WS06oWUGaNpGLjrrrVkoRo3VgbvVU7F9IcIG9xaauDxV7RDezygc TvPrsi3pW4qqr3lQ8JVt7OPH3IVW0Z0mOc6S5/JoeaODIbb3W+phc1nIpmOcHtKn6QrRFf 4bn9gVktAH+Qo1oIWz8XasAc3DIDkcU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="S/rD9N/L"; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1dc49b00bdbso3110415ad.3 for ; Thu, 22 Feb 2024 20:17:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708661859; x=1709266659; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=5+9+uMP+Bu3TiyfCDkgIe0O8ZamRmCynrJ5soRd+G7M=; b=S/rD9N/LC9VRczoBTN3OLUTJQtkbAuf2CP8jWE9ev0zxqWEp9Fh/qFSwiGJq9Ci16N V4bXt/J+OVaocdwdmH7dpLb77qAiA9T0PH3g2CNUvSazX5KNEC1rUyxaZnT9CpdSZxjE 5lnUL/JmRuI1U0UG61veRCKh+qXCWYNj/jsGQI+LvWRXn4ejvpkDR0YyTstgfywH0QUj 3JBDbXo/K0Fay+VMB5Yxho9jRBe6DYr52WYTGUFg30tQ62FBWPKiR0Sliboj4+PjKfZG v5OxCeB+y591xVXgiY61gllLbFDJb75G9w19iRd6tR15z4rVNQEWN4rOXqtS8GDx91cd pv0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708661859; x=1709266659; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5+9+uMP+Bu3TiyfCDkgIe0O8ZamRmCynrJ5soRd+G7M=; b=TxW2B/BZTuu8a+qEQcsusaqh9VMWR0XnIBrUkMLDbESjYH4IHhBsxgaRPeCj89L9Ey iW1ar0INt/RrYD2V4108URdIPcJSL3l0KyC3zM6RlbZPTRmwDmZzJd0NqmHXQVxGLKVB 7jyzrzndQ5XfE1V5qJyVN+S4ZpKonLoNaYKyiNa/mvWvygnfrdQ69r6n5GZqsVi0kAov dbZ8l9HfXktd50b/QweEFueSSE57w4N5yit2mwjhsQkue87Ob99RZJx7/oArxxmupG4F NTwBlqWrk+kyxH3Txj+91fL2UY4HoGHdQ4OYyf3hwKAvIqxhjSPJN0Kb0WrBieafOdyc Sr+A== X-Forwarded-Encrypted: i=1; AJvYcCWFlMYplo9eGxxJjpK0f71ji7QBQaC0m3RKg9L0b4GxdroT8MihaE8FJB71KdiSRJ8HI52j42ktLbbjKNr7wyPsBrg= X-Gm-Message-State: AOJu0YzJ4FBAS6zBjPTUeP9StI3N9A5d9CCEPlsazK0LKuyNic542MZp /vvYHHrFuYY+2PpsiNgRE3w5MYhSTWdXxO9wbcJeWCy/StwzW8lb X-Google-Smtp-Source: AGHT+IEahzpjoCO4rsyjoYfjCIEkVvOSN5zx8vrEST2Nrht65jVZw1Eqx6UK2Bu6waRNM6E1tEjp/w== X-Received: by 2002:a17:902:be0a:b0:1dc:6373:3cc with SMTP id r10-20020a170902be0a00b001dc637303ccmr652610pls.50.1708661858923; Thu, 22 Feb 2024 20:17:38 -0800 (PST) Received: from barry-desktop.hub ([2407:7000:8942:5500:5a27:dbae:d10:c2d6]) by smtp.gmail.com with ESMTPSA id p12-20020a170902eacc00b001dbb06b6133sm10662054pld.127.2024.02.22.20.17.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Feb 2024 20:17:38 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: sj@kernel.org, akpm@linux-foundation.org, damon@lists.linux.dev, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, minchan@kernel.org, mhocko@suse.com, hannes@cmpxchg.org, Barry Song Subject: [PATCH RFC] mm: madvise: pageout: ignore references rather than clearing young Date: Fri, 23 Feb 2024 17:15:50 +1300 Message-Id: <20240223041550.77157-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Stat-Signature: wccws3rjghmzwbpireipozzdzibhoaa6 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 52020A0002 X-Rspam-User: X-HE-Tag: 1708661860-687930 X-HE-Meta: U2FsdGVkX19Tsskt9xQpcvWpBVnUaZLmWcbZm7Wse4YXsb7jtTUb3P4QXdFPdqxDpNRk7v95VBGdK1T0JDtvH5FtgNXNKsPSbF00kSUhPNdiMBOsRRTjbgxy6bDu2yIIQt6i26fpW2sJB+eyRSaXizv4GA2HtCmMZHwNcmzlPrvRQ1mYfB+NvKPdrKS8V+uYGJpXEFvyNN4GOgdXvEDewWgPe+QVqEqFBSyScHidTBZOjSxaICiEjH9FGAs+tf34+lI9G9rS0vU/iwcDSd09g9B10NUVnXsCTRilmnqbwzaChIgrAlJhW9LppnxjhOIS0k5pDRnEvQjUV/urGb1mEQY+JZqJs4kaJvtjO0mP1KTMq5IGknJrWdlKQm7AXMERE3lHM0Kq1XjB3l8HOc9nhTPv/2g4DYOfaV27H84xXX3d09X++dk+4UZT4yXrQqHZk8iE5VbHcEk3eoKkdk1YuxBOyDNLULR+E8mTDDd07qwzaW4CT4cYFl1nUMv9R1Kz8CvXtJ8vT6SOGd6MDh4ui8FO8/aw3sFBy5IpnH89KEVjyNUT1rmSmCfwPK+lJgWdd6/m4c2Fu/Pq9Z3EkwSS0XiiJvha2mUaIz21aVvY/ynOxpLLBvWr6lZHFu7xSFxAW3T+8jX7rWw8g6z2w4pYOeuPlWgVsyesB8ijL5HCjzfHPBSuA9kK1fg1TdKuwolYNqqKmva2fdTzHKBD0YH3OHN9EiRF4XEIHHqEm/pRf0p8w5L0kqyhyokAVR14+k2cSKMX1/d/XxsrfaCGTPTnegAVJvWwovVWuuQwkfHJPlTQzChsJ39mh3E2uxIWD4ipE0vs93b5yVfBz03mS2WPRg0ZEmZfKUNTi3DW6mDk2HUtYC1/aPrpjprrGo/EP59krfN+CcquysghQ5g6NAv2TrlO6xd3h31Q0ijWUXgYYn/SJbNGwkcCOUcd3Wav5+jrrxrLILgHhec17xlOoSq bn2Lpuwy KwtOsvbI43yZ0pQZtdRtsS3uEGqkni+XZuu+vMQ/KNylEd1KxjOU7/KSlj9QUufkXCqHOv8NTkL12WZL/ZrVXsdbvPlKz/qLvqVsQDHhJUvsZlB4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song While doing MADV_PAGEOUT, the current code will clear PTE young so that vmscan won't read young flags to allow the reclamation of madvised folios to go ahead. It seems we can do it by directly ignoring references, thus we can remove tlb flush in madvise and rmap overhead in vmscan. Regarding the side effect, in the original code, if a parallel thread runs side by side to access the madvised memory with the thread doing madvise, folios will get a chance to be re-activated by vmscan. But with the patch, they will still be reclaimed. But this behaviour doing PAGEOUT and doing access at the same time is quite silly like DoS. So probably, we don't need to care. A microbench as below has shown 6% decrement on the latency of MADV_PAGEOUT, #define PGSIZE 4096 main() { int i; #define SIZE 512*1024*1024 volatile long *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); for (i = 0; i < SIZE/sizeof(long); i += PGSIZE / sizeof(long)) p[i] = 0x11; madvise(p, SIZE, MADV_PAGEOUT); } w/o patch w/ patch root@10:~# time ./a.out root@10:~# time ./a.out real 0m49.634s real 0m46.334s user 0m0.637s user 0m0.648s sys 0m47.434s sys 0m44.265s Signed-off-by: Barry Song --- mm/damon/paddr.c | 2 +- mm/internal.h | 2 +- mm/madvise.c | 8 ++++---- mm/vmscan.c | 12 +++++++----- 4 files changed, 13 insertions(+), 11 deletions(-) diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c index 081e2a325778..5e6dc312072c 100644 --- a/mm/damon/paddr.c +++ b/mm/damon/paddr.c @@ -249,7 +249,7 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s) put_folio: folio_put(folio); } - applied = reclaim_pages(&folio_list); + applied = reclaim_pages(&folio_list, false); cond_resched(); return applied * PAGE_SIZE; } diff --git a/mm/internal.h b/mm/internal.h index 93e229112045..36c11ea41f47 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -868,7 +868,7 @@ extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long, unsigned long, unsigned long); extern void set_pageblock_order(void); -unsigned long reclaim_pages(struct list_head *folio_list); +unsigned long reclaim_pages(struct list_head *folio_list, bool ignore_references); unsigned int reclaim_clean_pages_from_list(struct zone *zone, struct list_head *folio_list); /* The ALLOC_WMARK bits are used as an index to zone->watermark */ diff --git a/mm/madvise.c b/mm/madvise.c index abde3edb04f0..44a498c94158 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -386,7 +386,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, return 0; } - if (pmd_young(orig_pmd)) { + if (!pageout && pmd_young(orig_pmd)) { pmdp_invalidate(vma, addr, pmd); orig_pmd = pmd_mkold(orig_pmd); @@ -410,7 +410,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, huge_unlock: spin_unlock(ptl); if (pageout) - reclaim_pages(&folio_list); + reclaim_pages(&folio_list, true); return 0; } @@ -490,7 +490,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - if (pte_young(ptent)) { + if (!pageout && pte_young(ptent)) { ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); ptent = pte_mkold(ptent); @@ -524,7 +524,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, pte_unmap_unlock(start_pte, ptl); } if (pageout) - reclaim_pages(&folio_list); + reclaim_pages(&folio_list, true); cond_resched(); return 0; diff --git a/mm/vmscan.c b/mm/vmscan.c index 402c290fbf5a..ba2f37f46a73 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2102,7 +2102,8 @@ static void shrink_active_list(unsigned long nr_to_scan, } static unsigned int reclaim_folio_list(struct list_head *folio_list, - struct pglist_data *pgdat) + struct pglist_data *pgdat, + bool ignore_references) { struct reclaim_stat dummy_stat; unsigned int nr_reclaimed; @@ -2115,7 +2116,7 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list, .no_demotion = 1, }; - nr_reclaimed = shrink_folio_list(folio_list, pgdat, &sc, &dummy_stat, false); + nr_reclaimed = shrink_folio_list(folio_list, pgdat, &sc, &dummy_stat, ignore_references); while (!list_empty(folio_list)) { folio = lru_to_folio(folio_list); list_del(&folio->lru); @@ -2125,7 +2126,7 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list, return nr_reclaimed; } -unsigned long reclaim_pages(struct list_head *folio_list) +unsigned long reclaim_pages(struct list_head *folio_list, bool ignore_references) { int nid; unsigned int nr_reclaimed = 0; @@ -2147,11 +2148,12 @@ unsigned long reclaim_pages(struct list_head *folio_list) continue; } - nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid)); + nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid), + ignore_references); nid = folio_nid(lru_to_folio(folio_list)); } while (!list_empty(folio_list)); - nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid)); + nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid), ignore_references); memalloc_noreclaim_restore(noreclaim_flag);