From patchwork Fri Apr 4 14:11:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Koichiro Den X-Patchwork-Id: 14038523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8475C36010 for ; Fri, 4 Apr 2025 14:11:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 812DE6B0023; Fri, 4 Apr 2025 10:11:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C32B6B002F; Fri, 4 Apr 2025 10:11:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63BD36B0030; Fri, 4 Apr 2025 10:11:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4210F6B0023 for ; Fri, 4 Apr 2025 10:11:49 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8E203160292 for ; Fri, 4 Apr 2025 14:11:50 +0000 (UTC) X-FDA: 83296549980.06.CF8DB59 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by imf28.hostedemail.com (Postfix) with ESMTP id 52BC3C0004 for ; Fri, 4 Apr 2025 14:11:48 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=EE5fcLCx; spf=pass (imf28.hostedemail.com: domain of koichiro.den@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=koichiro.den@canonical.com; dmarc=pass (policy=none) header.from=canonical.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743775908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=C+zfkvYOpX/LkordDlbA+L4X7gmdrbSBRR7pZY7+5yg=; b=2tGYhHnBQ/0HZ6X3QP8UvNVFpKQ1GZ03n+uAodvMUOqj8kZYU8HL8wy7dxM9YO+X7g6WWR DrY2vYhWLacgxQY9IbCVHoibApl49r4/vV3cFf0H/pSm+S+J/toY2MnR/cu/00gKa3Ny0m nJZa0Ga1+BM/bSj7wlWPSaPGKpf1qlE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743775908; a=rsa-sha256; cv=none; b=2aEpmru4JxjZyKkjJcUmkq9kfzhGs4cJ9c4Qq51qan7KipTK4CTm5X45kUT7daDYUlGBmk dmw2zKXOBFo7fpFO6AZbQxvp9KTiGGMhygKEMYxy5BMIKP115WdU1jM8yxcn8fKcxsz+Lp n8z1ONQnGhv0iM9CB+RaYU7uNZT6Ge0= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=EE5fcLCx; spf=pass (imf28.hostedemail.com: domain of koichiro.den@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=koichiro.den@canonical.com; dmarc=pass (policy=none) header.from=canonical.com Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 5F1B23F84D for ; Fri, 4 Apr 2025 14:11:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1743775906; bh=C+zfkvYOpX/LkordDlbA+L4X7gmdrbSBRR7pZY7+5yg=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=EE5fcLCxAi+LyWj2dyy5T+Bf43e7MwKqEDRewi/QD+HAuaDouGf/SiczsGVbN5d3b ypEYcI3aKu3JVQjubTWsjDTI15V3zqcJLr49ehmbN1iaU1T3qkYQQw1uUsb1uAujJz +5V3WeApaX563hS0X/XkMDCum9MxDxNe+r/TTQ5XAPsp9Nt2QljNAcpnJc9SWahqqy j5kdt60D0TRTQlcPm60v1m5PFSNSiHGmiX0cyTKhdyb8tX7W4UjQx2W8ZWSDmtrwC0 jQfa7c7PSXqym7kdEeRckFRXsK5KnzeAk8iAgca7SlvLWw76B3VHRfzmXKW9g8OmuA 8qYu0MjFG+M4g== Received: by mail-pj1-f69.google.com with SMTP id 98e67ed59e1d1-2ff62f96b10so3438546a91.0 for ; Fri, 04 Apr 2025 07:11:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743775905; x=1744380705; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=C+zfkvYOpX/LkordDlbA+L4X7gmdrbSBRR7pZY7+5yg=; b=S2Se/Hxs/2fpM008CAiL+UbEfEFeTzs7JicnJQIZ50SC7vud51ixLE2RSIIy06zMib WDaL0FE9sh3GHSLrF4Yj35oseFyzMpuJVn33fqMnELONIzy1ANxJot6jUkT0HHS15aX1 JMR+y291LiqxTfERqd4GiPYEBbC9yk8uF86tUQiY627g4S9DTsvrvgY0LB8CqAKDpP9K No/0WyiIXxpmGdDqmbK1EE6qtyl6WgzA9jfxAoH0SKjDo+yr+Gs3wgS5RKBLHWgtyfYK VoFUFwBF2+0Flml9ByGOGha0/Au++oPpYuLQW3F5xsuE+ovZLB3uzJBsLLyGbJLy99kM qZ7g== X-Gm-Message-State: AOJu0YxPpDt613Agk7QcEGkUPGD9OHw3Qz7Q8Z83822cgKs3uN2spKOt xmlEzLJjQnDIpJ5IqYE60IhC8GyqFjGU6aapSM5Eg7O8fLCB6cAX4bRfLK6kKOsmUMLzaLdiWkI rSqlesgkwMmuTIW46eYJ274YM9Fq+Dl1LK8fKy6eEtejTJN0arjeDMPS1YELrJww0+z1NYc// X-Gm-Gg: ASbGncvt6GCVC4lMNBUcS1/0clc0oJP64eqywRBx4IrHRFkBhu9J1Jgcdgtzhk2JfgG CpuTXiUUDAYOPgbm7hSdQFtmnZYS0F3jh/rIZBoIEGODTWcZk7IW4hgHMGlt8Wlex9MPAqj0JIZ Eh0NK9NH/DlI4tU+Ta7ohVIXnV7pPHvtm9V5DIrhcjY3HF5LNKGxV5TvO7IEQb8dIZuAvUYeEiL RC50MFlUeLa8p++k/bx/3ybA4v4qwuVyehlyWLeOIKjPRC8vXuZROeroSmcx0WDcy8/0FCB8x8t 1JCrK9xXBEi2bCxwHqjFUdx/z88ZKxLmuA== X-Received: by 2002:a17:90b:5187:b0:2fa:6793:e860 with SMTP id 98e67ed59e1d1-3057a399b6amr11645221a91.0.1743775904583; Fri, 04 Apr 2025 07:11:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFX9i1NIwBk506a6Vnth24ovqrIv1z3UCBu/3trbfNNc9QcdfNp6Tt72AOicIMydGztnYc1zA== X-Received: by 2002:a17:90b:5187:b0:2fa:6793:e860 with SMTP id 98e67ed59e1d1-3057a399b6amr11645164a91.0.1743775904035; Fri, 04 Apr 2025 07:11:44 -0700 (PDT) Received: from localhost.localdomain ([240f:74:7be:1:a79e:c2f7:7ca2:a9e0]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3057ca1e874sm3676512a91.8.2025.04.04.07.11.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Apr 2025 07:11:43 -0700 (PDT) From: Koichiro Den To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, yuzhao@google.com, linux-kernel@vger.kernel.org Subject: [PATCH] mm: vmscan: apply proportional reclaim pressure for memcg when MGLRU is enabled Date: Fri, 4 Apr 2025 23:11:18 +0900 Message-ID: <20250404141118.3895592-1-koichiro.den@canonical.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 52BC3C0004 X-Stat-Signature: ajdhsbneacmdkisuc4igmhhjnigyz7ti X-HE-Tag: 1743775908-481726 X-HE-Meta: U2FsdGVkX18okfcdOtz1qf3cQ18H9Mv8bZMM8FIcpx+aKSLF+0ln+FtnTSA0IQ9pihVtYfGUrOlXrnwi6O9kcBwsnrrbMieKIP/niu4K+VHtSM2kJ2+I/5goKMgZi95WL4YzRQwToPIqvLyPqoUQ6itq+B6X64btuNtT9EkYMk4l22mGF2fjtbdLHmUM+vbPXKCay1Zth27WRasCbqxT9KAWOOycllyKTBi9JpWIx7hdNTZ/nagPAHTtnv8eqLqO2Lx1g33heB3Dpe02XXX4xmnlrdjIPI4V3XFeav6Ks1F6aEpLy4iumSpZrpcmEWRdUORUsy2cmPq7r7GZK1yJw6TTTZgjlJEDgJ47Tn/D7Yui2N7yM3nVfrvjppn1QkigFLGVWZk5ysFAHZVnxrnSABTrP9zgU0GLmrcoGbMp6FWtOWg2EhChRQY058MqMo7PgUVSwHUig48jJj+CSbwBOONfwKr6sO5u3A1gcJ2jmjV8R8kZsTyeDba8dG2OFfSz5Sps7vg8F6KBPwOElIS+vqjFdr/5IJ2SNiFh8gjBY2i3EcmitVC/qv9aX4gaSHz+x0s6zrtHRtdgCvQMODDpdZJhi83lO7SOEG9USLtcQA6q+VGPnwg0h61VfAT2wb6T3+YEy+xv/7ciUr7Ecjl+0lmqvyAMhnaqzv3mlk/tWzMKbDTDx4moFSG/3quILZOWJUK8ZS6GAWGDd5OjdWwOGJX6saGzB2wMFjCFZ6P64nUNSLUWawNRy1XCKZwxnniMRMBnmTWKFGNIHtnueAsHfZrOn/2H1nZ6uw0pF8l6fnBHAW03fwtWvQnDXs9SJKbUUsncR4ee4wBLQwkkbeMb8al/kT6ZMbAah7QiwCK3rqqZpoBt6/L1oDJaLYrpxytogwYRMLUGw1yUk7+mjNejzAejE8gsrR1JIByqRor45Xuq/YTeMZFKJPpgQa38czjaGeNKVXzU5Jbe/4OiMOT U1V6rNdy phgTuR9rbiZWbumxIGFqm6fF5vRdx+pfi+Cs4bRpUS07BgzkM/n33wgWpS7y4+N05LUlAyxH/UGX14mxAM/pGJWJqzundxq+Tep/uTvmWxMV1hDDgqpIoxbH0E41OYfpQ9LR86NSdYxbCqaTmp6KK7k7VV3D19t3V6dPpBXno7dMXqGTPbUBv8HMjTM+pqEEksJjqRL1tK7eGIjoiKuxBF4wM0oXVckK4UGCgwIY71x/gmB53WS7TS1j9xSaJx99twwfVM4Aj3CJnp/tW/EsFNsRpiFmMCjxXaL21tzABTJeTUji5Nfli7rQLNVKJfdnH/sdKvd6ryR0x1qsp+ah1jOnJSQmC4P0owg/KxiNBC4UxULFHQ3kGHVybiOqabW0tUjtbZoUCchQv53gpQ4wRGZNu4ITJoGfJqKET X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The scan implementation for MGLRU was missing proportional reclaim pressure for memcg, which contradicts the description in Documentation/admin-guide/cgroup-v2.rst (memory.{low,min} section). This issue was revealed by the LTP memcontrol03 [1] test case. The following example output from a local test env with no NUMA shows that prior to this patch, proportional protection was not working: * Without this patch (MGLRU enabled): $ sudo LTP_SINGLE_FS_TYPE=xfs ./memcontrol03 ... memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=25964544) ~= 34603008 memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=26038272) ~= 17825792 ... * With this patch (MGLRU enabled): $ sudo LTP_SINGLE_FS_TYPE=xfs ./memcontrol03 ... memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=29327360) ~= 34603008 memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=23748608) ~= 17825792 ... * When MGLRU is disabled: $ sudo LTP_SINGLE_FS_TYPE=xfs ./memcontrol03 ... memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=28819456) ~= 34603008 memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=24018944) ~= 17825792 ... Note that the test shows TPASS for all cases here due to its lenient criteria. And even with this patch, or when MGLRU is disabled, the results above show slight deviation from the expected values, but this is due to relatively small mem usage compared to the >> DEF_PRIORITY adjustment. Factor out the proportioning logic to a new function and have MGLRU reuse it. [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/controllers/memcg/memcontrol03.c Signed-off-by: Koichiro Den --- mm/vmscan.c | 148 +++++++++++++++++++++++++++------------------------- 1 file changed, 78 insertions(+), 70 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..c594d8264938 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2467,6 +2467,69 @@ static inline void calculate_pressure_balance(struct scan_control *sc, *denominator = ap + fp; } +static unsigned long apply_proportional_protection(struct mem_cgroup *memcg, + struct scan_control *sc, unsigned long scan) +{ + unsigned long min, low; + + mem_cgroup_protection(sc->target_mem_cgroup, memcg, &min, &low); + + if (min || low) { + /* + * Scale a cgroup's reclaim pressure by proportioning + * its current usage to its memory.low or memory.min + * setting. + * + * This is important, as otherwise scanning aggression + * becomes extremely binary -- from nothing as we + * approach the memory protection threshold, to totally + * nominal as we exceed it. This results in requiring + * setting extremely liberal protection thresholds. It + * also means we simply get no protection at all if we + * set it too low, which is not ideal. + * + * If there is any protection in place, we reduce scan + * pressure by how much of the total memory used is + * within protection thresholds. + * + * There is one special case: in the first reclaim pass, + * we skip over all groups that are within their low + * protection. If that fails to reclaim enough pages to + * satisfy the reclaim goal, we come back and override + * the best-effort low protection. However, we still + * ideally want to honor how well-behaved groups are in + * that case instead of simply punishing them all + * equally. As such, we reclaim them based on how much + * memory they are using, reducing the scan pressure + * again by how much of the total memory used is under + * hard protection. + */ + unsigned long cgroup_size = mem_cgroup_size(memcg); + unsigned long protection; + + /* memory.low scaling, make sure we retry before OOM */ + if (!sc->memcg_low_reclaim && low > min) { + protection = low; + sc->memcg_low_skipped = 1; + } else { + protection = min; + } + + /* Avoid TOCTOU with earlier protection check */ + cgroup_size = max(cgroup_size, protection); + + scan -= scan * protection / (cgroup_size + 1); + + /* + * Minimally target SWAP_CLUSTER_MAX pages to keep + * reclaim moving forwards, avoiding decrementing + * sc->priority further than desirable. + */ + scan = max(scan, SWAP_CLUSTER_MAX); + } + return scan; +} + /* * Determine how aggressively the anon and file LRU lists should be * scanned. @@ -2537,70 +2600,10 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, for_each_evictable_lru(lru) { bool file = is_file_lru(lru); unsigned long lruvec_size; - unsigned long low, min; unsigned long scan; lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); - mem_cgroup_protection(sc->target_mem_cgroup, memcg, - &min, &low); - - if (min || low) { - /* - * Scale a cgroup's reclaim pressure by proportioning - * its current usage to its memory.low or memory.min - * setting. - * - * This is important, as otherwise scanning aggression - * becomes extremely binary -- from nothing as we - * approach the memory protection threshold, to totally - * nominal as we exceed it. This results in requiring - * setting extremely liberal protection thresholds. It - * also means we simply get no protection at all if we - * set it too low, which is not ideal. - * - * If there is any protection in place, we reduce scan - * pressure by how much of the total memory used is - * within protection thresholds. - * - * There is one special case: in the first reclaim pass, - * we skip over all groups that are within their low - * protection. If that fails to reclaim enough pages to - * satisfy the reclaim goal, we come back and override - * the best-effort low protection. However, we still - * ideally want to honor how well-behaved groups are in - * that case instead of simply punishing them all - * equally. As such, we reclaim them based on how much - * memory they are using, reducing the scan pressure - * again by how much of the total memory used is under - * hard protection. - */ - unsigned long cgroup_size = mem_cgroup_size(memcg); - unsigned long protection; - - /* memory.low scaling, make sure we retry before OOM */ - if (!sc->memcg_low_reclaim && low > min) { - protection = low; - sc->memcg_low_skipped = 1; - } else { - protection = min; - } - - /* Avoid TOCTOU with earlier protection check */ - cgroup_size = max(cgroup_size, protection); - - scan = lruvec_size - lruvec_size * protection / - (cgroup_size + 1); - - /* - * Minimally target SWAP_CLUSTER_MAX pages to keep - * reclaim moving forwards, avoiding decrementing - * sc->priority further than desirable. - */ - scan = max(scan, SWAP_CLUSTER_MAX); - } else { - scan = lruvec_size; - } - + scan = apply_proportional_protection(memcg, sc, lruvec_size); scan >>= sc->priority; /* @@ -4521,8 +4524,9 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca return true; } -static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, - int type, int tier, struct list_head *list) +static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec, + struct scan_control *sc, int type, int tier, + struct list_head *list) { int i; int gen; @@ -4531,7 +4535,7 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, int scanned = 0; int isolated = 0; int skipped = 0; - int remaining = MAX_LRU_BATCH; + int remaining = min(nr_to_scan, MAX_LRU_BATCH); struct lru_gen_folio *lrugen = &lruvec->lrugen; struct mem_cgroup *memcg = lruvec_memcg(lruvec); @@ -4642,7 +4646,8 @@ static int get_type_to_scan(struct lruvec *lruvec, int swappiness) return positive_ctrl_err(&sp, &pv); } -static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness, +static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec, + struct scan_control *sc, int swappiness, int *type_scanned, struct list_head *list) { int i; @@ -4654,7 +4659,7 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw *type_scanned = type; - scanned = scan_folios(lruvec, sc, type, tier, list); + scanned = scan_folios(nr_to_scan, lruvec, sc, type, tier, list); if (scanned) return scanned; @@ -4664,7 +4669,8 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw return 0; } -static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness) +static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec, + struct scan_control *sc, int swappiness) { int type; int scanned; @@ -4683,7 +4689,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap spin_lock_irq(&lruvec->lru_lock); - scanned = isolate_folios(lruvec, sc, swappiness, &type, &list); + scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness, &type, &list); scanned += try_to_inc_min_seq(lruvec, swappiness); @@ -4804,6 +4810,8 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int s if (nr_to_scan && !mem_cgroup_online(memcg)) return nr_to_scan; + nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan); + /* try to get away with not aging at the default priority */ if (!success || sc->priority == DEF_PRIORITY) return nr_to_scan >> sc->priority; @@ -4856,7 +4864,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) if (nr_to_scan <= 0) break; - delta = evict_folios(lruvec, sc, swappiness); + delta = evict_folios(nr_to_scan, lruvec, sc, swappiness); if (!delta) break; @@ -5477,7 +5485,7 @@ static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_co if (sc->nr_reclaimed >= nr_to_reclaim) return 0; - if (!evict_folios(lruvec, sc, swappiness)) + if (!evict_folios(MAX_LRU_BATCH, lruvec, sc, swappiness)) return 0; cond_resched();