From patchwork Fri Jul 12 23:29:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13732229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94453C2BD09 for ; Fri, 12 Jul 2024 23:30:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 290006B0096; Fri, 12 Jul 2024 19:30:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23FA26B0098; Fri, 12 Jul 2024 19:30:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12E9B6B0099; Fri, 12 Jul 2024 19:30:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E9C866B0096 for ; Fri, 12 Jul 2024 19:30:03 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9DEED1408B8 for ; Fri, 12 Jul 2024 23:30:03 +0000 (UTC) X-FDA: 82332695886.25.6EBC78D Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf05.hostedemail.com (Postfix) with ESMTP id E8D03100003 for ; Fri, 12 Jul 2024 23:30:01 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FalbnE0r; spf=pass (imf05.hostedemail.com: domain of 3eLyRZgYKCHMplqYRfXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuzhao.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3eLyRZgYKCHMplqYRfXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720826968; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=yuopOozH91D09gZpiNN2s1dGAhgwFcGvecwcksbRHfM=; b=vvWOWBkL48VcVgNoxPO9FkSnzTFc/qkUPC9D1QsTkvJdBY8Ph0M75IVsqiStqOX4I+RJNB EfNz18q5Y8ntyrDg6TFkoMs7MerH8niwvfyff2qSuz6FXIQvSAHt1G2u6xlUH2xdeaJbgn c1cEyBdfS75MTbDCUwWSS3ptA73Jz/I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720826968; a=rsa-sha256; cv=none; b=kTD5sWjdSxen/vi6DaP3RyVXvo4JieyvRC+mjApC/txJrn2F+nrBsTaGwgYk/0SnbtC+zK 9R6IjgZXIOiNnS9qmtgk79aLgMrxysovdob3z7+RKgkiHC4bIDIn0WFND5UE6Hk+xBgqOu y6Dd25rTYEVRP+QsvWLRWQ3A+FoRe9w= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FalbnE0r; spf=pass (imf05.hostedemail.com: domain of 3eLyRZgYKCHMplqYRfXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuzhao.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3eLyRZgYKCHMplqYRfXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e03623b24ddso4814613276.1 for ; Fri, 12 Jul 2024 16:30:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720827001; x=1721431801; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=yuopOozH91D09gZpiNN2s1dGAhgwFcGvecwcksbRHfM=; b=FalbnE0rCWj1SfSZbcVdJb/RmFvcX7+xNN3P/9FqT5GKX1D4e1jSm7mxkkaPsNUiPC NHInzbBX71+ltb+QOUdK1949CLsW77Yf2oHLrgKoN+4M8bgMayppWAD9v9Dvz+Ifj2B5 16RFIoXrk7AqVFp6ooZocgIg1yH2YRVh+cIV63q8agwYuQy48+Of9dRKm6D5KV+StUlC D8iTtgEvtyKWB0azUG8mKeKX0txKEkitepqdIu0mFHs2kIgUOhJy5NKWT8l53hbnRGwc /TgxB5VGmDpyPxvttb9w83JUpT5oxSh0fIYkeZlk5lsLqvEYGGrTYyU1W56zx5Zl7y15 zosQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720827001; x=1721431801; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=yuopOozH91D09gZpiNN2s1dGAhgwFcGvecwcksbRHfM=; b=JrHU4oKH01Rd8iv9ljGIYsKYR7bassG96Qtqhu4XBY/mXd+dpI/b4NaC9+ZwtaO34N wVVGv/AKZSP/2MSfOqaQ5t2UX+jAvow3U0D4u02qgxNzJxNocBpBB1srgmUPHK1JSBXU jsMpyjvIgpnq0cwclD8X+7MMwnL+uJawZE/lYkUCkBTE2tl2Wp/rHpiMUf9DoB405lT5 unerqHBeBm0ZjPOekE92HiXMGuSS6koO3Zycmkipbs0hxAqOdCuk3ePjNipd7+yNdh2v XevHMtJYWBUvmWAsyWNgLErHvX4+0aYaRMilpchlgA649eFGvPB31UV/Q7RnODrxfl96 ynBA== X-Forwarded-Encrypted: i=1; AJvYcCVEeaPJrW8KM3aG7PcDChmDFYkpaKl9Nn8Y1I6ifnQfvd+lcu1GzLoKaaPSwemK4URXjd07kysNyCqycp+Vye2mRbs= X-Gm-Message-State: AOJu0Yw+G9ep41nhdXRD+8vWQurC7WdV7q/7lW82y+/xI6/J15e/2Tax 6LT05vXf4tTJiXimnIwcvTY8dWqWAdjqIHVkPQG85yBQWT4Xv8Y3NBYMrK77QvPnmZm9P8ifeqS l4A== X-Google-Smtp-Source: AGHT+IHhDLg8iF0W4ocBy32TeGuIZJfhmk/ta1j0BVxQ4rkVUv2uZXUfe7v8aGYspqAdYnING71TnWh9ndo= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:f82:d194:27fc:2620]) (user=yuzhao job=sendgmr) by 2002:a05:6902:2384:b0:e03:31ec:8a2d with SMTP id 3f1490d57ef6-e041b17bf90mr531180276.12.1720827000832; Fri, 12 Jul 2024 16:30:00 -0700 (PDT) Date: Fri, 12 Jul 2024 17:29:56 -0600 Mime-Version: 1.0 X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240712232956.1427127-1-yuzhao@google.com> Subject: [PATCH mm-unstable v1] mm/mglru: fix ineffective protection calculation From: Yu Zhao To: Andrew Morton Cc: "T. J. Mercier" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , stable@vger.kernel.org X-Rspamd-Queue-Id: E8D03100003 X-Stat-Signature: 55x8pxf7aprfhjpw7dy9qebhfuau148r X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1720827001-275043 X-HE-Meta: U2FsdGVkX19N9BDn03J/Ceg4rsWzTo8osr/HHAtV9w4GM3QqqZIeIIBPb/rWPOrwP3XGs30RnLu5PiLmPQiuHPmZV+IQhmtTE737xKC437GemyaVpjYcYVrhSXO9fRj05aJDWinYVDxBccklH5xVJ7JWcYEXtMAXsKwikeulIga8fYuDOmqgTvgBviUYpwySVWGb2qEKo+RPu5G4iAFK4rBAvBpt8Cnsw6nD2O4Lo67idu8QMXLQ5ZYOKsmuJQFO0iETqvBEfKPWx0bGCJ5kWyC9+uHBBAdz87WWMBGmhWuVCKxZatYkBReNrfuPPekRsHAKUFZQyawZDpJiFt1bFNlE5e+b+Y/iCrGphSrLhHVmv4KWrNT+Lv4fulNNBgITnHoXzLvAJ6tS1JSX9xWKUu/H3q9oVECtWZt57VF8uXMcz67NwYsvfXAb3wCS1J3fd143hNgNySQ36ycxJYfoKbJfM3V5qUdzXZQThvsz+M4gK8M6TeXTOl4B50+HWMeAVhivNtbU91QF0bUpwqCcLVomSqQVF/tU79rqfwFy4DR+8FL522E3SUBp2b6gs7QOKz7gzAq8WD86giWhr3ajknHdk+g+UvSIKYarg2rwm6vs+YvkfMUKm3tJ5XscOz7NY4luyO6xz21nDeeTKZpP8MQ6p5GKI5HlLAMhZjf8HoLzBCF31b74WPYp2bnDRREOr60xsg4wAJDu+EVyIRUkSda0cEdT7ygxqZEHHDa1qEtPKZdoFSC8NzxkonLmv14He3orCxbNaR00bPyUMYKq1DrV3pRNBstCcXNlb3qXhJTCiaRBkg+3ahJGyvLqUFSmszv8FGAiwos+JL3ni8gXmxdA7LiiQYnMUXq+FB+CHbxF5f7kPqRKJG6jESMVCZgaKwtSYC4MYXN0YiaG3kIGmhASTyX3vh+W5khZXUJDD2xf5/JnzRhXxfEGPt3QIKFOePsQnquTAMZ1Z8iAnRB NxU9ih1y P0vgu6O6ultnEVQsz2Roz9JpeF76EjAVYr8AZV4qiX4dWdqXpZ92iPy9kwgK6aXpXWSIujIshIbY8ivYgIlATwdlPjq2Ra50BwdCNpm8/HA/cFeGtskJ/LRhyAnJorVJ1aTSHVaLxHuCQfzWhK8Jy/m/1Kpj13o0BGMYSooUSEHif88WScV9D84sk1qrBPQOHEqvYcbt4zncYR+DlMASyY+IYoEu8DVFhnkctb1XgQMap56BKVoeA2WL6p+stUbmOYQzI+P165I9U3Jd+6gbGotgVFZPPV80h1nuRM8ir6Xn4K2WfMi2WdcydvW9qYs+Bgy3OqF1VtGF9dbEv2m9kkMAsYZlRKIZIkrN2f1n215PUeh4edYpKbb+YgV1r3/2s3iCWYK3b3CIzN+xVD/oMMPZ1IIUNPQSV9YIfXI8Uur1g/nM2nWuTuDxdSOJ7Adbmzc9rLNd7q0eIIHT97bjkUPqN6T5RP5tMa3hbhf6H2wRbL+Vzd25C0984dg4C/+jlgnOu2i9AwOzXUEVRU1OG7Aw6W9/3rKFIh80zFfXSTz+IY6sdtPSw5Y9ye7RO0rN2z55V X-Bogosity: Ham, tests=bogofilter, spamicity=0.000045, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: mem_cgroup_calculate_protection() is not stateless and should only be used as part of a top-down tree traversal. shrink_one() traverses the per-node memcg LRU instead of the root_mem_cgroup tree, and therefore it should not call mem_cgroup_calculate_protection(). The existing misuse in shrink_one() can cause ineffective protection of sub-trees that are grandchildren of root_mem_cgroup. Fix it by reusing lru_gen_age_node(), which already traverses the root_mem_cgroup tree, to calculate the protection. Previously lru_gen_age_node() opportunistically skips the first pass, i.e., when scan_control->priority is DEF_PRIORITY. On the second pass, lruvec_is_sizable() uses appropriate scan_control->priority, set by set_initial_priority() from lru_gen_shrink_node(), to decide whether a memcg is too small to reclaim from. Now lru_gen_age_node() unconditionally traverses the root_mem_cgroup tree. So it should call set_initial_priority() upfront, to make sure lruvec_is_sizable() uses appropriate scan_control->priority on the first pass. Otherwise, lruvec_is_reclaimable() can return false negatives and result in premature OOM kills when min_ttl_ms is used. Reported-by: T.J. Mercier Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Cc: stable@vger.kernel.org Signed-off-by: Yu Zhao --- mm/vmscan.c | 86 +++++++++++++++++++++++++---------------------------- 1 file changed, 40 insertions(+), 46 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 6216d79edb7f..525d3ffa8451 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3915,6 +3915,32 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, * working set protection ******************************************************************************/ +static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc) +{ + int priority; + unsigned long reclaimable; + + if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH) + return; + /* + * Determine the initial priority based on + * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, + * where reclaimed_to_scanned_ratio = inactive / total. + */ + reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE); + if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc)) + reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON); + + /* round down reclaimable and round up sc->nr_to_reclaim */ + priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1); + + /* + * The estimation is based on LRU pages only, so cap it to prevent + * overshoots of shrinker objects by large margins. + */ + sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY); +} + static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc) { int gen, type, zone; @@ -3948,19 +3974,17 @@ static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc struct mem_cgroup *memcg = lruvec_memcg(lruvec); DEFINE_MIN_SEQ(lruvec); + if (mem_cgroup_below_min(NULL, memcg)) + return false; + + if (!lruvec_is_sizable(lruvec, sc)) + return false; + /* see the comment on lru_gen_folio */ gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]); birth = READ_ONCE(lruvec->lrugen.timestamps[gen]); - if (time_is_after_jiffies(birth + min_ttl)) - return false; - - if (!lruvec_is_sizable(lruvec, sc)) - return false; - - mem_cgroup_calculate_protection(NULL, memcg); - - return !mem_cgroup_below_min(NULL, memcg); + return time_is_before_jiffies(birth + min_ttl); } /* to protect the working set of the last N jiffies */ @@ -3970,23 +3994,20 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) { struct mem_cgroup *memcg; unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl); + bool reclaimable = !min_ttl; VM_WARN_ON_ONCE(!current_is_kswapd()); - /* check the order to exclude compaction-induced reclaim */ - if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY) - return; + set_initial_priority(pgdat, sc); memcg = mem_cgroup_iter(NULL, NULL, NULL); do { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); - if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) { - mem_cgroup_iter_break(NULL, memcg); - return; - } + mem_cgroup_calculate_protection(NULL, memcg); - cond_resched(); + if (!reclaimable) + reclaimable = lruvec_is_reclaimable(lruvec, sc, min_ttl); } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); /* @@ -3994,7 +4015,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * younger than min_ttl. However, another possibility is all memcgs are * either too small or below min. */ - if (mutex_trylock(&oom_lock)) { + if (!reclaimable && mutex_trylock(&oom_lock)) { struct oom_control oc = { .gfp_mask = sc->gfp_mask, }; @@ -4786,8 +4807,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec); - mem_cgroup_calculate_protection(NULL, memcg); - + /* lru_gen_age_node() called mem_cgroup_calculate_protection() */ if (mem_cgroup_below_min(NULL, memcg)) return MEMCG_LRU_YOUNG; @@ -4911,32 +4931,6 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc blk_finish_plug(&plug); } -static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc) -{ - int priority; - unsigned long reclaimable; - - if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH) - return; - /* - * Determine the initial priority based on - * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, - * where reclaimed_to_scanned_ratio = inactive / total. - */ - reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE); - if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc)) - reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON); - - /* round down reclaimable and round up sc->nr_to_reclaim */ - priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1); - - /* - * The estimation is based on LRU pages only, so cap it to prevent - * overshoots of shrinker objects by large margins. - */ - sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY); -} - static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc) { struct blk_plug plug;