From patchwork Thu Nov 25 15:18:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12639367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B00E0C433F5 for ; Thu, 25 Nov 2021 15:19:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14BDD6B0075; Thu, 25 Nov 2021 10:19:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FC3D6B0078; Thu, 25 Nov 2021 10:19:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F05C46B007B; Thu, 25 Nov 2021 10:19:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id DDD476B0075 for ; Thu, 25 Nov 2021 10:19:20 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 141758DA2A for ; Thu, 25 Nov 2021 15:19:09 +0000 (UTC) X-FDA: 78847810818.25.7381C6A Received: from outbound-smtp47.blacknight.com (outbound-smtp47.blacknight.com [46.22.136.64]) by imf17.hostedemail.com (Postfix) with ESMTP id 1475DF0001F8 for ; Thu, 25 Nov 2021 15:19:07 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp47.blacknight.com (Postfix) with ESMTPS id D624AFB8B8 for ; Thu, 25 Nov 2021 15:19:04 +0000 (GMT) Received: (qmail 917 invoked from network); 25 Nov 2021 15:19:04 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPA; 25 Nov 2021 15:19:04 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , Vlastimil Babka , Alexey Avramov , Rik van Riel , Mike Galbraith , Darrick Wong , regressions@lists.linux.dev, Linux-fsdevel , Linux-MM , LKML , Mel Gorman Subject: [PATCH 1/1] mm: vmscan: Reduce throttling due to a failure to make progress Date: Thu, 25 Nov 2021 15:18:53 +0000 Message-Id: <20211125151853.8540-1-mgorman@techsingularity.net> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Stat-Signature: iznaz6ak1o15os5t9gcgc6z48ogb4q97 Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.64 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1475DF0001F8 X-HE-Tag: 1637853547-177735 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar problems due to reclaim throttling for excessive lengths of time. In Alexey's case, a memory hog that should go OOM quickly stalls for several minutes before stalling. In Mike and Darrick's cases, a small memcg environment stalled excessively even though the system had enough memory overall. Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made") introduced the problem although commit a19594ca4a8b ("mm/vmscan: increase the timeout if page reclaim is not making progress") made it worse. Systems at or near an OOM state that cannot be recovered must reach OOM quickly and memcg should kill tasks if a memcg is near OOM. To address this, only stall for the first zone in the zonelist, reduce the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if the scan control nr_reclaimed is 0 and kswapd is still active. If kswapd has stopped reclaiming due to excessive failures, do not stall at all so that OOM triggers relatively quickly. Alexey's test case was the most straight forward for i in {1..3}; do tail /dev/zero; done On vanilla 5.16-rc1, this test stalled and was reset after 10 minutes. After the patch, the test gets killed after roughly 15 seconds which is the same length of time taken in 5.15. Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made") Signed-off-by: Mel Gorman Tested-by: Darrick J. Wong Acked-by: Vlastimil Babka Tested-by: Darrick J. Wong Tested-by: Mike Galbraith Tested-by: Alexey Avramov --- mm/vmscan.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index fb9584641ac7..176ddd28df21 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1057,7 +1057,17 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason) break; case VMSCAN_THROTTLE_NOPROGRESS: - timeout = HZ/2; + timeout = 1; + + /* + * If kswapd is disabled, reschedule if necessary but do not + * throttle as the system is likely near OOM. + */ + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) { + cond_resched(); + return; + } + break; case VMSCAN_THROTTLE_ISOLATED: timeout = HZ/50; @@ -3395,7 +3405,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc) return; /* Throttle if making no progress at high prioities. */ - if (sc->priority < DEF_PRIORITY - 2) + if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed) reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS); } @@ -3415,6 +3425,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) unsigned long nr_soft_scanned; gfp_t orig_mask; pg_data_t *last_pgdat = NULL; + pg_data_t *first_pgdat = NULL; /* * If the number of buffer_heads in the machine exceeds the maximum @@ -3478,14 +3489,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) /* need some check for avoid more shrink_zone() */ } + if (!first_pgdat) + first_pgdat = zone->zone_pgdat; + /* See comment about same check for global reclaim above */ if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; shrink_node(zone->zone_pgdat, sc); - consider_reclaim_throttle(zone->zone_pgdat, sc); } + consider_reclaim_throttle(first_pgdat, sc); + /* * Restore to original mask to avoid the impact on the caller if we * promoted it to __GFP_HIGHMEM.