From patchwork Tue Feb 25 14:15:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 11403961 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2520D13A4 for ; Tue, 25 Feb 2020 14:15:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E677F218AC for ; Tue, 25 Feb 2020 14:15:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E677F218AC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9A7876B0007; Tue, 25 Feb 2020 09:15:40 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8478D6B000A; Tue, 25 Feb 2020 09:15:40 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66DBD6B0008; Tue, 25 Feb 2020 09:15:40 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0171.hostedemail.com [216.40.44.171]) by kanga.kvack.org (Postfix) with ESMTP id 4BE936B0007 for ; Tue, 25 Feb 2020 09:15:40 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F1E8E18034A7A for ; Tue, 25 Feb 2020 14:15:39 +0000 (UTC) X-FDA: 76528847598.15.joke70_143f76b6f7853 X-Spam-Summary: 2,0,0,8a92aedde1102b31,d41d8cd98f00b204,mgorman@techsingularity.net,,RULES_HIT:2:41:69:355:379:541:800:960:966:968:973:988:989:1260:1345:1359:1437:1535:1605:1606:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2693:2731:2890:2898:2916:3138:3139:3140:3141:3142:3834:3865:3866:3867:3868:3870:3871:3872:3874:4042:4117:4250:4321:4385:4560:4605:5007:6119:6261:6630:7903:7904:8634:8784:9592:10004:10026:10128:11026:11537:11658:11914:12043:12198:12291:12296:12297:12438:12517:12519:12522:12555:12660:12679:12683:12895:12986:13146:13149:13161:13229:13230:14093:14394:21063:21080:21433:21450:21451:21611:21627:21740:21796:21939:21966:21990:30001:30036:30045:30054:30060:30070:30074,0,RBL:46.22.139.15:@techsingularity.net:.lbl8.mailshell.net-62.2.114.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: joke70_143f76b6f7853 X-Filterd-Recvd-Size: 6613 Received: from outbound-smtp10.blacknight.com (outbound-smtp10.blacknight.com [46.22.139.15]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Feb 2020 14:15:38 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp10.blacknight.com (Postfix) with ESMTPS id 749BC1C35C9 for ; Tue, 25 Feb 2020 14:15:37 +0000 (GMT) Received: (qmail 2025 invoked from network); 25 Feb 2020 14:15:37 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.18.57]) by 81.17.254.9 with ESMTPA; 25 Feb 2020 14:15:37 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , Vlastimil Babka , Ivan Babrou , Rik van Riel , Linux-MM , Linux Kernel Mailing List , Mel Gorman Subject: [PATCH 3/3] mm, vmscan: Do not reclaim for boosted watermarks at high priority Date: Tue, 25 Feb 2020 14:15:34 +0000 Message-Id: <20200225141534.5044-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200225141534.5044-1-mgorman@techsingularity.net> References: <20200225141534.5044-1-mgorman@techsingularity.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ivan Babrou reported the following (slightly paraphrased) Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") introduced undesired effects in our environment. * NUMA with 2 x CPU * 128GB of RAM * THP disabled * Upgraded from 4.19 to 5.4 Before we saw free memory hover at around 1.4GB with no spikes. After the upgrade we saw some machines decide that they need a lot more than that, with frequent spikes above 10GB, often only on a single numa node. We can see kswapd quite active in balance_pgdat (it didn't look like it slept at all): $ ps uax | fgrep kswapd root 1850 23.0 0.0 0 0 ? R Jan30 1902:24 [kswapd0] root 1851 1.8 0.0 0 0 ? S Jan30 152:16 [kswapd1] This in turn massively increased pressure on page cache, which did not go well to services that depend on having a quick response from a local cache backed by solid storage. Rik van Riel indicated that he had observed something similar. Details are sparse but the bulk of the excessive reclaim activity appears to be on node 0. My belief is that on node 0, a DMA32 or DMA zone can get boosted but vmscan then reclaims from higher zones until the boost is removed. While we could apply the reclaim to just the lower zones, it would result in a lot of pages skipped during scanning. Watermark boosting is inherently optimisitc and is only applied to reduce the possibility of pageblocks being mixed further in the future so high-order allocations are both more likely to succeed and be allocated with lower latency. It was not intended that it reclaim the world ever. This patch limits watermark boosting. If reclaim reaches a higher priority then reclaim based on watermark boosting is aborted. Unfortunately, the bug reporters are not in the position to actually test this but it makes sense that watermark boosting be aborted quickly when reclaim is not making progress given that boosting was never intended to reclaim or scan excessively. While I could not reproduce the problem locally, this was compared against a vanilla kernel and one with watermark boosting disabled. The test results still indicate that this helps the workloads addressed by 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") although the behaviour of THP allocation has changed since making a direct comparison problematic. At worst, this patch will mitigate a problem when watermarks are persistently boosted. Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Mel Gorman --- mm/vmscan.c | 46 +++++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 876370565455..40c9e48dc542 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3449,6 +3449,25 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx) return false; } +static void acct_boosted_reclaim(pg_data_t *pgdat, int classzone_idx, + unsigned long *zone_boosts) +{ + struct zone *zone; + unsigned long flags; + int i; + + for (i = 0; i <= classzone_idx; i++) { + if (!zone_boosts[i]) + continue; + + /* Increments are under the zone lock */ + zone = pgdat->node_zones + i; + spin_lock_irqsave(&zone->lock, flags); + zone->watermark_boost -= min(zone->watermark_boost, zone_boosts[i]); + spin_unlock_irqrestore(&zone->lock, flags); + } +} + /* Clear pgdat state for congested, dirty or under writeback. */ static void clear_pgdat_congested(pg_data_t *pgdat) { @@ -3641,9 +3660,17 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) if (!nr_boost_reclaim && balanced) goto out; - /* Limit the priority of boosting to avoid reclaim writeback */ - if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2) - raise_priority = false; + /* + * Abort boosting if reclaiming at higher priority is not + * working to avoid excessive reclaim due to lower zones + * being boosted. + */ + if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2) { + acct_boosted_reclaim(pgdat, classzone_idx, zone_boosts); + boosted = false; + nr_boost_reclaim = 0; + goto restart; + } /* * Do not writeback or swap pages for boosted reclaim. The @@ -3725,18 +3752,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) out: /* If reclaim was boosted, account for the reclaim done in this pass */ if (boosted) { - unsigned long flags; - - for (i = 0; i <= classzone_idx; i++) { - if (!zone_boosts[i]) - continue; - - /* Increments are under the zone lock */ - zone = pgdat->node_zones + i; - spin_lock_irqsave(&zone->lock, flags); - zone->watermark_boost -= min(zone->watermark_boost, zone_boosts[i]); - spin_unlock_irqrestore(&zone->lock, flags); - } + acct_boosted_reclaim(pgdat, classzone_idx, zone_boosts); /* * As there is now likely space, wakeup kcompact to defragment