From patchwork Thu Mar 13 21:05:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 14015913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7FDCC35FF3 for ; Thu, 13 Mar 2025 21:07:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EA92280003; Thu, 13 Mar 2025 17:07:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 272AC280001; Thu, 13 Mar 2025 17:07:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 09E15280003; Thu, 13 Mar 2025 17:07:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C2D24280001 for ; Thu, 13 Mar 2025 17:06:59 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 581B11A05A7 for ; Thu, 13 Mar 2025 21:07:01 +0000 (UTC) X-FDA: 83217762642.20.8B05E88 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf02.hostedemail.com (Postfix) with ESMTP id 5D28980019 for ; Thu, 13 Mar 2025 21:06:59 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=hiGBklwu; spf=pass (imf02.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.178 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741900019; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jrFqfcieZOq9at8nTRvNC7n7QX7bSrdSSf1BjUE4qFU=; b=mbixEgfY5awJ0/sNEKGdc1FEj22Em2mWVFGEB/Tyg5fgvpyhjEZXXvsvSlDZ0Yty4xQWI4 wZB/IKZxh63hdBeZ5ykykbdklV5GTaOyAXuuqLyjttsiBX046nY/RQsn2h1BaQTB2qyOj3 T+3YTr8vXynEBIXEuJwcXknILWCLavU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741900019; a=rsa-sha256; cv=none; b=OsIkIa2j+HASbi1Ubog4oWiRTXIxVySSljBUAv0xatwyQYn3/ICrfd8hrkEa4lu+4Ydldy Yy4GAXahuzyY8bRM69Vw8O4ut74LAaetZMw+jDht6w2VIuhY4JGpy9E6Prtp6xrMoXIBip 5Sn2kkCSMltbctHPxWdGaNuz2dabvSE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=hiGBklwu; spf=pass (imf02.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.178 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-47698757053so16828971cf.0 for ; Thu, 13 Mar 2025 14:06:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1741900018; x=1742504818; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jrFqfcieZOq9at8nTRvNC7n7QX7bSrdSSf1BjUE4qFU=; b=hiGBklwuxHj4uXT7GTcZny+y8fxQiq+ZbXcQ4BEzXdPwq8vGwzML+Z0JHr4Gv02j1d 6m8hwEdN3dSzZ/hS87971MnoBVWywhONpvy5M8lWTSokaWPFklsQpHfO2xSjCngIKrV2 qf+I1b5y5Fq5sHhpnv+TfExlstCxpUp+tqnuk9jEKipn42pq/FWmHiGxzV4OWHRweZPG Fuj59JwrJU8BX6o5vpS4cSAy2cUUjGoBHs65lrfT8mTsDZnrkdcHeyQ+p3R0vgQAiBkR nMdUpJqsI2gp1MNsWFitj5TmhCW+NrQk8PAqpOhajqBVhTltmqpK7NRukG/BQ9LZjI6j bKlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741900018; x=1742504818; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jrFqfcieZOq9at8nTRvNC7n7QX7bSrdSSf1BjUE4qFU=; b=jgcHTY30EcxCVKCgTXsZS9vlYGr6Fgglyil3N+r3gra6BKHVtKenrpBtLVst7HAhYk v4xkx5WCigOQBHY68LJIBhMfHmkuvhDH6/LZ5FO9PGhVT3TkHSfAE9MsowdzyqcNmbBQ lTYpx9VY3pOKrSmqbhYK9+GJnCORogz0dIEivJOd/jlYt5vRKQgjX6H+G52ZSpMXHccC +Bs2oA5KPAWHS8IXQgbOlTgUDaFmQ1RAlkBJNwwFgloOpfORwQGoEYs3GybcmCcI8nRt Id6B+PI9reH04HOmjKK7yxdiOGF6asze55uvIAuGrD5J9ZvsLbHGLCgHePPSFkLFluTf o5dA== X-Forwarded-Encrypted: i=1; AJvYcCUG52+kdzM2IDy+ZWg4ExqntBrJl3I70eZoPMueJiUsLt5daBN74BgYwUaNHxB59h/JnfrxrfwkqA==@kvack.org X-Gm-Message-State: AOJu0Yyo9AAAIE+SPHi14FJ9S5veT3/jYRfbwCPoZ9127XDMIs+tP6+M ls5aaWUvj/JIZiI54pl/+kU0QsJhHRq+3EXfDwBqh8QGaFRh0/Z1D3CofdEJCeB8UReXx41SXFl + X-Gm-Gg: ASbGnctVbN8SeqqcImqbSULREHNhTXANzeFUCaP1qQB/oTsv8OHF8wKZqn+2Il6pQ7v qjxjMwv2q8s2F6QsRpnS5Cdo2t6A454AExA9WdjH+07ea8hTSudbZe5oxdfgULrYhGKxrT217hC jdZCk3Wuo7t2qeUC5fQQT8fXKDZOajqdb6ZRlhoUpVAV0+TYc4i1lpXNR8S3aBsQWpMluFkea9h yr+NVVI3t4MTTqFmB90l4JS1oLgQpHWDUGLzoSgeFz8GPnEO7Acoy57Un7cS24P9yoor9cg1oHK epo6FQ0SmSGWfedtyzsPj0+LVTI4/7MhaDjASnUhfvo= X-Google-Smtp-Source: AGHT+IEDyN0M6bIcmPwzaon/X4MJTAlHo+m+s2WcqlwJf5d95cUnnQdEITXcrYwXBCRzts6TlH0+Zw== X-Received: by 2002:a05:622a:45:b0:476:84c0:4864 with SMTP id d75a77b69052e-476c8152dfbmr1632161cf.31.1741900018472; Thu, 13 Mar 2025 14:06:58 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6eade231256sm13909986d6.32.2025.03.13.14.06.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Mar 2025 14:06:56 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Mel Gorman , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers Date: Thu, 13 Mar 2025 17:05:32 -0400 Message-ID: <20250313210647.1314586-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313210647.1314586-1-hannes@cmpxchg.org> References: <20250313210647.1314586-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 5D28980019 X-Stat-Signature: doh3ggygco4ewtsawqtatqrwde4wptq6 X-HE-Tag: 1741900019-717147 X-HE-Meta: U2FsdGVkX1+jPQ0swBKjmAq1AlZ4pYTtQf+AJ87m3NRyYZTceZ14Q10l0DYh0Mfg1Y+LuZczHKPcGC5IXp9xi0eM4xChfDXnftPdyFOasfivmjtjf4I0CYQrpx/wTQtFUTBEPnV7/eOPgZRpRIwZ3vjeaadslPVIpW64DG8ysoIH4nLEI6fqsF+C5pXmn/JAHn3EI7Zm2CyPeXWW3EGzMfQGa37TWLItWHGkfiKMUOhouIC15rr6jEdAOCZKoSappEk6hQOvLn8t9GeV1Ysdi70WxIkGgQbORUpuES/vmin0ls8s94uP7r8kzhVBmbBkSqKBZNT6Vy5TwNIa9H6Lc+rDa0I19/wS1GDTrXbRoIrQgy8EQAhJ8hzYbC/KD35uFDun+b9xqlEH03GBYaNZppIc6xLHzhd6/W5kegP/LGHS6cWYm5Fdp/KUQZRslhxan2+7IMJ4UtUD2gN4GsIdTpUTJLKyOnBWkaAGFc5T++Wb8GDZIH6pvXnD2965GLzPH0E531P2p29vlBFTMrSqQR0mLKShCtsCETJwkXHpSIqj0AUwxkIMCypQtjn6Qv0qtFXlE1wCcI9+PRmsaGb+6AwfK7/3G/rwbZajz7Nb2z+vKphG8g4h3awwFqvXgNGqZNn4qQQ8wdpJNLbT7RXj8HVebjxur3Bpokko4GDCBu4v2fsbzDTxXHRQbSfqZiICmjQnAiAWFwK9fetY7PYciP/WdAeQXO2pDRdLlhHBzykGhfCd18POrLJZtyml5wmEmDsnqsduZBK/HJVymw5o9lFVJMClW3eYSL/XvTlr4mbKPIuLktXE2EnWdKndascVxXlKkAXAHaeMbfIBiRsQxBSnkzHbeDDYPQCr1TOdGC0x3x4tGYib7HGUcv2epGig16fM2AHlEUSehF721KePJ/oRL8jKvsD6+eAy9khoCfDQ/IXEigg/GTSVXX5lmqKpcKpUa5E0eahxKb1YkTK XwVkmN2z 5t/OUmBgruyMDSbPdostIr/Y18S4Mru5a9A3DQl5qx2kCaxPUossh/u1HN0GX6z4pI3/oQdu8SHflNq4PZaRqaG1+upYV7UyBHRfcLxTQbf7y5FB/EHzFigmM4Jbo7QiceYMIUVDgZMxhZi6ML2YQKiNTVybqUhUezWi0ZKMAyCIRw7EyIsNhHRgpHhnZLBfbR4+2gfvrPBJLto59SlfFPkvoMF+4YcxQ0h62NpTuZgFQFbvOz6vkN2VnlF7X9QzpZ8Ws2stRxLlDeS1yGKPkYRdC4DS6kQ/8odZOZq84t+5vh08xPqTsQXR3YBCM1lKpEjHZ37S8QySjDYdt/gqPFb0O8FXJncwVhOibz+NCnU5ihd62seoO2xHjtKvesgdyrjIMG+cNV15MmvB1umsplZWIWXCHjc6HK0Y+JG3UDXzmhbcZz346rMOckA1ALmbJt7xk1sc+KOAqRu2tEFoXA/AAJFzH3dSKMlheTFImaTlIlaQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000756, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: compaction_suitable() hardcodes the min watermark, with a boost to the low watermark for costly orders. However, compaction_ready() requires order-0 at the high watermark. It currently checks the marks twice. Make the watermark a parameter to compaction_suitable() and have the callers pass in what they require: - compaction_zonelist_suitable() is used by the direct reclaim path, so use the min watermark. - compact_suit_allocation_order() has a watermark in context derived from cc->alloc_flags. The only quirk is that kcompactd doesn't initialize cc->alloc_flags explicitly. There is a direct check in kcompactd_do_work() that passes ALLOC_WMARK_MIN, but there is another check downstack in compact_zone() that ends up passing the unset alloc_flags. Since they default to 0, and that coincides with ALLOC_WMARK_MIN, it is correct. But it's subtle. Set cc->alloc_flags explicitly. - should_continue_reclaim() is direct reclaim, use the min watermark. - Finally, consolidate the two checks in compaction_ready() to a single compaction_suitable() call passing the high watermark. There is a tiny change in behavior: before, compaction_suitable() would check order-0 against min or low, depending on costly order. Then there'd be another high watermark check. Now, the high watermark is passed to compaction_suitable(), and the costly order-boost (low - min) is added on top. This means compaction_ready() sets a marginally higher target for free pages. In a kernelbuild + THP pressure test, though, this didn't show any measurable negative effects on memory pressure or reclaim rates. As the comment above the check says, reclaim is usually stopped short on should_continue_reclaim(), and this just defines the worst-case reclaim cutoff in case compaction is not making any headway. Signed-off-by: Johannes Weiner --- include/linux/compaction.h | 5 ++-- mm/compaction.c | 52 ++++++++++++++++++++------------------ mm/vmscan.c | 26 ++++++++++--------- 3 files changed, 45 insertions(+), 38 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 7bf0c521db63..173d9c07a895 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -95,7 +95,7 @@ extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, struct page **page); extern void reset_isolation_suitable(pg_data_t *pgdat); extern bool compaction_suitable(struct zone *zone, int order, - int highest_zoneidx); + unsigned long watermark, int highest_zoneidx); extern void compaction_defer_reset(struct zone *zone, int order, bool alloc_success); @@ -113,7 +113,8 @@ static inline void reset_isolation_suitable(pg_data_t *pgdat) } static inline bool compaction_suitable(struct zone *zone, int order, - int highest_zoneidx) + unsigned long watermark, + int highest_zoneidx) { return false; } diff --git a/mm/compaction.c b/mm/compaction.c index 550ce5021807..036353ef1878 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2382,40 +2382,42 @@ static enum compact_result compact_finished(struct compact_control *cc) } static bool __compaction_suitable(struct zone *zone, int order, - int highest_zoneidx, - unsigned long wmark_target) + unsigned long watermark, int highest_zoneidx, + unsigned long free_pages) { - unsigned long watermark; /* * Watermarks for order-0 must be met for compaction to be able to * isolate free pages for migration targets. This means that the - * watermark and alloc_flags have to match, or be more pessimistic than - * the check in __isolate_free_page(). We don't use the direct - * compactor's alloc_flags, as they are not relevant for freepage - * isolation. We however do use the direct compactor's highest_zoneidx - * to skip over zones where lowmem reserves would prevent allocation - * even if compaction succeeds. - * For costly orders, we require low watermark instead of min for - * compaction to proceed to increase its chances. + * watermark have to match, or be more pessimistic than the check in + * __isolate_free_page(). + * + * For costly orders, we require a higher watermark for compaction to + * proceed to increase its chances. + * + * We use the direct compactor's highest_zoneidx to skip over zones + * where lowmem reserves would prevent allocation even if compaction + * succeeds. + * * ALLOC_CMA is used, as pages in CMA pageblocks are considered - * suitable migration targets + * suitable migration targets. */ - watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? - low_wmark_pages(zone) : min_wmark_pages(zone); watermark += compact_gap(order); + if (order > PAGE_ALLOC_COSTLY_ORDER) + watermark += low_wmark_pages(zone) - min_wmark_pages(zone); return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx, - ALLOC_CMA, wmark_target); + ALLOC_CMA, free_pages); } /* * compaction_suitable: Is this suitable to run compaction on this zone now? */ -bool compaction_suitable(struct zone *zone, int order, int highest_zoneidx) +bool compaction_suitable(struct zone *zone, int order, unsigned long watermark, + int highest_zoneidx) { enum compact_result compact_result; bool suitable; - suitable = __compaction_suitable(zone, order, highest_zoneidx, + suitable = __compaction_suitable(zone, order, highest_zoneidx, watermark, zone_page_state(zone, NR_FREE_PAGES)); /* * fragmentation index determines if allocation failures are due to @@ -2453,6 +2455,7 @@ bool compaction_suitable(struct zone *zone, int order, int highest_zoneidx) return suitable; } +/* Used by direct reclaimers */ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, int alloc_flags) { @@ -2475,8 +2478,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, */ available = zone_reclaimable_pages(zone) / order; available += zone_page_state_snapshot(zone, NR_FREE_PAGES); - if (__compaction_suitable(zone, order, ac->highest_zoneidx, - available)) + if (__compaction_suitable(zone, order, min_wmark_pages(zone), + ac->highest_zoneidx, available)) return true; } @@ -2513,13 +2516,13 @@ compaction_suit_allocation_order(struct zone *zone, unsigned int order, */ if (order > PAGE_ALLOC_COSTLY_ORDER && async && !(alloc_flags & ALLOC_CMA)) { - watermark = low_wmark_pages(zone) + compact_gap(order); - if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx, - 0, zone_page_state(zone, NR_FREE_PAGES))) + if (!__zone_watermark_ok(zone, 0, watermark + compact_gap(order), + highest_zoneidx, 0, + zone_page_state(zone, NR_FREE_PAGES))) return COMPACT_SKIPPED; } - if (!compaction_suitable(zone, order, highest_zoneidx)) + if (!compaction_suitable(zone, order, watermark, highest_zoneidx)) return COMPACT_SKIPPED; return COMPACT_CONTINUE; @@ -3082,6 +3085,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) .mode = MIGRATE_SYNC_LIGHT, .ignore_skip_hint = false, .gfp_mask = GFP_KERNEL, + .alloc_flags = ALLOC_WMARK_MIN, }; enum compact_result ret; @@ -3100,7 +3104,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) continue; ret = compaction_suit_allocation_order(zone, - cc.order, zoneid, ALLOC_WMARK_MIN, + cc.order, zoneid, cc.alloc_flags, false); if (ret != COMPACT_CONTINUE) continue; diff --git a/mm/vmscan.c b/mm/vmscan.c index 2bc740637a6c..3370bdca6868 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5890,12 +5890,15 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, /* If compaction would go ahead or the allocation would succeed, stop */ for_each_managed_zone_pgdat(zone, pgdat, z, sc->reclaim_idx) { + unsigned long watermark = min_wmark_pages(zone); + /* Allocation can already succeed, nothing to do */ - if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), + if (zone_watermark_ok(zone, sc->order, watermark, sc->reclaim_idx, 0)) return false; - if (compaction_suitable(zone, sc->order, sc->reclaim_idx)) + if (compaction_suitable(zone, sc->order, watermark, + sc->reclaim_idx)) return false; } @@ -6122,22 +6125,21 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) sc->reclaim_idx, 0)) return true; - /* Compaction cannot yet proceed. Do reclaim. */ - if (!compaction_suitable(zone, sc->order, sc->reclaim_idx)) - return false; - /* - * Compaction is already possible, but it takes time to run and there - * are potentially other callers using the pages just freed. So proceed - * with reclaim to make a buffer of free pages available to give - * compaction a reasonable chance of completing and allocating the page. + * Direct reclaim usually targets the min watermark, but compaction + * takes time to run and there are potentially other callers using the + * pages just freed. So target a higher buffer to give compaction a + * reasonable chance of completing and allocating the pages. + * * Note that we won't actually reclaim the whole buffer in one attempt * as the target watermark in should_continue_reclaim() is lower. But if * we are already above the high+gap watermark, don't reclaim at all. */ - watermark = high_wmark_pages(zone) + compact_gap(sc->order); + watermark = high_wmark_pages(zone); + if (compaction_suitable(zone, sc->order, watermark, sc->reclaim_idx)) + return true; - return zone_watermark_ok_safe(zone, 0, watermark, sc->reclaim_idx); + return false; } static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc) From patchwork Thu Mar 13 21:05:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 14015914 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 248D1C35FF1 for ; Thu, 13 Mar 2025 21:07:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DF06280007; Thu, 13 Mar 2025 17:07:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 93FA5280001; Thu, 13 Mar 2025 17:07:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 796BE280007; Thu, 13 Mar 2025 17:07:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 50EB2280001 for ; Thu, 13 Mar 2025 17:07:02 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1363A54F16 for ; Thu, 13 Mar 2025 21:07:04 +0000 (UTC) X-FDA: 83217762768.12.209470C Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by imf25.hostedemail.com (Postfix) with ESMTP id 39E67A0011 for ; Thu, 13 Mar 2025 21:07:02 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=0BHOy9gf; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.177 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741900022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2esPX4cdQSzTzPsUwG+c+ZkpjEsRNEpWoM7zt95z5xg=; b=PbwREosaj75u6ZwuEft3rBDvJK7+2q3vJo+9uWXII/Lvzsrtax4hGf4OGKX7T4UEcvkiag J8JhMTeV+XVe+PyStFt2uq4zhgGw5HRAeO+hn0XoHwlQkJcnX9VmgsIaIO6Vsv/GmdVoYx TMPANaezodmKCfLFIx1R4U60veubDto= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741900022; a=rsa-sha256; cv=none; b=k/HLpCr1nAAB6iIDm7ljnMg5Re3ZdkEWriHYCp7AmROQIc5YxQ0LjGoNNQ8zcgOU/aMyVG zSnmy8gOLtaj1Hnf2smP30xUf34EpBkT04JtWEVyy9eVPQ6ZXlOJR6l2CuovIsoYoBpAEN tnN5oafQKFZUcxWtSsrv0t0Y5vP98JA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=0BHOy9gf; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.177 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-7c08fc20194so235398385a.2 for ; Thu, 13 Mar 2025 14:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1741900021; x=1742504821; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2esPX4cdQSzTzPsUwG+c+ZkpjEsRNEpWoM7zt95z5xg=; b=0BHOy9gfZV8CVNvyf9lHKvW5KrJhtRWWq/CF0ZcFn30fDVrqRoadTq194l41e8U8K2 6V28xgZE0C46q1eW0SniP0d6UweuRNg4xVBZxK1Qg+6b9AFdtCbkNtXY3FSTgk9zimB2 d2bAo41KmCrQ1xvXRxunVRPAWO/Q/8aORCKXjHvkJZWD7KNxiJWC5ZTx1Ef0TvQvMVXR LwWdmnRJQSkFLlqbWCb7Uu8e3FMnrY/hFvTtAXU1NwS9eeyl30UqQ9MpUeQm0McQsviF D4i/UWHE5thg7qhn2xBTKX39M8D+Od4BtH5esa2GD5Xe6ngMuUwpEzGW0iw3QJ9CHRU7 Ykdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741900021; x=1742504821; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2esPX4cdQSzTzPsUwG+c+ZkpjEsRNEpWoM7zt95z5xg=; b=Tn3XwiRxzXxjVGnWMdSaOc3AsOhcxKFtgCBWE17toaYhrI2/nyYCEiI6Y7I7WPHfjt 7y7a1zV5YnnwioFpnRT+enMtwa480GvWGbvHtQuTheutKknfSrPSUjQsCLsFs1R6lw7Y 459k2xhdDbIv5BZIJ/uFpKrWqK8p6kH3F1IPEUzuMEp0ZCJy63bBCmhOV/w4lbqgB7FK ZZi/jACh3fiCzA/pNqq5pKvAt++kEyb7MG75mCCu0+US8ic4eSV81khUVYzl/K2ZXyw5 0lY2MX5l0dCcyCZX7eOGDXPfmw4AHu2y85CS2FMq8pGIwMiktYtyfUfgsVGkybhYVDy+ Vhag== X-Forwarded-Encrypted: i=1; AJvYcCX7mZ9Tp5M4/XeKmT2mx9BKAbWH5xSnj23Ezob33ibwVbTVTaeknA75hjGRo6NlNrVM8BKlsskILg==@kvack.org X-Gm-Message-State: AOJu0YxAuvnMzw16DuGV/sbLuGo8N+nKWR89vdsX8AWROhn1HTMcWgUP 2JNQZ4ZwzFPD14BMPTc4HEuidWcl5VAh7scZA4IWO1U6vWGOmoBsQ7KtOAj6fw8= X-Gm-Gg: ASbGncv8V+WR9YnRraBany3RK5gLP42qIy2FuRiCkdI7y31GCtL4yY0BufnfoYFvcuP Kyvp1/ysaZUjl4PeKYF2JSZf0Q91PhtMY4+tsOaT5IPDyUqh/VI3Tzdtqy6D66xG1WBzlsriXYb mm7mg2b1sYGKxKuqWK3yFNmEoyH+wJ4Hd8WniwgT7UpkQ0q6oIW+ohiQR+YZCg2iYqK8HJvmOq1 EIXciXqhL18HsA0OYKjORcRJUinn6he4BKfM8LX9EmEDnmBSDB6DyaV4W+vSMqVGrrc1HJ1+pjq MP3LXCeqc+ejoyrv0Q39GdPdxmF/Zn51kBsW9s6bpCo= X-Google-Smtp-Source: AGHT+IEDki5koJiqxSP81S0xIyeu1/cjXxLH4CEnBdYSjdRM2zBmEa/+ycr0rjzgyK/cpuSXIx1wnA== X-Received: by 2002:a05:620a:24e:b0:7c5:4194:bbd3 with SMTP id af79cd13be357-7c579f96ff8mr167000285a.44.1741900021237; Thu, 13 Mar 2025 14:07:01 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id af79cd13be357-7c573d6fb1fsm142597585a.80.2025.03.13.14.06.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Mar 2025 14:06:59 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Mel Gorman , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/5] mm: page_alloc: trace type pollution from compaction capturing Date: Thu, 13 Mar 2025 17:05:33 -0400 Message-ID: <20250313210647.1314586-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313210647.1314586-1-hannes@cmpxchg.org> References: <20250313210647.1314586-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 39E67A0011 X-Stat-Signature: bjg1eh7wjnbq9tst3ui49b859r3ey49r X-HE-Tag: 1741900022-359512 X-HE-Meta: U2FsdGVkX1+pKFs6eWdKU7Qf27k/vR/mQgvT2wa6go9fwV2UCnClm64bQY6Hk/SzhpSQhnHmH69LHdVHrBGMnLp8/K4K8x/6VAlA8dgfuNLbZ/DwMwHF7FNzfVdSE5kgJmJZmbmN0S6XapR1NXzCxcQTFBeEds11OMByXYKCTFAnv+yOWKk26GQ/C6mTCrXPs5kgmBo/AA2zEXHfn7XeP1DzAI/Wnewcu9T6VTPtNAWiGjJWWwRtvVYu5WXU5dw3yiAz0xfqnhlPkghHK7ITpFIpOsSz9Q8JEjoNhXO5womjuSYgM6bfyq20yaaKg8Zgu6/lnZSuqeDebOb8N/JgVtJpkv7BkAYd+kNf/Yo1Aa6WCXqXAo2TP+B7/gSBVRH45J7SZvjXxuiigB7+MDg9ynwzEcw16vqwCEaIYrelQkvrC+f+hS2R4iGgKT4qhanOz/JzVa6zeaILAkOClDw05nlihuJJK9S3UYjnd3rFJ6qCf3O8y8CDydWhBGRzgMF/g+Gu3Z8F7ErG3GC7oWt+BLCFT0J0UBz47ByqBWYIWwR7vM38WqD4CDw5S2QRw98jQvEDBlr19a8Gs6aMHbIC0nEMxjV5PpIQZHB0LOu0sa2tfOoR7atrr59TCBnpX9X5aOh/TjU9IXRCTVh+B/s8eLmTVvSmuAGiGnKMZ6O6+M+xWAc+e5I5jS1/I5ozkJ1SFElp5Uzn1NukUV+wrZOPmSpUOTE6ErhSrmGsN0FUFlViu0YqLqjtcUbzy4C/vCQKLEXT1nS8zQSQxgF50OSiioI9gN+gQYezY0HtK7h9gVFVwse/Ux4XERtu5iKGuE3xJjgLvC+pvlKpx/4q50Un+92rD3vMhz6Dc+Dq+nFcpjKFTDCUNCTp4gE5VZT5mWhTZ08i79fgwZ/W4J1iyAY5PPaQzxmhBsS4nK3zMJYwCU/7SDWe/A2JqR3FZuRjGHr6+avngbJ71vgM3B3+51G kpJ+vakY ZUmCv8jplN/T3BsybS2sSfZ+mSdCB+KeVPl5Rg2fZxd0fSufjo4j1T+vYiECLoIpdqwLA6qSMqvpKcsb22PuHrjcx/NW5EO1Rkq5OuNIP6qszuGX7EOZ9zSKVSAl4RyDJ3UfFhFcx08JSFJU3Q3AZGaGmBURkn9kYVJd/0yQDg5pTInspIi6H/XOyCqeAdSY+QWYMgolhCGRlPirEzhUu/a/09i1uscohGy8TpprEBtcoBkbS7KjAOp/Y9ItFBfTfxPs7db2Jmc+mtJLsuynMJBJDiKTRVB3fAmSa1DGAjXg2iGwazOfnF+smfZA5jz3RQFJYnUkC7T5OMIoU3akC+71N/RHpyYDS0Dlxu7GaDxfNRtVk054oso+fNIUtzdZ1n7b9Ey4nIjdlZJyBYN8KYgjOOFEFYi0miwmqwg6e+mABrsV2Nc9OjH+dLYwiV5SfmBZhDfG5PFPtGTeoaroPubCixqUXI8pieeM/LpY17/lwUyo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When the page allocator places pages of a certain migratetype into blocks of another type, it has lasting effects on the ability to compact and defragment down the line. For improving placement and compaction, visibility into such events is crucial. The most common case, allocator fallbacks, is already annotated, but compaction capturing is also allowed to grab pages of a different type. Extend the tracepoint to cover this case. Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9b4a5e6dfee9..6f0404941886 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -614,6 +614,10 @@ compaction_capture(struct capture_control *capc, struct page *page, capc->cc->migratetype != MIGRATE_MOVABLE) return false; + if (migratetype != capc->cc->migratetype) + trace_mm_page_alloc_extfrag(page, capc->cc->order, order, + capc->cc->migratetype, migratetype); + capc->page = page; return true; } From patchwork Thu Mar 13 21:05:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 14015915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89B92C282DE for ; Thu, 13 Mar 2025 21:07:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5A3E280008; Thu, 13 Mar 2025 17:07:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE4C0280001; Thu, 13 Mar 2025 17:07:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7332280008; Thu, 13 Mar 2025 17:07:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8E8C5280001 for ; Thu, 13 Mar 2025 17:07:05 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 55A0454EF0 for ; Thu, 13 Mar 2025 21:07:07 +0000 (UTC) X-FDA: 83217762894.06.FFA3AD2 Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf23.hostedemail.com (Postfix) with ESMTP id F0AFA140017 for ; Thu, 13 Mar 2025 21:07:04 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=iFU7D3ib; spf=pass (imf23.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.44 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741900025; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sZHRsjhwN4k6tUsXlsg8AVWAXyHTwCMhLnhjRmadEFs=; b=VlUpPpgVjpqlv4ICi/Y2A2G0mWVm/Im4ermG9wo4XQGNKQmbceBBXBjXcrhDymUnUTNV4B eYuzpHPcufzZvTQvEDtvzmdGBRDhC1zmkDDUECpDQgv4+Fxto1nbGTLNPGdyugHkf2+YS7 ilkRhRQ+5kpgduK4thv7Okwf69YceFk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741900025; a=rsa-sha256; cv=none; b=Lo380wSDQdRWAT7E9nBEVcacSzRcEtwzaqmC4Jjh4IUxbeMQRoO7gaWh1B1tLtP8sGgKzE 2ykvxJWOnLSa9Ig/rKWN+QODlTLKHtHrkXx1+Sb4EEGSmKDZpvnhXFeCzXteWxGXJoBjb9 +LzajhRtgybWdqKfSSJ4V06iraO9BLU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=iFU7D3ib; spf=pass (imf23.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.44 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-6ddcff5a823so12826106d6.0 for ; Thu, 13 Mar 2025 14:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1741900024; x=1742504824; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sZHRsjhwN4k6tUsXlsg8AVWAXyHTwCMhLnhjRmadEFs=; b=iFU7D3ibNFAFVyimeKP0HbzQT9ogdeA5In5C6Y+s7mhG/e7jS5ofl3U3zyKaZq8A88 0OTQcJjRWUHsQElr8a3IW72P3ZgnwWR/R4pZMSimvlnNBjNh1oYOSiKTY+VWK/dwpvhB oXZpQc68lbeV+FoovO3ExBaOp/7tR+PDsRN7ElfBLVr3rRxEMFAzFVlGOFUy43UScRKI Y6NIDRaxVCVdyDegQvxcAnlWM3z6ZcN9XSxQ5uB1ugoxGSp1LsGdlUwHvmI6pGB2NHa9 fIr3ALXv14mogeLkafX7+u09kh87JwfeqtvF1u5i14vt/Gc4qYxlipVuvDIWUxHGZ5Zf 2iDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741900024; x=1742504824; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sZHRsjhwN4k6tUsXlsg8AVWAXyHTwCMhLnhjRmadEFs=; b=s6QOMF92yBO/tpnIv0+yM0vw1qzjxXHehhuM2uNSOCBfHOJZ8RPUOuzKI1YfEsD8v2 BH4LJexBRXz8TTv4T/ZOqKKRid1HtnN7yDCxYJ+XcG9h19NhPb7utGh2WzQK9PfwH27s jXfha3E8yrRbTWrk7pMyIItLan9G+WN38E5b05NNTxNyl/u3Pn6CqkiFcly+4J4rFbaR jY0F6yg7cR7oTCMCeCw8IQ1VVfEkx1cvWlNZdXXaTeBZKD4+HupRKh5GNeZ+KNO4LEmu 7CADvxqUPTYRadKWTA4ZZxlE+s7mAjQ4r2S/djg9rJKjNBw997OMj2VcMeaLXdj84/ti LrKA== X-Forwarded-Encrypted: i=1; AJvYcCUyswks2SJM+sKrFaArLErF0igA/YczZhKYgBtx70M6ENJ4LyVU9Bm1U2zK9ihvVXvbuq/ArEaDVQ==@kvack.org X-Gm-Message-State: AOJu0YwC4IoOTN38QiRk1SZuUnmVE9CTkbn2RzLYBHkzEFVnvS7XZaMu utK7hGiOOCBMpjuvFpzUuM7g2PM1AcK3AHyUOyhbMoCAp59CHY1QSKRHOpcx6zc= X-Gm-Gg: ASbGncu3jXvflG5XyvPKaV4XCs5jvZCEB44zsYXAs8EYhxv+Jls+IoA6Aoi5gPizHwE nMusO4XvGP0f8gtMrcZEUq/Wfil7kECfAnoTDUevA8fep8PrPY57UA/n4L5DTGW+uZCn6IXyxRY aEkEkWmahTnwpeVbmQt5ikFFI6aWHw7d44KNLNOKlbT6k6OhIjYeU81HiZxrSDSkKlXYDruFLto 3xtwu4fAf4LROCTUxrUmqu+jU0KJn6xcRx6ligNdTJvHVY+HkKeB3l0W4qBSwql9YOkNBLglgLk GZRsIdxmKYoGsoaFnFQkx2XeEKhw4AWx1ywAN+EVUh0= X-Google-Smtp-Source: AGHT+IHwO+cZb5MGTiwyC3MJf2gMVF38ko0IXPhHKQ4bI2CQqbDOb0AFS8ZUsuofmUulVbWhY9Mvjw== X-Received: by 2002:a05:6214:5299:b0:6e6:6c39:cb71 with SMTP id 6a1803df08f44-6eae7b3faa0mr22068376d6.45.1741900024009; Thu, 13 Mar 2025 14:07:04 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6eade230de3sm14015326d6.28.2025.03.13.14.07.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Mar 2025 14:07:02 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Mel Gorman , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/5] mm: page_alloc: defrag_mode Date: Thu, 13 Mar 2025 17:05:34 -0400 Message-ID: <20250313210647.1314586-4-hannes@cmpxchg.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313210647.1314586-1-hannes@cmpxchg.org> References: <20250313210647.1314586-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: F0AFA140017 X-Stat-Signature: s35yz78p411qeqzys3haw5rgpcxcnkn1 X-HE-Tag: 1741900024-209815 X-HE-Meta: U2FsdGVkX18Ya/1rvgN5Q+Z8p3FHA5vesyhaLnRfIc6vKCj68WK6S991WK7nrDoz+916on840Gn7PpZeBzewDc8rXhKumVx3izm+OjsFpau8RiYiDGI4OwXkC3g7q3zd1yd8/kyAiqCVUEtnQT8eeSacMP8Zq7Y7OaezGTMa/gjl0iPf6dqdURiPDzlatkoCnwlI6HHEmy7+PSYqonjkEddi6POshXwecSa5NtTbVjbt/WOUO3XEc37C1NCyYhvEbqW7NnkxMkmwFHh00IzBVZvoo1FKL+U8SmJBCMAa5a8KhK4t3s0jgCAvK9F4cZIq99c1xFu8692UWXiFWT+mkighhAbhrBYLmZgt1l7g5yPt+0ZAwt9w3ZB1Q2CV5qbD2asm0UP8UYjRYBVIsC42tpp0T50gW4HG6YP6rdb8Ah/23EdogoJvXq4xrfZMP1Krw7L2FAS/7OlUh8eYk3jO+tV99+1kIWA6uiz5qucyp8kLiz9PCutA5JPnDUH1jRVN6dc+dA1dKRTuRR3qpNXYQpQJJpY4WCzjjWxSGR1Yrpo/TxCqm3+QGrhGUfm4vH0Y1JFAxyP8cPAPpAuoLfjPjkmsdmbuWJGCjexovTp3KLs/xmvp9V4v7plqma2Gvc+yPdW5SxbO1EmPyxWHS/epdwkrLtqq6dC8f8uUo3T33OFwLG7EaLosfo60vXi1VFOO1pmAFgyBNXCOXP/Gq7wEszO4aRWbEuUtDX0arPQC3+8Do7IohnFKPEnmYwvKTI4CmwfWQf2NZxV/XGxhobBgzrxDBjgQqyFyBrj5jIyY+b/zssI205i0rU71GjupzRskzfC2mHwifG4qvSCOCoxDc1FOpmQ7Zp8vtz2OxpC5dtQe21y1zIcmPA/4Pc2C+B2K5PR3nY0Tq4OgR8ewbShPuvPkypBtls2Q6D/Kz4wkbQkxOVd8yfPOFLnRVPq65S+qj6t4F24hDB1/QP2ZR0V 1d0Ck4wf susowRU2CEGKnTNmcLAYRZxwidWMn4QapZ57JgjbWkrPwAmw4Z9YE/EWQKE8lg+lG+sqmV7mXlMrKJMBkhXlk9F7rUDS4XcbrI3GJjoWksyxoJoLhusaR896KFPmOHeMKaj3KEqjFiWBBhl0bVwEQLjwpvdZSseK2Y0Yqy+epHRwgB1okXnyblY6Rre2wjmqNt4VsqYlRd1cNIARO4Nur6aqPKpqB1h7m62cA6d2UxyJcSMq18MQaLTTKiLKcFlGvb7ApdEz7Ime/HrrxF7TgirGrgwl3OoTYz03Kup5dxiGcCD+B8FPuHfeGfXc7JuFXdoY4ytuw4WJAc1BR/K8vbxpWV5QP4xm9rho89NCCcx0PMIGLcY1GAmw2LqknFXtB3tuZKGSm4KpNPn2WLDoxOIYKFx/ZxrlVh7S2wfVqLeyJhJEOOoKu6WS2ScBx3giZJsXS1WOTife+HpNLsIw99kzbOSA6nm9zW/q+2VtCBZI+A7M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The page allocator groups requests by migratetype to stave off fragmentation. However, in practice this is routinely defeated by the fact that it gives up *before* invoking reclaim and compaction - which may well produce suitable pages. As a result, fragmentation of physical memory is a common ongoing process in many load scenarios. Fragmentation deteriorates compaction's ability to produce huge pages. Depending on the lifetime of the fragmenting allocations, those effects can be long-lasting or even permanent, requiring drastic measures like forcible idle states or even reboots as the only reliable ways to recover the address space for THP production. In a kernel build test with supplemental THP pressure, the THP allocation rate steadily declines over 15 runs: thp_fault_alloc 61988 56474 57258 50187 52388 55409 52925 47648 43669 40621 36077 41721 36685 34641 33215 This is a hurdle in adopting THP in any environment where hosts are shared between multiple overlapping workloads (cloud environments), and rarely experience true idle periods. To make THP a reliable and predictable optimization, there needs to be a stronger guarantee to avoid such fragmentation. Introduce defrag_mode. When enabled, reclaim/compaction is invoked to its full extent *before* falling back. Specifically, ALLOC_NOFRAGMENT is enforced on the allocator fastpath and the reclaiming slowpath. For now, fallbacks are permitted to avert OOMs. There is a plan to add defrag_mode=2 to prefer OOMs over fragmentation, but this requires additional prep work in compaction and the reserve management to make it ready for all possible allocation contexts. The following test results are from a kernel build with periodic bursts of THP allocations, over 15 runs: vanilla defrag_mode=1 @claimer[unmovable]: 189 103 @claimer[movable]: 92 103 @claimer[reclaimable]: 207 61 @pollute[unmovable from movable]: 25 0 @pollute[unmovable from reclaimable]: 28 0 @pollute[movable from unmovable]: 38835 0 @pollute[movable from reclaimable]: 147136 0 @pollute[reclaimable from unmovable]: 178 0 @pollute[reclaimable from movable]: 33 0 @steal[unmovable from movable]: 11 0 @steal[unmovable from reclaimable]: 5 0 @steal[reclaimable from unmovable]: 107 0 @steal[reclaimable from movable]: 90 0 @steal[movable from reclaimable]: 354 0 @steal[movable from unmovable]: 130 0 Both types of polluting fallbacks are eliminated in this workload. Interestingly, whole block conversions are reduced as well. This is because once a block is claimed for a type, its empty space remains available for future allocations, instead of being padded with fallbacks; this allows the native type to group up instead of spreading out to new blocks. The assumption in the allocator has been that pollution from movable allocations is less harmful than from other types, since they can be reclaimed or migrated out should the space be needed. However, since fallbacks occur *before* reclaim/compaction is invoked, movable pollution will still cause non-movable allocations to spread out and claim more blocks. Without fragmentation, THP rates hold steady with defrag_mode=1: thp_fault_alloc 32478 20725 45045 32130 14018 21711 40791 29134 34458 45381 28305 17265 22584 28454 30850 While the downward trend is eliminated, the keen reader will of course notice that the baseline rate is much smaller than the vanilla kernel's to begin with. This is due to deficiencies in how reclaim and compaction are currently driven: ALLOC_NOFRAGMENT increases the extent to which smaller allocations are competing with THPs for pageblocks, while making no effort themselves to reclaim or compact beyond their own request size. This effect already exists with the current usage of ALLOC_NOFRAGMENT, but is amplified by defrag_mode insisting on whole block stealing much more strongly. Subsequent patches will address defrag_mode reclaim strategy to raise the THP success baseline above the vanilla kernel. Signed-off-by: Johannes Weiner --- Documentation/admin-guide/sysctl/vm.rst | 9 +++++++++ mm/page_alloc.c | 27 +++++++++++++++++++++++-- 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index ec6343ee4248..e169dbf48180 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -29,6 +29,7 @@ files can be found in mm/swap.c. - compaction_proactiveness - compaction_proactiveness_leeway - compact_unevictable_allowed +- defrag_mode - dirty_background_bytes - dirty_background_ratio - dirty_bytes @@ -162,6 +163,14 @@ On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due to compaction, which would block the task from becoming active until the fault is resolved. +defrag_mode +=========== + +When set to 1, the page allocator tries harder to avoid fragmentation +and maintain the ability to produce huge pages / higher-order pages. + +It is recommended to enable this right after boot, as fragmentation, +once it occurred, can be long-lasting or even permanent. dirty_background_bytes ====================== diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6f0404941886..9a02772c2461 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -273,6 +273,7 @@ int min_free_kbytes = 1024; int user_min_free_kbytes = -1; static int watermark_boost_factor __read_mostly = 15000; static int watermark_scale_factor = 10; +static int defrag_mode; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ int movable_zone; @@ -3389,6 +3390,11 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) */ alloc_flags = (__force int) (gfp_mask & __GFP_KSWAPD_RECLAIM); + if (defrag_mode) { + alloc_flags |= ALLOC_NOFRAGMENT; + return alloc_flags; + } + #ifdef CONFIG_ZONE_DMA32 if (!zone) return alloc_flags; @@ -3480,7 +3486,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, continue; } - if (no_fallback && nr_online_nodes > 1 && + if (no_fallback && !defrag_mode && nr_online_nodes > 1 && zone != zonelist_zone(ac->preferred_zoneref)) { int local_nid; @@ -3591,7 +3597,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, * It's possible on a UMA machine to get through all zones that are * fragmented. If avoiding fragmentation, reset and try again. */ - if (no_fallback) { + if (no_fallback && !defrag_mode) { alloc_flags &= ~ALLOC_NOFRAGMENT; goto retry; } @@ -4128,6 +4134,9 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags); + if (defrag_mode) + alloc_flags |= ALLOC_NOFRAGMENT; + return alloc_flags; } @@ -4510,6 +4519,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, &compaction_retries)) goto retry; + /* Reclaim/compaction failed to prevent the fallback */ + if (defrag_mode) { + alloc_flags &= ALLOC_NOFRAGMENT; + goto retry; + } /* * Deal with possible cpuset update races or zonelist updates to avoid @@ -6286,6 +6300,15 @@ static const struct ctl_table page_alloc_sysctl_table[] = { .extra1 = SYSCTL_ONE, .extra2 = SYSCTL_THREE_THOUSAND, }, + { + .procname = "defrag_mode", + .data = &defrag_mode, + .maxlen = sizeof(defrag_mode), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { .procname = "percpu_pagelist_high_fraction", .data = &percpu_pagelist_high_fraction, From patchwork Thu Mar 13 21:05:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 14015916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B6D5C35FF3 for ; Thu, 13 Mar 2025 21:07:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E776C280009; Thu, 13 Mar 2025 17:07:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD5C2280001; Thu, 13 Mar 2025 17:07:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C505E280009; Thu, 13 Mar 2025 17:07:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9C14A280001 for ; Thu, 13 Mar 2025 17:07:07 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6BF7516060E for ; Thu, 13 Mar 2025 21:07:09 +0000 (UTC) X-FDA: 83217762978.06.C13A753 Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) by imf28.hostedemail.com (Postfix) with ESMTP id 7EC85C0005 for ; Thu, 13 Mar 2025 21:07:07 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="Z5R1/XBf"; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf28.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741900027; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9ouliMx/rxg1GN/5kBba4AjsVs5wNUw9q6lxfkWoRUQ=; b=PGR0elNubCunl9+51lCYPW+8yfKnD1W+uYt7oYAVePx8mW6QC1W1R5+GYg4ebSgXaeJ9FJ DcYRL5taZpxVox/5mTFexO1Uo/erOU8ESB0659ocQaxBwzYuv/9qi9uX6zff0J2i9vQCZ5 IaW/fFTam1crP86s7+85+555VExabq4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741900027; a=rsa-sha256; cv=none; b=tyTk7lNfiUHmB0r8zpU1vsh//aFgG/MO9LfSvIOelcuXFiJDRywuuPOZ1YoLrSBJeWD5t/ UJoicZeprFnmHUuAsOyoqfwsjmmErWVyiTUf7y0HrPxNpfx60bgUJSyRqeZ7/uxhygUa1V 3SWjRkd6xXzH63XnRAMA8e8PVGqo0q8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="Z5R1/XBf"; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf28.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-7c56321b22cso159391485a.1 for ; Thu, 13 Mar 2025 14:07:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1741900027; x=1742504827; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9ouliMx/rxg1GN/5kBba4AjsVs5wNUw9q6lxfkWoRUQ=; b=Z5R1/XBf1XwIchE7EoM3LeWkMnElujFVSiXGjg/Tpzc0l4wsxssx17/O4JehlXpf3N ci621mLhX+smxfNEnw9sMJbm+D6Ecn0yKRWtgBfnH6gtBt6S95M1setDIgN+kzmJL5h5 vBSV8dspvagejwTWksGJUlBDcxhfLBR8TxHf9GOI/0P5kfIL6yfR+OKZGjFVMlnhyuse EjVsC0nLWslJfX/k/DDgcp7aJUAjVpPnmp56sNfBYBkC+zBqR/R3Ke9+mvucNInhlPWP paUAKpxkQT3AUHanqz14zAhLrX0VPDnIGReF8AuvnxAUQ3VJ6v46ZXs9oPAVEhwnkrAV 4UbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741900027; x=1742504827; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9ouliMx/rxg1GN/5kBba4AjsVs5wNUw9q6lxfkWoRUQ=; b=h5c3QoTgpTj2qTZzi5d066mXQ45wUejDVT7AjuPpsKS9XV1o5eh3H7kDvXUriW7VvA HcO78NBLdwek27eCxv5tpZKZH1OUaH5OtaA8r8qcDLlm4NvH1zkbNaOq+ZAfaB47iMlP x+CHPfjqh4ZSBN4ujAYhA3LgqbdrhK3iHp5zAyT4gWLZUnodaHDM6/ZuctxL8Mtpy1wN 1cUclHWbQde3f0BPGS1bMEPFDi+z2lreHn66AkcJRqfAOj+s806pBFw563Cn3tDtPD8J 1PkzcBDd+sqGjW6RAI6aJQj3YxjSBUyfl7R2Y+Q8atUXEnWrytWeOqQSjlADbg0ETvJu csPA== X-Forwarded-Encrypted: i=1; AJvYcCWfS5mQu2KSj0MZ8cHutN/h+OMuQywc5BwGXeU9xHXIzeXLx4PqEkkHY68K/TxKCtOMoySbqu9z0g==@kvack.org X-Gm-Message-State: AOJu0YyjP+Jomno3uDXKqdHWQGPHififpnGkiX5xSQx8+mCooT9HbhYr uul/RgL4dTsVrRZc29yVXtgnDCn/rYlYkMs69QJLDygtI23DqRSy4uDrpdGEgKU= X-Gm-Gg: ASbGncv4G5Xzqz4PLlbWEuoiCcikM0f98dvJHcUqNCzacM2x4vuqlppzJWhWJVR9OHW 1tqL0x9Yeam2SPiskSg1fILF2cSgI/IQgFzV/d3G+Gna4OouugVEWBfq+vr6muSzHquARaghFFs ypbUgCHCXyNpgNrghaglBp/ODBu+fPWPsHTkwRJBVVbOq7xJVQCNtxYqIvMhhvkLZlXjhgEAcDY VfJaPWJNbBVnVb+Eh/vcqMy0Hm+pVuZi6VbT+4M4n06RBsJOXSxYHsELuurGJBqQmjtJCGFhD23 nW7gKuOVCW3n9BLCQGUIN1e5VKuL8pcXQlQF31t5hNU= X-Google-Smtp-Source: AGHT+IGURnx3waTxMIfAkGQ+RELMEfUq3qDPKdHBU4TLhM9iO7ZYMst5c+ohVVmbcycCmEztxbjgYQ== X-Received: by 2002:a05:620a:8ecb:b0:7c5:53ab:a732 with SMTP id af79cd13be357-7c5737b8b58mr536730085a.16.1741900026755; Thu, 13 Mar 2025 14:07:06 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id af79cd13be357-7c573c9d641sm143094885a.65.2025.03.13.14.07.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Mar 2025 14:07:04 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Mel Gorman , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 4/5] mm: page_alloc: defrag_mode kswapd/kcompactd assistance Date: Thu, 13 Mar 2025 17:05:35 -0400 Message-ID: <20250313210647.1314586-5-hannes@cmpxchg.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313210647.1314586-1-hannes@cmpxchg.org> References: <20250313210647.1314586-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7EC85C0005 X-Stat-Signature: gfm5a95fgz134xpwb7tic6wwuqbaagqh X-HE-Tag: 1741900027-417324 X-HE-Meta: U2FsdGVkX1/zEd4aVQqIdPwNZJ6/TDltcXKZ8PFfM0Mk5v5h9otBjB1msos4INQqfnqsOPN/uTl/xzzF80H+unYJxpshzjzVVIJIMPxFCq14xxJg2Dff8b+GI23LuqvEJ4mR0mMbpccqY4ITXnsiCgyjQS004rDEQzz5VfAlytHpaHTbGV5Y9AzOMewJWbadbECLuSOVJIIBnfhwAG8Yo3wK5KE5Kz1MWqcQNt/5kY8aCF3dtorW9Zb/E3TOstNftYzM7miT6v/F1eB/P3y6nOGpkDnV5TUXU1B1JtrcZNk7bdlbPr2vTowFjBuRL8ucJhmf8nYWmAx2HaZCztIvrmO4H7FPFYEeWeraefZGaM+mkohmIe5PZQjLIH7/O/jizRiRtQHJ8Zhlj9tfs6DiVPJORw+PR9nIhpKXaoPoQpH/5beHRRq/fcqc5+h3cNhUhnT9XpaC5oYCxSafdOdxoxd002oiayi7NZJkFP8qyv5Y5SdjFso7idr2hoLHm4WqQZNxl+sQD6gcfJoVNF77yJ1s020tfcSmv6N8L1EByAyOvZN8fPhhIvu8QxOdSsPr8VL9c8CgLDODHKmi8kQMh35Tq6gTHJ5Rgogg++saVQZ7RG/2gzDAIMPRlza1oraJ+3gEO2nJpLhfZmcbzDibBwDwgOCZgBk9DvXWEFtjaTYWr4Pc3IfOsqWuSqKXdVHmqJk/Wh3gGaFYbOPuG8Ae0qErOv3A50Jw2YPG1173Q7NTsbDkSxaL6dzoN4t3gf9/3sJvVkGDKcbyoX2WYOiy5oMJqTSeorXzczVFABxvss5e8yi3ztdwFvef9VV5prIcsBQZTeVUC/ga9IOjmQ+XOn5wA4l2+Gi+uCbkWK/XzkLkGjGwxh9J8RyV+7OCcvA0FNVFNEhV2mliGKFDRCi2xgTdU86Ik0SMul/IriL5b/QmrEW3oqCEx7lL9TDX0LvkqdJosXhd7I4DWZQgW2K bAut1lhm IxFiLlTDoqUGQ7AnqVKAEqBcoO9zLt1Ua/k1F8mVXiT/NKBjrVniwWT8rxBTHggrvz8xv0Cp5qfrk5hFp2/MeaPjWwRi34OaZ/rSkvreggikOoTPCcnmmRhv7RD9QON3DSBDk24popubFinTcTjIH62L9mH+1NCFx+EkgUxe9n9AO80XKNLPaUBoYKfSegULFD8T1S1dVAA54Jl9PE9YYPitE6e6+3nkLoC3OTaLowA/jYPxyujZby0GWF+/XAh25b77LwS+sbq7p6rxwogMZjEehv5TWvkvyIOlDz5ENiVJyupP9fNWtqLD7aXZwYokTEari6ClSIlwSnSElMzoH4vwEB3KYTHGc7dKgvZRh9jxw7ZKxbQbhiR7Iogz2pKddoFMygDmw+4k/1lsa+582y8tL0vlUH6iQtj92uPSVUGjpuLakeuWwnb+/IK4pkRCqdyo8KHiAkpOQCPyTfxumufprdBmxDOW1tUBalK/10/oAsog= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When defrag_mode is enabled, allocation fallbacks strongly prefer whole block conversions instead of polluting or stealing partially used blocks. This means there is a demand for pageblocks even from sub-block requests. Let kswapd/kcompactd help produce them. By the time kswapd gets woken up, normal rmqueue and block conversion fallbacks have been attempted and failed. So always wake kswapd with the block order; it will take care of producing a suitable compaction gap and then chain-wake kcompactd with the block order when its done. VANILLA DEFRAGMODE-ASYNC Hugealloc Time mean 52739.45 ( +0.00%) 34300.36 ( -34.96%) Hugealloc Time stddev 56541.26 ( +0.00%) 36390.42 ( -35.64%) Kbuild Real time 197.47 ( +0.00%) 196.13 ( -0.67%) Kbuild User time 1240.49 ( +0.00%) 1234.74 ( -0.46%) Kbuild System time 70.08 ( +0.00%) 62.62 ( -10.50%) THP fault alloc 46727.07 ( +0.00%) 57054.53 ( +22.10%) THP fault fallback 21910.60 ( +0.00%) 11581.40 ( -47.14%) Direct compact fail 195.80 ( +0.00%) 107.80 ( -44.72%) Direct compact success 7.93 ( +0.00%) 4.53 ( -38.06%) Direct compact success rate % 3.51 ( +0.00%) 3.20 ( -6.89%) Compact daemon scanned migrate 3369601.27 ( +0.00%) 5461033.93 ( +62.07%) Compact daemon scanned free 5075474.47 ( +0.00%) 5824897.93 ( +14.77%) Compact direct scanned migrate 161787.27 ( +0.00%) 58336.93 ( -63.94%) Compact direct scanned free 163467.53 ( +0.00%) 32791.87 ( -79.94%) Compact total migrate scanned 3531388.53 ( +0.00%) 5519370.87 ( +56.29%) Compact total free scanned 5238942.00 ( +0.00%) 5857689.80 ( +11.81%) Alloc stall 2371.07 ( +0.00%) 2424.60 ( +2.26%) Pages kswapd scanned 2160926.73 ( +0.00%) 2657018.33 ( +22.96%) Pages kswapd reclaimed 533191.07 ( +0.00%) 559583.07 ( +4.95%) Pages direct scanned 400450.33 ( +0.00%) 722094.07 ( +80.32%) Pages direct reclaimed 94441.73 ( +0.00%) 107257.80 ( +13.57%) Pages total scanned 2561377.07 ( +0.00%) 3379112.40 ( +31.93%) Pages total reclaimed 627632.80 ( +0.00%) 666840.87 ( +6.25%) Swap out 47959.53 ( +0.00%) 77238.20 ( +61.05%) Swap in 7276.00 ( +0.00%) 11712.80 ( +60.97%) File refaults 138043.00 ( +0.00%) 143438.80 ( +3.91%) With this patch, defrag_mode=1 beats the vanilla kernel in THP success rates and allocation latencies. The trend holds over time: thp_fault_alloc VANILLA DEFRAGMODE-ASYNC 61988 52066 56474 58844 57258 58233 50187 58476 52388 54516 55409 59938 52925 57204 47648 60238 43669 55733 40621 56211 36077 59861 41721 57771 36685 58579 34641 51868 33215 56280 DEFRAGMODE-ASYNC also wins on %sys as ~3/4 of the direct compaction work is shifted to kcompactd. Reclaim activity is higher. Part of that is simply due to the increased memory footprint from higher THP use. The other aspect is that *direct* reclaim/compaction are still going for requested orders rather than targeting the page blocks required for fallbacks, which is less efficient than it could be. However, this is already a useful tradeoff to make, as in many environments peak periods are short and retaining the ability to produce THP through them is more important. Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9a02772c2461..4a0d8f871e56 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4076,15 +4076,21 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask, struct zone *zone; pg_data_t *last_pgdat = NULL; enum zone_type highest_zoneidx = ac->highest_zoneidx; + unsigned int reclaim_order; + + if (defrag_mode) + reclaim_order = max(order, pageblock_order); + else + reclaim_order = order; for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, highest_zoneidx, ac->nodemask) { if (!managed_zone(zone)) continue; - if (last_pgdat != zone->zone_pgdat) { - wakeup_kswapd(zone, gfp_mask, order, highest_zoneidx); - last_pgdat = zone->zone_pgdat; - } + if (last_pgdat == zone->zone_pgdat) + continue; + wakeup_kswapd(zone, gfp_mask, reclaim_order, highest_zoneidx); + last_pgdat = zone->zone_pgdat; } } From patchwork Thu Mar 13 21:05:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 14015917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AFDAC35FF1 for ; Thu, 13 Mar 2025 21:07:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A2EF28000D; Thu, 13 Mar 2025 17:07:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB4F1280001; Thu, 13 Mar 2025 17:07:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D00C828000D; Thu, 13 Mar 2025 17:07:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A9E19280001 for ; Thu, 13 Mar 2025 17:07:13 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 432D480605 for ; Thu, 13 Mar 2025 21:07:13 +0000 (UTC) X-FDA: 83217763146.01.0A4120C Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) by imf19.hostedemail.com (Postfix) with ESMTP id 6DAD71A0010 for ; Thu, 13 Mar 2025 21:07:11 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ea0xxxiq; spf=pass (imf19.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.52 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741900031; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=omkCeK6Z+5R8kI2i4e0zD4yPuPDofJe9UMJlISCGlWc=; b=bLaWqTW/pPz0CYmkAINJWnIffub9GMUzrpmUux6rRI4ZGyNpdYVUX4SHdD2hc/t2eu42Eo lqOJhhQtDnDVNCY6c0VBpNvMrJLZ10Hrk32NhREvXEXMf/jIHy1256dmI7mGSBEGwVtm3T wLh3UxofPQ04pa7hw3aXm6oWjIooGrE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741900031; a=rsa-sha256; cv=none; b=1sEfb/YBmIY3Qu7NA4/4M2tbAlooWiQKyPnzZ+1vOHjRTEbX5CdumZqutxgGWWRv5kSyYb x55jZ23VR4d5J3GFV1zemjClY5uA3ToVV2RAd4IhUvY7r1DWNQsG1VwWE6Ha+hofmh6hJS b4YLEM96ksDn+WK4/ADIFm4Iph/FaKw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ea0xxxiq; spf=pass (imf19.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.52 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6eaddbfcbc9so19057586d6.2 for ; Thu, 13 Mar 2025 14:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1741900030; x=1742504830; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=omkCeK6Z+5R8kI2i4e0zD4yPuPDofJe9UMJlISCGlWc=; b=ea0xxxiqAkwymP9ID1m8mzsry5dRIaZG4i323MNIWKjSK+Dd5zQjAurPmwVmSYE4SU +chTWQ8h5SoRInUoR9aNlIp04tsShlwc+uqxyBloq8YqD3XM8N2aVdwv88+jHTLXeogI B18SZojEj2p+ELtqzWn6ErbJy0x3RaoSELCjM0JbQ3UYfnXT+T5e/2Kh1nSGEuNTiYss W6vU8u6c58A2Wvmh3e6TXnpqtHbgTL6y59TDfwGS+AmQxqrDSi8JXNpxVsV23K+p5ojm FdvjqhddioXmExmhKFXYZmyJV1LjQeuYjXItABt87MhrgZsfCLXDghblO+g2Wbha+V9Y CP/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741900030; x=1742504830; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=omkCeK6Z+5R8kI2i4e0zD4yPuPDofJe9UMJlISCGlWc=; b=D0lJYRJmP53G1UAeiWHszZr4UP3LvzwnPg0vW6DfNw6Caw6V1rMTl4XT4/ZM3TYN+i oSpbWxW2xWEr4qlpoxfm4aYOgF5drzFvcyiRIk41q6n3Ww3/5SHoQJyCq/s7KzHaRWEo QP1JQqIwy/f13ByuDnNftM49vRz/LSZ0SNGDJATHSBgYCOp0OgEda3tZf9Sfd80UM6X0 Vas/QgoYXnpQAd9QJ9Er4UwklsBNQvDJHfixR6XVoCsb5MobA3wPTK68WIjX4apU86WS apxq4q6S6+9hQQw6242nZIOoArQLfI7J30rdeD0DsUNxp6CXA6/mCNC+GGWKAGq3gEQf 4VoQ== X-Forwarded-Encrypted: i=1; AJvYcCWZ2b+e5Qh/Vq5fKBWh5/k7r+i/CC/loDfWE/tjDViu+flvfSiCRovb0j27ZVzZjlHdtOyzC3bPyA==@kvack.org X-Gm-Message-State: AOJu0YxtHvI9zQJZwX5sDcrpnC5Nr/YXXYR0jx3oGGITzQkZ/z/HhVDo nknccXnCKzRvlFiPHaAy2cyshs8Kut5QiCg8UhoeVPTbEuSDjIScIOKm+2xIfTg= X-Gm-Gg: ASbGncvCk2Mv4HjJxVAQ2p6j7vFt4zQ3UMOqddkwwzLET3uIr7tYNZjB+IfDZ85HUTv KKdKp2zvr48h3JQqnpld353PmY6PHG1nidzdVHBDmM+aSb53IVmw4qyEWg0dUXCJdIR35mm6XoX 2r658hhhy2AytnYX7jrXnbQnUA28ggBqVQPTxrS5swE3aTyEa5i20KhngFx1inaizNqOHIrLsba Xpdc/0f1oqN+IeYibj8/Jn4F43WvORxa34EMnn9cBavi4P2ZFOFbUKIWT4G/sG1d1D6DMBxqC7A OcEq6t+rW7p/0FrAibVO1/UuQVJhXbtV4YZzyc2clvAv/f11p3QB6g== X-Google-Smtp-Source: AGHT+IE0pgmpJ7iYZdsbDD5GRPXRpzmD9Mb8iJmN9X7Z8Y97UKXrjLpuHa7ASqYiLMkPwrW1/Ylm4w== X-Received: by 2002:a05:6214:212b:b0:6e8:f470:2b0d with SMTP id 6a1803df08f44-6eae7a3f38cmr22090436d6.19.1741900030398; Thu, 13 Mar 2025 14:07:10 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6eade235f9bsm14078246d6.29.2025.03.13.14.07.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Mar 2025 14:07:08 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Mel Gorman , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks Date: Thu, 13 Mar 2025 17:05:36 -0400 Message-ID: <20250313210647.1314586-6-hannes@cmpxchg.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313210647.1314586-1-hannes@cmpxchg.org> References: <20250313210647.1314586-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 6DAD71A0010 X-Stat-Signature: tnj6iu1khsg6unndu8bro64xzkjj753w X-HE-Tag: 1741900031-689320 X-HE-Meta: U2FsdGVkX19QgGuWIOdGm2Nd1XKD00QrejqgaN1GJEEqxnRNBjQ6RDygkg4yeiTpNoKrJCMQHnkO+5NY7dWI2gZ/w+KGtLgsKxaEOxJS5arNAJAUMqTz5OW/GhJC63Ema/kcMRdTsduxYlEUnMVaiws51Uh1/X2nFk+WBwlVSSK1vEYHEXwrWBiFPAPy6tTWKUl3PhF9PZeQjRdGbnUuDg8ksz4Zv7b0u1/tEqEtIqq6ouD8Fitl4nGIJfJEMK/SHYoDbrR3kyPuAj8meNapoBxlLgBCFUwHeyS4uXru/wXyf58lPxWeH6A84A503euvZrA6eALQI24Cul8711RNqU0jQuKhZJ4vHJHFjXuheg6+dnEA998rUvzSF4SMB73nIpnn26bawdkDeulpV5cULEuYn7dYM+26nbWjbYZqe7Cp74C49q4TTzJI/T6FJQLWWxm3Nk/9UjMHJc85HhAwsFDLZ0r4id2UkYETQa91xFpBch5/FP9Now58db/eCQayijw6sLOMu42TEWLvOzq4dXHm4o4XKYFndoOytsez493zQsUbLve48Uo9Rm+pQ85trdf+iI1aLO+irtNOZcwwefCyq0PilyyXizM64ivTMiLKh+uxh+PGom6lmznD9fVidU/IlA5l8N8yheb/Jl+3Pi7+PdxIheuOHCwv8PWubCsTO3R0LzwU20dPNzqkHedAPtEn2LIMTmKMW/jMYkP5MSgRT6Ywemwt8v6vbdyZp0zC2NnBZ6AZnwbUHjikmlFdDxlQ+Z+Jm9wPVubWZ7PsgkdIPjSZXi0sdY/Z+e1zIHk6XYSoDTxW1phYg1Ga+i4jd+NQ7VrXa0HyKoYSJjk+uTKmlfeJ2ZP143axr0uXfqszKkffg7zKJ9zEVWwV9Y3c4PaX6j+z7/Gx9dSWzcz2zhGqJ+axmgG8jX2lUHK4MNmyCLFVSuJmKNvTpRQMHL1dD3hRd3ngoBYlE/T4gcf Tn30cncC RruRoYj6BYEwB0nY/QxAGbEgSz9Ki55dKt7paDf3TU1vTlDdNiC/roHT39gvBYGYQ9W7svC0wLQvMQxQKJJ3PYNqNl80Y9D1tOlczC/tCND/V1vYBhXvfHgsWwrC+6g201LvwIhkyNlm+mmqqqVWkMFluUe8CcDHgoZ8wqAGsohHnbE0EfEWLDHWAhYivcAc9tSV11rP0XWVKM5kHoxvAl60bUjHSutWGVH3q2szAz9S5l5yVSqMtHxpTHyCK3zaRNS4idEx2tWC8FCaQT5yNukzpAUgrmm6kos1nY4YqkiYty55CHss9Pwbc5oUnusD502sOadsO3O/KNNYndAZAbVP5FKEiUKojOXwwF/bqtf9Us2DOPCApP87MEDfNeN+BjFslAJ/nHJ59G4Ari8u5bVUTY9zwS2ulu+G8eSl991c2fpe+uFkmTFSYx6KusXqZQ5eItAxqYkGMe22msDSm0Z00s2/p8O4ZuxSV9A7NY4t/j4g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The previous patch added pageblock_order reclaim to kswapd/kcompactd, which helps, but produces only one block at a time. Allocation stalls and THP failure rates are still higher than they could be. To adequately reflect ALLOC_NOFRAGMENT demand for pageblocks, change the watermarking for kswapd & kcompactd: instead of targeting the high watermark in order-0 pages and checking for one suitable block, simply require that the high watermark is entirely met in pageblocks. To this end, track the number of free pages within contiguous pageblocks, then change pgdat_balanced() and compact_finished() to check watermarks against this new value. This further reduces THP latencies and allocation stalls, and improves THP success rates against the previous patch: DEFRAGMODE-ASYNC DEFRAGMODE-ASYNC-WMARKS Hugealloc Time mean 34300.36 ( +0.00%) 28904.00 ( -15.73%) Hugealloc Time stddev 36390.42 ( +0.00%) 33464.37 ( -8.04%) Kbuild Real time 196.13 ( +0.00%) 196.59 ( +0.23%) Kbuild User time 1234.74 ( +0.00%) 1231.67 ( -0.25%) Kbuild System time 62.62 ( +0.00%) 59.10 ( -5.54%) THP fault alloc 57054.53 ( +0.00%) 63223.67 ( +10.81%) THP fault fallback 11581.40 ( +0.00%) 5412.47 ( -53.26%) Direct compact fail 107.80 ( +0.00%) 59.07 ( -44.79%) Direct compact success 4.53 ( +0.00%) 2.80 ( -31.33%) Direct compact success rate % 3.20 ( +0.00%) 3.99 ( +18.66%) Compact daemon scanned migrate 5461033.93 ( +0.00%) 2267500.33 ( -58.48%) Compact daemon scanned free 5824897.93 ( +0.00%) 2339773.00 ( -59.83%) Compact direct scanned migrate 58336.93 ( +0.00%) 47659.93 ( -18.30%) Compact direct scanned free 32791.87 ( +0.00%) 40729.67 ( +24.21%) Compact total migrate scanned 5519370.87 ( +0.00%) 2315160.27 ( -58.05%) Compact total free scanned 5857689.80 ( +0.00%) 2380502.67 ( -59.36%) Alloc stall 2424.60 ( +0.00%) 638.87 ( -73.62%) Pages kswapd scanned 2657018.33 ( +0.00%) 4002186.33 ( +50.63%) Pages kswapd reclaimed 559583.07 ( +0.00%) 718577.80 ( +28.41%) Pages direct scanned 722094.07 ( +0.00%) 355172.73 ( -50.81%) Pages direct reclaimed 107257.80 ( +0.00%) 31162.80 ( -70.95%) Pages total scanned 3379112.40 ( +0.00%) 4357359.07 ( +28.95%) Pages total reclaimed 666840.87 ( +0.00%) 749740.60 ( +12.43%) Swap out 77238.20 ( +0.00%) 110084.33 ( +42.53%) Swap in 11712.80 ( +0.00%) 24457.00 ( +108.80%) File refaults 143438.80 ( +0.00%) 188226.93 ( +31.22%) Also of note is that compaction work overall is reduced. The reason for this is that when free pageblocks are more readily available, allocations are also much more likely to get physically placed in LRU order, instead of being forced to scavenge free space here and there. This means that reclaim by itself has better chances of freeing up whole blocks, and the system relies less on compaction. Comparing all changes to the vanilla kernel: VANILLA DEFRAGMODE-ASYNC-WMARKS Hugealloc Time mean 52739.45 ( +0.00%) 28904.00 ( -45.19%) Hugealloc Time stddev 56541.26 ( +0.00%) 33464.37 ( -40.81%) Kbuild Real time 197.47 ( +0.00%) 196.59 ( -0.44%) Kbuild User time 1240.49 ( +0.00%) 1231.67 ( -0.71%) Kbuild System time 70.08 ( +0.00%) 59.10 ( -15.45%) THP fault alloc 46727.07 ( +0.00%) 63223.67 ( +35.30%) THP fault fallback 21910.60 ( +0.00%) 5412.47 ( -75.29%) Direct compact fail 195.80 ( +0.00%) 59.07 ( -69.48%) Direct compact success 7.93 ( +0.00%) 2.80 ( -57.46%) Direct compact success rate % 3.51 ( +0.00%) 3.99 ( +10.49%) Compact daemon scanned migrate 3369601.27 ( +0.00%) 2267500.33 ( -32.71%) Compact daemon scanned free 5075474.47 ( +0.00%) 2339773.00 ( -53.90%) Compact direct scanned migrate 161787.27 ( +0.00%) 47659.93 ( -70.54%) Compact direct scanned free 163467.53 ( +0.00%) 40729.67 ( -75.08%) Compact total migrate scanned 3531388.53 ( +0.00%) 2315160.27 ( -34.44%) Compact total free scanned 5238942.00 ( +0.00%) 2380502.67 ( -54.56%) Alloc stall 2371.07 ( +0.00%) 638.87 ( -73.02%) Pages kswapd scanned 2160926.73 ( +0.00%) 4002186.33 ( +85.21%) Pages kswapd reclaimed 533191.07 ( +0.00%) 718577.80 ( +34.77%) Pages direct scanned 400450.33 ( +0.00%) 355172.73 ( -11.31%) Pages direct reclaimed 94441.73 ( +0.00%) 31162.80 ( -67.00%) Pages total scanned 2561377.07 ( +0.00%) 4357359.07 ( +70.12%) Pages total reclaimed 627632.80 ( +0.00%) 749740.60 ( +19.46%) Swap out 47959.53 ( +0.00%) 110084.33 ( +129.53%) Swap in 7276.00 ( +0.00%) 24457.00 ( +236.10%) File refaults 138043.00 ( +0.00%) 188226.93 ( +36.35%) THP allocation latencies and %sys time are down dramatically. THP allocation failures are down from nearly 50% to 8.5%. And to recall previous data points, the success rates are steady and reliable without the cumulative deterioration of fragmentation events. Compaction work is down overall. Direct compaction work especially is drastically reduced. As an aside, its success rate of 4% indicates there is room for improvement. For now it's good to rely on it less. Reclaim work is up overall, however direct reclaim work is down. Part of the increase can be attributed to a higher use of THPs, which due to internal fragmentation increase the memory footprint. This is not necessarily an unexpected side-effect for users of THP. However, taken both points together, there may well be some opportunities for fine tuning in the reclaim/compaction coordination. Signed-off-by: Johannes Weiner --- include/linux/mmzone.h | 1 + mm/compaction.c | 37 ++++++++++++++++++++++++++++++------- mm/internal.h | 1 + mm/page_alloc.c | 29 +++++++++++++++++++++++------ mm/vmscan.c | 15 ++++++++++++++- mm/vmstat.c | 1 + 6 files changed, 70 insertions(+), 14 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index dbb0ad69e17f..37c29f3fbca8 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -138,6 +138,7 @@ enum numa_stat_item { enum zone_stat_item { /* First 128 byte cacheline (assuming 64 bit words) */ NR_FREE_PAGES, + NR_FREE_PAGES_BLOCKS, NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */ NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE, NR_ZONE_ACTIVE_ANON, diff --git a/mm/compaction.c b/mm/compaction.c index 036353ef1878..4a2ccb82d0b2 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2329,6 +2329,22 @@ static enum compact_result __compact_finished(struct compact_control *cc) if (!pageblock_aligned(cc->migrate_pfn)) return COMPACT_CONTINUE; + /* + * When defrag_mode is enabled, make kcompactd target + * watermarks in whole pageblocks. Because they can be stolen + * without polluting, no further fallback checks are needed. + */ + if (defrag_mode && !cc->direct_compaction) { + if (__zone_watermark_ok(cc->zone, cc->order, + high_wmark_pages(cc->zone), + cc->highest_zoneidx, cc->alloc_flags, + zone_page_state(cc->zone, + NR_FREE_PAGES_BLOCKS))) + return COMPACT_SUCCESS; + + return COMPACT_CONTINUE; + } + /* Direct compactor: Is a suitable page free? */ ret = COMPACT_NO_SUITABLE_PAGE; for (order = cc->order; order < NR_PAGE_ORDERS; order++) { @@ -2496,13 +2512,19 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, static enum compact_result compaction_suit_allocation_order(struct zone *zone, unsigned int order, int highest_zoneidx, unsigned int alloc_flags, - bool async) + bool async, bool kcompactd) { + unsigned long free_pages; unsigned long watermark; + if (kcompactd && defrag_mode) + free_pages = zone_page_state(zone, NR_FREE_PAGES_BLOCKS); + else + free_pages = zone_page_state(zone, NR_FREE_PAGES); + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK); - if (zone_watermark_ok(zone, order, watermark, highest_zoneidx, - alloc_flags)) + if (__zone_watermark_ok(zone, order, watermark, highest_zoneidx, + alloc_flags, free_pages)) return COMPACT_SUCCESS; /* @@ -2558,7 +2580,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) ret = compaction_suit_allocation_order(cc->zone, cc->order, cc->highest_zoneidx, cc->alloc_flags, - cc->mode == MIGRATE_ASYNC); + cc->mode == MIGRATE_ASYNC, + !cc->direct_compaction); if (ret != COMPACT_CONTINUE) return ret; } @@ -3062,7 +3085,7 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat) ret = compaction_suit_allocation_order(zone, pgdat->kcompactd_max_order, highest_zoneidx, ALLOC_WMARK_MIN, - false); + false, true); if (ret == COMPACT_CONTINUE) return true; } @@ -3085,7 +3108,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) .mode = MIGRATE_SYNC_LIGHT, .ignore_skip_hint = false, .gfp_mask = GFP_KERNEL, - .alloc_flags = ALLOC_WMARK_MIN, + .alloc_flags = ALLOC_WMARK_HIGH, }; enum compact_result ret; @@ -3105,7 +3128,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) ret = compaction_suit_allocation_order(zone, cc.order, zoneid, cc.alloc_flags, - false); + false, true); if (ret != COMPACT_CONTINUE) continue; diff --git a/mm/internal.h b/mm/internal.h index 2f52a65272c1..286520a424fe 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -536,6 +536,7 @@ extern char * const zone_names[MAX_NR_ZONES]; DECLARE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); extern int min_free_kbytes; +extern int defrag_mode; void setup_per_zone_wmarks(void); void calculate_min_free_kbytes(void); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4a0d8f871e56..c33c08e278f9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -273,7 +273,7 @@ int min_free_kbytes = 1024; int user_min_free_kbytes = -1; static int watermark_boost_factor __read_mostly = 15000; static int watermark_scale_factor = 10; -static int defrag_mode; +int defrag_mode; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ int movable_zone; @@ -660,16 +660,20 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, bool tail) { struct free_area *area = &zone->free_area[order]; + int nr_pages = 1 << order; VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", - get_pageblock_migratetype(page), migratetype, 1 << order); + get_pageblock_migratetype(page), migratetype, nr_pages); if (tail) list_add_tail(&page->buddy_list, &area->free_list[migratetype]); else list_add(&page->buddy_list, &area->free_list[migratetype]); area->nr_free++; + + if (order >= pageblock_order && !is_migrate_isolate(migratetype)) + __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages); } /* @@ -681,24 +685,34 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, unsigned int order, int old_mt, int new_mt) { struct free_area *area = &zone->free_area[order]; + int nr_pages = 1 << order; /* Free page moving can fail, so it happens before the type update */ VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt, "page type is %lu, passed migratetype is %d (nr=%d)\n", - get_pageblock_migratetype(page), old_mt, 1 << order); + get_pageblock_migratetype(page), old_mt, nr_pages); list_move_tail(&page->buddy_list, &area->free_list[new_mt]); - account_freepages(zone, -(1 << order), old_mt); - account_freepages(zone, 1 << order, new_mt); + account_freepages(zone, -nr_pages, old_mt); + account_freepages(zone, nr_pages, new_mt); + + if (order >= pageblock_order && + is_migrate_isolate(old_mt) != is_migrate_isolate(new_mt)) { + if (!is_migrate_isolate(old_mt)) + nr_pages = -nr_pages; + __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages); + } } static inline void __del_page_from_free_list(struct page *page, struct zone *zone, unsigned int order, int migratetype) { + int nr_pages = 1 << order; + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", - get_pageblock_migratetype(page), migratetype, 1 << order); + get_pageblock_migratetype(page), migratetype, nr_pages); /* clear reported state and update reported page count */ if (page_reported(page)) @@ -708,6 +722,9 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon __ClearPageBuddy(page); set_page_private(page, 0); zone->free_area[order].nr_free--; + + if (order >= pageblock_order && !is_migrate_isolate(migratetype)) + __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages); } static inline void del_page_from_free_list(struct page *page, struct zone *zone, diff --git a/mm/vmscan.c b/mm/vmscan.c index 3370bdca6868..b5c7dfc2b189 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6724,11 +6724,24 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) * meet watermarks. */ for_each_managed_zone_pgdat(zone, pgdat, i, highest_zoneidx) { + unsigned long free_pages; + if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) mark = promo_wmark_pages(zone); else mark = high_wmark_pages(zone); - if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) + + /* + * In defrag_mode, watermarks must be met in whole + * blocks to avoid polluting allocator fallbacks. + */ + if (defrag_mode) + free_pages = zone_page_state(zone, NR_FREE_PAGES_BLOCKS); + else + free_pages = zone_page_state(zone, NR_FREE_PAGES); + + if (__zone_watermark_ok(zone, order, mark, highest_zoneidx, + 0, free_pages)) return true; } diff --git a/mm/vmstat.c b/mm/vmstat.c index 16bfe1c694dd..ed49a86348f7 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1190,6 +1190,7 @@ int fragmentation_index(struct zone *zone, unsigned int order) const char * const vmstat_text[] = { /* enum zone_stat_item counters */ "nr_free_pages", + "nr_free_pages_blocks", "nr_zone_inactive_anon", "nr_zone_active_anon", "nr_zone_inactive_file",