From patchwork Tue Mar 8 23:47:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 12774487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AE3FC433FE for ; Tue, 8 Mar 2022 23:48:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C189F8D000C; Tue, 8 Mar 2022 18:48:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B4ECA8D0001; Tue, 8 Mar 2022 18:48:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A16CF8D000C; Tue, 8 Mar 2022 18:48:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id 8F82F8D0001 for ; Tue, 8 Mar 2022 18:48:38 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 55D598248076 for ; Tue, 8 Mar 2022 23:48:38 +0000 (UTC) X-FDA: 79222861116.25.F0CEBD0 Received: from mail-oi1-f202.google.com (mail-oi1-f202.google.com [209.85.167.202]) by imf21.hostedemail.com (Postfix) with ESMTP id D079C1C000A for ; Tue, 8 Mar 2022 23:48:37 +0000 (UTC) Received: by mail-oi1-f202.google.com with SMTP id w19-20020a0568080d5300b002d54499cc1eso619203oik.19 for ; Tue, 08 Mar 2022 15:48:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=Ho7f/NeezaNx/+VE1xEC1BmmHhvU/tXr7U8nm19Ny/8=; b=Kug63PWFYJQb+IwWw8YfG9c5gyFA/+YkOMbdO97VJcnY3F7XuGvLqV1qb6eKsstYC6 fu8W4sy6PFGBz9UK+746zgHTSZVNYleA69c+Vq1TWk4/TU0EuK497e70zIiIOamehA0h Gr8W0c+sSHve+4eRxqrR5ljYMTvqacpM74fUU6qQ0A3IGEfMNnRqsozICggpkVL7/dY2 AvgPNCEIzB1Ja/2nNKG0O2O9vsy6qG9DZgHVDhhRhmUleDNdAcGGbCYTbG8epYqydwO0 gs5Tif4RsqE8crhUUYHOSkRUeNWcEXmkJs1SO+Rol7UpeJyHXqzn/uBhfzxuF7jaIr8/ O7WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=Ho7f/NeezaNx/+VE1xEC1BmmHhvU/tXr7U8nm19Ny/8=; b=6y/4X3kubN4JRz+4B6SNaFC944YMldMBzhDO42VtWXJp6zmiNj3MZ5v0ZflzW8odPZ UnaxLrK1v6R/m6pmsOs7bpMehTWowrLs30FjiY4XaSiSJNrdgVVsDlsMpr/IagSi+/5z ByMG67QFbADVRHOyrVNNBZ0X126gSFtonsKAjVEOoFMT5DsPV9M3I++Zngj9xaAlz1uD rIEdLb5yVHyZvQFgr1TSCuDdvgnQvLczO7BkDWtWddibhBUED2wmYqVXBG5D365Neamy Ckaj5GcbWelESfhkphdCYeMlIewFklU4lIgwAL32aHJx7XV+4ZfMr7l95J7fLy9KpBRC Ldkw== X-Gm-Message-State: AOAM533j8A43xUX0B7DvxXZN+mcmNG+owbPPeFTgbijzuP8R2V4+sgu8 rHb0R9x8lpRT7KBOhlhZYpKu6Qnbqog= X-Google-Smtp-Source: ABdhPJzM4xXQoy+k+mVLsuJXY9MtpV9RL7jteuvpZLDCBS3uEqNMbJUnHqoKnZ3hL+Nd69OrayRE4a4XWJk= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:57a6:54a6:aad1:c0a8]) (user=yuzhao job=sendgmr) by 2002:a05:6808:1455:b0:2d9:a01a:4bbd with SMTP id x21-20020a056808145500b002d9a01a4bbdmr4227260oiv.228.1646783316883; Tue, 08 Mar 2022 15:48:36 -0800 (PST) Date: Tue, 8 Mar 2022 16:47:19 -0700 In-Reply-To: <20220308234723.3834941-1-yuzhao@google.com> Message-Id: <20220308234723.3834941-10-yuzhao@google.com> Mime-Version: 1.0 References: <20220308234723.3834941-1-yuzhao@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [PATCH v8 09/14] mm: multi-gen LRU: optimize multiple memcgs From: Yu Zhao To: Andrew Morton Cc: Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, Yu Zhao , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , " =?utf-8?q?Holger_Hoffst=C3=A4tte?= " , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain X-Rspamd-Server: rspam10 X-Rspam-User: X-Stat-Signature: 83gt1yysbttx5w5hsbsoj8zo8hsxa64a Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Kug63PWF; spf=pass (imf21.hostedemail.com: domain of 3VOsnYgYKCOEbXcKDRJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--yuzhao.bounces.google.com designates 209.85.167.202 as permitted sender) smtp.mailfrom=3VOsnYgYKCOEbXcKDRJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Queue-Id: D079C1C000A X-HE-Tag: 1646783317-422757 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When multiple memcgs are available, it is possible to make better choices based on generations and tiers and therefore improve the overall performance under global memory pressure. This patch adds a rudimentary optimization to select memcgs that can drop single-use unmapped clean pages first. Doing so reduces the chance of going into the aging path or swapping. These two operations can be costly. A typical example that benefits from this optimization is a server running mixed types of workloads, e.g., heavy anon workload in one memcg and heavy buffered I/O workload in the other. Though this optimization can be applied to both kswapd and direct reclaim, it is only added to kswapd to keep the patchset manageable. Later improvements will cover the direct reclaim path. Server benchmark results: Mixed workloads: fio (buffered I/O): -[28, 30]% IOPS BW patch1-7: 3117k 11.9GiB/s patch1-8: 2217k 8661MiB/s memcached (anon): +[247, 251]% Ops/sec KB/sec patch1-7: 563772.35 21900.01 patch1-8: 1968343.76 76461.24 Mixed workloads: fio (buffered I/O): -[4, 6]% IOPS BW 5.17-rc2: 2338k 9133MiB/s patch1-8: 2217k 8661MiB/s memcached (anon): +[524, 530]% Ops/sec KB/sec 5.17-rc2: 313821.65 12190.55 patch1-8: 1968343.76 76461.24 Configurations: (changes since patch 5) cat mixed.sh modprobe brd rd_nr=2 rd_size=56623104 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 mkfs.ext4 /dev/ram1 mount -t ext4 /dev/ram1 /mnt memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=90m --group_reporting & pid=$! sleep 200 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed kill -INT $pid wait Client benchmark results: no change (CONFIG_MEMCG=n) Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain --- mm/vmscan.c | 45 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 67dc4190e790..7375c9dae08f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -127,6 +127,13 @@ struct scan_control { /* Always discard instead of demoting to lower tier memory */ unsigned int no_demotion:1; +#ifdef CONFIG_LRU_GEN + /* help make better choices when multiple memcgs are available */ + unsigned int memcgs_need_aging:1; + unsigned int memcgs_need_swapping:1; + unsigned int memcgs_avoid_swapping:1; +#endif + /* Allocation order */ s8 order; @@ -4348,6 +4355,22 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) VM_BUG_ON(!current_is_kswapd()); + /* + * To reduce the chance of going into the aging path or swapping, which + * can be costly, optimistically skip them unless their corresponding + * flags were cleared in the eviction path. This improves the overall + * performance when multiple memcgs are available. + */ + if (!sc->memcgs_need_aging) { + sc->memcgs_need_aging = true; + sc->memcgs_avoid_swapping = !sc->memcgs_need_swapping; + sc->memcgs_need_swapping = true; + return; + } + + sc->memcgs_need_swapping = true; + sc->memcgs_avoid_swapping = true; + current->reclaim_state->mm_walk = &pgdat->mm_walk; memcg = mem_cgroup_iter(NULL, NULL, NULL); @@ -4750,7 +4773,8 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw return scanned; } -static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness) +static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness, + bool *swapped) { int type; int scanned; @@ -4816,6 +4840,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap sc->nr_reclaimed += reclaimed; + if (type == LRU_GEN_ANON && swapped) + *swapped = true; + return scanned; } @@ -4844,8 +4871,10 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool if (!nr_to_scan) return 0; - if (!need_aging) + if (!need_aging) { + sc->memcgs_need_aging = false; return nr_to_scan; + } /* leave the work to lru_gen_age_node() */ if (current_is_kswapd()) @@ -4867,6 +4896,8 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc { struct blk_plug plug; long scanned = 0; + bool swapped = false; + unsigned long reclaimed = sc->nr_reclaimed; struct pglist_data *pgdat = lruvec_pgdat(lruvec); lru_add_drain(); @@ -4892,13 +4923,19 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc if (!nr_to_scan) break; - delta = evict_folios(lruvec, sc, swappiness); + delta = evict_folios(lruvec, sc, swappiness, &swapped); if (!delta) break; + if (sc->memcgs_avoid_swapping && swappiness < 200 && swapped) + break; + scanned += delta; - if (scanned >= nr_to_scan) + if (scanned >= nr_to_scan) { + if (!swapped && sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH) + sc->memcgs_need_swapping = false; break; + } cond_resched(); }