From patchwork Tue Feb 8 08:18:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 12738305 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B089BC433EF for ; Tue, 8 Feb 2022 08:19:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F22826B009B; Tue, 8 Feb 2022 03:19:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED20B6B009C; Tue, 8 Feb 2022 03:19:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4B046B009D; Tue, 8 Feb 2022 03:19:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id C21DF6B009B for ; Tue, 8 Feb 2022 03:19:39 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8CC148249980 for ; Tue, 8 Feb 2022 08:19:39 +0000 (UTC) X-FDA: 79118913678.28.AF1F2F5 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf31.hostedemail.com (Postfix) with ESMTP id 4416420003 for ; Tue, 8 Feb 2022 08:19:39 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id x1-20020a25a001000000b0061c64ee0196so24559912ybh.9 for ; Tue, 08 Feb 2022 00:19:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=lRNGv978IZUzuZtn7+GJfB1JhMWHf1r6lSn8z5WI1mY=; b=INp8EKCPNwNrzFQ2lW8+ZqMYhAtdAEXCXLlJVllljVDNsoGTRm2ZuYxrtOcHVsp/hJ qmssG44HpUq64mP3uLn/4AVI1ej7zkoKRZWpxXkOxhxavN0fQ+I9m7cy7g8P97wLPGBw NY0xshp2tLyOPGdrOUuB5HoeJdMea7eAJE8C4wvmp6nNQciiFuE+WjSFzsZM5l3dFI8A 4dQU9Ou2fEJQqJbBAdBnKNy6nOdDXSgR7uVtCvlcRb795L8OEurajARCMGpgCZxjqeQP sEetj61pS6mz0Xca5MGDgAbECJWqr+tKRTa9ZXEmvfYTizKbzD38q8OtNntJi9hD/hoC tL4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=lRNGv978IZUzuZtn7+GJfB1JhMWHf1r6lSn8z5WI1mY=; b=NXLyX2n8om/jErqE009UDSn9pf927OhFDJZLtCEikouK3ez1DSRtHjGmRfwRNdAG9P x+mdxlgkkbIEXkWZ7T0aKv1eRkKlQK+FCYZ4Uqarlwr1hUFHVE1REoBc4aHDL/WX17BD 6hDvZ5oo7e/ZjJDDi4YNLJl0g0Mj9IvblS3G0UMgGov7fvOM52z+mohdoTUX100+27LY jjZc9hX/jifeshQLdmu4v8/QqGRA6luQE/5AsjBgrgn9zuyslQnn+0hTUZScokjyZ1lX Fe7CfRLjmoUJR/p/HC3mijpE9429xIKp6fF7Kg3cPHHK33v/MxVcTXHNlkM0foqEqdCn Cv4A== X-Gm-Message-State: AOAM533kB9Y9iUugbuj9dQffm/S3+e32zKVQWiT4hXzbX0E17Ieanl61 Y0vcJenlry8PruTQp7SOByZUrS25Jb0= X-Google-Smtp-Source: ABdhPJxJg59mt99jNZX53wlCHH+5BfJiYUEn9IVqtde0TbSBN1Ev7xHE2KNmWqJImN86wGIDNgk+93Hy6EU= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:5f31:19c3:21f5:7300]) (user=yuzhao job=sendgmr) by 2002:a81:3346:: with SMTP id z67mr3736711ywz.187.1644308378463; Tue, 08 Feb 2022 00:19:38 -0800 (PST) Date: Tue, 8 Feb 2022 01:18:58 -0700 In-Reply-To: <20220208081902.3550911-1-yuzhao@google.com> Message-Id: <20220208081902.3550911-9-yuzhao@google.com> Mime-Version: 1.0 References: <20220208081902.3550911-1-yuzhao@google.com> X-Mailer: git-send-email 2.35.0.263.gb82422642f-goog Subject: [PATCH v7 08/12] mm: multigenerational LRU: optimize multiple memcgs From: Yu Zhao To: Andrew Morton , Johannes Weiner , Mel Gorman , Michal Hocko Cc: Andi Kleen , Aneesh Kumar , Barry Song <21cnbao@gmail.com>, Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Michael Larabel , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, Yu Zhao , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , " =?utf-8?q?Holger_Hoffst=C3=A4tte?= " , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh X-Rspamd-Queue-Id: 4416420003 X-Stat-Signature: zzfkir9jp83mpppuqmxeh1o6m6u3tqnk X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=INp8EKCP; spf=pass (imf31.hostedemail.com: domain of 3micCYgYKCAk738qjxpxxpun.lxvurw36-vvt4jlt.x0p@flex--yuzhao.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3micCYgYKCAk738qjxpxxpun.lxvurw36-vvt4jlt.x0p@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1644308379-346083 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When multiple memcgs are available, it's possible to improve the overall performance under global memory pressure by making better choices based on generations and tiers. This patch adds a rudimentary optimization to select memcgs that can drop single-use unmapped clean pages first, and thus it reduces the chance of going into the aging path or swapping, which can be costly. Its goal is to improve the overall performance when there are mixed types of workloads, e.g., heavy anon workload in one memcg and heavy buffered I/O workload in the other. Though this optimization can be applied to both kswapd and direct reclaim, it's only added to kswapd to keep the patchset manageable. Later improvements will cover the direct reclaim path. Server benchmark results: Mixed workloads: fio (buffered I/O): -[28, 30]% IOPS BW patch1-7: 3117k 11.9GiB/s patch1-8: 2217k 8661MiB/s memcached (anon): +[247, 251]% Ops/sec KB/sec patch1-7: 563772.35 21900.01 patch1-8: 1968343.76 76461.24 Mixed workloads: fio (buffered I/O): -[4, 6]% IOPS BW 5.17-rc2: 2338k 9133MiB/s patch1-8: 2217k 8661MiB/s memcached (anon): +[524, 530]% Ops/sec KB/sec 5.17-rc2: 313821.65 12190.55 patch1-8: 1968343.76 76461.24 Configurations: (changes since patch 5) cat combined.sh modprobe brd rd_nr=2 rd_size=56623104 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 mkfs.ext4 /dev/ram1 mount -t ext4 /dev/ram1 /mnt memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=90m --group_reporting & pid=$! sleep 200 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed kill -INT $pid wait Client benchmark results: no change (CONFIG_MEMCG=n) Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh --- mm/vmscan.c | 45 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 5ab6cd332fcc..fc09b6c10624 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -127,6 +127,13 @@ struct scan_control { /* Always discard instead of demoting to lower tier memory */ unsigned int no_demotion:1; +#ifdef CONFIG_LRU_GEN + /* help make better choices when multiple memcgs are available */ + unsigned int memcgs_need_aging:1; + unsigned int memcgs_need_swapping:1; + unsigned int memcgs_avoid_swapping:1; +#endif + /* Allocation order */ s8 order; @@ -4343,6 +4350,22 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) VM_BUG_ON(!current_is_kswapd()); + /* + * To reduce the chance of going into the aging path or swapping, which + * can be costly, optimistically skip them unless their corresponding + * flags were cleared in the eviction path. This improves the overall + * performance when multiple memcgs are available. + */ + if (!sc->memcgs_need_aging) { + sc->memcgs_need_aging = true; + sc->memcgs_avoid_swapping = !sc->memcgs_need_swapping; + sc->memcgs_need_swapping = true; + return; + } + + sc->memcgs_need_swapping = true; + sc->memcgs_avoid_swapping = true; + current->reclaim_state->mm_walk = &pgdat->mm_walk; memcg = mem_cgroup_iter(NULL, NULL, NULL); @@ -4745,7 +4768,8 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw return scanned; } -static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness) +static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness, + bool *swapped) { int type; int scanned; @@ -4810,6 +4834,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap sc->nr_reclaimed += reclaimed; + if (!type && swapped) + *swapped = true; + return scanned; } @@ -4838,8 +4865,10 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool if (!nr_to_scan) return 0; - if (!need_aging) + if (!need_aging) { + sc->memcgs_need_aging = false; return nr_to_scan; + } /* leave the work to lru_gen_age_node() */ if (current_is_kswapd()) @@ -4861,6 +4890,8 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc { struct blk_plug plug; long scanned = 0; + bool swapped = false; + unsigned long reclaimed = sc->nr_reclaimed; struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec); @@ -4887,13 +4918,19 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc if (!nr_to_scan) break; - delta = evict_folios(lruvec, sc, swappiness); + delta = evict_folios(lruvec, sc, swappiness, &swapped); if (!delta) break; + if (sc->memcgs_avoid_swapping && swappiness < 200 && swapped) + break; + scanned += delta; - if (scanned >= nr_to_scan) + if (scanned >= nr_to_scan) { + if (!swapped && sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH) + sc->memcgs_need_swapping = false; break; + } cond_resched(); }