From patchwork Tue Jul 25 18:57:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13326948 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98BDEEB64DD for ; Tue, 25 Jul 2023 18:57:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BAF86B0074; Tue, 25 Jul 2023 14:57:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0686E8D0001; Tue, 25 Jul 2023 14:57:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFDD06B0080; Tue, 25 Jul 2023 14:57:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CF1E96B0074 for ; Tue, 25 Jul 2023 14:57:47 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A6A7CB2854 for ; Tue, 25 Jul 2023 18:57:47 +0000 (UTC) X-FDA: 81051043374.27.7BAEA9B Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf19.hostedemail.com (Postfix) with ESMTP id 9F4EA1A0010 for ; Tue, 25 Jul 2023 18:57:45 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=oUKCMg4O; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690311465; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/V9NTUtkZvbfiK86ydJfreRjSXSWs9ZMB0x0RhuuRao=; b=2JVWydbblbZ/qpZMnL+3023eKXV8Q3y567fw00CfrI7rJhSp8CT9/LHsW1htZ61/7eQLuy X2lo1PIlw/Bt0WAzeqSnSwbdImxPF8hsWWXIXvyqyh7sJIdqJ1PfKazWqVGqDhJcaHvVgE qkMmAps3os67r8Koe5SKl90krB44eoc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690311465; a=rsa-sha256; cv=none; b=GmP7zG7KqsF7MIvOjQ4etQdyPwTC88sikqcz/32wI5xpwMZopby2gssL7C1nlSumPLbZON gbVTk31kNmRB9Sbqe0S02aGVRtgu3kTzKNERXdoE7SE8smyBdPW4uTaFlx+RZc+Opzf79O fatQnf/IvjhjrJcn2AqcFpA8J4ieeE8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=oUKCMg4O; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1bb775625e2so1195285ad.1 for ; Tue, 25 Jul 2023 11:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690311464; x=1690916264; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=/V9NTUtkZvbfiK86ydJfreRjSXSWs9ZMB0x0RhuuRao=; b=oUKCMg4OcaAW5Xr2h/OI7RvqdLBEssH/wodTQ5IQojTWe0YPXd/cq9TeBwtU3Bjps2 BBEYjQVaiVl2Iynw8M0MTJJTQBMn5Nioa5cvGHTJknu1sWUV09WyIWT8bWdtdU+Pfu0K iumAZzrZ6EXVfy8Lnv9OAsYZGmg+O+lYgLWafNSHLMfDgiptP5GcKQMVR3L1kzjO7+8M J3IeBHFZPurY+Jn/Oz4l35gJQWVp3d/9mYYIHuLCIrdpXzglFNdGmQBMGxOWLaFST10w NFfUReokEEEL2zYP+OIax5FxasTZoI24+g/ZAQ2LD15HF4wZ3OTkEVgaYGNl4ZmToUdR EgrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690311464; x=1690916264; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/V9NTUtkZvbfiK86ydJfreRjSXSWs9ZMB0x0RhuuRao=; b=EbmWLM2YvGp7t6pb68SuyQwYr5GWbDZ8iZAdEin/giwzAtaCss8KonWYKMJgM6egd9 tJrjUFySaaw74hgEht/Ea0IJoGA/Fb2k7AwDsFxDneuQENMejRkpwc3vqGcxpw9MA+0K Jz8un71l4/tbiEpYaVCGqPOGCJG71UEEwKgWLSh7vMtRX1EZpyWiO+S1bwi2P0U+tuew RVremhdu/3BLRzUWP8pdl2hfEwYLNBY/wnhpoIyGEcvNhXZ7745F8gx9hgYMSOeT3AZe z73RBhsIIT1EwQ+W8A6WXfFsl0vn22brt4m1hMdn2wkUNiNcoTTKKpsChPaLn8O9YYcB 7dYA== X-Gm-Message-State: ABy/qLZaol+uPPELYO6BLA7XCxpokv4uC8k79MTcGe3MW479rV5Tsv6B aEJU5/Xxaw1rT+JEaxh1vllIwFPbC+9iBq+MZqUA8A== X-Google-Smtp-Source: APBJJlEjRKuTzo4fQoO8+6CpxfSSBT5eNuAwWVSBJUrAe61HphyeTU1TX5zhT5PfvqDlHtAl5f8LEQ== X-Received: by 2002:a17:903:32c5:b0:1bb:377f:8cbf with SMTP id i5-20020a17090332c500b001bb377f8cbfmr51183plr.16.1690311463505; Tue, 25 Jul 2023 11:57:43 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([240e:306:1582:8dc2:70ef:4d19:edfe:c3a3]) by smtp.gmail.com with ESMTPSA id j17-20020a170902c3d100b001b03f208323sm11443150plj.64.2023.07.25.11.57.40 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 25 Jul 2023 11:57:42 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , Kairui Song Subject: [RFC PATCH 1/4] workingset: simplify and use a more intuitive model Date: Wed, 26 Jul 2023 02:57:29 +0800 Message-ID: <20230725185733.43929-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230725185733.43929-1-ryncsn@gmail.com> References: <20230725185733.43929-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 9F4EA1A0010 X-Rspam-User: X-Stat-Signature: 7xzxasitrk7mfcg76y8n59pyzh4i4und X-Rspamd-Server: rspam03 X-HE-Tag: 1690311465-343996 X-HE-Meta: U2FsdGVkX187EDczsbhN99pgUCRgHSeqmGukj++IzK/o903hNiaOzxy04/aaHKZiUbwCT/X+KZZUfcSGYlgckUs/ZWym9AtA6IYSyxH5dFtYK1iDzKKd19BuPykNeySdO7kIushOBH6bY6WjBSYPVmk7Nqwg1ss02Am0mJA175XPEjx5RhQN8T9XBiGk6puMa6uyqE6IwkKCVy/gnBaugz87yinyGo4Hb7Ud69VvFYyAYaH7rPZrJeO0+OgSNDX3aYYpB1u0XDrWjnjvlvdq/cEuJXD7nYflxfZK+Y7SclpyeDAq/PWxDXSqD8zPg2EQI1HABrbCgXV2eSQ7pte1Q5G7avZwNcgeUPjDi5srNNBnmdsVk8pdFX19DDsvdPEeKSS10UYsAzi8Uf0AkQQcl91aqdm6NIhV+dsyse041P8xIIxKVM8p2V7cFVM7t0PS8W7i1Qn+3FmUSBhUsNz3f7UqyksUElyJxjvckymGRaFN1FrYnnB0MmPQhxhfBdTIs62fEEXTVI8PPYnvfazZ0nUGrgWOhBntuPSK4oljX9LzGrqpiKOQS7FUDJ252taVWBLzjTasJojDK4XRdKP9mIQHFDqRI4Mp/L8wE0IvpflojMxMApx5TKhxkSh/lIl/xzVlfRPHPEAbv4C+7FVM+7bfFdz4zE/AmedmqLIQWpTQyAFy65bS4nDBfn24fxz4xaxzNkarcBVdm0B+U2t705wqdCQdQ6T9I8I/1eQXY7B30sI7vd1MaWPjsvUwTdTf37kfCZLxRjXz5SpHeUu7JQpX+fSArJRyLN47rlGU4wPGfQH3/t5mDic4GGdktwP9o3Eg5kjBMbcZ+YN2tz5dgwBtAlHH6cPVEPQ3YwH0pn5JQBsQsYz7UXwSrYfoviaEzd2yd3BKfL+ygnBuAnRhM93GQzsHLcEFR2mbSFbOQgEgc5On53CzOCtLocG18D09IcSH8VeEBG+C+PCh8T/ mAtpFvCU m/MWRMmnuD6IXogOfTIpZAkGCvODmIx8f5xzTTYN3LteavI2A/33qY6efdGwUQf9C4oTsvqGcp17e8TIbdVZDg77tqrF8z2qZ2EjcxNe+nOiSSlqYgkBG2Cx62f7X3wXgevumkVwgOTpgegQAfcxsD9tcGuL2pFEkkC/DmsyI62Sdh6byZ0N9wUIFw5/AvIK+GTIttrhb+C8gZIUXMATCU8aSmH/1ARY9kcjrg9sGEQY/ARCnOzqUo6mEjA6aposN29NndRGOs5vo23neK6E0uETC0vP+Eu5m1PVotHc30PdeVvFJ3t5cgIqGMdJ3px/t8fnDSnCYDZ5S0TNDfMF/TMojFNpEySb2z+OgOMsc1RoCBTWq979l4GPHZWcU2N6+RrZunrlyMi2CkbmPs7YKJGk+BgOgKnVkJ0dTGr/8CGky2n3j0Jkvj+m6SZpAtNaWK2Qh3hSYvG8wJWaIdjxeePlq+Xv7xf9cAeOpu5+6OJAQwrid7ciSIOU/pl2T2e+ojMpt3Nj0VKG7HovjfqmTibbYVopJghzG1203NUCDGV2Yt+VTesG/dc/vuxsnRRvp9aUu0tQqTqsskqfEVLP9J3OvPrZp9Z9QrXZGE9ifQ+s5WpTMHVrQjMB0CB+Rx1IO4UgB3pLiL2WEdOA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song This basically removed workingset_activation and reduced calls to workingset_age_nonresident. The idea behind this change is a new way to calculate the refault distance which seems working fine in most cases and fits for MGLRU (in later commits). Current refault distance is based on two assumptions: 1. Activation of an inactive page will left-shift LRU pages (consider LRU starts from right). 2. Eviction of an inactive page will left-shift LRU pages. Assumption 2 is correct, but assumption 1 is not always true, the activated page could be anywhere in the LRU list, it only left-shift the pages on its right. And one page can get activate/deactivated for multiple times. And MGLRU doesn't fit with this model, since there are multiple gens, and pages are getting aged and activated constantly upon gen growth. So instead we introduce a new idea here, "Shadow LRU Position". Simply consider the evicted pages are still in memory, each has an eviction sequence like before. Let the `nonresistence_age` be NA and get increased for each eviction, so the "Shadow LRU Position" of one evicted page will be: SP = ((NA's reading @ eviction) - (NA's reading @ current)) +---------------------------------------+==========+========+ | * Shadow LRU O O O | INACTIVE | ACTIVE | +----+----------------------------------+==========+========+ | | +----------------------------------+ | SP refault page O -> Hole left by previously refaulted page * -> The page corresponding to SP Now since SP simply stands for how much currently workflow could push a page out of current memory, which also means if the page started on INACTIVE part, it *may* get re-activated if it right shift SP slots into the ACTIVE list and still doesn't go exceed total memory, which is: SP + NR_INACTIVE < NR_INACTIVE + NR_ACTIVE Which can be simplified to: SP < NR_ACTIVE And since this is only an estimation, based on several hypothesis and it actually violates the normal routine of LRU when LRU is working well, so throttle this by two factor: 1. Previously refaulted pages may leave "holes" on the shadow LRU and decrease re-active rate for distant shadow pages. 2. When the ACTIVE part of LRU is long enough, chanllaging them by activating one-time faulted inactive page may not be a good idea so throttle it by the ratio of ACTIVE/INACTIVE. Combined all above, we have: Upon refault: - If ACTIVE LRU is low, check if SP < NR_ACTIVE to check for re-activation. - If ACTIVE LRU is high, check if SP < min(NR_ACTIVE, NR_INACTIVE) / (exponential ratio of ACTIVE / INACTIVE). This is simpler than before since no longer need to do lruvec operations when activating a page, and so far, a few benchmarks shows a fair result. Using memtier and fio test from commit ac35a4902374 but scaled down to fit in my test environment, and some other test results: memtier test (with 16G ramdisk as swap and 2G cgroup limit): memcached -u nobody -m 16384 -s /tmp/memcached.socket \ -a 0700 -t 12 -B binary & memtier_benchmark -S /tmp/memcached.socket -P memcache_binary -n allkeys\ --key-minimum=1 --key-maximum=24000000 --key-pattern=P:P -c 1 \ -t 12 --ratio 1:0 --pipeline 8 -d 2000 -x 6 fio test (with 16G ramdisk on /mnt and 4G cgroup limit): fio -name=refault --numjobs=12 --directory=/mnt --size=1024m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=5m --runtime=5m --group_reporting Pgbench setup using phronix-test-suite with scale 1000 and 50 clients on a 5G VM. Linux Kernel compliation test done with defconfig on a 2G VM. Before: memcached: 48157.04 ops/s read: IOPS=2003k, BW=7823MiB/s (8203MB/s)(2292GiB/300001msec) pgbench: 5845 qps build-linux: 247.063 After: memcached: 49144.55 ops/s read: IOPS=2005k, BW=7832MiB/s (8212MB/s)(2294GiB/300002msec) pgbench: 5832 qps build-linux: 247.302 Signed-off-by: Kairui Song --- include/linux/swap.h | 2 - mm/swap.c | 1 - mm/vmscan.c | 2 - mm/workingset.c | 217 +++++++++++++++++++++---------------------- 4 files changed, 108 insertions(+), 114 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 456546443f1f..43e48023c4c4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -350,10 +350,8 @@ static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry) /* linux/mm/workingset.c */ bool workingset_test_recent(void *shadow, bool file, bool *workingset); -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg); void workingset_refault(struct folio *folio, void *shadow); -void workingset_activation(struct folio *folio); /* Only track the nodes of mappings with shadow entries */ void workingset_update_node(struct xa_node *node); diff --git a/mm/swap.c b/mm/swap.c index cd8f0150ba3a..685b446fd4f9 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -482,7 +482,6 @@ void folio_mark_accessed(struct folio *folio) else __lru_cache_activate_folio(folio); folio_clear_referenced(folio); - workingset_activation(folio); } if (folio_test_idle(folio)) folio_clear_idle(folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index 1080209a568b..e7906f7fdc77 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2539,8 +2539,6 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, lruvec_add_folio(lruvec, folio); nr_pages = folio_nr_pages(folio); nr_moved += nr_pages; - if (folio_test_active(folio)) - workingset_age_nonresident(lruvec, nr_pages); } /* diff --git a/mm/workingset.c b/mm/workingset.c index 4686ae363000..c0dea2c05f55 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -180,9 +180,10 @@ */ #define WORKINGSET_SHIFT 1 -#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ +#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ WORKINGSET_SHIFT + NODES_SHIFT + \ MEM_CGROUP_ID_SHIFT) +#define EVICTION_BITS (BITS_PER_LONG - (EVICTION_SHIFT)) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) /* @@ -226,8 +227,105 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, *workingsetp = workingset; } -#ifdef CONFIG_LRU_GEN +/* + * Get the distance reading at eviction time. + */ +static inline unsigned long lru_eviction(struct lruvec *lruvec, + int bits, int bucket_order) +{ + unsigned long eviction = atomic_long_read(&lruvec->nonresident_age); + + eviction >>= bucket_order; + eviction &= ~0UL >> (BITS_PER_LONG - bits); + + return eviction; +} + +/* + * Calculate and test refault distance + */ +static bool lru_refault(struct mem_cgroup *memcg, + struct lruvec *lruvec, + unsigned long eviction, + int bits, int bucket_order) +{ + unsigned long refault, distance; + unsigned long active, inactive; + + eviction <<= bucket_order; + refault = atomic_long_read(&lruvec->nonresident_age); + + /* + * The unsigned subtraction here gives an accurate distance + * across nonresident_age overflows in most cases. There is a + * special case: usually, shadow entries have a short lifetime + * and are either refaulted or reclaimed along with the inode + * before they get too old. But it is not impossible for the + * nonresident_age to lap a shadow entry in the field, which + * can then result in a false small refault distance, leading + * to a false activation should this old entry actually + * refault again. However, earlier kernels used to deactivate + * unconditionally with *every* reclaim invocation for the + * longest time, so the occasional inappropriate activation + * leading to pressure on the active list is not a problem. + */ + distance = (refault - eviction) & (~0UL >> (BITS_PER_LONG - bits)); + + active = lruvec_page_state(lruvec, NR_ACTIVE_FILE); + inactive = lruvec_page_state(lruvec, NR_INACTIVE_FILE); + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) { + active += lruvec_page_state(lruvec, NR_ACTIVE_ANON); + inactive += lruvec_page_state(lruvec, NR_INACTIVE_ANON); + } + + /* + * When there are already enough active pages, be less aggressive + * on activating pages, challenge already established workingset with + * one time refaulted page may not be a good idea, especially as + * the gap between active workingset and inactive queue grows larger. + */ + if (active > inactive) + return distance < inactive >> (1 + (fls_long(active) - fls_long(inactive)) / 2); + + /* + * Compare the distance to the existing workingset size. We + * don't activate pages that couldn't stay resident even if + * all the memory was available to the workingset. Whether + * workingset competition needs to consider anon or not depends + * on having free swap space. + */ + return distance < active; +} + +/** + * workingset_age_nonresident - age non-resident entries as LRU ages + * @lruvec: the lruvec that was aged + * @nr_pages: the number of pages to count + * + * As in-memory pages are aged, non-resident pages need to be aged as + * well, in order for the refault distances later on to be comparable + * to the in-memory dimensions. This function allows reclaim and LRU + * operations to drive the non-resident aging along in parallel. + */ +static void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages) +{ + /* + * Reclaiming a cgroup means reclaiming all its children in a + * round-robin fashion. That means that each cgroup has an LRU + * order that is composed of the LRU orders of its child + * cgroups; and every page has an LRU position not just in the + * cgroup that owns it, but in all of that group's ancestors. + * + * So when the physical inactive list of a leaf cgroup ages, + * the virtual inactive lists of all its parents, including + * the root cgroup's, age as well. + */ + do { + atomic_long_add(nr_pages, &lruvec->nonresident_age); + } while ((lruvec = parent_lruvec(lruvec))); +} +#ifdef CONFIG_LRU_GEN static void *lru_gen_eviction(struct folio *folio) { int hist; @@ -342,34 +440,6 @@ static void lru_gen_refault(struct folio *folio, void *shadow) #endif /* CONFIG_LRU_GEN */ -/** - * workingset_age_nonresident - age non-resident entries as LRU ages - * @lruvec: the lruvec that was aged - * @nr_pages: the number of pages to count - * - * As in-memory pages are aged, non-resident pages need to be aged as - * well, in order for the refault distances later on to be comparable - * to the in-memory dimensions. This function allows reclaim and LRU - * operations to drive the non-resident aging along in parallel. - */ -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages) -{ - /* - * Reclaiming a cgroup means reclaiming all its children in a - * round-robin fashion. That means that each cgroup has an LRU - * order that is composed of the LRU orders of its child - * cgroups; and every page has an LRU position not just in the - * cgroup that owns it, but in all of that group's ancestors. - * - * So when the physical inactive list of a leaf cgroup ages, - * the virtual inactive lists of all its parents, including - * the root cgroup's, age as well. - */ - do { - atomic_long_add(nr_pages, &lruvec->nonresident_age); - } while ((lruvec = parent_lruvec(lruvec))); -} - /** * workingset_eviction - note the eviction of a folio from memory * @target_memcg: the cgroup that is causing the reclaim @@ -396,11 +466,11 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg) lruvec = mem_cgroup_lruvec(target_memcg, pgdat); /* XXX: target_memcg can be NULL, go through lruvec */ memcgid = mem_cgroup_id(lruvec_memcg(lruvec)); - eviction = atomic_long_read(&lruvec->nonresident_age); - eviction >>= bucket_order; + + eviction = lru_eviction(lruvec, EVICTION_BITS, bucket_order); workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(memcgid, pgdat, eviction, - folio_test_workingset(folio)); + folio_test_workingset(folio)); } /** @@ -418,9 +488,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) { struct mem_cgroup *eviction_memcg; struct lruvec *eviction_lruvec; - unsigned long refault_distance; - unsigned long workingset_size; - unsigned long refault; int memcgid; struct pglist_data *pgdat; unsigned long eviction; @@ -429,7 +496,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) return lru_gen_test_recent(shadow, file, &eviction_lruvec, &eviction, workingset); unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); - eviction <<= bucket_order; /* * Look up the memcg associated with the stored ID. It might @@ -450,50 +516,10 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) eviction_memcg = mem_cgroup_from_id(memcgid); if (!mem_cgroup_disabled() && !eviction_memcg) return false; - eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); - refault = atomic_long_read(&eviction_lruvec->nonresident_age); - /* - * Calculate the refault distance - * - * The unsigned subtraction here gives an accurate distance - * across nonresident_age overflows in most cases. There is a - * special case: usually, shadow entries have a short lifetime - * and are either refaulted or reclaimed along with the inode - * before they get too old. But it is not impossible for the - * nonresident_age to lap a shadow entry in the field, which - * can then result in a false small refault distance, leading - * to a false activation should this old entry actually - * refault again. However, earlier kernels used to deactivate - * unconditionally with *every* reclaim invocation for the - * longest time, so the occasional inappropriate activation - * leading to pressure on the active list is not a problem. - */ - refault_distance = (refault - eviction) & EVICTION_MASK; - - /* - * Compare the distance to the existing workingset size. We - * don't activate pages that couldn't stay resident even if - * all the memory was available to the workingset. Whether - * workingset competition needs to consider anon or not depends - * on having free swap space. - */ - workingset_size = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); - if (!file) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_INACTIVE_FILE); - } - if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_ACTIVE_ANON); - if (file) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_INACTIVE_ANON); - } - } - - return refault_distance <= workingset_size; + return lru_refault(eviction_memcg, eviction_lruvec, eviction, + EVICTION_BITS, bucket_order); } /** @@ -543,7 +569,6 @@ void workingset_refault(struct folio *folio, void *shadow) goto out; folio_set_active(folio); - workingset_age_nonresident(lruvec, nr); mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + file, nr); /* Folio was active prior to eviction */ @@ -560,30 +585,6 @@ void workingset_refault(struct folio *folio, void *shadow) rcu_read_unlock(); } -/** - * workingset_activation - note a page activation - * @folio: Folio that is being activated. - */ -void workingset_activation(struct folio *folio) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); - /* - * Filter non-memcg pages here, e.g. unmap can call - * mark_page_accessed() on VDSO pages. - * - * XXX: See workingset_refault() - this should return - * root_mem_cgroup even for !CONFIG_MEMCG. - */ - memcg = folio_memcg_rcu(folio); - if (!mem_cgroup_disabled() && !memcg) - goto out; - workingset_age_nonresident(folio_lruvec(folio), folio_nr_pages(folio)); -out: - rcu_read_unlock(); -} - /* * Shadow entries reflect the share of the working set that does not * fit into memory, so their number depends on the access pattern of @@ -777,7 +778,6 @@ static struct lock_class_key shadow_nodes_key; static int __init workingset_init(void) { - unsigned int timestamp_bits; unsigned int max_order; int ret; @@ -789,12 +789,11 @@ static int __init workingset_init(void) * some more pages at runtime, so keep working with up to * double the initial memory by using totalram_pages as-is. */ - timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT; max_order = fls_long(totalram_pages() - 1); - if (max_order > timestamp_bits) - bucket_order = max_order - timestamp_bits; + if (max_order > EVICTION_BITS) + bucket_order = max_order - EVICTION_BITS; pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n", - timestamp_bits, max_order, bucket_order); + EVICTION_BITS, max_order, bucket_order); ret = prealloc_shrinker(&workingset_shadow_shrinker, "mm-shadow"); if (ret) From patchwork Tue Jul 25 18:57:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13326949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DD65EB64DD for ; Tue, 25 Jul 2023 18:57:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E25E76B007B; Tue, 25 Jul 2023 14:57:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD64E6B0080; Tue, 25 Jul 2023 14:57:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C773A6B0081; Tue, 25 Jul 2023 14:57:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B8D176B007B for ; Tue, 25 Jul 2023 14:57:51 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9160D40F76 for ; Tue, 25 Jul 2023 18:57:51 +0000 (UTC) X-FDA: 81051043542.01.3393BD7 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf11.hostedemail.com (Postfix) with ESMTP id 756FA4001E for ; Tue, 25 Jul 2023 18:57:49 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=ezzYOc28; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690311469; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yph/ZlbCl28tbkn0Iq6ks5vDDL2gwzA7iKSLMUU7U2M=; b=5HqxXNA9DhCGmznIMYdFE5FW+Lmg1NQITSyoNF7p/Kvyy7m1uFGedPBfSP2UziZ44ZRu9X rx7G2M2iD1zCAwwTR4i1wzhZh9ntXdTc8ywHDCCP8yAOLiResoXwB7FM2dMsV9L6NNaQCv wzkJItFlwgR6793c3K4ue5csVLoGYM0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690311469; a=rsa-sha256; cv=none; b=rzfQuRfmDs+3ji5RxyOLhi7M4f1N77Kbrg9U5mH26fCL0WoCQdZLJND0rYZdT8NA1PaNP+ hx4zRto52hVIfmqAdvg7OrjH+DiUKXOxrqV0bWTvUaQ70fgxIzjuRVn6juxDG39g11fLwo mrARVkEzyY6bGqkLoWePfLVXOcCqhS4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=ezzYOc28; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1b9e9765f2cso30938185ad.3 for ; Tue, 25 Jul 2023 11:57:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690311467; x=1690916267; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=Yph/ZlbCl28tbkn0Iq6ks5vDDL2gwzA7iKSLMUU7U2M=; b=ezzYOc285NqQ0TNbMdWpxvC1ToDwlZu0iYtsXbievJqp6wzVnZvQy/Ja0mvhvEmbO8 /o24m5x2ZQ5J2HlrcyGQUyJSHsjpfaN5PZmJKSWDvA9hPvaapz+RlaJq2KOQJN9RweeU PADYTZqSeT1hMTCMAyWbuv35kP9azkj1sHtXNC65XT3e+BBawL6T3l+bcS4HtL3nfs+a fq0rGXozbisod1W0GwVO5g/1KEtolR7I63NO+wiNdZhCRTMDEwKO6NGGbt9MZOeUK8eA 7UqUHZ8EFLILFN3hOHRX8ExkR2w8yZt0O7GOvqlbkUh5jv7DUIaDgeR7XfGPx+34xUQU pFpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690311467; x=1690916267; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Yph/ZlbCl28tbkn0Iq6ks5vDDL2gwzA7iKSLMUU7U2M=; b=TqVJWFhcSBW3ZmBQR6VT5vfUOx/5vefkFyt/cxQHiXaHuQnIl/RrXpHfnLUP7RIJYJ iezViemq/uB5bhj0W3waITC6jEnn0+yyMcxFlYZ859qlwxywkIxIXZsY+mBVMkqVAoaX haCQkfKqr6AcJwAj2yFC/0b++bDMzH0o3bZDLOy5Ho+XlAXy4TcCWNkD67X1krFhHrxB iockN18yvT6oSc+fEPr59E0Z9KrsSH6EWd3J/E98iNlz8IC+VtkQO+7vukgrzPteGPQk BNw5ncWKlpvRWJjjz9o4XSOb8IP6kM9O92cCWo2OKtTGglFn2di8fl/lkmcStib219sE grVg== X-Gm-Message-State: ABy/qLbSPm+GgoD2yA0I9ofvUnEOTElVE2hKeROq1rxtvEHszGCbZzGs I7AQzLQ4EVvoXcnHCrIe/dHB2soMuoToUj842rh/GA== X-Google-Smtp-Source: APBJJlEtrcJe4CtotFOxB3ORWHUUSl/kexVmRP7Y+qoCEdP25Wfgf54RYq35V16lRYJQLQt2t2VfrA== X-Received: by 2002:a17:902:728a:b0:1b9:dea2:800f with SMTP id d10-20020a170902728a00b001b9dea2800fmr42799pll.8.1690311467409; Tue, 25 Jul 2023 11:57:47 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([240e:306:1582:8dc2:70ef:4d19:edfe:c3a3]) by smtp.gmail.com with ESMTPSA id j17-20020a170902c3d100b001b03f208323sm11443150plj.64.2023.07.25.11.57.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 25 Jul 2023 11:57:46 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , Kairui Song Subject: [RFC PATCH 2/4] workingset: simplify lru_gen_test_recent Date: Wed, 26 Jul 2023 02:57:30 +0800 Message-ID: <20230725185733.43929-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230725185733.43929-1-ryncsn@gmail.com> References: <20230725185733.43929-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 756FA4001E X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: kaej3puqheu9bike5rnzqdtjhmcwuj1i X-HE-Tag: 1690311469-386961 X-HE-Meta: U2FsdGVkX19n+0/grLcQruuZp93UVRHx2lB3ZcNxaN/XT0SFAUC5qvci8f+Pdr6IaQ3uxHhoF69oga+2NUvLLwjPLxL1UmNiv9tUsKFeAWl4Z2ZPvuC3G1gVlODEFNIARVQw6FiGLaUsscgecZjPrtVqbG8Ji0zLHPwEtM/9t2ee20VfzPFEegfvClByYo5GAxBeERxUGsJU7CSzw55VM50Lc5lyHXpiqesbazSR73fsXlqZzJYNt49+0Zi3mB7MmS4OIKttwCUVOJb6yeU/bLrQevqiEDBZB2LiO3zMIoisZHVCxaUUlxsjUuYnXXeBvo55OfMUKA+7V6ZXO1hK1fe4ZfD//neKHMsZSXkJyGZha3MMqKviiHQ01ki3WuRjg8cd6TByIqttlxQPSDbpbUpicZ25EecpR2/1SeD7LOUteeEzpAetJgUQA7R1tKkk05MVOpn2VAOpXljiwTuKtjT+nsHsLSHGsdlpAA8qtcGgJExTa9dwnCcCj8TVj4o7oGsrfjLB5HPJnozRK4das4OKANaqj8bqBy7qToHiT/YKc+9aSfm3NA28E+L0vPC+i5fm467oJuYdEZOl9kwrEXnH8XK5020BbUPpSAnTt44pWUjhkTXqg7JthDXzQc/ocUojSyWmTR2zjjLh1PTaqeTDwKeDKzvuQvSbLejymdf2AgZuTzrNuqyfLP9+qowplC3N7eQr8/2ZQFZbUfI+0LvDe1u+tfdv+jAKCtqv+rKXNkhVDJ8xzk7H6HnoDF1GDvBeRNXdXrvYJZwfG2vBoiHKjk1HCWScVrQWgs79CeO5dH1Lt+WlcEhbeq3O3MeFjSyjXkMAvCudIXJMOrQG7TSDc9gya7AR8HRYhONbSpbprD2LmweDzJEhcsFskvNOAtAS8tsMI3rRZNxLcXM9ihfr+WXWjyFF3WwLYnKEZQoB1IWbMzeeRHwNdDkh9xceu3qV2EjsS6Ofx0PvDY9 MDg2zziu 1j+bBHI9DgQSpbWG0AMJPGr800iQNIpCwC8Bi9plC+K2yGGnFAxyfgwGq4I1QvNGeWkP3P61RxvEw5e3SONrfkCHJxnQMIrhJJBaowQ7VHy9H/ErtF9jQ4DmFb/avYincdX2u2W2wBsswBCzD2leOGGe0qiMzXmJ0jiwymGVFjZVjIJPytuyz8ehP93iGMMpZj6ulO9xnZbN1WTtpydmkXbvDEPfWXulktn0I8oGbAn9TXLKetnP5wAXIi2B2pkoxRSrxD54mx4ZSnVYRlaKjU5tTPBsedB8AmAIXTT3FTCKJj/tRBCN5Byl88WMkz/bxI8LchPpr24TrzMKAaHTk2wIny6kIkigCTEQmZMpgxiub4fvYcSYsGvW2fO7ZufG3QWnubL/vhxh1Y5IuNnfUA67NZzcIFjkp8iz2q/9l31fIKvSiCCLA/pkdhXYw2GgtMiLUO03BjySYlwgYOhKWx343ydY4zUYOCKBbRFv+W0Rp6vwtdqfz0D/t5rT+NT9+S1fKyt26a3xl9svrKfc83MKHDWKrZX4U4e2VWUrASFF1MfzFm1KXPbOeEvwf+xMSEEj1tKTVLcEbVEl6Yrl1a1Wv4EjDtVrmwQqA7dABRZUDIGdVa1Wf+0PVuBeEyfbZktHxL0hsjBy3DZhTQt6XxV46KEj+vZVQXXerkNrp72ohMWe+89LHEz7wFw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.088275, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song Simplify the code, move some common path into its caller, prepare for following commits. Signed-off-by: Kairui Song --- mm/workingset.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index c0dea2c05f55..126f1fec41ed 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -357,42 +357,38 @@ static void *lru_gen_eviction(struct folio *folio) * Tests if the shadow entry is for a folio that was recently evicted. * Fills in @lruvec, @token, @workingset with the values unpacked from shadow. */ -static bool lru_gen_test_recent(void *shadow, bool file, struct lruvec **lruvec, - unsigned long *token, bool *workingset) +static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, + unsigned long token) { - int memcg_id; unsigned long min_seq; - struct mem_cgroup *memcg; - struct pglist_data *pgdat; - unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset); - - memcg = mem_cgroup_from_id(memcg_id); - *lruvec = mem_cgroup_lruvec(memcg, pgdat); - - min_seq = READ_ONCE((*lruvec)->lrugen.min_seq[file]); - return (*token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); + min_seq = READ_ONCE(lruvec->lrugen.min_seq[file]); + return (token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); } static void lru_gen_refault(struct folio *folio, void *shadow) { + int memcgid; bool recent; - int hist, tier, refs; bool workingset; unsigned long token; + int hist, tier, refs; struct lruvec *lruvec; + struct pglist_data *pgdat; struct lru_gen_folio *lrugen; int type = folio_is_file_lru(folio); int delta = folio_nr_pages(folio); rcu_read_lock(); - recent = lru_gen_test_recent(shadow, type, &lruvec, &token, &workingset); + unpack_shadow(shadow, &memcgid, &pgdat, &token, &workingset); + lruvec = mem_cgroup_lruvec(mem_cgroup_from_id(memcgid), pgdat); if (lruvec != folio_lruvec(folio)) goto unlock; mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); + recent = lru_gen_test_recent(lruvec, type, token); if (!recent) goto unlock; @@ -492,9 +488,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) struct pglist_data *pgdat; unsigned long eviction; - if (lru_gen_enabled()) - return lru_gen_test_recent(shadow, file, &eviction_lruvec, &eviction, workingset); - unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); /* @@ -518,6 +511,9 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) return false; eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); + if (lru_gen_enabled()) + return lru_gen_test_recent(eviction_lruvec, file, eviction); + return lru_refault(eviction_memcg, eviction_lruvec, eviction, EVICTION_BITS, bucket_order); } From patchwork Tue Jul 25 18:57:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13326950 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7C23EB64DD for ; Tue, 25 Jul 2023 18:57:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 636A46B0080; Tue, 25 Jul 2023 14:57:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E7456B0081; Tue, 25 Jul 2023 14:57:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AF1F6B0082; Tue, 25 Jul 2023 14:57:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3C8316B0080 for ; Tue, 25 Jul 2023 14:57:56 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F1EB6140E8F for ; Tue, 25 Jul 2023 18:57:55 +0000 (UTC) X-FDA: 81051043710.01.F367A88 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf10.hostedemail.com (Postfix) with ESMTP id 14247C001D for ; Tue, 25 Jul 2023 18:57:52 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=epCKqrGF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690311473; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=znBcrDJlu2vhy2NP1qwf2fUju20MlCI6lGcj+tnBX+8=; b=cr+EBuL5GP6IoG9xVIKwOs9NqZtVrbJRoJ3k9fR1GBqb6utF1qZuZa6yXUOWXMYcfr0F5d Qwng18t6HfiCWWwWAWRV2qbd+en5RqkZNZ9VUHXtSXpssusNBaxULJEXFdurVBwqiXUduG Jj8M4SHvYGhypGv1LUyu9gMsVN/fn0g= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=epCKqrGF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690311473; a=rsa-sha256; cv=none; b=B6pSm4PDJmTzCCp4k/9AZnslU9+oz+ZLQq/FyBugOTrW0H2pHK/uniXD3eXE5SSpb+j5dz jO+KeYi5aHXZlnlJ60hABRI29IhbUNq4ThjsGm6F17DSERz5af7pKa8YPxxlKuQyEBiTF5 goZreN01+rr4zj2BESJQ3slRCMs34p4= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1b9cdef8619so37167145ad.0 for ; Tue, 25 Jul 2023 11:57:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690311471; x=1690916271; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=znBcrDJlu2vhy2NP1qwf2fUju20MlCI6lGcj+tnBX+8=; b=epCKqrGFvJFObLRdi8TOkSMjEfFMJntgFRngtZomoX0L3Eu+EfMiBkW5of9rNQsmna JKMizQFy5MgVXtHcBWBkgQc2edojoHH2JAZvm8wLDT4rcpHgaxU5a/af2I0qW39xD/e1 kMAMz+QHcF/l23aPNtwQ1kEel2G03vEimI7WJwrdJpbgF+GhDijdCSY4CVXYXmvpUrY8 FLGWVLdrE+aeIbtWDla0nWdbmmGhwS72eJKBL38n8adu3eZICSXQ/4rwKuh9OP46KeId GOAc6b32Tav7vTZXIa0s9wDygiWbZw9PTW5ehh+FVEv30u+DH59Vfr19X6ugM+jsrG8l sxMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690311471; x=1690916271; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=znBcrDJlu2vhy2NP1qwf2fUju20MlCI6lGcj+tnBX+8=; b=X7JojBnhWIs5q2Pv9/99o60c9CY0Mds81xOC+B6cGxRPWHfQ7AI9FclV07KjOXIjE0 WIGESsugz3q9cn3INcszfElC1REYhMzDE7aXsB9oL5S8zFW89aPFlYEyJtcEliehaRzG BmXEKNid4ON4bVPH/IWgfiBkw6KA30dBYsK8gTxPBOsV5epASWuIJRm9gk2Ti+Q0C0Nh 7WzfgZVW0CvU3tDh+datwh/JX2S55TRa2UcRXZ0jbvPNuixGsbDcuAC05BpiET2ftvHM qvqmbmW6VxgmaluOw9HQl0x7MqopRyYn7DqJjvXWtQ8+rnQGYTkfKlW/coAu49eN7iN2 SQ+A== X-Gm-Message-State: ABy/qLYuR4aEvMtzV3dP1GDxmFeqqNM/Hqytunx4k436+fDjFSWh5GDA iSwIy5pKz4+6iFsxFvjRbEUwPhEgx9SOqg1tLFnOew== X-Google-Smtp-Source: APBJJlGqhdPZvsK0xai2Q554lWRYz9xrPShpCSnAtyuNX/JGRlE5azgc8XlpPZBkizCXQg8xzZAgmw== X-Received: by 2002:a17:903:1105:b0:1bb:3a7:6af7 with SMTP id n5-20020a170903110500b001bb03a76af7mr48512plh.23.1690311471261; Tue, 25 Jul 2023 11:57:51 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([240e:306:1582:8dc2:70ef:4d19:edfe:c3a3]) by smtp.gmail.com with ESMTPSA id j17-20020a170902c3d100b001b03f208323sm11443150plj.64.2023.07.25.11.57.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 25 Jul 2023 11:57:50 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , Kairui Song Subject: [RFC PATCH 3/4] lru_gen: convert avg_total and avg_refaulted to atomic Date: Wed, 26 Jul 2023 02:57:31 +0800 Message-ID: <20230725185733.43929-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230725185733.43929-1-ryncsn@gmail.com> References: <20230725185733.43929-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 14247C001D X-Stat-Signature: kzoeqt3qw178wbobynzo71fy4rgzp6pi X-Rspam-User: X-HE-Tag: 1690311472-813734 X-HE-Meta: U2FsdGVkX1+x/Nf+BL9ksPMhjrxM2MuasHTiLydpWPLeSpUJdKmplfU45HF8GuEW5ibT8NdcQcp15EHLNIOusPwyr+j9rhBcTzrsNj/EBigZJNjlqDpqOAkHkP+zsiI0y/iZuGlqttSnSdYa5Csfq1Du2iF2O2UzMkuY8/POztKwwsGNLCvkMNJcEAVeARniAWgmmMdZHEb9i+ok0B3Gnmds2JQbp3TbQVf5Uw/CwDywjjFSNO4KT8yVvhkuXOE4w456k2yR2ts2xDGr+S6RzH03/A20+GJEu7IEPGiyeaxjW2MRBB7SNDsSabIo+Rj4BRHiVFEqZgkqg/Hg6cE6kzBCMEYDxiAGD9zhsODSLjxxAk5Kg0idJ9rGzynuznLtjfhYw+t3itwjDNoEHiNbjTFBhRvWP86rJE9yAP6uF2QLFKcmjjySGwmw/9pvHun/BwKt0anqGCaordY/V3thtCL2htRJsCrXhXkOdI2EwPe4W2fAbrSoKsj9qjXnYeW8iqJEBmZVucoGGzIgz39e9R8eyc+Ub31C+++RGys7tXovcTlhIA/eAw4k4gFM1nemwGsJ1JdMlQ7S3FJi2fk78bsakTa7iL/RqNxulq3D7A/hih6mY0p0ZuWWIB05aSqp4tyk0D7cZZJGwOxkg+UDjxXTW7Ei/VTAKWe0RuIlZkoOFmwP1ck5BjetXDQUO1cjBZYgcwvQRH9/vSwIT8480DAJeL5COMGmNYuffNY5Ok9pyJ38YuBFZ/hO7pBTHYhoVPCb3+QeaKeVs8a+X968+pQk4V52C8rTeDPIyn3ND1p3V8rohoi/8ORT2EsWkkqFL6P3GgvKO+Chu0QkvFVa2jXrwAtYzJKdIwSJEG8sWe34MFkaXnE7doCceuhIHsXAbnzEULT0Be/VrNdZ59IrWfnMig0E9pJJ/Av/FCdPwfjVYeKQkmZH5kCUNyo6I/cAeV4C3DaRPscurkhSSS5 DtM7KX8e wmzE21mPIVXMAicCWRcWWuJSaM4k/txZdUgIldcYcatQS2NhED77w9/wjz+uCHqwHz8l9bWoi1uRXVOrKX1/IpvNUB0wb21/Kofldr7kbbco2llLFzRv92bQJ7rlSwDg6O7CSRSl9/K9IhGsLYhAPYr1Yi2R3Sv78TCYJ092Ev7Di42Av0c+IBnXYqcYrIfi1Gq3oy/kroQ4rcbkjVDwrMpg53WA6CxF5zpMtAKQ0+mqrQBR6oPqwsNQ2UdIGy5RnIukgVYrSJrJVM51WlgEvftZWLnVFLgwcqU453FZQuj4vdHsvWfxeUR9sWPSybyOAJbKZsViRGr9DkgWOYZq1CpqE6Cg24w8qikchIYDkwBypSdx2lSXxximX2JiQODV6fjiV2bwC4uvj43DeE8HJGyNG8rMDqK7hsN/StnNeBXWFtR8kqo8rlsOKKk78LOI7k4V9u0OH9bJbqe/g0XQjNysKnuSjvysZGk9L/2bihgUUSKuWVmycTbjqsdwbmarrI2W/M7bEciqntanUbQjZruTy01o6MLA/vbUHrGEjhXHTlFkvdrEHMTi9PECyMrzVnAPY0aqfw5BHsQVbAJzlx3wYyQnIu4nAVs66sn6jGVyALSyeikpnxCknxs+tXxHgLHwUuEC77XV5pntv9mBVgkx43ok744Ui1SSrDWKtn4OhfZjCEpjcVbTQAA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000875, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song No feature change, prepare for later patch. Signed-off-by: Kairui Song --- include/linux/mmzone.h | 4 ++-- mm/vmscan.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5e50b78d58ea..4ab6bedd3c5b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -425,9 +425,9 @@ struct lru_gen_folio { /* the multi-gen LRU sizes, eventually consistent */ long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the exponential moving average of refaulted */ - unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; /* the exponential moving average of evicted+protected */ - unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_total[ANON_AND_FILE][MAX_NR_TIERS]; /* the first tier doesn't need protection, hence the minus one */ unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS - 1]; /* can be modified without holding the LRU lock */ diff --git a/mm/vmscan.c b/mm/vmscan.c index e7906f7fdc77..d34817795c70 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3705,9 +3705,9 @@ static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain, struct lru_gen_folio *lrugen = &lruvec->lrugen; int hist = lru_hist_from_seq(lrugen->min_seq[type]); - pos->refaulted = lrugen->avg_refaulted[type][tier] + + pos->refaulted = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - pos->total = lrugen->avg_total[type][tier] + + pos->total = atomic_long_read(&lrugen->avg_total[type][tier]) + atomic_long_read(&lrugen->evicted[hist][type][tier]); if (tier) pos->total += lrugen->protected[hist][type][tier - 1]; @@ -3732,15 +3732,15 @@ static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover) if (carryover) { unsigned long sum; - sum = lrugen->avg_refaulted[type][tier] + + sum = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - WRITE_ONCE(lrugen->avg_refaulted[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_refaulted[type][tier], sum / 2); - sum = lrugen->avg_total[type][tier] + + sum = atomic_long_read(&lrugen->avg_total[type][tier]) + atomic_long_read(&lrugen->evicted[hist][type][tier]); if (tier) sum += lrugen->protected[hist][type][tier - 1]; - WRITE_ONCE(lrugen->avg_total[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_total[type][tier], sum / 2); } if (clear) { @@ -5869,8 +5869,8 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, if (seq == max_seq) { s = "RT "; - n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); - n[1] = READ_ONCE(lrugen->avg_total[type][tier]); + n[0] = atomic_long_read(&lrugen->avg_refaulted[type][tier]); + n[1] = atomic_long_read(&lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { s = "rep"; n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]); From patchwork Tue Jul 25 18:57:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13326951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AC6EEB64DD for ; Tue, 25 Jul 2023 18:58:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA0588D0002; Tue, 25 Jul 2023 14:57:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4F7E8D0001; Tue, 25 Jul 2023 14:57:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF03C6B0083; Tue, 25 Jul 2023 14:57:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9E8736B0081 for ; Tue, 25 Jul 2023 14:57:59 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 68805140E8F for ; Tue, 25 Jul 2023 18:57:59 +0000 (UTC) X-FDA: 81051043878.04.2AED3CC Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf11.hostedemail.com (Postfix) with ESMTP id 489B440028 for ; Tue, 25 Jul 2023 18:57:56 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=RtcjT95B; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690311477; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NLnbyyuzr210ga9qdxdP+jywCc3qVm5fWtkraBmR3dU=; b=a3treDNgtEU9ylzProTR3jiHTsqoX3RuT8l8sfx7b2QLXb3/IHcCrR8eWNrORz16VIZCnn kbLYEIH9G3o2ORIDswO89npPLIx6+WpeAtWS/MluH+MJwFnYt/Ghq6wXd3a4VQ/GFlHEmZ qixluTiuJtXR1gJhz6pf78EvfhZOZXU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=RtcjT95B; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690311477; a=rsa-sha256; cv=none; b=XmUULHBIu8tf9PNBQdCANxmWt36ChcChGl+nRQcLeacxjvGtW692Ej94HNHPFNTyT6f5Ge cjCpHVOvUeBZoZGawW/M39pkn7U6Zjs46dxSOqAN8TimubYD+L1jVo8Z5rwfaNna9lXZq/ dKbUj28ICen7Aoi4PjyS33Js3+Tbk8I= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1b8b4749013so46165595ad.2 for ; Tue, 25 Jul 2023 11:57:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690311475; x=1690916275; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=NLnbyyuzr210ga9qdxdP+jywCc3qVm5fWtkraBmR3dU=; b=RtcjT95BPBz8/lif3mE1d4th4BnweAK8o6arZuAtevy/3ypXyRKCvUdQwXnKP3zlZK QPZrgO3eyNH/VBX9YvbXbJbDT4qv+w6din6bNCc+x7acUiHTZ8WztKuRJ3J9fiqAhKGU 0tqZkr//iv++gMTaAgyxFKH3uSPZTwUllaNs4mYF8iz/J7qNIGs4He4mwc01DaH0jxxP I4Y7h9UaFE+YM2cp+VLzMfqZ84Sn8ytHHuc8N8F6SlQre6am85wXZobYtTux76Iq1zjI +z6pTyLBJbVoZksqep5bdGJAe6bPtvOn5trogqA79ofoX/DEdauWd8tinQOlb+jBKGji h/UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690311475; x=1690916275; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=NLnbyyuzr210ga9qdxdP+jywCc3qVm5fWtkraBmR3dU=; b=OAQzUPDNQdd3JBIltosIGwdbqWVdkh9PRXWTLvED0+4xCrJk059bxfhL+4UoZptk10 4qf7ioCa/plI7YhRP+/+z2Bsv3ANq7rPr3Kf+L5ofI0NfDUBYikFiRWqG3f9gQ+OIsTw X0+DqV4vQ3JkODYnL1bGSdAEM5VQyKbX/6xUBp077X+DVBtcZjHNLDpQw4Xp7e4gBuV8 /1egp4IX8TPM0H5Fb0Jdu87PRMCj8bxSQRigH1KIT/gQHSXIth277KTD5L7OOPev5+7p +SXFU1m1cqg6d0ZlShTHi+T/AnTUiIs/0K5YgT/UjlAOSVtV7jpYyAKfQq8MMI2nw6jz 3IZg== X-Gm-Message-State: ABy/qLYVDu3vW5XWTa6BGZJqpDJhstNWUy5g+kUqZWWb/FG4+Uate7xP YGgWNAGLwHn83KZoc4LBOS/gCGYqZ1N3z+jqWdrUAw== X-Google-Smtp-Source: APBJJlEoAHI5spn7bfwYJL8CaCLfz7BuoQpgPX7+4Pa2k9q9i08qVAZPkIOQ+B9VBKXuUk8B2Fzgww== X-Received: by 2002:a17:903:2286:b0:1b8:66f6:87a3 with SMTP id b6-20020a170903228600b001b866f687a3mr23619plh.52.1690311475213; Tue, 25 Jul 2023 11:57:55 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([240e:306:1582:8dc2:70ef:4d19:edfe:c3a3]) by smtp.gmail.com with ESMTPSA id j17-20020a170902c3d100b001b03f208323sm11443150plj.64.2023.07.25.11.57.51 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 25 Jul 2023 11:57:54 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Suren Baghdasaryan , "T . J . Mercier" , Kairui Song Subject: [RFC PATCH 4/4] workingset, lru_gen: apply refault-distance based re-activation Date: Wed, 26 Jul 2023 02:57:32 +0800 Message-ID: <20230725185733.43929-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230725185733.43929-1-ryncsn@gmail.com> References: <20230725185733.43929-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 489B440028 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 69y4pijyqbso391nh8g6tywnp8jptxh7 X-HE-Tag: 1690311476-510198 X-HE-Meta: U2FsdGVkX182ojJ2zPvBxJjZiBieD5z0xpYwi7OL1w1zn+jGyBYtj2bOzO8yhxP4eTosDCCbTltslHPC4higHLqRoZu1NboHcgZBZpIfEWv53IrC2v/hgeNn8dXZ3xG1eqsXTxT5XPgA+1kK7ZT90wv9f96BYdTgs9ZwDZWTKlffyx+tYdf7Qzq2o7DntaAOhW4NSkp4zvrdj40sN+jg7el6FGGKMWJvPKM4c4DKBAlwVjAnDswtSZgnSHTBp4g0htPzFkonAqAUtrZr8ehdulDp6763sTScQpjoQk0JQ2I/tPGvYMh2cnlKCw2UbqFi52rNfXRUEqJ1bXUegtwKuuV7oVppDp+Lzkq57/Tmr1ojVMpm8y0bDSpBs81opw0IWTS9aFXGjmk8ukNC3ckP7RzUAqULkAGDBbiA4orWmlkA7rAPF3eUuc38gy43GCPydGewXqI0dewQJEcL8/bW+TyBp82kHDmahYd+wOVBKeh/8T8YHd/1cPA2StsHMKXdt9ECHvwdgaW6aaqWezIkFJrozVbzdfSSbPnpVWDqe5Ro1GCsaYv3Us1UGtNzGK67ap4orgp5IXxDPIRjBOcUIBVvtLS/5cM0K8cnwqQX+jT9KLab87NYeJPfYi03ilz92nLGfe/DYRrT+4aZMFQe6QO2aMx3BxduXWaDOIBMNsAMjd9x5qVI57DJZZQqs4QeVfLZYx+P9/86Fwdf/WKpBvzvPUVOeqTnjbY73JUPIbowmsCy3G/r/DsXD8HsC2h+N93XJ9vSao46eTb7bVX927ZELmQWU0eijpfAgU/tt6Q9fhJGtQ7F3XKsrZj/tqXuXNJ/UiyJxwFbjDBK6xzOeIvT9wSZDxVMOwoGx0A7ywWHfKH71Bw4FdkXzrkddEEfZ/mhXxqWd3RBmqq+W6gODVRTjPhAZZAGcw5eSpky85aZTPmhCITtAypq0u0FqsnRyk18mtmsjLRFAW9id6L lrk7Y4VJ HlmElBON/lyQel6ye+Y2u58uarbeqo2JfrzHRoGNABG9tMDh5OoMgLLPlq0WaQDZ8tNhI35xQ+odKZZIaNW1a5j1oN7R64arJ1VzmmNNX2y1YQ1edpQEet2qi/s5wifa3f6OlBYet9BAc7WGGS4JQh664POSB+atBPPJR4gAHv6Rxuk0um1lzWtp0dawVICS8YcdG2eZd6jY4QvxBEdfVImuBnQpGrs7gFpS3mnOMgQdrP3s2dDrHz1XEREhXgCL08uAQy7It31tqYYo1aWg1ldHIDlWaB7RtmCO7job8cy+emOLCwORuy7vkJamD7gZAz8125sSAt/whJSDwYslstxYHM5IM0+ses4zlmlM3HoSKFNYoy2ilBvN91PcDSIEZiuLC/QQsoctZzGORsz2eHjiPAT2i+p9P8FvLFasKrb5z8vFPAHozdvox8ARUAu5I24KLLt5uNWpJ603kzMbRYPM0VzbB9fVy1Tju6AejfX/owUh8LiwPiOxT0opWlHFeY7SiM0tWD5ATo4dHF3atfA0GATZvz9hAiQvpD6pPcAnAe0kZtnzxvtt5UswFIbalZGC628pTryQS0/dTqtmD8sykbmTqEoU33PxVcMBbtiiwwT3fkXHx1vMZZzJx7aKaZEvMc3Fdu8OqPyVGoFKnVPpX8XZr6fLCZwKNOKIIKyfCfFYug1APMDvyEPI15x5LfgSGLgjr5Niv+kUPW9fAFbYtTA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song I noticed MGLRU not working very well on certain workflows, which is observed on some heavily stressed databases. That is when the file page workingset size exceeds total memory, and the access distance (the left-shift time of a page before it gets activated, considering LRU starts from right) of file pages also larger than total memory. All file pages are stuck on the oldest generation and getting read-in then evicted permutably. Despite anon pages being idle, they never get aged. PID controller didn't kickin until there are some minor access pattern changes. And file pages are not promoted or reused. Even though the memory can't cover the whole workingset, the refault-distance based re-activation can help hold part of the workingset in-memory to help reduce the IO workload significantly. So apply it for MGLRU as well. The updated refault-distance model fits well for MGLRU in most cases, if we just consider the last two generation as the inactive LRU and the first two generations as active LRU. Some minor tinkering is done to fit the logic better, also make the refault-distance contributed to page tiering and PID refault detection of MGLRU: - If a tier-0 page have a qualified refault-distance, just promote it to higher tier, send it to second oldest gen. - If a tier >= 1 page have a qualified refault-distance, mark it as active and send it to youngest gen. - Increase the reference of every page that have a qualified refault-distance and increase the PID countroled refault rate of the updated tier. Following benchmark showed a major improvement. To simulate the workflow, I setup a 3-replicated mongodb cluster using docker, each in a standalone cgroup, set to use 5 gb of cache and 10g of oplog, on a 32G VM. The benchmark is done using https://github.com/apavlo/py-tpcc.git, modified to run STOCK_LEVEL query only, for simulating slow query and get a stable result. Before the patch (with 10G swap, the result won't change whether swap is on or not): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 904 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 503 27150226136.4 0.02 txn/s ------------------------------------------------------------------ TOTAL 503 27150226136.4 0.02 txn/s $ cat /proc/vmstat | grep working workingset_nodes 53391 workingset_refault_anon 0 workingset_refault_file 23856735 workingset_activate_anon 0 workingset_activate_file 23845737 workingset_restore_anon 0 workingset_restore_file 18280692 workingset_nodereclaim 1024 $ free -m total used free shared buff/cache available Mem: 31837 6752 379 23 24706 24607 Swap: 10239 0 10239 After the patch (with 10G swap on same disk, similar result using ZRAM): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 903 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 2575 27094953498.8 0.10 txn/s ------------------------------------------------------------------ TOTAL 2575 27094953498.8 0.10 txn/s $ cat /proc/vmstat | grep working workingset_nodes 78249 workingset_refault_anon 10139 workingset_refault_file 23001863 workingset_activate_anon 7238 workingset_activate_file 6718032 workingset_restore_anon 7432 workingset_restore_file 6719406 workingset_nodereclaim 9747 $ free -m total used free shared buff/cache available Mem: 31837 7376 320 3 24140 24014 Swap: 10239 1662 8577 The performance is 5x times better than before, and the idle anon pages now can get swapped out as expected. The result is also better with lower test stress, testing with lower stress also shows a improvement. I also checked the benchmark with memtier/memcached and fio, using similar setup as in commit ac35a4902374 but scaled down to fit in my test environment: memcached test (with 16G ramdisk as swap and 2G cgroup limit): memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 \ -t 12 -B binary & memtier_benchmark -S /tmp/memcached.socket -P memcache_binary -n allkeys\ --key-minimum=1 --key-maximum=24000000 --key-pattern=P:P -c 1 \ -t 12 --ratio 1:0 --pipeline 8 -d 2000 -x 6 fio test (with 16G ramdisk on /mnt and 4G cgroup limit): fio -name=refault --numjobs=12 --directory=/mnt --size=1024m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=5m --runtime=5m --group_reporting Before this patch: memcached read: Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec Best 52832.79 0.00 0.00 1.82042 1.70300 4.54300 6.27100 105641.69 Worst 46613.56 0.00 0.00 2.05686 1.77500 7.80700 11.83900 93206.05 Avg (6x) 51024.85 0.00 0.00 1.88506 1.73500 5.43900 9.47100 102026.64 fio: read: IOPS=2211k, BW=8637MiB/s (9056MB/s)(2530GiB/300001msec) After this patch: memcached read: Ops/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec Best 54218.92 1.76930 1.65500 4.41500 6.27100 108413.34 Worst 47640.13 2.01495 1.74300 7.64700 11.64700 95258.72 Avg (6x) 51408.33 1.86988 1.71900 5.43900 9.34300 102793.42 fio: read: IOPS=2166k, BW=8462MiB/s (8873MB/s)(2479GiB/300001msec) memcached looks ok but there is a %2 performance drop for FIO test, and after some profiling this is mainly caused by the extra atomic operations and new functions, there seems to be no LRU accuracy drop. Signed-off-by: Kairui Song --- mm/workingset.c | 74 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 23 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 126f1fec41ed..40cb0df980f7 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -185,6 +185,7 @@ MEM_CGROUP_ID_SHIFT) #define EVICTION_BITS (BITS_PER_LONG - (EVICTION_SHIFT)) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) +#define LRU_GEN_EVICTION_BITS (EVICTION_BITS - LRU_REFS_WIDTH - LRU_GEN_WIDTH) /* * Eviction timestamps need to be able to cover the full range of @@ -195,6 +196,7 @@ * evictions into coarser buckets by shaving off lower timestamp bits. */ static unsigned int bucket_order __read_mostly; +static unsigned int lru_gen_bucket_order __read_mostly; static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction, bool workingset) @@ -345,10 +347,14 @@ static void *lru_gen_eviction(struct folio *folio) lruvec = mem_cgroup_lruvec(memcg, pgdat); lrugen = &lruvec->lrugen; min_seq = READ_ONCE(lrugen->min_seq[type]); + token = (min_seq << LRU_REFS_WIDTH) | max(refs - 1, 0); + token <<= LRU_GEN_EVICTION_BITS; + token |= lru_eviction(lruvec, LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); hist = lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); + workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(mem_cgroup_id(memcg), pgdat, token, refs); } @@ -363,44 +369,55 @@ static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, unsigned long min_seq; min_seq = READ_ONCE(lruvec->lrugen.min_seq[file]); + token >>= LRU_GEN_EVICTION_BITS; return (token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); } static void lru_gen_refault(struct folio *folio, void *shadow) { int memcgid; - bool recent; + bool refault; bool workingset; unsigned long token; + bool recent = false; + int refault_tier = 0; int hist, tier, refs; struct lruvec *lruvec; + struct mem_cgroup *memcg; struct pglist_data *pgdat; struct lru_gen_folio *lrugen; int type = folio_is_file_lru(folio); int delta = folio_nr_pages(folio); - rcu_read_lock(); - unpack_shadow(shadow, &memcgid, &pgdat, &token, &workingset); - lruvec = mem_cgroup_lruvec(mem_cgroup_from_id(memcgid), pgdat); - if (lruvec != folio_lruvec(folio)) - goto unlock; + memcg = mem_cgroup_from_id(memcgid); + lruvec = mem_cgroup_lruvec(memcg, pgdat); + /* memcg can be NULL, go through lruvec */ + memcg = lruvec_memcg(lruvec); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); - - recent = lru_gen_test_recent(lruvec, type, token); - if (!recent) - goto unlock; + refault = lru_refault(memcg, lruvec, token, LRU_GEN_EVICTION_BITS, + lru_gen_bucket_order); + if (lruvec == folio_lruvec(folio)) + recent = lru_gen_test_recent(lruvec, type, token); + if (!recent && !refault) + return; lrugen = &lruvec->lrugen; - hist = lru_hist_from_seq(READ_ONCE(lrugen->min_seq[type])); /* see the comment in folio_lru_refs() */ + token >>= LRU_GEN_EVICTION_BITS; refs = (token & (BIT(LRU_REFS_WIDTH) - 1)) + workingset; tier = lru_tier_from_refs(refs); - - atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); - mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); + refault_tier = tier; + + if (refault) { + if (refs) + folio_set_active(folio); + if (refs != BIT(LRU_REFS_WIDTH)) + refault_tier = lru_tier_from_refs(refs + 1); + mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); + } /* * Count the following two cases as stalls: @@ -409,12 +426,17 @@ static void lru_gen_refault(struct folio *folio, void *shadow) * 2. For pages accessed multiple times through file descriptors, * numbers of accesses might have been out of the range. */ - if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { + if (refault || lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { folio_set_workingset(folio); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } -unlock: - rcu_read_unlock(); + + if (recent && refault_tier == tier) { + atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); + } else { + atomic_long_add(delta, &lrugen->avg_total[type][refault_tier]); + atomic_long_add(delta, &lrugen->avg_refaulted[type][refault_tier]); + } } #else /* !CONFIG_LRU_GEN */ @@ -536,16 +558,15 @@ void workingset_refault(struct folio *folio, void *shadow) bool workingset; long nr; - if (lru_gen_enabled()) { - lru_gen_refault(folio, shadow); - return; - } - /* Flush stats (and potentially sleep) before holding RCU read lock */ mem_cgroup_flush_stats_ratelimited(); - rcu_read_lock(); + if (lru_gen_enabled()) { + lru_gen_refault(folio, shadow); + goto out; + } + /* * The activation decision for this folio is made at the level * where the eviction occurred, as that is where the LRU order @@ -791,6 +812,13 @@ static int __init workingset_init(void) pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n", EVICTION_BITS, max_order, bucket_order); +#ifdef CONFIG_LRU_GEN + if (max_order > LRU_GEN_EVICTION_BITS) + lru_gen_bucket_order = max_order - LRU_GEN_EVICTION_BITS; + pr_info("workingset: lru_gen_timestamp_bits=%d lru_gen_bucket_order=%u\n", + LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); +#endif + ret = prealloc_shrinker(&workingset_shadow_shrinker, "mm-shadow"); if (ret) goto err;