From patchwork Wed Dec 14 22:51:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13073644 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C57ADC4332F for ; Wed, 14 Dec 2022 22:51:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 210D48E0005; Wed, 14 Dec 2022 17:51:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C13D8E0002; Wed, 14 Dec 2022 17:51:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03ACC8E0005; Wed, 14 Dec 2022 17:51:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E890C8E0002 for ; Wed, 14 Dec 2022 17:51:49 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B04AD16057A for ; Wed, 14 Dec 2022 22:51:49 +0000 (UTC) X-FDA: 80242410738.11.B727855 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf04.hostedemail.com (Postfix) with ESMTP id 1731E4000C for ; Wed, 14 Dec 2022 22:51:47 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=D1ZGqVZq; spf=pass (imf04.hostedemail.com: domain of 3g1OaYwcKCLYuqWjYdqckkcha.Ykihejqt-iigrWYg.knc@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3g1OaYwcKCLYuqWjYdqckkcha.Ykihejqt-iigrWYg.knc@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671058308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EUc+v5HwD46tRHbZWt0mN+PX65Kt87vRY2PWRTrdra8=; b=KmKp6QTdypT5P+TtvmvKfi4yGD8xV0JKaKN4Y2hcb2++wcugbt6aDHLAKKe2bAto8As+ac hyWVaxVawJAPPfODCWiTmZAqBdzdL1+Nbw8C89mmOJwPTje590N/kF3ni7bYjSdJXsixKv 9mI3pCp4PdL6tuloZN8zT6qiOAHnJO8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=D1ZGqVZq; spf=pass (imf04.hostedemail.com: domain of 3g1OaYwcKCLYuqWjYdqckkcha.Ykihejqt-iigrWYg.knc@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3g1OaYwcKCLYuqWjYdqckkcha.Ykihejqt-iigrWYg.knc@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671058308; a=rsa-sha256; cv=none; b=igQ7KYuWT767yrr/zAKSISRvNDavqdL9HsE0I/zdi0lsW6162stlWifUfOf84IyMKRcBIi X8LBo1l90TBPD6qebisC7iGmEaVkVm4SO+SGKS8Vh0oKbnrrkzna2BES0QMcAFgkdmikab 8l4H1UKVeazt2rQHSJFXakhuwVtW34Q= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3c9960ad866so15066467b3.4 for ; Wed, 14 Dec 2022 14:51:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EUc+v5HwD46tRHbZWt0mN+PX65Kt87vRY2PWRTrdra8=; b=D1ZGqVZqRfjR4k+QCHfaRDc8HXpG/JxiKLnotRDm7Q0jKKMK98yfDAMjKF0CLGdkeO BVF5VEcuWQxJx6HnJzNbK6mWrwXtEdlrjzqRS0tZNCszk9P+OwNxJmLtbk++RNZ3MJ2G CZO/woce3dSyn1ylx7eIxBwwxjPYWeaDpc35wLuMTka2EBsXgXps1k6sEw7ixcsNXF9p XS/ySLs5INBD64ztnBLJmjS2xKhcPb2uQ8HO5xHRDe3Z59Zo8PPRKLq8X555tFBm+KIb asLhS5yiHmEdXlSIpeEqhJHQ5RCV8f1SRahCwzobKiTodVo1k1IFYKNhqPiusXTCOodl f8fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EUc+v5HwD46tRHbZWt0mN+PX65Kt87vRY2PWRTrdra8=; b=cQbSlss3XMGAt5i2Qshij/RoUHSyH64evpZc6VF5iOriM/k4YfM1hfamY6zrw+hyEN fL0QGdB6rBka5mdI8L8X47Vxtut7BUVTVdkuEOSouLRCMhbump9fSHu5Lg5B3ZweAGW0 N6qNf5oE5CkFCBwL79WyBC85UphaBEn9CSEIllPmqpRzMd2fqxsYHGRmoyp0Bc17up7B iZFmpakI5nObZIsDtRpKkMpyi0VEV9EG5GsHCypzaXFMiKjEfy5vanVJASiqsS9CvvjX gVxdxIkDK8LKkEd7A8axuySLmT8N8sOys4welBq6YhrmnNI3qsVDWIi3A8Wj74NpilEA gUKw== X-Gm-Message-State: ANoB5pkZ/TK2h1RFXbB+4zY8Sxn5LC3XNosPVF2aULFzh8KIMf6tbrlI M6VRpGy7M/gCUnBPgXrxWCFkDQ7KcHQX X-Google-Smtp-Source: AA0mqf7DGDIJUfC4VIyKgAVtfwcgyC3fcmhOUgkP+R5oeIkqE2QVIN7CL/qi6/psX2VJUox0PiyWM7a9fBN1 X-Received: from yuanchu.svl.corp.google.com ([2620:15c:2d4:203:1311:60bc:9e2a:ab1]) (user=yuanchu job=sendgmr) by 2002:a05:690c:583:b0:3bd:4b7a:c0b1 with SMTP id bo3-20020a05690c058300b003bd4b7ac0b1mr1390108ywb.220.1671058307326; Wed, 14 Dec 2022 14:51:47 -0800 (PST) Date: Wed, 14 Dec 2022 14:51:22 -0800 In-Reply-To: <20221214225123.2770216-1-yuanchu@google.com> Mime-Version: 1.0 References: <20221214225123.2770216-1-yuanchu@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221214225123.2770216-2-yuanchu@google.com> Subject: [RFC PATCH 1/2] mm: multi-gen LRU: periodic aging From: Yuanchu Xie To: Johannes Weiner , Michal Hocko , Roman Gushchin , Yu Zhao Cc: Andrew Morton , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yuanchu Xie X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 1731E4000C X-Stat-Signature: 35ud5qh4e6uxpmqkbhpg9sjqmsdyz9df X-Rspam-User: X-HE-Tag: 1671058307-958732 X-HE-Meta: U2FsdGVkX19fiJJsfzkfzur0zHn3SM2JDmknUqfBJnfZd7RRl6qwOCdJSYx8004/ISg+boAkgfkoJr/wOJKPvUjKnw5H8lZ9Uq5JReI7vZE60iXhU16YPgEJHNzhsM/Rww89yWZtbw6yk+6fvrZUu1Tyj6V7AIVTlL6fY6iiYxHnl7T0sK++g9b4WX4LvyZBSfMAWC/qeG2QKNIultTL8DzXHAFZs/kAYHssDdAMhpmuDdBs5wqBAKApSfIT7yOGoIWj6F1nuUBa9THDWzA+FXIeNrumPyidHUU++r5ubhFj/92RF8cASdPmpv7XdrJqvAtJkd19O70mO1TcakeCPobNJDGSTBnX/Ab6lDu09lavDrzpw7BdnOFeRgNohWEBH0Ensju1F9iB5rOJagc3PZiNTD8QtQXMKH9tNLgtTA93X/epgajePmEMtf8Zl9cvfn0BHQ4QAbVfN4PH3z0kyTPs0Doz7e2/UzTsL/9u+gob5GL8i/5XnsyNnqSGUrgObPnECxeJZxrja5Uh1mNyVv06ds/i0iCtm5RJ9IdqIO2tQs82Ea9LbkIdCbFN0XZlhmjZSKPTdtAhiPpu/N2mSO3cUb4d6tLmSDi2wKKIji9Myu5MdPyPWmKr6hiy5EiPWQEO4VFulYtRid5KVNScrX2grxTz5rWfnJm2uqph8dX9+Gs+UqlzWuEmTe3wXxf1Yhx0/1vHS8icMBoDJpk+gNMsHPVBcXLDGRQWMTdMbb7vwPDVlKjmWMOL9gPXBvnBiy56e1K+lBsGmX95b6y9wde8U/LzNvQceUlHYCH9LTrJx46pN+ejz6G0dvG/EVLmHL5Y4K9Rk3WDiEZYihJNWJwW3T7gQiYEouN+fBh9SfLfL3JfnKkhwT7mrhRe+kgnuZ0YjQxorCnM7BfC0dRBLay1j1TbdCNWNzrkN0/G6/ABaRdFC3fGst9S8TJaHSrqV/rQRFVuEgw/VWuFA+X nFF8MjkK n9C5XS0nE998TXGT8Q/OYawKvEEniEjRNMBuldbQDndLeQfHXom5ox17x5mEO5NfbsSKZtglZa/ZdwYpeTfwjkDRZdSuHO+Oj6ivTPUHj79ZKKPEnDMkHzDNU9hecFXL8im1WGWBkmgOU//Astb+llXbwgGe3O6OVDNbyjtM7zoNUhQOk5G7qPxrphXm5iv+Gf+KaFDKA6HxL7K21m/4FKGkGFSFHt1xGiLcR1W0CUySBJFGrBudqxNvv2ACPx1ipKeBTKi1/sLByb8Wz2maODpkxs1ojPudNUEa9kDeLBZuz/u89ylgEwaxm2Q+o7CmdWCNvXBNY/5i/3XdiK+6DbGtlFO9QkqW7qqJDRXsATnDPB5tFJG+0HCuTtQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Periodically age MGLRU-enabled lruvecs to turn MGLRU generations into time-based working set information. This includes an interface to set the periodic aging interval and a new kthread to perform aging. memory.periodic_aging: a new root-level only file in cgroupfs Writing to memory.periodic aging sets the aging interval and opts into periodic aging. kold: a new kthread that ages memcgs based on the set aging interval. Signed-off-by: Yuanchu Xie --- include/linux/kold.h | 44 ++++++++++++ include/linux/mmzone.h | 4 +- mm/Makefile | 3 + mm/kold.c | 150 +++++++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 52 ++++++++++++++ mm/vmscan.c | 35 +++++++++- 6 files changed, 286 insertions(+), 2 deletions(-) create mode 100644 include/linux/kold.h create mode 100644 mm/kold.c diff --git a/include/linux/kold.h b/include/linux/kold.h new file mode 100644 index 000000000000..10b0dbe09a5c --- /dev/null +++ b/include/linux/kold.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * + * Periodic aging for multi-gen LRU + * + * Copyright (C) 2022 Yuanchu Xie + */ +#ifndef KOLD_H_ +#define KOLD_H_ + +#include + +struct kold_stats { + /* late is defined as spending an entire interval aging without sleep + * stat is aggregated every aging interval + */ + unsigned int late_count; +}; + +int kold_set_interval(unsigned int interval); +unsigned int kold_get_interval(void); +int kold_get_stats(struct kold_stats *stats); + +/* returns the creation timestamp of the youngest generation */ +unsigned long lru_gen_force_age_lruvec(struct mem_cgroup *memcg, int nid, + unsigned long min_ttl); + +#ifndef CONFIG_MEMCG +int kold_set_interval(unsigned int interval) +{ + return 0; +} + +unsigned int kold_get_interval(void) +{ + return 0; +} + +int kold_get_stats(struct kold_stats *stats) +{ + return -1; +} +#endif /* CONFIG_MEMCG */ + +#endif /* KOLD_H_ */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5f74891556f3..929c777b826a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1218,7 +1218,9 @@ typedef struct pglist_data { #ifdef CONFIG_LRU_GEN /* kswap mm walk data */ - struct lru_gen_mm_walk mm_walk; + struct lru_gen_mm_walk mm_walk; + /* kold periodic aging walk data */ + struct lru_gen_mm_walk kold_mm_walk; #endif CACHELINE_PADDING(_pad2_); diff --git a/mm/Makefile b/mm/Makefile index 8e105e5b3e29..8bd554a6eb7d 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -98,6 +98,9 @@ obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o +ifdef CONFIG_LRU_GEN +obj-$(CONFIG_MEMCG) += kold.o +endif ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/kold.c b/mm/kold.c new file mode 100644 index 000000000000..094574177968 --- /dev/null +++ b/mm/kold.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022 Yuanchu Xie + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static struct task_struct *kold_thread __read_mostly; +/* protects kold_thread */ +static DEFINE_MUTEX(kold_mutex); + +static unsigned int aging_interval __read_mostly; +static unsigned int late_count; + +/* try to move to a cpu on the target node */ +static void try_move_current_to_node(int nid) +{ + struct cpumask node_cpus; + + cpumask_and(&node_cpus, cpumask_of_node(nid), cpu_online_mask); + if (!cpumask_empty(&node_cpus)) + set_cpus_allowed_ptr(current, &node_cpus); +} + +static int kold_run(void *none) +{ + int nid; + unsigned int flags; + unsigned long last_interval_start_time = jiffies; + bool sleep_since_last_full_scan = false; + struct mem_cgroup *memcg; + struct reclaim_state reclaim_state = {}; + + while (!kthread_should_stop()) { + unsigned long interval = + (unsigned long)(READ_ONCE(aging_interval)) * HZ; + unsigned long next_wakeup_tick = jiffies + interval; + long timeout_ticks; + + current->reclaim_state = &reclaim_state; + flags = memalloc_noreclaim_save(); + + for_each_node_state(nid, N_MEMORY) { + pg_data_t *pgdat = NODE_DATA(nid); + + try_move_current_to_node(nid); + reclaim_state.mm_walk = &pgdat->kold_mm_walk; + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + unsigned long young_timestamp = + lru_gen_force_age_lruvec(memcg, nid, + interval); + + if (time_before(young_timestamp + interval, + next_wakeup_tick)) { + next_wakeup_tick = young_timestamp + interval; + } + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + } + + memalloc_noreclaim_restore(flags); + current->reclaim_state = NULL; + + /* late_count stats update */ + if (time_is_before_jiffies(last_interval_start_time + interval)) { + last_interval_start_time += interval; + if (!sleep_since_last_full_scan) { + WRITE_ONCE(late_count, + READ_ONCE(late_count) + 1); + } + sleep_since_last_full_scan = false; + } + + /* sleep until next aging */ + timeout_ticks = -(long)(jiffies - next_wakeup_tick); + if (timeout_ticks > 0 && timeout_ticks != MAX_SCHEDULE_TIMEOUT) { + sleep_since_last_full_scan = true; + schedule_timeout_idle(timeout_ticks); + } + } + return 0; +} + +int kold_get_stats(struct kold_stats *stats) +{ + stats->late_count = READ_ONCE(late_count); + return 0; +} + +unsigned int kold_get_interval(void) +{ + return READ_ONCE(aging_interval); +} + +int kold_set_interval(unsigned int interval) +{ + int err = 0; + + mutex_lock(&kold_mutex); + if (interval && !kold_thread) { + if (!lru_gen_enabled()) { + err = -EOPNOTSUPP; + goto cleanup; + } + kold_thread = kthread_create(kold_run, NULL, "kold"); + + if (IS_ERR(kold_thread)) { + pr_err("kold: kthread_run(kold_run) failed\n"); + err = PTR_ERR(kold_thread); + kold_thread = NULL; + goto cleanup; + } + WRITE_ONCE(aging_interval, interval); + wake_up_process(kold_thread); + } else { + if (!interval && kold_thread) { + kthread_stop(kold_thread); + kold_thread = NULL; + } + WRITE_ONCE(aging_interval, interval); + } + +cleanup: + mutex_unlock(&kold_mutex); + return err; +} + +static int __init kold_init(void) +{ + return 0; +} + +module_init(kold_init); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2d8549ae1b30..7d2fb3fc4580 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -6569,6 +6570,49 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, return nbytes; } +#ifdef CONFIG_LRU_GEN +static int memory_periodic_aging_show(struct seq_file *m, void *v) +{ + unsigned int interval = kold_get_interval(); + struct kold_stats stats; + int err; + + err = kold_get_stats(&stats); + + if (err) + return err; + + seq_printf(m, "aging_interval %u\n", interval); + seq_printf(m, "late_count %u\n", stats.late_count); + return 0; +} + +static ssize_t memory_periodic_aging_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int new_interval; + int err; + + if (!lru_gen_enabled()) + return -EOPNOTSUPP; + + buf = strstrip(buf); + if (!buf) + return -EINVAL; + + err = kstrtouint(buf, 0, &new_interval); + if (err) + return err; + + err = kold_set_interval(new_interval); + if (err) + return err; + + return nbytes; +} +#endif /* CONFIG_LRU_GEN */ + static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -6679,6 +6723,14 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NS_DELEGATABLE, .write = memory_reclaim, }, +#ifdef CONFIG_LRU_GEN + { + .name = "periodic_aging", + .flags = CFTYPE_ONLY_ON_ROOT, + .seq_show = memory_periodic_aging_show, + .write = memory_periodic_aging_write, + }, +#endif { } /* terminate */ }; diff --git a/mm/vmscan.c b/mm/vmscan.c index 04d8b88e5216..0fea21366fc8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include @@ -5279,8 +5280,10 @@ static void lru_gen_change_state(bool enabled) if (enabled) static_branch_enable_cpuslocked(&lru_gen_caps[LRU_GEN_CORE]); - else + else { static_branch_disable_cpuslocked(&lru_gen_caps[LRU_GEN_CORE]); + kold_set_interval(0); + } memcg = mem_cgroup_iter(NULL, NULL, NULL); do { @@ -5760,6 +5763,36 @@ static const struct file_operations lru_gen_ro_fops = { .release = seq_release, }; +/****************************************************************************** + * periodic aging (kold) + ******************************************************************************/ + +/* age lruvec as long as it is older than min_ttl, + * return the timestamp of the youngest generation + */ +unsigned long lru_gen_force_age_lruvec(struct mem_cgroup *memcg, int nid, + unsigned long min_ttl) +{ + struct scan_control sc = { + .may_writepage = true, + .may_unmap = true, + .may_swap = true, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + struct lruvec *lruvec = get_lruvec(memcg, nid); + DEFINE_MAX_SEQ(lruvec); + int gen = lru_gen_from_seq(max_seq); + unsigned long birth_timestamp = + READ_ONCE(lruvec->lrugen.timestamps[gen]); + + if (time_is_before_jiffies(birth_timestamp + min_ttl)) + try_to_inc_max_seq(lruvec, max_seq, &sc, true, true); + + return READ_ONCE(lruvec->lrugen.timestamps[lru_gen_from_seq( + READ_ONCE((lruvec)->lrugen.max_seq))]); +} + /****************************************************************************** * initialization ******************************************************************************/