From patchwork Tue Aug 13 16:56:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13762346 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9667EC52D7F for ; Tue, 13 Aug 2024 16:59:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2A246B009A; Tue, 13 Aug 2024 12:59:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB0676B009E; Tue, 13 Aug 2024 12:59:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B29256B009F; Tue, 13 Aug 2024 12:59:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8F5926B009A for ; Tue, 13 Aug 2024 12:59:45 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DCC4FA7827 for ; Tue, 13 Aug 2024 16:59:44 +0000 (UTC) X-FDA: 82447833888.29.650E7FB Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf27.hostedemail.com (Postfix) with ESMTP id D50E540020 for ; Tue, 13 Aug 2024 16:59:42 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ZpQKte5M; spf=pass (imf27.hostedemail.com: domain of 3_ZC7ZgcKCEoA6mzot6s00sxq.o0yxuz69-yyw7mow.03s@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3_ZC7ZgcKCEoA6mzot6s00sxq.o0yxuz69-yyw7mow.03s@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723568327; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=I1JUyOR+BauD0beWrpDUPXKr0GpTowWtPDSwnxHbi4c=; b=ivLLcGwvTR/JLQuy7qwJ+hExMYw01MhqhfP4PldDCpOdeSK6KCH2uKzsFG9/qTjnE6YR4r 0jfHc7nOOvGY8vmD/zUIHvCvWG/KvjwwqYGd5HBiLW0YbgnPMYHNFlYgi7UeQb45sTfO8Z bGQ7Il+KMOCL+77bkUBimOxW4ywWYAE= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ZpQKte5M; spf=pass (imf27.hostedemail.com: domain of 3_ZC7ZgcKCEoA6mzot6s00sxq.o0yxuz69-yyw7mow.03s@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3_ZC7ZgcKCEoA6mzot6s00sxq.o0yxuz69-yyw7mow.03s@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723568327; a=rsa-sha256; cv=none; b=yc/+CadiK0YA8ALpyuhT09goblQb2HU8OalUUeyR+a79oAJnsCEpHSWdydUKLK0njCylrg dyFFLYuPucDEo3rlzgQpz3vVJZm9LBxYx/vWgwd/62BAIxVsFwigiU5nvluqg3JdNNXWK0 1y6Won6oxHNcE88zqKcW7ZSzHM5Nk1Y= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-666010fb35cso1071497b3.0 for ; Tue, 13 Aug 2024 09:59:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723568382; x=1724173182; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=I1JUyOR+BauD0beWrpDUPXKr0GpTowWtPDSwnxHbi4c=; b=ZpQKte5MIDJ/lnNynSUeHfoit1BCtiBNVRyjTRHG11g4uMwjA+8JUgvjja8EaqUmdJ zYaMLh316yi+kjPJU98DIV3xF9OlneTF7VA93xkFQ2Xcz6DsPA7gxGowVuPJEg6xJeYJ hriMgSy89AfmB3/483xLA0KZ2okIREY+t18DMKVuw8qQfc7lno0LvuLE1EsgdjmqIQ3d HNB0xQrzZoWQavGJvDknJvx8jnlQXOdWX/IZr/RsQnpFVchYiKd/TBnfC1JbaUnPlGes wgXTOEccdaJCATKu45M+4UEJ6excwFRM9HOcxvfmwfY7Z3zVrU6/paUoGLmUUIjL9S8n 04aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723568382; x=1724173182; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=I1JUyOR+BauD0beWrpDUPXKr0GpTowWtPDSwnxHbi4c=; b=hQBQCw8Rn6w3aOz/0WSUypBw5zSVW4RCwlLdEKiLbtmsW+yvCI0TNAGaxjM56FyDbP uYezyNbGYWUrYiO38FW8kC3S5i4PvYLtOpI9J6Hc6rMd55R2f3TwA6a4udaVccNLXTgQ E0d7Rtgm3k5Lm7B2VhtgPd+gKE04JaMHHQpR5DL/DH0S5ywneHXXk0rtb68sxrQHUwrf QeVD7jkAqQP7cpepD2nVTOEdR+OSrMcTU2kcVJt3BfKm1wPUZj5wvWDafjmyv6pdji4N xepmckKcYmsJp7miNFapkeRCgKz9HQurBNSqBeNrLlM4TId+bPn61TkBGNdhE1yie9AT kirw== X-Forwarded-Encrypted: i=1; AJvYcCWo42Tb2Aoa3cR/s6kONJgzAZD3AmPGhIIJTeS/Std/1j4Yman2KaIDicGuwvkWOYQlbdiwvuvT3NsRdf+eahVS4MA= X-Gm-Message-State: AOJu0YydB/p/o/6Ld5MIazvQBfBhlNyuTxin7Z2bFx7rY8fzQZlNJ1B2 X5iGd1MqKwwPvzLk+x6CewZskYaEpBvpz6IrtpkCeAThK+TtJhNPLb1dezf6DDsr/aRKMn2G3In P0ix1eQ== X-Google-Smtp-Source: AGHT+IGF6gcvOUuK0qRwt2F4MDLyEzr3H7dooXhW2VhDhQ7vMG1kJd5bUtwNOeRY1xrvYnGAuUjldLTxpbDJ X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:b50c:66e8:6532:a371]) (user=yuanchu job=sendgmr) by 2002:a0d:ce01:0:b0:6a1:741b:b723 with SMTP id 00721157ae682-6a9ea8e5238mr545567b3.5.1723568381863; Tue, 13 Aug 2024 09:59:41 -0700 (PDT) Date: Tue, 13 Aug 2024 09:56:14 -0700 In-Reply-To: <20240813165619.748102-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240813165619.748102-1-yuanchu@google.com> X-Mailer: git-send-email 2.46.0.76.ge559c4bf1a-goog Message-ID: <20240813165619.748102-4-yuanchu@google.com> Subject: [PATCH v3 3/7] mm: report workingset during memory pressure driven scanning From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Andrew Morton , Lance Yang , Randy Dunlap , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Stat-Signature: bcjote3hawq9gfc6wnbewt33tc7hwu3x X-Rspam-User: X-Rspamd-Queue-Id: D50E540020 X-Rspamd-Server: rspam02 X-HE-Tag: 1723568382-102211 X-HE-Meta: U2FsdGVkX18XZ16qh+OJzmxLEydYjyd/jsqSV+KA6T2J+P+xYk50/7NYgcwdonZ4KeDCoCF7MP/YeEKauhGaeYuLcF6ilM9G1hMtYbJIgbNhh7GXoytBHHiRfv/A3q9iPMv5rQRrdJi98CBsCHKqpyHMhuft7ARG+toiHintcCAJxxJkG5MjcSV1482iBBb8m2iL17wyqPla6wRomi8czUri3nmR+FbtRvi2onC+0j/HuUw5LNNQX6l9dyxVxdB8wIEUV0+d96X0P/W0jSwk0eUTvpLvwwEgd315DmIeo55yJauwtpdC+qjVHPK7+ShO+ueEKAOPfAWBHhRfaiyfiRiSn04lmtt6WvUQbDVpzetVcbKHFN0II7gA89HKhKi2KV3pc6gGfqK84QmV8OIkDhYO43FcObVyUYtQkrMhRd2wVkw0L2ncVSUJueWOtTZLOAq/Ca0v6mtx2uHajgt21aVO2g5ADPwD6kr7MFD2rphIcQkY+gyLsfvVJVSxFkJRMRrPTQ7Rjr5cOuxluZfj8yBND0AVfsIVZBgdkBKh1GzDD1aqyrQfNQ6ZAnag0M+IKX3c+3Xs/pFrL2jis6e24IRm11niaPYDNwNsqEfIBMKKqtC63Wd+KxbCMwk67JTknl50s/Ta7pZkT0eJ+fYMjWNb/ti9kudktNwAHzn540JcYvxpyx9ku9fGCEDMd4fmoQokX06rWYmrL7Or91E3n9FFMxyLoJPq7Ra2DZiSvmLMyzN1O9gUrNjT+XeCfwyr9/+fa7P3Y1CscqXNCCUW7U7QOY6nUjMkVAqCcspAWGyMaCisFnxfh3YMXfrVwVwChK7P+hTmALFFuGIEjB0ehs4qGAVAaIEmHVzPEkmiIQB1NlJBsT7IuvKfHpgndMaoKiU0FJgvU/JouwodfC+bYP5/mux1EiKFZat3etPHp+OlVT8x0g1XIgZ+AE25/G6kPWF90mWNuPcDnTpKqp0 XMgDHXYK Hs1Qr8d9XuuTsMTAG+Py9dkjUlRPpohVP4Y6dEC1aDlYaJrgaDp4pHQJ6JxdLgP8d9Ripwft/qov8+HOkpriZxGB8uvpURnrDGTJsxJSfGnnstYDgOE6hkHavLbQvR7iitKZEgeWdJJDcMUxMokGFbffZPX4ar/dMZ879OPO59jFpkvrTZbhKiSSGkmmmZV3pekUFtatufgN6vZoZVObbDErZA8pu0FbOgYYPeqbtTiCmqNWxkQJMQv5fd2WDcS6xzDBG01xKj5LQpzjUHCpRV+PRV4erhgokeBeJWHRW0zbENbeD7F67tjCpQgXaaItaBKkNQU7Ya40H+oUT6vp4Yo0MjWzCAl+EOxk5524DTMyJgHQA13PrN2CpGU/+XDuAwXXNWWE8cH1etSkdzXwheWOkC7B+i49Iomr9mO6IwBGswb3gxoKCkVDaCyMgO6Syg2O1eiMrFJR4U1lO4rQdwmbW7ht5U98YTBqy25D6lQyYbQ5wEKtbXsplRUeSdnFCM1XkdXcq2tIYlGJLfizRP0wsWgQC5bKHxpPGvB9NgrHKDvDAHglyS80xE2uvyIItDUCNtWGyxWR0asTsfbIa0imfiREBLfsR6Xpb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When a node reaches its low watermarks and wakes up kswapd, notify all userspace programs waiting on the workingset page age histogram of the memory pressure, so a userspace agent can read the workingset report in time and make policy decisions, such as logging, oom-killing, or migration. Sysfs interface: /sys/devices/system/node/nodeX/workingset_report/report_threshold time in milliseconds that specifies how often the userspace agent can be notified for node memory pressure. Signed-off-by: Yuanchu Xie Change-Id: Ib1fdb726da921e8d467822a99da8f384c6181951 --- include/linux/workingset_report.h | 4 +++ mm/internal.h | 12 ++++++++ mm/vmscan.c | 46 +++++++++++++++++++++++++++++++ mm/workingset_report.c | 43 ++++++++++++++++++++++++++++- 4 files changed, 104 insertions(+), 1 deletion(-) diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index 8bae6a600410..2ec8b927b200 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -37,7 +37,11 @@ struct wsr_page_age_histo { }; struct wsr_state { + unsigned long report_threshold; unsigned long refresh_interval; + + struct kernfs_node *page_age_sys_file; + /* breakdown of workingset by page age */ struct mutex page_age_lock; struct wsr_page_age_histo *page_age; diff --git a/mm/internal.h b/mm/internal.h index f7d790f5d41c..19af882c506e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -416,6 +416,18 @@ bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, bool can_swap, bool force_scan); void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs); +#ifdef CONFIG_WORKINGSET_REPORT +/* + * in mm/wsr.c + */ +void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat); +#else +static inline void notify_workingset(struct mem_cgroup *memcg, + struct pglist_data *pgdat) +{ +} +#endif + /* * in mm/rmap.c: */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 8ab1b456d2cf..b07fd2016c75 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2574,6 +2574,15 @@ static bool can_age_anon_pages(struct pglist_data *pgdat, return can_demote(pgdat->node_id, sc); } +#ifdef CONFIG_WORKINGSET_REPORT +static void try_to_report_workingset(struct pglist_data *pgdat, struct scan_control *sc); +#else +static inline void try_to_report_workingset(struct pglist_data *pgdat, + struct scan_control *sc) +{ +} +#endif + #ifdef CONFIG_LRU_GEN #ifdef CONFIG_LRU_GEN_ENABLED @@ -4000,6 +4009,8 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) set_initial_priority(pgdat, sc); + try_to_report_workingset(pgdat, sc); + memcg = mem_cgroup_iter(NULL, NULL, NULL); do { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); @@ -5640,6 +5651,38 @@ static int __init init_lru_gen(void) }; late_initcall(init_lru_gen); +#ifdef CONFIG_WORKINGSET_REPORT +static void try_to_report_workingset(struct pglist_data *pgdat, + struct scan_control *sc) +{ + struct mem_cgroup *memcg = sc->target_mem_cgroup; + struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; + unsigned long threshold = READ_ONCE(wsr->report_threshold); + + if (sc->priority == DEF_PRIORITY) + return; + + if (!threshold) + return; + + if (!mutex_trylock(&wsr->page_age_lock)) + return; + + if (!wsr->page_age) { + mutex_unlock(&wsr->page_age_lock); + return; + } + + if (time_is_after_jiffies(wsr->page_age->timestamp + threshold)) { + mutex_unlock(&wsr->page_age_lock); + return; + } + + mutex_unlock(&wsr->page_age_lock); + notify_workingset(memcg, pgdat); +} +#endif /* CONFIG_WORKINGSET_REPORT */ + #else /* !CONFIG_LRU_GEN */ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) @@ -6191,6 +6234,9 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; + + if (!sc->proactive) + try_to_report_workingset(zone->zone_pgdat, sc); shrink_node(zone->zone_pgdat, sc); } diff --git a/mm/workingset_report.c b/mm/workingset_report.c index fe553c0a653e..801ac8e5c1da 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -311,6 +311,33 @@ static struct wsr_state *kobj_to_wsr(struct kobject *kobj) return &mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))->wsr; } +static ssize_t report_threshold_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct wsr_state *wsr = kobj_to_wsr(kobj); + unsigned int threshold = READ_ONCE(wsr->report_threshold); + + return sysfs_emit(buf, "%u\n", jiffies_to_msecs(threshold)); +} + +static ssize_t report_threshold_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int threshold; + struct wsr_state *wsr = kobj_to_wsr(kobj); + + if (kstrtouint(buf, 0, &threshold)) + return -EINVAL; + + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(threshold)); + + return len; +} + +static struct kobj_attribute report_threshold_attr = + __ATTR_RW(report_threshold); + static ssize_t refresh_interval_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -465,6 +492,7 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, static struct kobj_attribute page_age_attr = __ATTR_RO(page_age); static struct attribute *workingset_report_attrs[] = { + &report_threshold_attr.attr, &refresh_interval_attr.attr, &page_age_intervals_attr.attr, &page_age_attr.attr, @@ -486,8 +514,13 @@ void wsr_init_sysfs(struct node *node) wsr = kobj_to_wsr(kobj); - if (sysfs_create_group(kobj, &workingset_report_attr_group)) + if (sysfs_create_group(kobj, &workingset_report_attr_group)) { pr_warn("Workingset report failed to create sysfs files\n"); + return; + } + + wsr->page_age_sys_file = + kernfs_walk_and_get(kobj->sd, "workingset_report/page_age"); } EXPORT_SYMBOL_GPL(wsr_init_sysfs); @@ -500,6 +533,14 @@ void wsr_remove_sysfs(struct node *node) return; wsr = kobj_to_wsr(kobj); + kernfs_put(wsr->page_age_sys_file); sysfs_remove_group(kobj, &workingset_report_attr_group); } EXPORT_SYMBOL_GPL(wsr_remove_sysfs); + +void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat) +{ + struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; + + kernfs_notify(wsr->page_age_sys_file); +}