From patchwork Mon Apr 5 17:08:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BED0DC43461 for ; Mon, 5 Apr 2021 18:09:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 545CC61246 for ; Mon, 5 Apr 2021 18:09:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 545CC61246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DE10C6B0075; Mon, 5 Apr 2021 14:08:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D68B96B0078; Mon, 5 Apr 2021 14:08:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC15A6B007B; Mon, 5 Apr 2021 14:08:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 981EC6B0075 for ; Mon, 5 Apr 2021 14:08:59 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 53326824999B for ; Mon, 5 Apr 2021 18:08:59 +0000 (UTC) X-FDA: 77999099598.07.DAC37EB Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 6DDC2C0007C8 for ; Mon, 5 Apr 2021 18:08:59 +0000 (UTC) IronPort-SDR: gWG+TdPIwzUe2iw3NJ4nVWkfbD+bq2/YqJYn/G2ZEKqaGEe7jZhpHujfFS7TSw/2Gh1eedtFN4 XccfuCy7dlvg== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968196" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968196" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:08:58 -0700 IronPort-SDR: roq1dXu4kU12x4YrUui45ev184O6K4wcr29dMW2iXpvfH51dqj3i31whjoyiVBESwJh09PH4Wf zLiS8OPhJXdA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153872" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:08:57 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 01/11] mm: Define top tier memory node mask Date: Mon, 5 Apr 2021 10:08:25 -0700 Message-Id: <57544494cb67299fabfa01dd17885f7b6a4266bb.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6DDC2C0007C8 X-Stat-Signature: emtdbr397j8ffqx9oq3mhpigan91jmio Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646139-728800 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Traditionally, all RAM is DRAM. Some DRAM might be closer/faster than others, but a byte of media has about the same cost whether it is close or far. But, with new memory tiers such as High-Bandwidth Memory or Persistent Memory, there is a choice between fast/expensive and slow/cheap. The fast/expensive memory lives in the top tier of the memory hierachy and it is a precious resource that needs to be accounted and managed on a memory cgroup basis. Define the top tier memory as the memory nodes that don't have demotion paths into it from higher tier memory. Signed-off-by: Tim Chen --- drivers/base/node.c | 2 ++ include/linux/nodemask.h | 1 + mm/memory_hotplug.c | 3 +++ mm/migrate.c | 1 + mm/page_alloc.c | 5 ++++- 5 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 04f71c7bc3f8..9eb214ac331f 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -1016,6 +1016,7 @@ static struct node_attr node_state_attr[] = { #endif [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY), [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), + [N_TOPTIER] = _NODE_ATTR(is_toptier, N_TOPTIER), [N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator, N_GENERIC_INITIATOR), }; @@ -1029,6 +1030,7 @@ static struct attribute *node_state_attrs[] = { #endif &node_state_attr[N_MEMORY].attr.attr, &node_state_attr[N_CPU].attr.attr, + &node_state_attr[N_TOPTIER].attr.attr, &node_state_attr[N_GENERIC_INITIATOR].attr.attr, NULL }; diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index ac398e143c9a..3003401ed7f0 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -399,6 +399,7 @@ enum node_states { #endif N_MEMORY, /* The node has memory(regular, high, movable) */ N_CPU, /* The node has one or more cpus */ + N_TOPTIER, /* Top tier node, no demotion path into node */ N_GENERIC_INITIATOR, /* The node has one or more Generic Initiators */ NR_NODE_STATES }; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7550b88e2432..7b21560d4c4d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -36,6 +36,7 @@ #include #include #include +#include #include @@ -654,6 +655,8 @@ static void node_states_set_node(int node, struct memory_notify *arg) if (arg->status_change_nid >= 0) node_set_state(node, N_MEMORY); + + node_set_state(node, N_TOPTIER); } static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn, diff --git a/mm/migrate.c b/mm/migrate.c index 72223fd7e623..e84aedf611da 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -3439,6 +3439,7 @@ static int establish_migrate_target(int node, nodemask_t *used) return NUMA_NO_NODE; node_demotion[node] = migration_target; + node_clear_state(migration_target, N_TOPTIER); return migration_target; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ff058941ccfa..471a2c342c4f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -157,6 +157,7 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_MEMORY] = { { [0] = 1UL } }, [N_CPU] = { { [0] = 1UL } }, #endif /* NUMA */ + [N_TOPTIER] = NODE_MASK_ALL, }; EXPORT_SYMBOL(node_states); @@ -7590,8 +7591,10 @@ void __init free_area_init(unsigned long *max_zone_pfn) free_area_init_node(nid); /* Any memory on that node */ - if (pgdat->node_present_pages) + if (pgdat->node_present_pages) { node_set_state(nid, N_MEMORY); + node_set_state(nid, N_TOPTIER); + } check_for_memory(pgdat, nid); } } From patchwork Mon Apr 5 17:08:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 950CDC433B4 for ; Mon, 5 Apr 2021 18:09:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F378C61359 for ; Mon, 5 Apr 2021 18:09:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F378C61359 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 815FE6B0078; Mon, 5 Apr 2021 14:09:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79DB26B007B; Mon, 5 Apr 2021 14:09:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A2E66B007D; Mon, 5 Apr 2021 14:09:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 340BA6B0078 for ; Mon, 5 Apr 2021 14:09:01 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F1DF6F040 for ; Mon, 5 Apr 2021 18:09:00 +0000 (UTC) X-FDA: 77999099640.01.CA23701 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 351F5C0007C6 for ; Mon, 5 Apr 2021 18:09:01 +0000 (UTC) IronPort-SDR: 32Rs5JK445cD6yN5xaScNOvErdqW2G8+qWq7GAwS7JcgtI76DxljUhWFfCC46AacV1XdLvBOza 3cifWu4mmdCw== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968199" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968199" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:00 -0700 IronPort-SDR: +HUey/5dlE1yomxaN4NZGyrmKeFWrxhZy1afGwPD497QiQPpdG0WuCOHBvsRGzWRXKSVEtNmzP cDI92OwtOb+g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153881" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:08:59 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 02/11] mm: Add soft memory limit for mem cgroup Date: Mon, 5 Apr 2021 10:08:26 -0700 Message-Id: <96026428f135c1199a3216fde6e48317fc45486f.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 351F5C0007C6 X-Stat-Signature: 57xgborqurqhbqbkmrg6amy5xsqp77kk Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646141-239483 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For each memory cgroup, define a soft memory limit on its top tier memory consumption. Memory cgroups exceeding their top tier limit will be selected for demotion of their top tier memory to lower tier under memory pressure. Signed-off-by: Tim Chen --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index eeb0b52203e9..25d8b9acec7c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -230,6 +230,7 @@ struct mem_cgroup { struct work_struct high_work; unsigned long soft_limit; + unsigned long toptier_soft_limit; /* vmpressure notifications */ struct vmpressure vmpressure; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 41a3f22b6639..9a9d677a6654 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3603,6 +3603,7 @@ enum { RES_MAX_USAGE, RES_FAILCNT, RES_SOFT_LIMIT, + RES_TOPTIER_SOFT_LIMIT, }; static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, @@ -3643,6 +3644,8 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, return counter->failcnt; case RES_SOFT_LIMIT: return (u64)memcg->soft_limit * PAGE_SIZE; + case RES_TOPTIER_SOFT_LIMIT: + return (u64)memcg->toptier_soft_limit * PAGE_SIZE; default: BUG(); } @@ -3881,6 +3884,14 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, memcg->soft_limit = nr_pages; ret = 0; break; + case RES_TOPTIER_SOFT_LIMIT: + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ + ret = -EINVAL; + break; + } + memcg->toptier_soft_limit = nr_pages; + ret = 0; + break; } return ret ?: nbytes; } @@ -5029,6 +5040,12 @@ static struct cftype mem_cgroup_legacy_files[] = { .write = mem_cgroup_write, .read_u64 = mem_cgroup_read_u64, }, + { + .name = "toptier_soft_limit_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_TOPTIER_SOFT_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, { .name = "failcnt", .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), @@ -5365,6 +5382,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); memcg->soft_limit = PAGE_COUNTER_MAX; page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); + memcg->toptier_soft_limit = PAGE_COUNTER_MAX; if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); memcg->oom_kill_disable = parent->oom_kill_disable; From patchwork Mon Apr 5 17:08:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9647C433ED for ; Mon, 5 Apr 2021 18:09:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3BD0661246 for ; Mon, 5 Apr 2021 18:09:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3BD0661246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C82A66B007B; Mon, 5 Apr 2021 14:09:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6EB26B007D; Mon, 5 Apr 2021 14:09:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99A236B007E; Mon, 5 Apr 2021 14:09:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0077.hostedemail.com [216.40.44.77]) by kanga.kvack.org (Postfix) with ESMTP id 71E056B007B for ; Mon, 5 Apr 2021 14:09:03 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3082A180AD83B for ; Mon, 5 Apr 2021 18:09:03 +0000 (UTC) X-FDA: 77999099766.09.CFD25C0 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 35E54C0007C6 for ; Mon, 5 Apr 2021 18:09:03 +0000 (UTC) IronPort-SDR: XBxwNWN57oJQGiBsb1KHOD/RXzt2ybRj0avF1aRQggf/RiZO5aHcKB+Aa28Q87eaeCWJPp4bi+ gEJh69LErlCw== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968204" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968204" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:02 -0700 IronPort-SDR: qgI1AWasyGMiSQyPd1dlw+IO/g4dA+Sz56r88lb4ZjpUINjjsmOf13+aHJTagLpbV3IOggN1qe nqvLqza+TQBw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153892" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:01 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 03/11] mm: Account the top tier memory usage per cgroup Date: Mon, 5 Apr 2021 10:08:27 -0700 Message-Id: X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 35E54C0007C6 X-Stat-Signature: dnsqiyoak4ss336snfub8bbezpoitysa Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646143-148985 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For each memory cgroup, account its usage of the top tier memory at the time a top tier page is assigned and uncharged from the cgroup. Signed-off-by: Tim Chen --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 39 +++++++++++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 25d8b9acec7c..609d8590950c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -225,6 +225,7 @@ struct mem_cgroup { /* Legacy consumer-oriented counters */ struct page_counter kmem; /* v1 only */ struct page_counter tcpmem; /* v1 only */ + struct page_counter toptier; /* Range enforcement for interrupt charges */ struct work_struct high_work; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9a9d677a6654..fe7bb8613f5a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -253,6 +253,13 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr) return &container_of(vmpr, struct mem_cgroup, vmpressure)->css; } +static inline bool top_tier(struct page *page) +{ + int nid = page_to_nid(page); + + return node_state(nid, N_TOPTIER); +} + #ifdef CONFIG_MEMCG_KMEM extern spinlock_t css_set_lock; @@ -951,6 +958,23 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages); } +static inline void mem_cgroup_charge_toptier(struct mem_cgroup *memcg, + struct page *page, + int nr_pages) +{ + if (!top_tier(page)) + return; + + if (nr_pages >= 0) + page_counter_charge(&memcg->toptier, + (unsigned long) nr_pages); + else { + nr_pages = -nr_pages; + page_counter_uncharge(&memcg->toptier, + (unsigned long) nr_pages); + } +} + static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target) { @@ -2932,6 +2956,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg) * - exclusive reference */ page->memcg_data = (unsigned long)memcg; + mem_cgroup_charge_toptier(memcg, page, thp_nr_pages(page)); } #ifdef CONFIG_MEMCG_KMEM @@ -3138,6 +3163,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) if (!ret) { page->memcg_data = (unsigned long)memcg | MEMCG_DATA_KMEM; + mem_cgroup_charge_toptier(memcg, page, 1 << order); return 0; } css_put(&memcg->css); @@ -3161,6 +3187,7 @@ void __memcg_kmem_uncharge_page(struct page *page, int order) VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); __memcg_kmem_uncharge(memcg, nr_pages); page->memcg_data = 0; + mem_cgroup_charge_toptier(memcg, page, -nr_pages); css_put(&memcg->css); } @@ -5389,11 +5416,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) page_counter_init(&memcg->memory, &parent->memory); page_counter_init(&memcg->swap, &parent->swap); + page_counter_init(&memcg->toptier, &parent->toptier); page_counter_init(&memcg->kmem, &parent->kmem); page_counter_init(&memcg->tcpmem, &parent->tcpmem); } else { page_counter_init(&memcg->memory, NULL); page_counter_init(&memcg->swap, NULL); + page_counter_init(&memcg->toptier, NULL); page_counter_init(&memcg->kmem, NULL); page_counter_init(&memcg->tcpmem, NULL); @@ -5745,6 +5774,8 @@ static int mem_cgroup_move_account(struct page *page, css_put(&from->css); page->memcg_data = (unsigned long)to; + mem_cgroup_charge_toptier(to, page, nr_pages); + mem_cgroup_charge_toptier(from, page, -nr_pages); __unlock_page_memcg(from); @@ -6832,6 +6863,7 @@ struct uncharge_gather { unsigned long nr_pages; unsigned long pgpgout; unsigned long nr_kmem; + unsigned long nr_toptier; struct page *dummy_page; }; @@ -6846,6 +6878,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) if (!mem_cgroup_is_root(ug->memcg)) { page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); + page_counter_uncharge(&ug->memcg->toptier, ug->nr_toptier); if (do_memsw_account()) page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) @@ -6891,6 +6924,8 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) nr_pages = compound_nr(page); ug->nr_pages += nr_pages; + if (top_tier(page)) + ug->nr_toptier += nr_pages; if (PageMemcgKmem(page)) ug->nr_kmem += nr_pages; @@ -7216,8 +7251,10 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) page->memcg_data = 0; - if (!mem_cgroup_is_root(memcg)) + if (!mem_cgroup_is_root(memcg)) { page_counter_uncharge(&memcg->memory, nr_entries); + mem_cgroup_charge_toptier(memcg, page, -nr_entries); + } if (!cgroup_memory_noswap && memcg != swap_memcg) { if (!mem_cgroup_is_root(swap_memcg)) From patchwork Mon Apr 5 17:08:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9113EC433B4 for ; Mon, 5 Apr 2021 18:09:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3F30F61246 for ; Mon, 5 Apr 2021 18:09:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F30F61246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2BDA16B007D; Mon, 5 Apr 2021 14:09:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 226036B007E; Mon, 5 Apr 2021 14:09:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E80076B0080; Mon, 5 Apr 2021 14:09:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id B9EFC6B007D for ; Mon, 5 Apr 2021 14:09:04 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 72DAB1801682C for ; Mon, 5 Apr 2021 18:09:04 +0000 (UTC) X-FDA: 77999099808.06.4F7DEDD Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id BD65DC0007CA for ; Mon, 5 Apr 2021 18:09:04 +0000 (UTC) IronPort-SDR: 9Z4wcZahSPZPIPf3HVYP2RXMZhidpAPizeEidLVAouLR+eHUHqSAa2RRUaqtllUBe9YGkFtu9R c4j9i7Hhs3Eg== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968207" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968207" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:03 -0700 IronPort-SDR: +VoR0p/UgOe2wYsmoXbNkTgi4fj3Sr4bGRZZcdv98Loe80ptqI4Vscp6yUH/RbLdEo6Zw+pzm4 zinar7ufsiag== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153903" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:03 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 04/11] mm: Report top tier memory usage in sysfs Date: Mon, 5 Apr 2021 10:08:28 -0700 Message-Id: <1196182bf902e36c8ecbf2afb7dd570e96c99ff4.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: BD65DC0007CA X-Stat-Signature: qo8bc88p6fmo3joxom7q3i5h4pi9got5 Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646144-750397 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In memory cgroup's sysfs, report the memory cgroup's usage of top tier memory in a new field: "toptier_usage_in_bytes". Signed-off-by: Tim Chen --- mm/memcontrol.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fe7bb8613f5a..68590f46fa76 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3631,6 +3631,7 @@ enum { RES_FAILCNT, RES_SOFT_LIMIT, RES_TOPTIER_SOFT_LIMIT, + RES_TOPTIER_USAGE, }; static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, @@ -3673,6 +3674,8 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, return (u64)memcg->soft_limit * PAGE_SIZE; case RES_TOPTIER_SOFT_LIMIT: return (u64)memcg->toptier_soft_limit * PAGE_SIZE; + case RES_TOPTIER_USAGE: + return (u64)page_counter_read(&memcg->toptier) * PAGE_SIZE; default: BUG(); } @@ -5073,6 +5076,11 @@ static struct cftype mem_cgroup_legacy_files[] = { .write = mem_cgroup_write, .read_u64 = mem_cgroup_read_u64, }, + { + .name = "toptier_usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_TOPTIER_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, { .name = "failcnt", .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), From patchwork Mon Apr 5 17:08:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D79B3C433B4 for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6F09061246 for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F09061246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 027306B007E; Mon, 5 Apr 2021 14:09:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECA656B0080; Mon, 5 Apr 2021 14:09:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF65C6B0081; Mon, 5 Apr 2021 14:09:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id AE90B6B007E for ; Mon, 5 Apr 2021 14:09:06 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 648AC824999B for ; Mon, 5 Apr 2021 18:09:06 +0000 (UTC) X-FDA: 77999099892.30.5526ADD Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 7DDF4C0007CF for ; Mon, 5 Apr 2021 18:09:06 +0000 (UTC) IronPort-SDR: Xa6vK0t732LaAug40qQuxxHbWF7JarW9+qFqS3fz+hYz0J2X7cRoVqmDGn9romvPcbQ+Xmo016 9MfnlnEeED2Q== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968210" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968210" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:05 -0700 IronPort-SDR: dqiazJeDK+5dnrLq82BR3rtp/1vJ1y/Xf2Kr8yfYi7ZHLqw5nCAi+y3mvFlQ+nLNp5VEX6GwQH yoQOMuM6C30w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153918" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:04 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 05/11] mm: Add soft_limit_top_tier tree for mem cgroup Date: Mon, 5 Apr 2021 10:08:29 -0700 Message-Id: <04b7c9bce901d271eae216dcfbb928aadc8d48d0.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 7DDF4C0007CF X-Stat-Signature: wmwj8owoemrf3ht4i7xmq1964errnzsz Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646146-41239 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Define a per node soft_limit_top_tier red black tree that sort and track the cgroups by each group's excess over its toptier soft limit. A cgroup is added to the tree if it has exceeded its top tier soft limit and it has used pages on the node. Signed-off-by: Tim Chen --- mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 52 insertions(+), 16 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 68590f46fa76..90a78ff3fca8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -122,6 +122,7 @@ struct mem_cgroup_tree { }; static struct mem_cgroup_tree soft_limit_tree __read_mostly; +static struct mem_cgroup_tree soft_limit_toptier_tree __read_mostly; /* for OOM */ struct mem_cgroup_eventfd_list { @@ -590,17 +591,27 @@ mem_cgroup_page_nodeinfo(struct mem_cgroup *memcg, struct page *page) } static struct mem_cgroup_tree_per_node * -soft_limit_tree_node(int nid) -{ - return soft_limit_tree.rb_tree_per_node[nid]; +soft_limit_tree_node(int nid, enum node_states type) +{ + switch (type) { + case N_MEMORY: + return soft_limit_tree.rb_tree_per_node[nid]; + case N_TOPTIER: + if (node_state(nid, N_TOPTIER)) + return soft_limit_toptier_tree.rb_tree_per_node[nid]; + else + return NULL; + default: + return NULL; + } } static struct mem_cgroup_tree_per_node * -soft_limit_tree_from_page(struct page *page) +soft_limit_tree_from_page(struct page *page, enum node_states type) { int nid = page_to_nid(page); - return soft_limit_tree.rb_tree_per_node[nid]; + return soft_limit_tree_node(nid, type); } static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, @@ -661,12 +672,24 @@ static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, spin_unlock_irqrestore(&mctz->lock, flags); } -static unsigned long soft_limit_excess(struct mem_cgroup *memcg) +static unsigned long soft_limit_excess(struct mem_cgroup *memcg, enum node_states type) { - unsigned long nr_pages = page_counter_read(&memcg->memory); - unsigned long soft_limit = READ_ONCE(memcg->soft_limit); + unsigned long nr_pages; + unsigned long soft_limit; unsigned long excess = 0; + switch (type) { + case N_MEMORY: + nr_pages = page_counter_read(&memcg->memory); + soft_limit = READ_ONCE(memcg->soft_limit); + break; + case N_TOPTIER: + nr_pages = page_counter_read(&memcg->toptier); + soft_limit = READ_ONCE(memcg->toptier_soft_limit); + break; + default: + return 0; + } if (nr_pages > soft_limit) excess = nr_pages - soft_limit; @@ -679,7 +702,7 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page) struct mem_cgroup_per_node *mz; struct mem_cgroup_tree_per_node *mctz; - mctz = soft_limit_tree_from_page(page); + mctz = soft_limit_tree_from_page(page, N_MEMORY); if (!mctz) return; /* @@ -688,7 +711,7 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page) */ for (; memcg; memcg = parent_mem_cgroup(memcg)) { mz = mem_cgroup_page_nodeinfo(memcg, page); - excess = soft_limit_excess(memcg); + excess = soft_limit_excess(memcg, N_MEMORY); /* * We have to update the tree if mz is on RB-tree or * mem is over its softlimit. @@ -718,7 +741,7 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) for_each_node(nid) { mz = mem_cgroup_nodeinfo(memcg, nid); - mctz = soft_limit_tree_node(nid); + mctz = soft_limit_tree_node(nid, N_MEMORY); if (mctz) mem_cgroup_remove_exceeded(mz, mctz); } @@ -742,7 +765,7 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) * position in the tree. */ __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg) || + if (!soft_limit_excess(mz->memcg, N_MEMORY) || !css_tryget(&mz->memcg->css)) goto retry; done: @@ -1805,7 +1828,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, .pgdat = pgdat, }; - excess = soft_limit_excess(root_memcg); + excess = soft_limit_excess(root_memcg, N_MEMORY); while (1) { victim = mem_cgroup_iter(root_memcg, victim, &reclaim); @@ -1834,7 +1857,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, total += mem_cgroup_shrink_node(victim, gfp_mask, false, pgdat, &nr_scanned); *total_scanned += nr_scanned; - if (!soft_limit_excess(root_memcg)) + if (!soft_limit_excess(root_memcg, N_MEMORY)) break; } mem_cgroup_iter_break(root_memcg, victim); @@ -3457,7 +3480,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, if (order > 0) return 0; - mctz = soft_limit_tree_node(pgdat->node_id); + mctz = soft_limit_tree_node(pgdat->node_id, N_MEMORY); /* * Do not even bother to check the largest node if the root @@ -3513,7 +3536,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, if (!reclaimed) next_mz = __mem_cgroup_largest_soft_limit_node(mctz); - excess = soft_limit_excess(mz->memcg); + excess = soft_limit_excess(mz->memcg, N_MEMORY); /* * One school of thought says that we should not add * back the node to the tree if reclaim returns 0. @@ -7189,6 +7212,19 @@ static int __init mem_cgroup_init(void) rtpn->rb_rightmost = NULL; spin_lock_init(&rtpn->lock); soft_limit_tree.rb_tree_per_node[node] = rtpn; + + if (!node_state(node, N_TOPTIER)) { + soft_limit_toptier_tree.rb_tree_per_node[node] = NULL; + continue; + } + + rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, + node_online(node) ? node : NUMA_NO_NODE); + + rtpn->rb_root = RB_ROOT; + rtpn->rb_rightmost = NULL; + spin_lock_init(&rtpn->lock); + soft_limit_toptier_tree.rb_tree_per_node[node] = rtpn; } return 0; From patchwork Mon Apr 5 17:08:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED9C4C433ED for ; Mon, 5 Apr 2021 18:09:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 92E5061246 for ; Mon, 5 Apr 2021 18:09:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92E5061246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ECB586B0080; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7BC56B0081; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAD916B0082; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id 9E5FF6B0080 for ; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 57C8110EED for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) X-FDA: 77999099976.30.B79D913 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 4D646C0007CD for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) IronPort-SDR: pNSMkDzL+eas6fmstSYlgNC3gIkFigSmQz+RojRwLkxlKLa4Ejgd5XNE48YRG0iljcoULbfC6k UNDkcEe/meHg== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968216" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968216" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:07 -0700 IronPort-SDR: esczU7xkNFYd+U792N4yBhq03+o9yWHZ+pxucD+0EXr0OSKlGEAfINce25lk9aQB5pgaByexfH ostO18fcP15A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153925" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:06 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 06/11] mm: Handle top tier memory in cgroup soft limit memory tree utilities Date: Mon, 5 Apr 2021 10:08:30 -0700 Message-Id: <86f4bad592a5232226c1779e6acce117a32b41ee.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4D646C0007CD X-Stat-Signature: tesg9pp5ka6cnqcuei3inrrf5tit8c7b Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646148-242536 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Update the utility functions __mem_cgroup_insert_exceeded() and __mem_cgroup_remove_exceeded(), to allow addition and removal of cgroups from the new red black tree that tracks the cgroups that exceed their toptier memory limits. Update also the function +mem_cgroup_largest_soft_limit_node(), to allow returning the cgroup that has the largest exceess usage of toptier memory. Signed-off-by: Tim Chen --- include/linux/memcontrol.h | 9 +++ mm/memcontrol.c | 152 +++++++++++++++++++++++++++---------- 2 files changed, 122 insertions(+), 39 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 609d8590950c..0ed8ddfd5436 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -124,6 +124,15 @@ struct mem_cgroup_per_node { unsigned long usage_in_excess;/* Set to the value by which */ /* the soft limit is exceeded*/ bool on_tree; + + struct rb_node toptier_tree_node; /* RB tree node */ + unsigned long toptier_usage_in_excess; /* Set to the value by which */ + /* the soft limit is exceeded*/ + bool on_toptier_tree; + + bool congested; /* memcg has many dirty pages */ + /* backed by a congested BDI */ + struct mem_cgroup *memcg; /* Back pointer, we cannot */ /* use container_of */ }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90a78ff3fca8..8a7648b79635 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -616,24 +616,44 @@ soft_limit_tree_from_page(struct page *page, enum node_states type) static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, - unsigned long new_usage_in_excess) + unsigned long new_usage_in_excess, + enum node_states type) { struct rb_node **p = &mctz->rb_root.rb_node; - struct rb_node *parent = NULL; + struct rb_node *parent = NULL, *mz_tree_node; struct mem_cgroup_per_node *mz_node; - bool rightmost = true; + bool rightmost = true, *mz_on_tree; + unsigned long usage_in_excess, *mz_usage_in_excess; - if (mz->on_tree) + if (type == N_TOPTIER) { + mz_usage_in_excess = &mz->toptier_usage_in_excess; + mz_tree_node = &mz->toptier_tree_node; + mz_on_tree = &mz->on_toptier_tree; + } else { + mz_usage_in_excess = &mz->usage_in_excess; + mz_tree_node = &mz->tree_node; + mz_on_tree = &mz->on_tree; + } + + if (*mz_on_tree) return; - mz->usage_in_excess = new_usage_in_excess; - if (!mz->usage_in_excess) + if (!new_usage_in_excess) return; + while (*p) { parent = *p; - mz_node = rb_entry(parent, struct mem_cgroup_per_node, + if (type == N_TOPTIER) { + mz_node = rb_entry(parent, struct mem_cgroup_per_node, + toptier_tree_node); + usage_in_excess = mz_node->toptier_usage_in_excess; + } else { + mz_node = rb_entry(parent, struct mem_cgroup_per_node, tree_node); - if (mz->usage_in_excess < mz_node->usage_in_excess) { + usage_in_excess = mz_node->usage_in_excess; + } + + if (new_usage_in_excess < usage_in_excess) { p = &(*p)->rb_left; rightmost = false; } else { @@ -642,33 +662,47 @@ static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, } if (rightmost) - mctz->rb_rightmost = &mz->tree_node; + mctz->rb_rightmost = mz_tree_node; - rb_link_node(&mz->tree_node, parent, p); - rb_insert_color(&mz->tree_node, &mctz->rb_root); - mz->on_tree = true; + rb_link_node(mz_tree_node, parent, p); + rb_insert_color(mz_tree_node, &mctz->rb_root); + *mz_usage_in_excess = new_usage_in_excess; + *mz_on_tree = true; } static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) + struct mem_cgroup_tree_per_node *mctz, + enum node_states type) { - if (!mz->on_tree) + bool *mz_on_tree; + struct rb_node *mz_tree_node; + + if (type == N_TOPTIER) { + mz_tree_node = &mz->toptier_tree_node; + mz_on_tree = &mz->on_toptier_tree; + } else { + mz_tree_node = &mz->tree_node; + mz_on_tree = &mz->on_tree; + } + + if (!(*mz_on_tree)) return; - if (&mz->tree_node == mctz->rb_rightmost) - mctz->rb_rightmost = rb_prev(&mz->tree_node); + if (mz_tree_node == mctz->rb_rightmost) + mctz->rb_rightmost = rb_prev(mz_tree_node); - rb_erase(&mz->tree_node, &mctz->rb_root); - mz->on_tree = false; + rb_erase(mz_tree_node, &mctz->rb_root); + *mz_on_tree = false; } static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) + struct mem_cgroup_tree_per_node *mctz, + enum node_states type) { unsigned long flags; spin_lock_irqsave(&mctz->lock, flags); - __mem_cgroup_remove_exceeded(mz, mctz); + __mem_cgroup_remove_exceeded(mz, mctz, type); spin_unlock_irqrestore(&mctz->lock, flags); } @@ -696,13 +730,18 @@ static unsigned long soft_limit_excess(struct mem_cgroup *memcg, enum node_state return excess; } -static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page) +static void mem_cgroup_update_tree(struct mem_cgroup *bottom_memcg, struct page *page) { unsigned long excess; struct mem_cgroup_per_node *mz; struct mem_cgroup_tree_per_node *mctz; + enum node_states type = N_MEMORY; + struct mem_cgroup *memcg; + +repeat_toptier: + memcg = bottom_memcg; + mctz = soft_limit_tree_from_page(page, type); - mctz = soft_limit_tree_from_page(page, N_MEMORY); if (!mctz) return; /* @@ -710,27 +749,37 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page) * because their event counter is not touched. */ for (; memcg; memcg = parent_mem_cgroup(memcg)) { + bool on_tree; + mz = mem_cgroup_page_nodeinfo(memcg, page); - excess = soft_limit_excess(memcg, N_MEMORY); + excess = soft_limit_excess(memcg, type); + + on_tree = (type == N_MEMORY) ? mz->on_tree: mz->on_toptier_tree; /* * We have to update the tree if mz is on RB-tree or * mem is over its softlimit. */ - if (excess || mz->on_tree) { + if (excess || on_tree) { unsigned long flags; spin_lock_irqsave(&mctz->lock, flags); /* if on-tree, remove it */ - if (mz->on_tree) - __mem_cgroup_remove_exceeded(mz, mctz); + if (on_tree) + __mem_cgroup_remove_exceeded(mz, mctz, type); + /* * Insert again. mz->usage_in_excess will be updated. * If excess is 0, no tree ops. */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); + __mem_cgroup_insert_exceeded(mz, mctz, excess, type); + spin_unlock_irqrestore(&mctz->lock, flags); } } + if (type == N_MEMORY) { + type = N_TOPTIER; + goto repeat_toptier; + } } static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) @@ -743,12 +792,16 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) mz = mem_cgroup_nodeinfo(memcg, nid); mctz = soft_limit_tree_node(nid, N_MEMORY); if (mctz) - mem_cgroup_remove_exceeded(mz, mctz); + mem_cgroup_remove_exceeded(mz, mctz, N_MEMORY); + mctz = soft_limit_tree_node(nid, N_TOPTIER); + if (mctz) + mem_cgroup_remove_exceeded(mz, mctz, N_TOPTIER); } } static struct mem_cgroup_per_node * -__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz, + enum node_states type) { struct mem_cgroup_per_node *mz; @@ -757,15 +810,19 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) if (!mctz->rb_rightmost) goto done; /* Nothing to reclaim from */ - mz = rb_entry(mctz->rb_rightmost, + if (type == N_TOPTIER) + mz = rb_entry(mctz->rb_rightmost, + struct mem_cgroup_per_node, toptier_tree_node); + else + mz = rb_entry(mctz->rb_rightmost, struct mem_cgroup_per_node, tree_node); /* * Remove the node now but someone else can add it back, * we will to add it back at the end of reclaim to its correct * position in the tree. */ - __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg, N_MEMORY) || + __mem_cgroup_remove_exceeded(mz, mctz, type); + if (!soft_limit_excess(mz->memcg, type) || !css_tryget(&mz->memcg->css)) goto retry; done: @@ -773,12 +830,13 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) } static struct mem_cgroup_per_node * -mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz, + enum node_states type) { struct mem_cgroup_per_node *mz; spin_lock_irq(&mctz->lock); - mz = __mem_cgroup_largest_soft_limit_node(mctz); + mz = __mem_cgroup_largest_soft_limit_node(mctz, type); spin_unlock_irq(&mctz->lock); return mz; } @@ -3472,7 +3530,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, struct mem_cgroup_per_node *mz, *next_mz = NULL; unsigned long reclaimed; int loop = 0; - struct mem_cgroup_tree_per_node *mctz; + struct mem_cgroup_tree_per_node *mctz, *mctz_sibling; unsigned long excess; unsigned long nr_scanned; int migration_nid; @@ -3481,6 +3539,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, return 0; mctz = soft_limit_tree_node(pgdat->node_id, N_MEMORY); + mctz_sibling = soft_limit_tree_node(pgdat->node_id, N_TOPTIER); /* * Do not even bother to check the largest node if the root @@ -3516,7 +3575,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, if (next_mz) mz = next_mz; else - mz = mem_cgroup_largest_soft_limit_node(mctz); + mz = mem_cgroup_largest_soft_limit_node(mctz, N_MEMORY); if (!mz) break; @@ -3526,7 +3585,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, nr_reclaimed += reclaimed; *total_scanned += nr_scanned; spin_lock_irq(&mctz->lock); - __mem_cgroup_remove_exceeded(mz, mctz); + __mem_cgroup_remove_exceeded(mz, mctz, N_MEMORY); /* * If we failed to reclaim anything from this memory cgroup @@ -3534,7 +3593,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, */ next_mz = NULL; if (!reclaimed) - next_mz = __mem_cgroup_largest_soft_limit_node(mctz); + next_mz = + __mem_cgroup_largest_soft_limit_node(mctz, N_MEMORY); excess = soft_limit_excess(mz->memcg, N_MEMORY); /* @@ -3546,8 +3606,20 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, * term TODO. */ /* If excess == 0, no tree ops */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); + __mem_cgroup_insert_exceeded(mz, mctz, excess, N_MEMORY); spin_unlock_irq(&mctz->lock); + + /* update both affected N_MEMORY and N_TOPTIER trees */ + if (mctz_sibling) { + spin_lock_irq(&mctz_sibling->lock); + __mem_cgroup_remove_exceeded(mz, mctz_sibling, + N_TOPTIER); + excess = soft_limit_excess(mz->memcg, N_TOPTIER); + __mem_cgroup_insert_exceeded(mz, mctz, excess, + N_TOPTIER); + spin_unlock_irq(&mctz_sibling->lock); + } + css_put(&mz->memcg->css); loop++; /* @@ -5312,6 +5384,8 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) lruvec_init(&pn->lruvec); pn->usage_in_excess = 0; pn->on_tree = false; + pn->toptier_usage_in_excess = 0; + pn->on_toptier_tree = false; pn->memcg = memcg; memcg->nodeinfo[node] = pn; From patchwork Mon Apr 5 17:08:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25131C433B4 for ; Mon, 5 Apr 2021 18:09:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BF13061246 for ; Mon, 5 Apr 2021 18:09:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF13061246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 52F9E6B0081; Mon, 5 Apr 2021 14:09:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 491306B0082; Mon, 5 Apr 2021 14:09:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 311D86B0083; Mon, 5 Apr 2021 14:09:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id EF9D26B0081 for ; Mon, 5 Apr 2021 14:09:10 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B2E92C5D1 for ; Mon, 5 Apr 2021 18:09:10 +0000 (UTC) X-FDA: 77999100060.10.7CFB922 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id A8676C0007C2 for ; Mon, 5 Apr 2021 18:09:10 +0000 (UTC) IronPort-SDR: fQ7Mk4QAOQJONhaVPE52khR9FnmM7iOw0ttK7nyo/MiiEVFWAX49PKpPxleIab4C9UO1cPZFHP dpzsIKxKn4sQ== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968217" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968217" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:09 -0700 IronPort-SDR: zciwvXhVlJxANCS1RQuETFmvpI/SVAShGiAS1SOFk+fup6XS3eaAdNiHzq00W3PW/LzFQBiUp1 11xpUJFo26pQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153935" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:08 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 07/11] mm: Account the total top tier memory in use Date: Mon, 5 Apr 2021 10:08:31 -0700 Message-Id: <9170c90b0f58dee05a2b2c1d3789d674df42ed65.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A8676C0007C2 X-Stat-Signature: 8pykzgmedzyqhgorcaki3egag64u6cge Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646150-318228 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Track the global top tier memory usage stats. They are used as the basis of deciding when to start demoting pages from memory cgroups that have exceeded their soft limit. We start reclaiming top tier memory when the total top tier memory is low. Signed-off-by: Tim Chen --- include/linux/vmstat.h | 18 ++++++++++++++++++ mm/vmstat.c | 20 +++++++++++++++++--- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index e1a4fa9abb3a..a3ad5a937fd8 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -139,6 +139,7 @@ static inline void vm_events_fold_cpu(int cpu) * Zone and node-based page accounting with per cpu differentials. */ extern atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS]; +extern atomic_long_t vm_toptier_zone_stat[NR_VM_ZONE_STAT_ITEMS]; extern atomic_long_t vm_numa_stat[NR_VM_NUMA_STAT_ITEMS]; extern atomic_long_t vm_node_stat[NR_VM_NODE_STAT_ITEMS]; @@ -175,6 +176,8 @@ static inline void zone_page_state_add(long x, struct zone *zone, { atomic_long_add(x, &zone->vm_stat[item]); atomic_long_add(x, &vm_zone_stat[item]); + if (node_state(zone->zone_pgdat->node_id, N_TOPTIER)) + atomic_long_add(x, &vm_toptier_zone_stat[item]); } static inline void node_page_state_add(long x, struct pglist_data *pgdat, @@ -212,6 +215,17 @@ static inline unsigned long global_node_page_state(enum node_stat_item item) return global_node_page_state_pages(item); } +static inline unsigned long global_toptier_zone_page_state(enum zone_stat_item item) +{ + long x = atomic_long_read(&vm_toptier_zone_stat[item]); + +#ifdef CONFIG_SMP + if (x < 0) + x = 0; +#endif + return x; +} + static inline unsigned long zone_page_state(struct zone *zone, enum zone_stat_item item) { @@ -325,6 +339,8 @@ static inline void __inc_zone_state(struct zone *zone, enum zone_stat_item item) { atomic_long_inc(&zone->vm_stat[item]); atomic_long_inc(&vm_zone_stat[item]); + if (node_state(zone->zone_pgdat->node_id, N_TOPTIER)) + atomic_long_inc(&vm_toptier_zone_stat[item]); } static inline void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) @@ -337,6 +353,8 @@ static inline void __dec_zone_state(struct zone *zone, enum zone_stat_item item) { atomic_long_dec(&zone->vm_stat[item]); atomic_long_dec(&vm_zone_stat[item]); + if (node_state(zone->zone_pgdat->node_id, N_TOPTIER)) + atomic_long_dec(&vm_toptier_zone_stat[item]); } static inline void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item) diff --git a/mm/vmstat.c b/mm/vmstat.c index f299d2e89acb..b59efbcaef4e 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -161,9 +161,11 @@ void vm_events_fold_cpu(int cpu) * vm_stat contains the global counters */ atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp; +atomic_long_t vm_toptier_zone_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp; atomic_long_t vm_numa_stat[NR_VM_NUMA_STAT_ITEMS] __cacheline_aligned_in_smp; atomic_long_t vm_node_stat[NR_VM_NODE_STAT_ITEMS] __cacheline_aligned_in_smp; EXPORT_SYMBOL(vm_zone_stat); +EXPORT_SYMBOL(vm_toptier_zone_stat); EXPORT_SYMBOL(vm_numa_stat); EXPORT_SYMBOL(vm_node_stat); @@ -695,7 +697,7 @@ EXPORT_SYMBOL(dec_node_page_state); * Returns the number of counters updated. */ #ifdef CONFIG_NUMA -static int fold_diff(int *zone_diff, int *numa_diff, int *node_diff) +static int fold_diff(int *zone_diff, int *numa_diff, int *node_diff, int *toptier_diff) { int i; int changes = 0; @@ -717,6 +719,11 @@ static int fold_diff(int *zone_diff, int *numa_diff, int *node_diff) atomic_long_add(node_diff[i], &vm_node_stat[i]); changes++; } + + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) + if (toptier_diff[i]) { + atomic_long_add(toptier_diff[i], &vm_toptier_zone_stat[i]); + } return changes; } #else @@ -762,6 +769,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) struct zone *zone; int i; int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, }; + int global_toptier_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, }; #ifdef CONFIG_NUMA int global_numa_diff[NR_VM_NUMA_STAT_ITEMS] = { 0, }; #endif @@ -779,6 +787,9 @@ static int refresh_cpu_vm_stats(bool do_pagesets) atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; + if (node_state(zone->zone_pgdat->node_id, N_TOPTIER)) { + global_toptier_diff[i] +=v; + } #ifdef CONFIG_NUMA /* 3 seconds idle till flush */ __this_cpu_write(p->expire, 3); @@ -846,7 +857,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) #ifdef CONFIG_NUMA changes += fold_diff(global_zone_diff, global_numa_diff, - global_node_diff); + global_node_diff, global_toptier_diff); #else changes += fold_diff(global_zone_diff, global_node_diff); #endif @@ -868,6 +879,7 @@ void cpu_vm_stats_fold(int cpu) int global_numa_diff[NR_VM_NUMA_STAT_ITEMS] = { 0, }; #endif int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; + int global_toptier_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; for_each_populated_zone(zone) { struct per_cpu_pageset *p; @@ -910,11 +922,13 @@ void cpu_vm_stats_fold(int cpu) p->vm_node_stat_diff[i] = 0; atomic_long_add(v, &pgdat->vm_stat[i]); global_node_diff[i] += v; + if (node_state(pgdat->node_id, N_TOPTIER)) + global_toptier_diff[i] +=v; } } #ifdef CONFIG_NUMA - fold_diff(global_zone_diff, global_numa_diff, global_node_diff); + fold_diff(global_zone_diff, global_numa_diff, global_node_diff, global_toptier_diff); #else fold_diff(global_zone_diff, global_node_diff); #endif From patchwork Mon Apr 5 17:08:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C0BAC433B4 for ; Mon, 5 Apr 2021 18:09:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 09B1861246 for ; Mon, 5 Apr 2021 18:09:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 09B1861246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E5D026B0082; Mon, 5 Apr 2021 14:09:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF8E76B0083; Mon, 5 Apr 2021 14:09:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF9B36B0085; Mon, 5 Apr 2021 14:09:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 8EE166B0082 for ; Mon, 5 Apr 2021 14:09:12 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5308E824999B for ; Mon, 5 Apr 2021 18:09:12 +0000 (UTC) X-FDA: 77999100144.31.0150F9A Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 4A173C0007D7 for ; Mon, 5 Apr 2021 18:09:12 +0000 (UTC) IronPort-SDR: fsbwn1QlcM0wyb1GMPHYnGlhj2qiWlzlA//++DVpLDERiXW1lyj6Gl8jHshOYvdN3Y5ujyiRUw 7DjJrOgZi32Q== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968225" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968225" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:11 -0700 IronPort-SDR: 0xW5HPtaRmBNDOzGdePrXlWBYYSSOULrHjmoKovNYHXubrBvKHI5PkcdihG+mW6p0cdusOgqex KqVm4nR3S91Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153945" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:10 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 08/11] mm: Add toptier option for mem_cgroup_soft_limit_reclaim() Date: Mon, 5 Apr 2021 10:08:32 -0700 Message-Id: X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4A173C0007D7 X-Stat-Signature: afzc5symmes78u5zy8jrqpbnrt6t17qu Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646152-683251 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add toptier relcaim type in mem_cgroup_soft_limit_reclaim(). This option reclaims top tier memory from cgroups in the order of its excess usage of top tier memory. Signed-off-by: Tim Chen --- include/linux/memcontrol.h | 9 ++++--- mm/memcontrol.c | 48 ++++++++++++++++++++++++-------------- mm/vmscan.c | 4 ++-- 3 files changed, 39 insertions(+), 22 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0ed8ddfd5436..c494c4b11ba2 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -21,6 +21,7 @@ #include #include #include +#include struct mem_cgroup; struct obj_cgroup; @@ -1003,7 +1004,8 @@ static inline void mod_memcg_lruvec_state(struct lruvec *lruvec, unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, - unsigned long *total_scanned); + unsigned long *total_scanned, + enum node_states type); void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count); @@ -1421,8 +1423,9 @@ static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, static inline unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) + gfp_t gfp_mask, + unsigned long *total_scanned, + enum node_states type) { return 0; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8a7648b79635..9f75475ae833 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1875,7 +1875,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, pg_data_t *pgdat, gfp_t gfp_mask, - unsigned long *total_scanned) + unsigned long *total_scanned, + enum node_states type) { struct mem_cgroup *victim = NULL; int total = 0; @@ -1886,7 +1887,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, .pgdat = pgdat, }; - excess = soft_limit_excess(root_memcg, N_MEMORY); + excess = soft_limit_excess(root_memcg, type); while (1) { victim = mem_cgroup_iter(root_memcg, victim, &reclaim); @@ -1915,7 +1916,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, total += mem_cgroup_shrink_node(victim, gfp_mask, false, pgdat, &nr_scanned); *total_scanned += nr_scanned; - if (!soft_limit_excess(root_memcg, N_MEMORY)) + if (!soft_limit_excess(root_memcg, type)) break; } mem_cgroup_iter_break(root_memcg, victim); @@ -3524,7 +3525,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, - unsigned long *total_scanned) + unsigned long *total_scanned, + enum node_states type) { unsigned long nr_reclaimed = 0; struct mem_cgroup_per_node *mz, *next_mz = NULL; @@ -3534,12 +3536,24 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, unsigned long excess; unsigned long nr_scanned; int migration_nid; + enum node_states sibling_type; if (order > 0) return 0; - mctz = soft_limit_tree_node(pgdat->node_id, N_MEMORY); - mctz_sibling = soft_limit_tree_node(pgdat->node_id, N_TOPTIER); + if (type != N_MEMORY && type != N_TOPTIER) + return 0; + + if (type == N_TOPTIER && !node_state(pgdat->node_id, N_TOPTIER)) + return 0; + + if (type == N_TOPTIER) + sibling_type = N_MEMORY; + else + sibling_type = N_TOPTIER; + + mctz = soft_limit_tree_node(pgdat->node_id, type); + mctz_sibling = soft_limit_tree_node(pgdat->node_id, sibling_type); /* * Do not even bother to check the largest node if the root @@ -3558,11 +3572,11 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, if (migration_nid != -1) { struct mem_cgroup_tree_per_node *mmctz; - mmctz = soft_limit_tree_node(migration_nid); + mmctz = soft_limit_tree_node(migration_nid, type); if (mmctz && !RB_EMPTY_ROOT(&mmctz->rb_root)) { pgdat = NODE_DATA(migration_nid); return mem_cgroup_soft_limit_reclaim(pgdat, order, - gfp_mask, total_scanned); + gfp_mask, total_scanned, type); } } @@ -3575,17 +3589,17 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, if (next_mz) mz = next_mz; else - mz = mem_cgroup_largest_soft_limit_node(mctz, N_MEMORY); + mz = mem_cgroup_largest_soft_limit_node(mctz, type); if (!mz) break; nr_scanned = 0; reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat, - gfp_mask, &nr_scanned); + gfp_mask, &nr_scanned, type); nr_reclaimed += reclaimed; *total_scanned += nr_scanned; spin_lock_irq(&mctz->lock); - __mem_cgroup_remove_exceeded(mz, mctz, N_MEMORY); + __mem_cgroup_remove_exceeded(mz, mctz, type); /* * If we failed to reclaim anything from this memory cgroup @@ -3594,9 +3608,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, next_mz = NULL; if (!reclaimed) next_mz = - __mem_cgroup_largest_soft_limit_node(mctz, N_MEMORY); + __mem_cgroup_largest_soft_limit_node(mctz, type); - excess = soft_limit_excess(mz->memcg, N_MEMORY); + excess = soft_limit_excess(mz->memcg, type); /* * One school of thought says that we should not add * back the node to the tree if reclaim returns 0. @@ -3606,17 +3620,17 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, * term TODO. */ /* If excess == 0, no tree ops */ - __mem_cgroup_insert_exceeded(mz, mctz, excess, N_MEMORY); + __mem_cgroup_insert_exceeded(mz, mctz, excess, type); spin_unlock_irq(&mctz->lock); /* update both affected N_MEMORY and N_TOPTIER trees */ if (mctz_sibling) { spin_lock_irq(&mctz_sibling->lock); __mem_cgroup_remove_exceeded(mz, mctz_sibling, - N_TOPTIER); - excess = soft_limit_excess(mz->memcg, N_TOPTIER); + sibling_type); + excess = soft_limit_excess(mz->memcg, sibling_type); __mem_cgroup_insert_exceeded(mz, mctz, excess, - N_TOPTIER); + sibling_type); spin_unlock_irq(&mctz_sibling->lock); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 3b200b7170a9..11bb0c6fa524 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3134,7 +3134,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) nr_soft_scanned = 0; nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone->zone_pgdat, sc->order, sc->gfp_mask, - &nr_soft_scanned); + &nr_soft_scanned, N_MEMORY); sc->nr_reclaimed += nr_soft_reclaimed; sc->nr_scanned += nr_soft_scanned; /* need some check for avoid more shrink_zone() */ @@ -3849,7 +3849,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) sc.nr_scanned = 0; nr_soft_scanned = 0; nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(pgdat, sc.order, - sc.gfp_mask, &nr_soft_scanned); + sc.gfp_mask, &nr_soft_scanned, N_MEMORY); sc.nr_reclaimed += nr_soft_reclaimed; /* From patchwork Mon Apr 5 17:08:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65916C433ED for ; Mon, 5 Apr 2021 18:09:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DFE12613B2 for ; Mon, 5 Apr 2021 18:09:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DFE12613B2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6B5416B0083; Mon, 5 Apr 2021 14:09:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 663CA6B0085; Mon, 5 Apr 2021 14:09:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BC2A6B0087; Mon, 5 Apr 2021 14:09:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 257646B0083 for ; Mon, 5 Apr 2021 14:09:14 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DB9F9F04E for ; Mon, 5 Apr 2021 18:09:13 +0000 (UTC) X-FDA: 77999100186.33.17CB22E Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id E5790C0007C0 for ; Mon, 5 Apr 2021 18:09:13 +0000 (UTC) IronPort-SDR: 0b/dVAmsmREyUHM9PkzqoaFBl3GyFunYobiUHedbgm5TNlpMH60A/8xaOLBNwtd5LWZAriObua +9FmR6oQMyxw== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968236" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968236" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:12 -0700 IronPort-SDR: uRJWNIEvIVenX3zhCfRavpX69HZihlo15E+HdqNsuhc8s2r9sEnAo//jTyEmVjEzmIsM3NyzhF 6er8odyQGP3w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153952" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:12 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 09/11] mm: Use kswapd to demote pages when toptier memory is tight Date: Mon, 5 Apr 2021 10:08:33 -0700 Message-Id: <83c06bf70e38360358c84daab399f18f57e7eba4.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E5790C0007C0 X-Stat-Signature: qzonb9apro7h6gdwjtt3khxhxabjkrq3 Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646153-340470 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Demote pages from memory cgroup that has excess toptier memory usage when top tier memory is tight. When free top tier memory falls below this fraction "toptier_scale_factor/10000" of overall toptier memory in a node, kswapd reclaims top tier memory from those mem cgroups that exceeded their toptier memory soft limit by deomoting the top tier pages to lower memory tier. Signed-off-by: Tim Chen --- Documentation/admin-guide/sysctl/vm.rst | 12 +++++ include/linux/mmzone.h | 2 + mm/page_alloc.c | 14 +++++ mm/vmscan.c | 69 ++++++++++++++++++++++++- 4 files changed, 96 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 9de3847c3469..6b49e2e90953 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -74,6 +74,7 @@ Currently, these files are in /proc/sys/vm: - vfs_cache_pressure - watermark_boost_factor - watermark_scale_factor +- toptier_scale_factor - zone_reclaim_mode @@ -962,6 +963,17 @@ too small for the allocation bursts occurring in the system. This knob can then be used to tune kswapd aggressiveness accordingly. +toptier_scale_factor +==================== + +This factor controls when kswapd wakes up to demote pages of those +cgroups that have exceeded their memory soft limit. + +The unit is in fractions of 10,000. The default value of 2000 means the +if there are less than 20% of free top tier memory in the +node/system, we will start to demote pages of those memory cgroups +that have exceeded their soft memory limit. + zone_reclaim_mode ================= diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bbe649c4fdee..4ee0073d255f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -332,12 +332,14 @@ enum zone_watermarks { WMARK_MIN, WMARK_LOW, WMARK_HIGH, + WMARK_TOPTIER, NR_WMARK }; #define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost) #define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost) #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost) +#define toptier_wmark_pages(z) (z->_watermark[WMARK_TOPTIER] + z->watermark_boost) #define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost) struct per_cpu_pages { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 471a2c342c4f..20f3caee60f3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7964,6 +7964,20 @@ static void __setup_per_zone_wmarks(void) zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2; + tmp = mult_frac(zone_managed_pages(zone), + toptier_scale_factor, 10000); + /* + * Clamp toptier watermark between twice high watermark + * and max managed pages. + */ + if (tmp < 2 * zone->_watermark[WMARK_HIGH]) + tmp = 2 * zone->_watermark[WMARK_HIGH]; + if (tmp > zone_managed_pages(zone)) + tmp = zone_managed_pages(zone); + zone->_watermark[WMARK_TOPTIER] = tmp; + + zone->watermark_boost = 0; + spin_unlock_irqrestore(&zone->lock, flags); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 11bb0c6fa524..270880c8baef 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -185,6 +185,7 @@ static void set_task_reclaim_state(struct task_struct *task, static LIST_HEAD(shrinker_list); static DECLARE_RWSEM(shrinker_rwsem); +int toptier_scale_factor = 2000; #ifdef CONFIG_MEMCG /* @@ -3624,6 +3625,34 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) return false; } +static bool pgdat_toptier_balanced(pg_data_t *pgdat, int order, int classzone_idx) +{ + int i; + unsigned long mark; + struct zone *zone; + + zone = pgdat->node_zones + ZONE_NORMAL; + + if (!node_state(pgdat->node_id, N_TOPTIER) || + next_demotion_node(pgdat->node_id) == -1 || + order > 0 || classzone_idx < ZONE_NORMAL) { + return true; + } + + zone = pgdat->node_zones + ZONE_NORMAL; + + if (!managed_zone(zone)) + return true; + + mark = min(toptier_wmark_pages(zone), + zone_managed_pages(zone)); + + if (zone_page_state(zone, NR_FREE_PAGES) < mark) + return false; + + return true; +} + /* Clear pgdat state for congested, dirty or under writeback. */ static void clear_pgdat_congested(pg_data_t *pgdat) { @@ -4049,6 +4078,39 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o finish_wait(&pgdat->kswapd_wait, &wait); } +static bool toptier_soft_reclaim(pg_data_t *pgdat, + unsigned int reclaim_order, + unsigned int classzone_idx) +{ + unsigned long nr_soft_scanned, nr_soft_reclaimed; + int ret; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .order = reclaim_order, + .may_unmap = 1, + }; + + if (!node_state(pgdat->node_id, N_TOPTIER) || kthread_should_stop()) + return false; + + set_task_reclaim_state(current, &sc.reclaim_state); + + if (!pgdat_toptier_balanced(pgdat, 0, classzone_idx)) { + nr_soft_scanned = 0; + nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(pgdat, + 0, GFP_KERNEL, + &nr_soft_scanned, N_TOPTIER); + } + + set_task_reclaim_state(current, NULL); + + if (prepare_kswapd_sleep(pgdat, reclaim_order, classzone_idx) && + !kthread_should_stop()) + return true; + else + return false; +} + /* * The background pageout daemon, started as a kernel thread * from the init process. @@ -4108,6 +4170,10 @@ static int kswapd(void *p) WRITE_ONCE(pgdat->kswapd_order, 0); WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES); + if (toptier_soft_reclaim(pgdat, 0, + highest_zoneidx)) + goto kswapd_try_sleep; + ret = try_to_freeze(); if (kthread_should_stop()) break; @@ -4173,7 +4239,8 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order, /* Hopeless node, leave it to direct reclaim if possible */ if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES || - (pgdat_balanced(pgdat, order, highest_zoneidx) && + (pgdat_toptier_balanced(pgdat, 0, highest_zoneidx) && + pgdat_balanced(pgdat, order, highest_zoneidx) && !pgdat_watermark_boosted(pgdat, highest_zoneidx))) { /* * There may be plenty of free memory available, but it's too From patchwork Mon Apr 5 17:08:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37EEBC433B4 for ; Mon, 5 Apr 2021 18:09:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C9B09613B4 for ; Mon, 5 Apr 2021 18:09:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C9B09613B4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0255F6B0085; Mon, 5 Apr 2021 14:09:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F18586B0087; Mon, 5 Apr 2021 14:09:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D46976B0088; Mon, 5 Apr 2021 14:09:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id B4EEF6B0085 for ; Mon, 5 Apr 2021 14:09:15 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 78B6910F3F for ; Mon, 5 Apr 2021 18:09:15 +0000 (UTC) X-FDA: 77999100270.34.594F4AE Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 9D669C0001FE for ; Mon, 5 Apr 2021 18:09:15 +0000 (UTC) IronPort-SDR: sKPTf6oyQl04wQ017QKZzIt+zblmk9TXWRnFvV2tGv+gRZK+SDzSmMF5HA/DrZVwDLZxFHRLAQ BU5y1wDxaUog== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968244" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968244" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:14 -0700 IronPort-SDR: BF8rUodZmVF1fwAFjQX5AgUv/krUz5OSXGNZbFGzvAqLPi/WZFv9CT+oNrtAcQEImIJXtV6J2j cbkyqBk7yzoQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153959" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:13 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 10/11] mm: Set toptier_scale_factor via sysctl Date: Mon, 5 Apr 2021 10:08:34 -0700 Message-Id: X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9D669C0001FE X-Stat-Signature: s3nc6uzdpj33xd5cochte5x8utq636nt Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646155-144594 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Update the toptier_scale_factor via sysctl. This variable determines when kswapd wakes up to recalaim toptier memory from those mem cgroups exceeding their toptier memory limit. Signed-off-by: Tim Chen --- include/linux/mm.h | 4 ++++ include/linux/mmzone.h | 2 ++ kernel/sysctl.c | 10 ++++++++++ mm/page_alloc.c | 15 +++++++++++++++ mm/vmstat.c | 2 ++ 5 files changed, 33 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index a43429d51fc0..af39e221d0f9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3179,6 +3179,10 @@ static inline bool debug_guardpage_enabled(void) { return false; } static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ +#ifdef CONFIG_MIGRATION +extern int toptier_scale_factor; +#endif + #if MAX_NUMNODES > 1 void __init setup_nr_node_ids(void); #else diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4ee0073d255f..789319dffe1c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1003,6 +1003,8 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); +int toptier_scale_factor_sysctl_handler(struct ctl_table *, int, + void __user *, size_t *, loff_t *); extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES]; int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 57f89fe1b0f2..e97c974f37b7 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -112,6 +112,7 @@ static int sixty = 60; #endif static int __maybe_unused neg_one = -1; +static int __maybe_unused one = 1; static int __maybe_unused two = 2; static int __maybe_unused three = 3; static int __maybe_unused four = 4; @@ -2956,6 +2957,15 @@ static struct ctl_table vm_table[] = { .extra1 = SYSCTL_ONE, .extra2 = &one_thousand, }, + { + .procname = "toptier_scale_factor", + .data = &toptier_scale_factor, + .maxlen = sizeof(toptier_scale_factor), + .mode = 0644, + .proc_handler = toptier_scale_factor_sysctl_handler, + .extra1 = &one, + .extra2 = &ten_thousand, + }, { .procname = "percpu_pagelist_fraction", .data = &percpu_pagelist_fraction, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 20f3caee60f3..91212a837d8e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8094,6 +8094,21 @@ int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write, return 0; } +int toptier_scale_factor_sysctl_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *length, loff_t *ppos) +{ + int rc; + + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); + if (rc) + return rc; + + if (write) + setup_per_zone_wmarks(); + + return 0; +} + #ifdef CONFIG_NUMA static void setup_min_unmapped_ratio(void) { diff --git a/mm/vmstat.c b/mm/vmstat.c index b59efbcaef4e..c581753cf076 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1658,6 +1658,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, "\n min %lu" "\n low %lu" "\n high %lu" + "\n toptier %lu" "\n spanned %lu" "\n present %lu" "\n managed %lu", @@ -1665,6 +1666,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, min_wmark_pages(zone), low_wmark_pages(zone), high_wmark_pages(zone), + toptier_wmark_pages(zone), zone->spanned_pages, zone->present_pages, zone_managed_pages(zone)); From patchwork Mon Apr 5 17:08:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Chen X-Patchwork-Id: 12183471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C18AC433B4 for ; Mon, 5 Apr 2021 18:09:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D5A27613B1 for ; Mon, 5 Apr 2021 18:09:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5A27613B1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 641176B0088; Mon, 5 Apr 2021 14:09:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C9646B008A; Mon, 5 Apr 2021 14:09:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4422A6B0089; Mon, 5 Apr 2021 14:09:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id 1D3FC6B0087 for ; Mon, 5 Apr 2021 14:09:19 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D1F0B180AD83B for ; Mon, 5 Apr 2021 18:09:18 +0000 (UTC) X-FDA: 77999100396.26.F6FD9F8 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 2444EC0001F7 for ; Mon, 5 Apr 2021 18:09:19 +0000 (UTC) IronPort-SDR: V1kC/6D5hzCDtqm9knWTK92W7FccoER0WEzw9MHTi89A3EI/xY3aOTqfW5ty4/Qf1Pzx+1EuZC Gq4SBa7/6pYA== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968260" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968260" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:17 -0700 IronPort-SDR: Z2d3PhVq6ZdULJ11AoNx+8S2Kcbx3ybsA+3K/PIxzUJNYVASRhxqILbG3I8X2iZHEYMANSa1fu Bl97Vua6IcsQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153974" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:17 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 11/11] mm: Wakeup kswapd if toptier memory need soft reclaim Date: Mon, 5 Apr 2021 10:08:35 -0700 Message-Id: <9d6b4eff73aef26fff3ecfbfc0693b84309d8b4c.1617642418.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2444EC0001F7 X-Stat-Signature: 1nph6phjnk7myupwn77o8x34z8sxquyp Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646159-755494 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Detect during page allocation whether free toptier memory is low. If so, wake up kswapd to reclaim memory from those mem cgroups that have exceeded their limit. Signed-off-by: Tim Chen --- include/linux/mmzone.h | 3 +++ mm/page_alloc.c | 2 ++ mm/vmscan.c | 2 +- 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 789319dffe1c..3603948e95cc 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -886,6 +886,8 @@ bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned int alloc_flags); bool zone_watermark_ok_safe(struct zone *z, unsigned int order, unsigned long mark, int highest_zoneidx); +bool pgdat_toptier_balanced(pg_data_t *pgdat, int order, int classzone_idx); + /* * Memory initialization context, use to differentiate memory added by * the platform statically or via memory hotplug interface. @@ -1466,5 +1468,6 @@ void sparse_init(void); #endif #endif /* !__GENERATING_BOUNDS.H */ + #endif /* !__ASSEMBLY__ */ #endif /* _LINUX_MMZONE_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 91212a837d8e..ca8aa789a967 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3519,6 +3519,8 @@ struct page *rmqueue(struct zone *preferred_zone, if (test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) { clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); wakeup_kswapd(zone, 0, 0, zone_idx(zone)); + } else if (!pgdat_toptier_balanced(zone->zone_pgdat, order, zone_idx(zone))) { + wakeup_kswapd(zone, 0, 0, zone_idx(zone)); } VM_BUG_ON_PAGE(page && bad_range(zone, page), page); diff --git a/mm/vmscan.c b/mm/vmscan.c index 270880c8baef..8fe709e3f5e4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3625,7 +3625,7 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) return false; } -static bool pgdat_toptier_balanced(pg_data_t *pgdat, int order, int classzone_idx) +bool pgdat_toptier_balanced(pg_data_t *pgdat, int order, int classzone_idx) { int i; unsigned long mark;