From patchwork Tue Dec 10 21:37:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13902106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFCBEE77180 for ; Tue, 10 Dec 2024 21:38:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DE308D001A; Tue, 10 Dec 2024 16:38:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 717BC8D0013; Tue, 10 Dec 2024 16:38:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 543938D001A; Tue, 10 Dec 2024 16:38:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2F6348D0013 for ; Tue, 10 Dec 2024 16:38:21 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DCE82C0EB5 for ; Tue, 10 Dec 2024 21:38:20 +0000 (UTC) X-FDA: 82880362950.29.CBB6CED Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf29.hostedemail.com (Postfix) with ESMTP id C4A63120016 for ; Tue, 10 Dec 2024 21:37:47 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=VaUX+Zyd; dmarc=none; spf=pass (imf29.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.174 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733866689; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+9y/Pmi/E+qcbGw7qvBfkROEFTzGsGqs1ClRtYT08n0=; b=MTlSCNJderHz/Ja2jMyv5oEkxcf69gifMtym9YKNjRJuS0uEzbHKPInm4R5xN/za99zhnH YJzvPXSjcjRyP8vA/PHWoMIk19/N5QPUoF6t6KwWWw6VGR5KNJn4G29BrA+N+VUxPWY6EU saUI/tiTkDAPC66ClHZDlguiKjasViA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733866689; a=rsa-sha256; cv=none; b=ahI646KdF6XI9IvfYB/osQ9+ZuuIwZo8yhH85NElhpun7EIrrvED+SHMzlfS5Bza8eRoDr BGI3/HGjNA5y69jyPSpXjNQa63ZFUf2m6S6KEnTgvHyITfIFUMR/i4zXjQHfbNz8Xa0Mmd G/9B+h0c0lrzh51Un1pgOYx1EQ5aMXA= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=VaUX+Zyd; dmarc=none; spf=pass (imf29.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.174 as permitted sender) smtp.mailfrom=gourry@gourry.net Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-46785fbb949so7815461cf.3 for ; Tue, 10 Dec 2024 13:38:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1733866698; x=1734471498; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+9y/Pmi/E+qcbGw7qvBfkROEFTzGsGqs1ClRtYT08n0=; b=VaUX+Zydk6lw1PBhjXbCpmJaBRn75ywZxSqivoyI+cJrtaHJVvtO6jT/jZuUlhxrBA 5aTtzpdBrFGHTOu8Oos16rGDb1MRMQqQM2+YAeRQRytL+xCuelKdbW0BUedNX+kUZJjc SvSQHQDWXYhzuB0bSwgqEbt6aX87WqWv6jy7bmSHrfG0vgWkZGRU4d48MxUW+lpo+8+Z 7/qVyCqiRgP8cBIrzbtzLpUb92c8M6EFfiMS+VUHjuxiBxCDyyhFUo/bXJ/QEGNhQs3D tIZuW+uKGUn5ToZx5IBRUwKCsYOMcL7lo/Q1XAHexXnGABkitqZNjEMr7tn1IN1FSyVq sPkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733866698; x=1734471498; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+9y/Pmi/E+qcbGw7qvBfkROEFTzGsGqs1ClRtYT08n0=; b=hJjYUNlE01CVT/lSRurqI+uNFFEsznkUcD4awn+FpeP9Ykhhl7mZDPDat0lUjCtbYA +z3S6unmRMdadN+k3gGhdO8pvx58cI6/+0dogkhJzq0ZtQqwcyBVEJEQxK4vWn7Jjnyu Zqhk0gt91k0y6V3prNeg6caRGaODPJCCDDb+VEtxw2Q+GKuCvR57u5dQWeUKakjnzhJW 9pHtBMRJOh+FcxpaRj8OhB/EZSdJdBeU/M3zh8vZgODp0gV8S/NahqMy46uypldlKhw1 t+vaBN8/fcoTGi4Koz8LW0oySHPTYz66zWroHPHYhzfEib1Sd1oFQD+xDAJMILtJXUi0 mlxQ== X-Gm-Message-State: AOJu0YywKWAREbQ72kdS7LZDYrLmDmnbfgRPcW2gWFNnudVO6ANwvsPJ 3MdCcoVtu3qOc05V3ZyYOp4VZK7wI9lpqahiXAv/conRTWpm0n905CoYaYnEmeWkgwYacvIV9pk y X-Gm-Gg: ASbGnctcE93K8EijjcCXGDMLOydPZ6ExefWtT7hWayu9ug0G353T6DC0ig4vtaHG3o5 UZuq1vMxuTkvxtiKK2s8uocaP3X2f9r3Sja8s5A01dKePVvE+WB1lFrGQ5JOREfEcsC/OspJACb 9gleIg2T36Iq12FeAka8MUq+Z9bDdGaZNYSZzaj0EItooTQduytqa5xCwAgciIBgx54HhiGp3ut sV2PY7aeK6KFmk63Zu5JJBHNy3lTRV8U/QzmfdlctOUQ6JB2a/PvXYEY3Ly/RDgdR3bVQ1Hkz2s tHCMWo3irewNNOvWKeS1ALCrnTMvKvHAQ6ZDb+k= X-Google-Smtp-Source: AGHT+IFxNZ06/2veak16EjkYa8clNKlgcdi7wLb2OYdhSxCOFJs32hkfuiKlgIR8eXNwOyd59R311Q== X-Received: by 2002:a05:622a:1815:b0:467:7208:8eeb with SMTP id d75a77b69052e-467893769a8mr9097551cf.37.1733866697811; Tue, 10 Dec 2024 13:38:17 -0800 (PST) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-467296f61f6sm65978991cf.43.2024.12.10.13.38.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 13:38:17 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, nehagholkar@meta.com, abhishekd@meta.com, kernel-team@meta.com, david@redhat.com, nphamcs@gmail.com, gourry@gourry.net, akpm@linux-foundation.org, hannes@cmpxchg.org, kbusch@meta.com, ying.huang@linux.alibaba.com Subject: [RFC v2 PATCH 5/5] migrate,sysfs: add pagecache promotion Date: Tue, 10 Dec 2024 16:37:44 -0500 Message-ID: <20241210213744.2968-6-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241210213744.2968-1-gourry@gourry.net> References: <20241210213744.2968-1-gourry@gourry.net> MIME-Version: 1.0 X-Stat-Signature: 7f14wfa839jtww7cqe5aq7uzn3y5aksc X-Rspamd-Queue-Id: C4A63120016 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1733866667-888689 X-HE-Meta: U2FsdGVkX1/uINwD4ARUtCM1/B4F+cpV8atxXwaTuctIUseLZVLGMNqIUiCOW5+ukxTGYOCcgauOghPjyR4zKIobVGOsykojrvqDvMdHDcpQP6x5qITkm7cSOB9J9BYkFnmixnjGe3bDAjFidXSWgRWeyGyGnzXYfHZf5lnJxBxSzFps7jRK56HDT/HYPk730bnrNfYqYTM43G5AU6NWoJvD5hFrq5Ydg3Ef6wIGa1uehApzbFVzsAFGv2GhdfSQvfLaAdWLl+enuJ5HNFSNYnRn5tz1ZV6wn8ladPSQ26uKmmHV0GYzMXOJs6jPDtHHLij9htbq6Iw/pUBm3bItZkWFABS3DmrPt33moVD4BDVXagMs5gyYIXmvxf5foI1BteDwBbcm8UzAnUUht/fMaohiaQfr3y13k/pFmKvwZlNNSYYLMfLLpIbziOXuafOwTN69UFtMECM8gyQuK5KXI4VXD7sszU2cE6l6o3hRrvAhvH3l3UVNky4+/u4ezvXzNHrowK26fN2Z5ArY71PL8KiPkoO5iaKbu9XJCx4kP5qS3Q58jp1CTrrthEUjiReYPYhknd0RsHQD75EKUgWf14JrHPMN4ju3AzPnB3VbT4lsdSZ6yuzTPzxiDYdiwIEujwqDoqk/AeIJtIbDboxTiHw4CrenOh3W7eQgFD3rqkAKKvI3xVu6u2TMczRqs1UQ14mHpEeh7Eoggq+lW1Dn7y2utmtjBDT7ly6GG3cLqsLm1nPVdwyG5TlEL6cA2hY9L2zHiKDtpq1qv20hcvE35gfDYlbr6RI0to9/bJq5at6a5HMV9F/80iV4pk9BKaNgbP1ozgqfvFyXHEkAVnsAAOwZInBz2c5hPaZ28hsLXCGAfgX118bvuN33aHTGIXJoAWrcJY2ONFJI7m+rmnoKSz2BIj9RqeIqbE8ypscZ1SkmszIWcM/twtXmBH6nov0C1adqaQyGD+06KlnYwPl jUzEHpJ8 0UsAmLwQ38rNFLjhjCdyCgqzSgvzVS5ROQZlRvQpbaTcc9N1znr7XyPyVa5TvPBRvbodViKDR5zCi12AXWa7jq82QQRVplqTUOzIUrb9IpQGozeQ8Mm5hhF57qE2lToDXJZKI4j69Wf/eCQh8eySM8X6paf4F/5M5d8Jzg13JjjhtqQrTpDZEUk2m5oY+JbPqoKhFUKai36BvmELMFyxZdzVKweidXxuJEE3bJpv9vdXXRh03eWlb6aLaQBxj7TkxfJ0PgRy+fBMeCYobq5S/lxVLZ2VVY70cBfQwN3amQTobPgctFuJ0leC59PYV802pcuxoAIm0bBxrRAk/3bRLkFFHndYCvWfalWvDodqFkUBdTQyv2X9vum6BQ4F+GoWP9sc1kohsuH8JRWXd9If+jbp7pjtXzhONKB6RPDfZ/XUNAX0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: adds /sys/kernel/mm/numa/pagecache_promotion_enabled When page cache lands on lower tiers, there is no way for promotion to occur unless it becomes memory-mapped and exposed to NUMA hint faults. Just adding a mechanism to promote pages unconditionally, however, opens up significant possibility of performance regressions. Similar to the `demotion_enabled` sysfs entry, provide a sysfs toggle to enable and disable page cache promotion. This option will enable opportunistic promotion of unmapped page cache during syscall access. This option is intended for operational conditions where demoted page cache will eventually contain memory which becomes hot - and where said memory likely to cause performance issues due to being trapped on the lower tier of memory. A Page Cache folio is considered a promotion candidates when: 0) tiering and pagecache-promotion are enabled 1) the folio reside on a node not in the top tier 2) the folio is already marked referenced and active. 3) Multiple accesses in (referenced & active) state occur quickly. Since promotion is not safe to execute unconditionally from within folio_mark_accessed, we defer promotion to a new task_work captured in the task_struct. This ensures that the task doing the access has some hand in promoting pages - even among deduplicated read only files. We use numa_hint_fault_latency to help identify when a folio is accessed multiple times in a short period. Along with folio flag checks, this helps us minimize promoting pages on the first few accesses. The promotion node is always the local node of the promoting cpu. Suggested-by: Johannes Weiner Signed-off-by: Gregory Price --- .../ABI/testing/sysfs-kernel-mm-numa | 20 +++++++ include/linux/memory-tiers.h | 2 + include/linux/migrate.h | 2 + include/linux/sched.h | 3 + include/linux/sched/numa_balancing.h | 5 ++ init/init_task.c | 1 + kernel/sched/fair.c | 26 +++++++- mm/memory-tiers.c | 27 +++++++++ mm/migrate.c | 59 +++++++++++++++++++ mm/swap.c | 3 + 10 files changed, 147 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation/ABI/testing/sysfs-kernel-mm-numa index 77e559d4ed80..b846e7d80cba 100644 --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa @@ -22,3 +22,23 @@ Description: Enable/disable demoting pages during reclaim the guarantees of cpusets. This should not be enabled on systems which need strict cpuset location guarantees. + +What: /sys/kernel/mm/numa/pagecache_promotion_enabled +Date: November 2024 +Contact: Linux memory management mailing list +Description: Enable/disable promoting pages during file access + + Page migration during file access is intended for systems + with tiered memory configurations that have significant + unmapped file cache usage. By default, file cache memory + on slower tiers will not be opportunistically promoted by + normal NUMA hint faults, because the system has no way to + track them. This option enables opportunistic promotion + of pages that are accessed via syscall (e.g. read/write) + if multiple accesses occur in quick succession. + + It may move data to a NUMA node that does not fall into + the cpuset of the allocating process which might be + construed to violate the guarantees of cpusets. This + should not be enabled on systems which need strict cpuset + location guarantees. diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 0dc0cf2863e2..fa96a67b8996 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -37,6 +37,7 @@ struct access_coordinate; #ifdef CONFIG_NUMA extern bool numa_demotion_enabled; +extern bool numa_pagecache_promotion_enabled; extern struct memory_dev_type *default_dram_type; extern nodemask_t default_dram_nodes; struct memory_dev_type *alloc_memory_type(int adistance); @@ -76,6 +77,7 @@ static inline bool node_is_toptier(int node) #else #define numa_demotion_enabled false +#define numa_pagecache_promotion_enabled false #define default_dram_type NULL #define default_dram_nodes NODE_MASK_NONE /* diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 29919faea2f1..cf58a97d4216 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -145,6 +145,7 @@ const struct movable_operations *page_movable_ops(struct page *page) int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node); int migrate_misplaced_folio(struct folio *folio, int node); +void promotion_candidate(struct folio *folio); #else static inline int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node) @@ -155,6 +156,7 @@ static inline int migrate_misplaced_folio(struct folio *folio, int node) { return -EAGAIN; /* can't migrate now */ } +static inline void promotion_candidate(struct folio *folio) { } #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_MIGRATION diff --git a/include/linux/sched.h b/include/linux/sched.h index d380bffee2ef..faa84fb7a756 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1356,6 +1356,9 @@ struct task_struct { unsigned long numa_faults_locality[3]; unsigned long numa_pages_migrated; + + struct callback_head numa_promo_work; + struct list_head promo_list; #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_RSEQ diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h index 52b22c5c396d..cc7750d754ff 100644 --- a/include/linux/sched/numa_balancing.h +++ b/include/linux/sched/numa_balancing.h @@ -32,6 +32,7 @@ extern void set_numabalancing_state(bool enabled); extern void task_numa_free(struct task_struct *p, bool final); bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio, int src_nid, int dst_cpu); +int numa_hint_fault_latency(struct folio *folio); #else static inline void task_numa_fault(int last_node, int node, int pages, int flags) @@ -52,6 +53,10 @@ static inline bool should_numa_migrate_memory(struct task_struct *p, { return true; } +static inline int numa_hint_fault_latency(struct folio *folio) +{ + return 0; +} #endif #endif /* _LINUX_SCHED_NUMA_BALANCING_H */ diff --git a/init/init_task.c b/init/init_task.c index e557f622bd90..f831980748c4 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -187,6 +187,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = { .numa_preferred_nid = NUMA_NO_NODE, .numa_group = NULL, .numa_faults = NULL, + .promo_list = LIST_HEAD_INIT(init_task.promo_list), #endif #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) .kasan_depth = 1, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a59ae2e23daf..047f02091773 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -1842,7 +1843,7 @@ static bool pgdat_free_space_enough(struct pglist_data *pgdat) * The smaller the hint page fault latency, the higher the possibility * for the page to be hot. */ -static int numa_hint_fault_latency(struct folio *folio) +int numa_hint_fault_latency(struct folio *folio) { int last_time, time; @@ -3534,6 +3535,27 @@ static void task_numa_work(struct callback_head *work) } } +static void task_numa_promotion_work(struct callback_head *work) +{ + struct task_struct *p = current; + struct list_head *promo_list = &p->promo_list; + struct folio *folio, *tmp; + int nid = numa_node_id(); + + SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_promo_work)); + + work->next = work; + + if (list_empty(promo_list)) + return; + + list_for_each_entry_safe(folio, tmp, promo_list, lru) { + list_del_init(&folio->lru); + migrate_misplaced_folio(folio, nid); + } +} + + void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) { int mm_users = 0; @@ -3558,8 +3580,10 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) RCU_INIT_POINTER(p->numa_group, NULL); p->last_task_numa_placement = 0; p->last_sum_exec_runtime = 0; + INIT_LIST_HEAD(&p->promo_list); init_task_work(&p->numa_work, task_numa_work); + init_task_work(&p->numa_promo_work, task_numa_promotion_work); /* New address space, reset the preferred nid */ if (!(clone_flags & CLONE_VM)) { diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index fc14fe53e9b7..4c44598e485e 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -935,6 +935,7 @@ static int __init memory_tier_init(void) subsys_initcall(memory_tier_init); bool numa_demotion_enabled = false; +bool numa_pagecache_promotion_enabled; #ifdef CONFIG_MIGRATION #ifdef CONFIG_SYSFS @@ -957,11 +958,37 @@ static ssize_t demotion_enabled_store(struct kobject *kobj, return count; } +static ssize_t pagecache_promotion_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "%s\n", + numa_pagecache_promotion_enabled ? "true" : "false"); +} + +static ssize_t pagecache_promotion_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + + ret = kstrtobool(buf, &numa_pagecache_promotion_enabled); + if (ret) + return ret; + + return count; +} + + static struct kobj_attribute numa_demotion_enabled_attr = __ATTR_RW(demotion_enabled); +static struct kobj_attribute numa_pagecache_promotion_enabled_attr = + __ATTR_RW(pagecache_promotion_enabled); + static struct attribute *numa_attrs[] = { &numa_demotion_enabled_attr.attr, + &numa_pagecache_promotion_enabled_attr.attr, NULL, }; diff --git a/mm/migrate.c b/mm/migrate.c index af07b399060b..320258a1aaba 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include @@ -2710,5 +2712,62 @@ int migrate_misplaced_folio(struct folio *folio, int node) BUG_ON(!list_empty(&migratepages)); return nr_remaining ? -EAGAIN : 0; } + +/** + * promotion_candidate() - report a promotion candidate folio + * + * @folio: The folio reported as a candidate + * + * Records folio access time and places the folio on the task promotion list + * if access time is less than the threshold. The folio will be isolated from + * LRU if selected, and task_work will putback the folio on promotion failure. + * + * If selected, takes a folio reference to be released in task work. + */ +void promotion_candidate(struct folio *folio) +{ + struct task_struct *task = current; + struct list_head *promo_list = &task->promo_list; + struct callback_head *work = &task->numa_promo_work; + struct address_space *mapping = folio_mapping(folio); + bool write = mapping ? mapping->gfp_mask & __GFP_WRITE : false; + int nid = folio_nid(folio); + int flags, last_cpupid; + + /* + * Only do this work if: + * 1) tiering and pagecache promotion are enabled + * 2) the page can actually be promoted + * 3) The hint-fault latency is relatively hot + * 4) the folio is not already isolated + * 5) This is not a kernel thread context + */ + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) || + !numa_pagecache_promotion_enabled || + node_is_toptier(nid) || + numa_hint_fault_latency(folio) >= PAGE_ACCESS_TIME_MASK || + folio_test_isolated(folio) || + (current->flags & PF_KTHREAD)) { + return; + } + + nid = numa_migrate_check(folio, NULL, 0, &flags, write, &last_cpupid); + if (nid == NUMA_NO_NODE) + return; + + if (migrate_misplaced_folio_prepare(folio, NULL, nid)) + return; + + /* + * Ensure task can schedule work, otherwise we'll leak folios. + * If the list is not empty, task work has already been scheduled. + */ + if (list_empty(promo_list) && task_work_add(task, work, TWA_RESUME)) { + folio_putback_lru(folio); + return; + } + list_add(&folio->lru, promo_list); +} +EXPORT_SYMBOL(promotion_candidate); #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_NUMA */ diff --git a/mm/swap.c b/mm/swap.c index 320b959b74c6..57909c349388 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "internal.h" @@ -469,6 +470,8 @@ void folio_mark_accessed(struct folio *folio) __lru_cache_activate_folio(folio); folio_clear_referenced(folio); workingset_activation(folio); + } else { + promotion_candidate(folio); } if (folio_test_idle(folio)) folio_clear_idle(folio);