[03/10] mm/migrate: update node demotion order during on hotplug events

Message ID	20210401183221.977831DE@viggo.jf.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=d9cH=I6=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D98B960FE6 IronPort-SDR: XoA6Kvqm68JKskqHp9KE2aGLCUOs6sh5G2Bum/5gm/PQ2KmDTZ91CjT3Xue7jqi6e7F6inwN+K W/zU0s2TSfXg== IronPort-SDR: Y0roNlBaLwmP8lEi19A2FmT1Z4gnlFleFU7X65ylshetlv0eFrmuP77Um1v6XSAHXDK+Pc6mF7 BKvnN7AGyrbw== Subject: [PATCH 03/10] mm/migrate: update node demotion order during on hotplug events To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org,Dave Hansen <dave.hansen@linux.intel.com>,shy828301@gmail.com,weixugc@google.com,rientjes@google.com,ying.huang@intel.com,dan.j.williams@intel.com,david@redhat.com,osalvador@suse.de From: Dave Hansen <dave.hansen@linux.intel.com> Date: Thu, 01 Apr 2021 11:32:21 -0700 References: <20210401183216.443C4443@viggo.jf.intel.com> In-Reply-To: <20210401183216.443C4443@viggo.jf.intel.com> Message-Id: <20210401183221.977831DE@viggo.jf.intel.com> Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from="<dave.hansen@linux.intel.com>"; helo=mga02.intel.com; client-ip=134.134.136.20 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Migrate Pages in lieu of discard \| expand [00/10,v7,RESEND] Migrate Pages in lieu of discard [01/10] mm/numa: node demotion data structure and lookup [02/10] mm/numa: automatically generate node migration order [03/10] mm/migrate: update node demotion order during on hotplug events [04/10] mm/migrate: make migrate_pages() return nr_succeeded [05/10] mm/migrate: demote pages during reclaim [06/10] mm/vmscan: add page demotion counter [07/10] mm/vmscan: add helper for querying ability to age anonymous pages [08/10] mm/vmscan: Consider anonymous pages without swap [09/10] mm/vmscan: never demote for memcg reclaim [10/10] mm/migrate: new zone_reclaim_mode to enable reclaim migration

diff -puN mm/migrate.c~enable-numa-demotion mm/migrate.c --- a/mm/migrate.c~enable-numa-demotion 2021-03-31 15:17:13.056000258 -0700 +++ b/mm/migrate.c 2021-03-31 15:17:13.062000258 -0700 @@ -49,6 +49,7 @@ #include <linux/sched/mm.h> #include <linux/ptrace.h> #include <linux/oom.h> +#include <linux/memory.h> #include <asm/tlbflush.h> @@ -1198,8 +1199,12 @@ out: */ /* - * Writes to this array occur without locking. READ_ONCE() - * is recommended for readers to ensure consistent reads. + * Writes to this array occur without locking. Cycles are + * not allowed: Node X demotes to Y which demotes to X... + * + * If multiple reads are performed, a single rcu_read_lock() + * must be held over all reads to ensure that no cycles are + * observed. */ static int node_demotion[MAX_NUMNODES] __read_mostly = {[0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE}; @@ -1215,13 +1220,22 @@ static int node_demotion[MAX_NUMNODES] _ */ int next_demotion_node(int node) { + int target; + /* - * node_demotion[] is updated without excluding - * this function from running. READ_ONCE() avoids - * reading multiple, inconsistent 'node' values - * during an update. + * node_demotion[] is updated without excluding this + * function from running. RCU doesn't provide any + * compiler barriers, so the READ_ONCE() is required + * to avoid compiler reordering or read merging. + * + * Make sure to use RCU over entire code blocks if + * node_demotion[] reads need to be consistent. */ - return READ_ONCE(node_demotion[node]); + rcu_read_lock(); + target = READ_ONCE(node_demotion[node]); + rcu_read_unlock(); + + return target; } /* @@ -3226,8 +3240,9 @@ void migrate_vma_finalize(struct migrate EXPORT_SYMBOL(migrate_vma_finalize); #endif /* CONFIG_DEVICE_PRIVATE */ +#if defined(CONFIG_MEMORY_HOTPLUG) /* Disable reclaim-based migration. */ -static void disable_all_migrate_targets(void) +static void __disable_all_migrate_targets(void) { int node; @@ -3235,6 +3250,25 @@ static void disable_all_migrate_targets( node_demotion[node] = NUMA_NO_NODE; } +static void disable_all_migrate_targets(void) +{ + __disable_all_migrate_targets(); + + /* + * Ensure that the "disable" is visible across the system. + * Readers will see either a combination of before+disable + * state or disable+after. They will never see before and + * after state together. + * + * The before+after state together might have cycles and + * could cause readers to do things like loop until this + * function finishes. This ensures they can only see a + * single "bad" read and would, for instance, only loop + * once. + */ + synchronize_rcu(); +} + /* * Find an automatic demotion target for 'node'. * Failing here is OK. It might just indicate @@ -3297,20 +3331,6 @@ static void __set_migration_target_nodes disable_all_migrate_targets(); /* - * Ensure that the "disable" is visible across the system. - * Readers will see either a combination of before+disable - * state or disable+after. They will never see before and - * after state together. - * - * The before+after state together might have cycles and - * could cause readers to do things like loop until this - * function finishes. This ensures they can only see a - * single "bad" read and would, for instance, only loop - * once. - */ - smp_wmb(); - - /* * Allocations go close to CPUs, first. Assume that * the migration path starts at the nodes with CPUs. */ @@ -3347,10 +3367,96 @@ again: /* * For callers that do not hold get_online_mems() already. */ -__maybe_unused // <- temporay to prevent warnings during bisects static void set_migration_target_nodes(void) { get_online_mems(); __set_migration_target_nodes(); put_online_mems(); } + +/* + * React to hotplug events that might affect the migration targets + * like events that online or offline NUMA nodes. + * + * The ordering is also currently dependent on which nodes have + * CPUs. That means we need CPU on/offline notification too. + */ +static int migration_online_cpu(unsigned int cpu) +{ + set_migration_target_nodes(); + return 0; +} + +static int migration_offline_cpu(unsigned int cpu) +{ + set_migration_target_nodes(); + return 0; +} + +/* + * This leaves migrate-on-reclaim transiently disabled between + * the MEM_GOING_OFFLINE and MEM_OFFLINE events. This runs + * whether reclaim-based migration is enabled or not, which + * ensures that the user can turn reclaim-based migration at + * any time without needing to recalculate migration targets. + * + * These callbacks already hold get_online_mems(). That is why + * __set_migration_target_nodes() can be used as opposed to + * set_migration_target_nodes(). + */ +static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, + unsigned long action, void *arg) +{ + switch (action) { + case MEM_GOING_OFFLINE: + /* + * Make sure there are not transient states where + * an offline node is a migration target. This + * will leave migration disabled until the offline + * completes and the MEM_OFFLINE case below runs. + */ + disable_all_migrate_targets(); + break; + case MEM_OFFLINE: + case MEM_ONLINE: + /* + * Recalculate the target nodes once the node + * reaches its final state (online or offline). + */ + __set_migration_target_nodes(); + break; + case MEM_CANCEL_OFFLINE: + /* + * MEM_GOING_OFFLINE disabled all the migration + * targets. Reenable them. + */ + __set_migration_target_nodes(); + break; + case MEM_GOING_ONLINE: + case MEM_CANCEL_ONLINE: + break; + } + + return notifier_from_errno(0); +} + +static int __init migrate_on_reclaim_init(void) +{ + int ret; + + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "migrate on reclaim", + migration_online_cpu, + migration_offline_cpu); + /* + * In the unlikely case that this fails, the automatic + * migration targets may become suboptimal for nodes + * where N_CPU changes. With such a small impact in a + * rare case, do not bother trying to do anything special. + */ + WARN_ON(ret < 0); + + hotplug_memory_notifier(migrate_on_reclaim_callback, 100); + return 0; +} +late_initcall(migrate_on_reclaim_init); +#endif /* CONFIG_MEMORY_HOTPLUG */

[03/10] mm/migrate: update node demotion order during on hotplug events

Commit Message

Comments

Patch