From patchwork Tue Apr 8 08:41:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 14042403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B36C2C3600C for ; Tue, 8 Apr 2025 08:42:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0BF46B000C; Tue, 8 Apr 2025 04:42:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E951E6B000D; Tue, 8 Apr 2025 04:42:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C74BE6B000E; Tue, 8 Apr 2025 04:42:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 95C8E6B000C for ; Tue, 8 Apr 2025 04:42:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4C8A112073A for ; Tue, 8 Apr 2025 08:42:22 +0000 (UTC) X-FDA: 83310234924.30.134F070 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf25.hostedemail.com (Postfix) with ESMTP id 24500A000B for ; Tue, 8 Apr 2025 08:42:19 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=vaKKwyCW; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3omQtDWM; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=OstABFl3; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=KjDnF2o5; spf=pass (imf25.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744101740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iRJIRa1nusEckmUT9EBTkRycb79M/nbC/V9st37VTJs=; b=bB+jjAmBcvStQBrYwpEVfIK4ojsx/7/BEdat2TZedLvncEuOt58pd3HkvbOdTv234mt9oK Z+s2F5DLI7X5a+Qbg5aLPL1kLNHk9OKauQ8C2/q+5nBZkOfSVDOd0i8KCtowCgCSjqDg2j DgInIpfUB7kIjNm2lU800+auIugmVuk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=vaKKwyCW; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3omQtDWM; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=OstABFl3; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=KjDnF2o5; spf=pass (imf25.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744101740; a=rsa-sha256; cv=none; b=auO2Bxc6XzQ/KQNVytMzfeT2wPKxlOdiOKpCVaLjqUX6Ym/7h5HdMBeftvnGvOK7aJG9gI 5W85uAwtIq/uBwmzhpJC3I7lmcQeDSmACaFlDLIiucwaE1igRwHyx7sXJ+93mEECdKKfg9 qaBcBJboZU2/v+FgkyDXTHtljh69inU= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 719BA1F395; Tue, 8 Apr 2025 08:42:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1744101738; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iRJIRa1nusEckmUT9EBTkRycb79M/nbC/V9st37VTJs=; b=vaKKwyCWmNRP3SalZqr1q/030/WDz2TCiKPv8SlnHE2GKCcpcWKnfXXQICKvtTnWs/2WHV AQX7LgRiTjbXqaJxMkmw00XngdC6V2e2Cnpj3SU4KDNngZ38HlSeEG0TW6ZKmS7d4ZugN8 m/gvCvVcHDtQuTKnkrrEjemQh9zK/Lc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1744101738; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iRJIRa1nusEckmUT9EBTkRycb79M/nbC/V9st37VTJs=; b=3omQtDWM/9f5HhTy72ygeGc5pKx//1Hc6NAW7A3KcFjPj2tKojAyJpSl4UsuXSVMYgYpIG h71GeuoCq6ahzJCA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1744101737; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iRJIRa1nusEckmUT9EBTkRycb79M/nbC/V9st37VTJs=; b=OstABFl3WVco/MvfHCQY+CuIJ5OEg+hazdrrs6OrXLmJoPmuQyaHAs27HrI0iz4+RYAj/j gLZLAbZqe5WeuZ7EGTX7HMTtqk4cjqugetYe1VdAFrhrjT/Nm7uXqDzGYeL3wYL7l9rMfS AleCFiNFOhTDYeyI9ceNvhpkiLTF+Mo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1744101737; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iRJIRa1nusEckmUT9EBTkRycb79M/nbC/V9st37VTJs=; b=KjDnF2o5clbpOhry3V4r98AXIUAmyAvAZeMV8E9Owgbh+TFsCnySO/XqW4NGZbotJzb9Xu c1gjlZSZ+YBUwICw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E4D6A13691; Tue, 8 Apr 2025 08:42:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id sEziNGjh9Gd6VgAAD6G6ig (envelope-from ); Tue, 08 Apr 2025 08:42:16 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Harry Yoo , Jonathan Cameron , linux-cxl@vger.kernel.org, Oscar Salvador Subject: [PATCH v2 2/3] mm,memory_hotplug: Implement numa node notifier Date: Tue, 8 Apr 2025 10:41:52 +0200 Message-ID: <20250408084153.255762-3-osalvador@suse.de> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250408084153.255762-1-osalvador@suse.de> References: <20250408084153.255762-1-osalvador@suse.de> MIME-Version: 1.0 X-Rspamd-Queue-Id: 24500A000B X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: i1oqaw4nwp36yeyajmqymrj9yc4tqy5g X-HE-Tag: 1744101739-764903 X-HE-Meta: U2FsdGVkX1/3avilJ5hj9xMddIsjvHT7s8LIH9k/iA1gQdUJrbl8uNEVCBQJCGROoTllEy3VHNSnWynnLJoZXRrYlgbuBr+dIhFjaVVwiNkH8PnY81skcmrSz9mC3nWZRPe1Xf6jYDTCmg6IGc+mo9m0cB7yKXq7H4P4r/IItDq6UDcQKIfGAvWowOh1ytU0Riy0MxQuWbEX7zjVVmNdyRf7Uo4pVju5IgzKRPwlf3Men4Tczga1u+xlJqPRu3HwT9ip9dFvIEOVxe06ViSePbtHjmSw4MYGurKzHPBETGqL+GvFscvP9mssma9FH7O6UfI6U2Sdew3WOIz3lXqZ746D/8fkTTyl6IqHz++BOCelF/4Xf5l8K98W38xjx9OUcxHUExXU0c5SikLcwxnUTeDSj9IntGZJr15MdHXWKyEZRhMaKXUumKn2yLpN2oKf1dY/tx7otnvyr4XhWXCwz4B8dV/3MC52QnKA5ZskWN92yS3PsCdOWipxqdUaPcmqBuLLhJprRnrAa7tEheHWUWJrNvSAO6UJWK/1OgYbKCrBHIGfrDkVZUp07W2Hm0SfQulo/pCa+iF65FmiECDy2EjZ29ORLeQXZWwEDKj/xAKYqcTWvSW9zh4uriKTwD6/2S1/hyCxhatpRNWn9B2AWQ5eryEEPcgT1RtM1GIVG6jSKEc/OGyDwbMQm/PUxxqnFM0l7JaL8nZZN44gDVPayNXJZMTu4IM33C7gXyGsyBZ8IIMeEu+Hs1OEfO4vCslPC8a1Em3+G8cQ886ZJ7MI8YFXU7AF14IIk3UIt5wSAgAi4tcOgb11wG3m9gt2hbLifLnPTyXony0M9wdBv7U+p6nrLx52dgXz/ccgRKeo69s8LlUTyRKvzhUee4/sawxbstp1C9ITp9y2jaYSLkkrDChj76Df5L3eDe3wjQOpIOIHE930XCGuri/o2oYuifZv6JbmfQNH7gAMa8p3MN/ CUdHWdz0 ziohgM8+0Zmf1XS08VNVucbPRV52oi7jts5/mM4YRkDbf76TZFKx2PeLtergxnKkYojIV4amfs4vMDezm26xQoTePkM2aDnRusSdsLgiunLB+RUNhVYpmw2jdv6llN3F69OK4lKzkYLJ1wX8jsnE43gQ8mUx1aUPa2IDOGDSFNeyg4QqqtUjbu+P0l6vRgebeYlzlihKnNp4T+DetVkbD/3zoLXG7MgmS9As++lk8IM1PuT5NgDhJEIGrDA4BqiRR0Go1sxWKxzF57/LAb4js16OJfI58M4ihxekkYCQ5FVMJRFtd8ohyW5ITrl3IvLbX/m0LEDB8T1j8h1Axj8aM5Sw30u44tWct9mKz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are at least five consumers of hotplug_memory_notifier that what they really are interested in is whether any numa node changed its state, e.g: going from being memory aware to becoming memoryless and vice versa. Implement a specific notifier for numa nodes when their state gets changed, and have those consumers that only care about numa node state changes use it. Signed-off-by: Oscar Salvador Reviewed-by: Harry Yoo Reviewed-by: Jonathan Cameron --- drivers/acpi/numa/hmat.c | 6 +- drivers/base/node.c | 19 +++++++ drivers/cxl/core/region.c | 14 ++--- drivers/cxl/cxl.h | 4 +- include/linux/memory.h | 38 ++++++++++++- kernel/cgroup/cpuset.c | 2 +- mm/memory-tiers.c | 8 +-- mm/memory_hotplug.c | 117 +++++++++++++++++++++----------------- mm/slub.c | 13 ++--- 9 files changed, 144 insertions(+), 77 deletions(-) diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index bfbb08b1e6af..d18f3efa2149 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -918,10 +918,10 @@ static int hmat_callback(struct notifier_block *self, unsigned long action, void *arg) { struct memory_target *target; - struct memory_notify *mnb = arg; + struct node_notify *mnb = arg; int pxm, nid = mnb->status_change_nid; - if (nid == NUMA_NO_NODE || action != MEM_ONLINE) + if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE) return NOTIFY_OK; pxm = node_to_pxm(nid); @@ -1074,7 +1074,7 @@ static __init int hmat_init(void) hmat_register_targets(); /* Keep the table and structures if the notifier may use them */ - if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI)) + if (hotplug_node_notifier(hmat_callback, HMAT_CALLBACK_PRI)) goto out_put; if (!hmat_set_default_dram_perf()) diff --git a/drivers/base/node.c b/drivers/base/node.c index 0ea653fa3433..182c71dfb5b8 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -110,6 +110,25 @@ static const struct attribute_group *node_access_node_groups[] = { NULL, }; +static BLOCKING_NOTIFIER_HEAD(node_chain); + +int register_node_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&node_chain, nb); +} +EXPORT_SYMBOL(register_node_notifier); + +void unregister_node_notifier(struct notifier_block *nb) +{ + blocking_notifier_chain_unregister(&node_chain, nb); +} +EXPORT_SYMBOL(unregister_node_notifier); + +int node_notify(unsigned long val, void *v) +{ + return blocking_notifier_call_chain(&node_chain, val, v); +} + static void node_remove_accesses(struct node *node) { struct node_access_nodes *c, *cnext; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e8d11a988fd9..7d187088f557 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2409,12 +2409,12 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb, unsigned long action, void *arg) { struct cxl_region *cxlr = container_of(nb, struct cxl_region, - memory_notifier); - struct memory_notify *mnb = arg; + node_notifier); + struct node_notify *mnb = arg; int nid = mnb->status_change_nid; int region_nid; - if (nid == NUMA_NO_NODE || action != MEM_ONLINE) + if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE) return NOTIFY_DONE; /* @@ -3388,7 +3388,7 @@ static void shutdown_notifiers(void *_cxlr) { struct cxl_region *cxlr = _cxlr; - unregister_memory_notifier(&cxlr->memory_notifier); + unregister_node_notifier(&cxlr->node_notifier); unregister_mt_adistance_algorithm(&cxlr->adist_notifier); } @@ -3427,9 +3427,9 @@ static int cxl_region_probe(struct device *dev) if (rc) return rc; - cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; - cxlr->memory_notifier.priority = CXL_CALLBACK_PRI; - register_memory_notifier(&cxlr->memory_notifier); + cxlr->node_notifier.notifier_call = cxl_region_perf_attrs_callback; + cxlr->node_notifier.priority = CXL_CALLBACK_PRI; + register_node_notifier(&cxlr->node_notifier); cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance; cxlr->adist_notifier.priority = 100; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index bbbaa0d0a670..d4c9a499de7a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -532,7 +532,7 @@ struct cxl_region_params { * @flags: Region state flags * @params: active + config params for the region * @coord: QoS access coordinates for the region - * @memory_notifier: notifier for setting the access coordinates to node + * @node_notifier: notifier for setting the access coordinates to node * @adist_notifier: notifier for calculating the abstract distance of node */ struct cxl_region { @@ -545,7 +545,7 @@ struct cxl_region { unsigned long flags; struct cxl_region_params params; struct access_coordinate coord[ACCESS_COORDINATE_MAX]; - struct notifier_block memory_notifier; + struct notifier_block node_notifier; struct notifier_block adist_notifier; }; diff --git a/include/linux/memory.h b/include/linux/memory.h index 12daa6ec7d09..a5b8068cf182 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -99,6 +99,14 @@ int set_memory_block_size_order(unsigned int order); #define MEM_PREPARE_ONLINE (1<<6) #define MEM_FINISH_OFFLINE (1<<7) +/* These states are used for numa node notifiers */ +#define NODE_BECOMING_MEM_AWARE (1<<0) +#define NODE_BECAME_MEM_AWARE (1<<1) +#define NODE_BECOMING_MEMORYLESS (1<<2) +#define NODE_BECAME_MEMORYLESS (1<<3) +#define NODE_CANCEL_MEM_AWARE (1<<4) +#define NODE_CANCEL_MEMORYLESS (1<<5) + struct memory_notify { /* * The altmap_start_pfn and altmap_nr_pages fields are designated for @@ -109,7 +117,10 @@ struct memory_notify { unsigned long altmap_nr_pages; unsigned long start_pfn; unsigned long nr_pages; - int status_change_nid_normal; + int status_change_nid; +}; + +struct node_notify { int status_change_nid; }; @@ -149,15 +160,34 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri) { return 0; } + +static inline int register_node_notifier(struct notifier_block *nb) +{ + return 0; +} +static inline void unregister_node_notifier(struct notifier_block *nb) +{ +} +static inline int node_notify(unsigned long val, void *v) +{ + return 0; +} +static inline int hotplug_node_notifier(notifier_fn_t fn, int pri) +{ + return 0; +} #else /* CONFIG_MEMORY_HOTPLUG */ extern int register_memory_notifier(struct notifier_block *nb); +extern int register_node_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); +extern void unregister_node_notifier(struct notifier_block *nb); int create_memory_block_devices(unsigned long start, unsigned long size, struct vmem_altmap *altmap, struct memory_group *group); void remove_memory_block_devices(unsigned long start, unsigned long size); extern void memory_dev_init(void); extern int memory_notify(unsigned long val, void *v); +extern int node_notify(unsigned long val, void *v); extern struct memory_block *find_memory_block(unsigned long section_nr); typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *); extern int walk_memory_blocks(unsigned long start, unsigned long size, @@ -177,6 +207,12 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func, register_memory_notifier(&fn##_mem_nb); \ }) +#define hotplug_node_notifier(fn, pri) ({ \ + static __meminitdata struct notifier_block fn##_node_nb =\ + { .notifier_call = fn, .priority = pri };\ + register_node_notifier(&fn##_node_nb); \ +}) + #ifdef CONFIG_NUMA void memory_block_add_nid(struct memory_block *mem, int nid, enum meminit_context context); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 39c1fc643d77..3323cf2124bf 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3938,7 +3938,7 @@ void __init cpuset_init_smp(void) cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask); top_cpuset.effective_mems = node_states[N_MEMORY]; - hotplug_memory_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI); + hotplug_node_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI); cpuset_migrate_mm_wq = alloc_ordered_workqueue("cpuset_migrate_mm", 0); BUG_ON(!cpuset_migrate_mm_wq); diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index fc14fe53e9b7..dfe6c28c8352 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -872,7 +872,7 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self, unsigned long action, void *_arg) { struct memory_tier *memtier; - struct memory_notify *arg = _arg; + struct node_notify *arg = _arg; /* * Only update the node migration order when a node is @@ -882,13 +882,13 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self, return notifier_from_errno(0); switch (action) { - case MEM_OFFLINE: + case NODE_BECAME_MEMORYLESS: mutex_lock(&memory_tier_lock); if (clear_node_memory_tier(arg->status_change_nid)) establish_demotion_targets(); mutex_unlock(&memory_tier_lock); break; - case MEM_ONLINE: + case NODE_BECAME_MEM_AWARE: mutex_lock(&memory_tier_lock); memtier = set_node_memory_tier(arg->status_change_nid); if (!IS_ERR(memtier)) @@ -929,7 +929,7 @@ static int __init memory_tier_init(void) nodes_and(default_dram_nodes, node_states[N_MEMORY], node_states[N_CPU]); - hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); + hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); return 0; } subsys_initcall(memory_tier_init); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 8305483de38b..84248f2e36f8 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -701,24 +701,18 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages) /* check which state of node_states will be changed when online memory */ static void node_states_check_changes_online(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) + struct zone *zone, struct node_notify *arg) { int nid = zone_to_nid(zone); arg->status_change_nid = NUMA_NO_NODE; - arg->status_change_nid_normal = NUMA_NO_NODE; if (!node_state(nid, N_MEMORY)) arg->status_change_nid = nid; - if (zone_idx(zone) <= ZONE_NORMAL && !node_state(nid, N_NORMAL_MEMORY)) - arg->status_change_nid_normal = nid; } -static void node_states_set_node(int node, struct memory_notify *arg) +static void node_states_set_node(int node, struct node_notify *arg) { - if (arg->status_change_nid_normal >= 0) - node_set_state(node, N_NORMAL_MEMORY); - if (arg->status_change_nid >= 0) node_set_state(node, N_MEMORY); } @@ -1177,7 +1171,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, int need_zonelists_rebuild = 0; const int nid = zone_to_nid(zone); int ret; - struct memory_notify arg; + struct memory_notify mem_arg; + struct node_notify node_arg; + bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false; /* * {on,off}lining is constrained to full memory sections (or more @@ -1194,11 +1190,22 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, /* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); - arg.start_pfn = pfn; - arg.nr_pages = nr_pages; - node_states_check_changes_online(nr_pages, zone, &arg); + mem_arg.start_pfn = pfn; + mem_arg.nr_pages = nr_pages; + node_states_check_changes_online(nr_pages, zone, &node_arg); + + if (node_arg.status_change_nid >= 0) { + /* Node is becoming memory aware. Notify consumers */ + cancel_node_notifier_on_err = true; + ret = node_notify(NODE_BECOMING_MEM_AWARE, &node_arg); + ret = notifier_to_errno(ret); + if (ret) + goto failed_addition; + } - ret = memory_notify(MEM_GOING_ONLINE, &arg); + cancel_mem_notifier_on_err = true; + mem_arg.status_change_nid = node_arg.status_change_nid; + ret = memory_notify(MEM_GOING_ONLINE, &mem_arg); ret = notifier_to_errno(ret); if (ret) goto failed_addition; @@ -1224,7 +1231,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, online_pages_range(pfn, nr_pages); adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); - node_states_set_node(nid, &arg); + node_states_set_node(nid, &node_arg); if (need_zonelists_rebuild) build_all_zonelists(NULL); @@ -1245,16 +1252,26 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, kswapd_run(nid); kcompactd_run(nid); + if (node_arg.status_change_nid >= 0) + /* + * Node went from memoryless to have memory. Notifiy interested + * consumers + */ + node_notify(NODE_BECAME_MEM_AWARE, &node_arg); + writeback_set_ratelimit(); - memory_notify(MEM_ONLINE, &arg); + memory_notify(MEM_ONLINE, &mem_arg); return 0; failed_addition: pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); - memory_notify(MEM_CANCEL_ONLINE, &arg); + if (cancel_mem_notifier_on_err) + memory_notify(MEM_CANCEL_ONLINE, &mem_arg); + if (cancel_node_notifier_on_err) + node_notify(NODE_CANCEL_MEM_AWARE, &node_arg); remove_pfn_range_from_zone(zone, pfn, nr_pages); return ret; } @@ -1892,48 +1909,29 @@ early_param("movable_node", cmdline_parse_movable_node); /* check which state of node_states will be changed when offline memory */ static void node_states_check_changes_offline(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) + struct zone *zone, struct node_notify *arg) { struct pglist_data *pgdat = zone->zone_pgdat; unsigned long present_pages = 0; enum zone_type zt; arg->status_change_nid = NUMA_NO_NODE; - arg->status_change_nid_normal = NUMA_NO_NODE; /* - * Check whether node_states[N_NORMAL_MEMORY] will be changed. - * If the memory to be offline is within the range - * [0..ZONE_NORMAL], and it is the last present memory there, - * the zones in that range will become empty after the offlining, - * thus we can determine that we need to clear the node from - * node_states[N_NORMAL_MEMORY]. + * Here we count the possible pages within the range [0..ZONE_MOVABLE]. + * If after having accounted all the pages, we see that the nr_pages to + * be offlined is over or equal to the accounted pages, we know that the + * node will become empty, ans so, we can clear it for N_MEMORY. */ - for (zt = 0; zt <= ZONE_NORMAL; zt++) + for (zt = 0; zt <= ZONE_MOVABLE; zt++) present_pages += pgdat->node_zones[zt].present_pages; - if (zone_idx(zone) <= ZONE_NORMAL && nr_pages >= present_pages) - arg->status_change_nid_normal = zone_to_nid(zone); - - /* - * We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM - * does not apply as we don't support 32bit. - * Here we count the possible pages from ZONE_MOVABLE. - * If after having accounted all the pages, we see that the nr_pages - * to be offlined is over or equal to the accounted pages, - * we know that the node will become empty, and so, we can clear - * it for N_MEMORY as well. - */ - present_pages += pgdat->node_zones[ZONE_MOVABLE].present_pages; if (nr_pages >= present_pages) arg->status_change_nid = zone_to_nid(zone); } -static void node_states_clear_node(int node, struct memory_notify *arg) +static void node_states_clear_node(int node, struct node_notify *arg) { - if (arg->status_change_nid_normal >= 0) - node_clear_state(node, N_NORMAL_MEMORY); - if (arg->status_change_nid >= 0) node_clear_state(node, N_MEMORY); } @@ -1957,7 +1955,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, unsigned long pfn, managed_pages, system_ram_pages = 0; const int node = zone_to_nid(zone); unsigned long flags; - struct memory_notify arg; + struct memory_notify mem_arg; + struct node_notify node_arg; + bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false; char *reason; int ret; @@ -2016,11 +2016,21 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, goto failed_removal_pcplists_disabled; } - arg.start_pfn = start_pfn; - arg.nr_pages = nr_pages; - node_states_check_changes_offline(nr_pages, zone, &arg); + mem_arg.start_pfn = start_pfn; + mem_arg.nr_pages = nr_pages; + node_states_check_changes_offline(nr_pages, zone, &node_arg); + + if (node_arg.status_change_nid >= 0) { + cancel_node_notifier_on_err = true; + ret = node_notify(NODE_BECOMING_MEMORYLESS, &node_arg); + ret = notifier_to_errno(ret); + if (ret) + goto failed_removal_isolated; + } - ret = memory_notify(MEM_GOING_OFFLINE, &arg); + cancel_mem_notifier_on_err = true; + mem_arg.status_change_nid = node_arg.status_change_nid; + ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg); ret = notifier_to_errno(ret); if (ret) { reason = "notifier failure"; @@ -2100,27 +2110,32 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, * Make sure to mark the node as memory-less before rebuilding the zone * list. Otherwise this node would still appear in the fallback lists. */ - node_states_clear_node(node, &arg); + node_states_clear_node(node, &node_arg); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } - if (arg.status_change_nid >= 0) { + if (node_arg.status_change_nid >= 0) { kcompactd_stop(node); kswapd_stop(node); + /* Node went memoryless. Notifiy interested consumers */ + node_notify(NODE_BECAME_MEMORYLESS, &node_arg); } writeback_set_ratelimit(); - memory_notify(MEM_OFFLINE, &arg); + memory_notify(MEM_OFFLINE, &mem_arg); remove_pfn_range_from_zone(zone, start_pfn, nr_pages); return 0; failed_removal_isolated: /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); - memory_notify(MEM_CANCEL_OFFLINE, &arg); + if (cancel_mem_notifier_on_err) + memory_notify(MEM_CANCEL_OFFLINE, &mem_arg); + if (cancel_node_notifier_on_err) + node_notify(NODE_CANCEL_MEMORYLESS, &node_arg); failed_removal_pcplists_disabled: lru_cache_enable(); zone_pcp_enable(zone); diff --git a/mm/slub.c b/mm/slub.c index e716b4cb2d0e..5c0f5d33b551 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -6168,8 +6168,8 @@ static int slab_mem_going_online_callback(void *arg) { struct kmem_cache_node *n; struct kmem_cache *s; - struct memory_notify *marg = arg; - int nid = marg->status_change_nid; + struct node_notify *narg = arg; + int nid = narg->status_change_nid; int ret = 0; /* @@ -6221,15 +6221,12 @@ static int slab_memory_callback(struct notifier_block *self, int ret = 0; switch (action) { - case MEM_GOING_ONLINE: + case NODE_BECOMING_MEM_AWARE: ret = slab_mem_going_online_callback(arg); break; - case MEM_GOING_OFFLINE: + case NODE_BECOMING_MEMORYLESS: ret = slab_mem_going_offline_callback(arg); break; - case MEM_ONLINE: - case MEM_CANCEL_OFFLINE: - break; } if (ret) ret = notifier_from_errno(ret); @@ -6304,7 +6301,7 @@ void __init kmem_cache_init(void) sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0); - hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI); + hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI); /* Able to allocate the per node structures */ slab_state = PARTIAL;