From patchwork Thu Sep 26 22:49:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13813708 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5964BCCFA17 for ; Thu, 26 Sep 2024 22:49:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C4EF6B009B; Thu, 26 Sep 2024 18:49:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 874526B009C; Thu, 26 Sep 2024 18:49:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 715496B009D; Thu, 26 Sep 2024 18:49:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 52D186B009B for ; Thu, 26 Sep 2024 18:49:52 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0558D8068B for ; Thu, 26 Sep 2024 22:49:51 +0000 (UTC) X-FDA: 82608383424.08.74B8A82 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf25.hostedemail.com (Postfix) with ESMTP id 588B4A0004 for ; Thu, 26 Sep 2024 22:49:50 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lyeZP9PW; spf=pass (imf25.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727390868; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8W7serO4JLPdkqgqPYOKj7uatBtc7JHjNBSXH5BrhFE=; b=ZOTI9QJGJm31lIfZ7nTDNATMw2Ev7Xis5vXF4ngHNevAOKKTFECKzqWWgVJ2jPQWpjWk7h q90BFn6jEz7SsoFk4kPncXNTBUQP/3MByMhdjVMTQEFjrASoZmHr2OkSobOAfLA0RHYhwx yEJ+g9ZHjjmggt3qNZDWxe4q2zLQ8no= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727390868; a=rsa-sha256; cv=none; b=YkQMlIy8y5btzatrddai1pxkyjELsXjxpzdh9mlvSG5WzS3LRoMtaSDbhZNW7zfLDIvhTZ BVs/ZstwRSnurzGuwB1Q7d8kwWgCiCWi7XX6bpwKT6lAcjD9eQDRYGKerbSTeB3wT5mNUH nwHge/qQH/mImwjx5b6vKmwEyyuqZjQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lyeZP9PW; spf=pass (imf25.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 4198E5C5696; Thu, 26 Sep 2024 22:49:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B97D7C4CECF; Thu, 26 Sep 2024 22:49:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727390988; bh=bE1OM/op4+e/YyAtntgPKGkBQ7JAfe3Vd2i3n+7WjN8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lyeZP9PWK4kodAkIS7yTyNhlbSAraYXy5xoRG+4jGTQgHQpPMG/SpnI8qwYvYWXHN 2n5SVdMmIQKyc28FIyY6jAWXZSFbIDk6s3gnVQeBr0LRLKKQlX9IM9vojf/i73iQqD CfCm3C+5NVgQBGW+P/HXiSTEfAolKpVXHCSG1YKaW1j4jcoJQOX3jab5KUWQOAmqUA CDsPW6j8jql2Dz6xVR+fFUfQx3zOCu0yD4gRCsNFASNmCFinYbw3AUOYceXe10QK0l fA7mxZEWvX0uj8fKH8Sz+UHIWEgZI1Vlifl6t85IucnruBVsouNjrOEnm0aK52cFE0 lwZINyZQN4z5w== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Uladzislau Rezki , Zqiang , rcu@vger.kernel.org Subject: [PATCH 13/20] kthread: Default affine kthread to its preferred NUMA node Date: Fri, 27 Sep 2024 00:49:01 +0200 Message-ID: <20240926224910.11106-14-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240926224910.11106-1-frederic@kernel.org> References: <20240926224910.11106-1-frederic@kernel.org> MIME-Version: 1.0 X-Stat-Signature: bb3ba1u7u5szruwk4114iqb5k5bpe59t X-Rspamd-Queue-Id: 588B4A0004 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1727390990-463764 X-HE-Meta: U2FsdGVkX184PcoXuh1DOsRlUYltewKXmAg/B3RcV6edTQY+p5XwlOxMj7xH48O4O0NJbDaFUGf90MO3k4yZgL5SCRxYLgOTg/KgGYpjdu1iRNtZ+Zl/HP4sW6EvpmLbvXwraIeoN2od8Ea9kSKIDiukpb6teyXVJ1u37bHf6Jlj2iLQh1x2nddW0Lni3j5lXUzhkRBaFyKiXahukRbBwc5ymTnnxXlB9vsMu43Vq07AS9YV/o1nOlZhilmJoP39djK5hBGWig9yuSUqAw7ZopikjlJbv/YAdG1hB55SImgvKufyux5cxSSOse/LxlG+lze18zGJatvQJ4SO2K9/xsbAMx2u9CEHEL2qVS6FPwmAYu7VbVl4ERPDJ8n2z2zgMWc6u+Rvm8HjTWit6IP5LwKip/yXiJnFZ56qNIquSfls7JtNinIzoRCrL0ERGQFjMxBCD6QW2nklIQLqMfyzHec/EItK1HQiohfY2Y1jMENkq7nM6X8inN0qp81Q7YeOgokO+BJjt77/KUEZW0sg1p6ipoQIXW/sG2Puo1rAFM6PLrhB9eUeutT8ujrBB2XT0F+ogN5YFvWR/Kby4VUP8/Uzy1ZO5xrABSiYzdnSYbA+TiMQudxJlwRSyc+WQsjkHr/pJX8CDjk9jmordHc2SgIsiSs0YFYTdhH0Vqhb6AGANBZ0YsqCjtvoTewzyRRN2SA0x45d3h/ZK4m+3F5sYr66kgJK9rQTDYyWmc9q1976LQ2tnFPZVD5yVdRZOlNHHJU638WmNDCTYy49nlJjaKXtQsBujzk5PbPS1BIMof442JondSOJD0hPTP5Wyus/7gZq9snuDRG7Ehe/ZzrwWeTBOqjBFuD/ViPOiJYfoA+mQzV19jF7BK16eT0OU80lE22ljo48yWVhqodAdl9Z7FpOaTeh6EoC+eHh0toiVqIB83qht8gGWjvg64yculGXjlUWxIA8aTlX+0dePbu zKttkyJo OvUTKHE3yzt0QptYqdbO6TzhulKiSZIhdCPrFw1Xae5Eb8E/qb2nX75cXF/yAzbZ7Jgn3sffFtJ/0rzrLZBRfFeeaaRHW/pfyLC3LvQxCsbiBdnum2jWz591Oua3eYxYE1CaQC+MzDh8wTVlV9DIvdEP3jZT008P5u3JkWEX5UpvqLyd5wqREss+U6z1MwOOWRQOnMImoZ3VINowWBv+VRD5rBK0L9+XL3cqMOJ3q8thyf/4DrxSTV3U8QCr47ks5a9ulXw0c0f0cKmdHPhAwlZpo25VpADhHx2yG6vew+WDybJaRVetAegL5IQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kthreads attached to a preferred NUMA node for their task structure allocation can also be assumed to run preferrably within that same node. A more precise affinity is usually notified by calling kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. For the others, a default affinity to the node is desired and sometimes implemented with more or less success when it comes to deal with hotplug events and nohz_full / CPU Isolation interactions: - kcompactd is affine to its node and handles hotplug but not CPU Isolation - kswapd is affine to its node and ignores hotplug and CPU Isolation - A bunch of drivers create their kthreads on a specific node and don't take care about affining further. Handle that default node affinity preference at the generic level instead, provided a kthread is created on an actual node and doesn't apply any specific affinity such as a given CPU or a custom cpumask to bind to before its first wake-up. This generic handling is aware of CPU hotplug events and CPU isolation such that: * When a housekeeping CPU goes up that is part of the node of a given kthread, the related task is re-affined to that own node if it was previously running on the default last resort online housekeeping set from other nodes. * When a housekeeping CPU goes down while it was part of the node of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the same node or, as a last resort, to all housekeepers from other nodes. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/cpuhotplug.h | 1 + kernel/kthread.c | 106 ++++++++++++++++++++++++++++++++++++- 2 files changed, 106 insertions(+), 1 deletion(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 2361ed4d2b15..228f27150a93 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -239,6 +239,7 @@ enum cpuhp_state { CPUHP_AP_WORKQUEUE_ONLINE, CPUHP_AP_RANDOM_ONLINE, CPUHP_AP_RCUTREE_ONLINE, + CPUHP_AP_KTHREADS_ONLINE, CPUHP_AP_BASE_CACHEINFO_ONLINE, CPUHP_AP_ONLINE_DYN, CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 40, diff --git a/kernel/kthread.c b/kernel/kthread.c index 1527a522cdd3..736276d313c2 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -35,6 +35,9 @@ static DEFINE_SPINLOCK(kthread_create_lock); static LIST_HEAD(kthread_create_list); struct task_struct *kthreadd_task; +static LIST_HEAD(kthreads_hotplug); +static DEFINE_MUTEX(kthreads_hotplug_lock); + struct kthread_create_info { /* Information passed to kthread() from kthreadd. */ @@ -53,6 +56,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + unsigned int node; int started; int result; int (*threadfn)(void *); @@ -64,6 +68,8 @@ struct kthread { #endif /* To store the full name if task comm is truncated. */ char *full_name; + struct task_struct *task; + struct list_head hotplug_node; }; enum KTHREAD_BITS { @@ -122,8 +128,11 @@ bool set_kthread_struct(struct task_struct *p) init_completion(&kthread->exited); init_completion(&kthread->parked); + INIT_LIST_HEAD(&kthread->hotplug_node); p->vfork_done = &kthread->exited; + kthread->task = p; + kthread->node = tsk_fork_get_node(current); p->worker_private = kthread; return true; } @@ -314,6 +323,11 @@ void __noreturn kthread_exit(long result) { struct kthread *kthread = to_kthread(current); kthread->result = result; + if (!list_empty(&kthread->hotplug_node)) { + mutex_lock(&kthreads_hotplug_lock); + list_del(&kthread->hotplug_node); + mutex_unlock(&kthreads_hotplug_lock); + } do_exit(0); } EXPORT_SYMBOL(kthread_exit); @@ -339,6 +353,48 @@ void __noreturn kthread_complete_and_exit(struct completion *comp, long code) } EXPORT_SYMBOL(kthread_complete_and_exit); +static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask) +{ + cpumask_and(cpumask, cpumask_of_node(kthread->node), + housekeeping_cpumask(HK_TYPE_KTHREAD)); + + if (cpumask_empty(cpumask)) + cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); +} + +static void kthread_affine_node(void) +{ + struct kthread *kthread = to_kthread(current); + cpumask_var_t affinity; + + WARN_ON_ONCE(kthread_is_per_cpu(current)); + + if (kthread->node == NUMA_NO_NODE) { + housekeeping_affine(current, HK_TYPE_RCU); + } else { + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) { + WARN_ON_ONCE(1); + return; + } + + mutex_lock(&kthreads_hotplug_lock); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + /* + * The node cpumask is racy when read from kthread() but: + * - a racing CPU going down will either fail on the subsequent + * call to set_cpus_allowed_ptr() or be migrated to housekeepers + * afterwards by the scheduler. + * - a racing CPU going up will be handled by kthreads_online_cpu() + */ + kthread_fetch_affinity(kthread, affinity); + set_cpus_allowed_ptr(current, affinity); + mutex_unlock(&kthreads_hotplug_lock); + + free_cpumask_var(affinity); + } +} + static int kthread(void *_create) { static const struct sched_param param = { .sched_priority = 0 }; @@ -369,7 +425,6 @@ static int kthread(void *_create) * back to default in case they have been changed. */ sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_UNINTERRUPTIBLE); @@ -385,6 +440,9 @@ static int kthread(void *_create) self->started = 1; + if (!(current->flags & PF_NO_SETAFFINITY)) + kthread_affine_node(); + ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); @@ -779,6 +837,52 @@ int kthreadd(void *unused) return 0; } +/* + * Re-affine kthreads according to their preferences + * and the newly online CPU. The CPU down part is handled + * by select_fallback_rq() which default re-affines to + * housekeepers in case the preferred affinity doesn't + * apply anymore. + */ +static int kthreads_online_cpu(unsigned int cpu) +{ + cpumask_var_t affinity; + struct kthread *k; + int ret; + + guard(mutex)(&kthreads_hotplug_lock); + + if (list_empty(&kthreads_hotplug)) + return 0; + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + ret = 0; + + list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { + if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || + kthread_is_per_cpu(k->task) || + k->node == NUMA_NO_NODE)) { + ret = -EINVAL; + continue; + } + kthread_fetch_affinity(k, affinity); + set_cpus_allowed_ptr(k->task, affinity); + } + + free_cpumask_var(affinity); + + return ret; +} + +static int kthreads_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online", + kthreads_online_cpu, NULL); +} +early_initcall(kthreads_init); + void __kthread_init_worker(struct kthread_worker *worker, const char *name, struct lock_class_key *key)