From patchwork Mon Sep 16 22:49:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13805807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B37CC3ABCB for ; Mon, 16 Sep 2024 22:50:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC2266B008C; Mon, 16 Sep 2024 18:50:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E72F46B0092; Mon, 16 Sep 2024 18:50:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEBC26B0093; Mon, 16 Sep 2024 18:50:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B0FD76B008C for ; Mon, 16 Sep 2024 18:50:09 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5FA93A013B for ; Mon, 16 Sep 2024 22:50:09 +0000 (UTC) X-FDA: 82572096138.11.99A38EC Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf30.hostedemail.com (Postfix) with ESMTP id BA7AA80016 for ; Mon, 16 Sep 2024 22:50:07 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=S51p9HZa; spf=pass (imf30.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726526897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zFjOW+MvS7lORSbyHwtg8Pp+tROSq2nGy2YTZo5Mggg=; b=La6F5t/rp0w5xswEV81ZsU/EiswKrXXrPUTJ3qL8sUsEGp2wABBjInhl96YieDWYtV+y54 wzHZe4q7f9llRSpJOjdDgiat7RX0Nz4R5EHwgeSmGqq2dDUmXd3sT/h7ouyEWWciwMeWIU 26lL/rPACoWU0M2Ery2ZjYvoyi98IxA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726526897; a=rsa-sha256; cv=none; b=Q7QzFJ+hmt2qgg174BfUTEX7ZS3ijqdjrmzu0EReDNUfu9I4ZcvWFi3bGkPlizzjp7VzIZ Vq/VKU9GxjoCge2Kr+njPmEuZa44aBs+fY6CrI3AE8DPwRb01X9seS8oFcUvLDEdEr1+gy 2JvF3TmE3OD7ftuZMrLAHQqRsbZIiuE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=S51p9HZa; spf=pass (imf30.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 55AE85C12B1; Mon, 16 Sep 2024 22:50:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24037C4CEC4; Mon, 16 Sep 2024 22:50:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726527006; bh=JA4X4dy4oru7fCID/2hM1LtEhfcBkOK7fx/W7lIKfx8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S51p9HZantjy015saAGW9URigenWXkheTkQ+MZrSa6aBz+VPUZZarWpliXuXnD5Aj IOY9EKnKwsbADPN7lV1wKNOq4tJ9lMFp4Nr4i4+9oJcmE9VUw35AYwIhFKvRuotS1H Tf7QEYBaqa2Fr1LDmmeyV5/F6xPtuwxspKk6MdK1AqEl6QMyWZ1HV6ZLlmmOqpDX1g LJKaj+MHZWmH/q9Rl/W34No0ub3GpwifhQRetbJ0aQRlKc+fenQ8Wg672Nq0JysVT7 K6j3Rul03wDWD/L4frWL0Us7YuVEXCuNNLbgCT3rSubRIhDJl8eKSc+z72NtYXNs4U LeSlKi31Nc0gQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Zqiang , rcu@vger.kernel.org Subject: [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Date: Tue, 17 Sep 2024 00:49:16 +0200 Message-ID: <20240916224925.20540-13-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240916224925.20540-1-frederic@kernel.org> References: <20240916224925.20540-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BA7AA80016 X-Stat-Signature: jswge8o41ctzruwo6iw6nojakmozb1ht X-HE-Tag: 1726527007-271521 X-HE-Meta: U2FsdGVkX1/O6ixWLyqGIZozl2kU74Ov4W8cU2ITTlFGQ2AK0DrUj7An4bMYekGtrdNur5vnJ0vRIdl29L/cA0TnKq9cFXGEllc/ca5n47RPY7+FjFQYQ1ng0emcs9H6TX1X/jVPypCkM7HF4MkYSkF0x6svSJAEXB1fT+iOOoB0vjgExWh9Osk6ME90/BMvt4nXNu1lBb7vGLA8+lMUl0T9F/mOd9og5Y/ujdXCKOA3ugK0bG4SUhUowqiJSPOVotyrxBPQq1VGqMnVcP5sJbv3C2xEkFXdeC+wxQK+mlgy8K7ZmcTnomTd/8Ghe48ppqH+oVKTKGpJvzzeSkymp9eoluHOXj43zwX5I1XoLtcDNNhXdD/neBNoNZwyRBg+nzZQ7EkfNVSFWbKijVSS/SlpFJ6R1wMno7pSaI3wJqQCZch6R33CEXB2WU5WZ6elZPAIUwR7BMYSPJ7jkz4oGLETkI/H2s//zusO3ykiN3bo2lcRMwMQViYjNJKerbNGR7HRdA1LphL5vSSh3CtVAewXkTrmR0C+Cgl3V7lptGtq273J1/xYX6+A9Ib8u4G3qF/4QQKcRHaDixwnidDTSwnT6Pqiv2JndvbXbJ1+HWbejtippUdJTD08qQMAYtAMbH8z9no4ezHTPUoQxkvI3bhKG9bAR8YYkBMOuT7zSi5ZwqqMRnciAoLbp/EfJTQlzo8kfKFC7XKwVrpLHBlFqD8Bm5WmdokRkIsOuvI4o1YoVa5YpZUT/IfidXnH5kRcj8IsaDlrXNXBOqhLAwnKJGNumyuuMOWFCrtvmvgEIrLtHmssfPybxSgEmubW6+mojf/27aXFBSiJclTUsV3f6N3ySbgH61DiGjWJXqfwXJcyrj5X2ngjYMdtE0A4geGBsyle2E1uctT4th6XWd5LVKKaEgmBqX7QsxUz4vEkSUMMx2aLDJZLrK4KqV+MHe0i9MNNLBQjwS95tYx2cvO G5x27/Zq G0ShRX8+bOimUjyl6DaSAkebD86BejiiU7OHBGDRXUym2i54LdulysBpHkdGoHWW1pWHHkbmAfMJCxlacNq8iVkSAjutbcSnQd9UHB6KHpTvJMrp0dltCrrIZ8491/mucbB44MhScpdaJVRTawkw18vT3caVafxu18zgnvDwmxWMYeEnN1uoxJ1Rzy4MS2JLVE/An0E5iwn12MMByGARCqtgS3uTdqIGno1lAJPXFLy5m8fDcMt5rEqZOchXo/s4BSisoEJpIT4H59klnHDhAq0mWWAxNdHOdj9Kh6RXoXfm2wG2TcVQAFtvRhA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kthreads attached to a preferred NUMA node for their task structure allocation can also be assumed to run preferrably within that same node. A more precise affinity is usually notified by calling kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. For the others, a default affinity to the node is desired and sometimes implemented with more or less success when it comes to deal with hotplug events and nohz_full / CPU Isolation interactions: - kcompactd is affine to its node and handles hotplug but not CPU Isolation - kswapd is affine to its node and ignores hotplug and CPU Isolation - A bunch of drivers create their kthreads on a specific node and don't take care about affining further. Handle that default node affinity preference at the generic level instead, provided a kthread is created on an actual node and doesn't apply any specific affinity such as a given CPU or a custom cpumask to bind to before its first wake-up. This generic handling is aware of CPU hotplug events and CPU isolation such that: * When a housekeeping CPU goes up and is part of the node of a given kthread, it is added to its applied affinity set (and possibly the default last resort online housekeeping set is removed from the set). * When a housekeeping CPU goes down while it was part of the node of a kthread, it is removed from the kthread's applied affinity. The last resort is to affine the kthread to all online housekeeping CPUs. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/cpuhotplug.h | 1 + kernel/kthread.c | 120 ++++++++++++++++++++++++++++++++++++- 2 files changed, 120 insertions(+), 1 deletion(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 9316c39260e0..89d852538b72 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -240,6 +240,7 @@ enum cpuhp_state { CPUHP_AP_WORKQUEUE_ONLINE, CPUHP_AP_RANDOM_ONLINE, CPUHP_AP_RCUTREE_ONLINE, + CPUHP_AP_KTHREADS_ONLINE, CPUHP_AP_BASE_CACHEINFO_ONLINE, CPUHP_AP_ONLINE_DYN, CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 40, diff --git a/kernel/kthread.c b/kernel/kthread.c index ecb719f54f7a..eee5925e7725 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -35,6 +35,10 @@ static DEFINE_SPINLOCK(kthread_create_lock); static LIST_HEAD(kthread_create_list); struct task_struct *kthreadd_task; +static struct cpumask kthread_online_mask; +static LIST_HEAD(kthreads_hotplug); +static DEFINE_MUTEX(kthreads_hotplug_lock); + struct kthread_create_info { /* Information passed to kthread() from kthreadd. */ @@ -53,6 +57,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + unsigned int node; int started; int result; int (*threadfn)(void *); @@ -64,6 +69,8 @@ struct kthread { #endif /* To store the full name if task comm is truncated. */ char *full_name; + struct task_struct *task; + struct list_head hotplug_node; }; enum KTHREAD_BITS { @@ -122,8 +129,11 @@ bool set_kthread_struct(struct task_struct *p) init_completion(&kthread->exited); init_completion(&kthread->parked); + INIT_LIST_HEAD(&kthread->hotplug_node); p->vfork_done = &kthread->exited; + kthread->task = p; + kthread->node = tsk_fork_get_node(current); p->worker_private = kthread; return true; } @@ -314,6 +324,13 @@ void __noreturn kthread_exit(long result) { struct kthread *kthread = to_kthread(current); kthread->result = result; + if (!list_empty(&kthread->hotplug_node)) { + mutex_lock(&kthreads_hotplug_lock); + list_del(&kthread->hotplug_node); + /* Make sure the kthread never gets re-affined globally */ + set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); + mutex_unlock(&kthreads_hotplug_lock); + } do_exit(0); } EXPORT_SYMBOL(kthread_exit); @@ -339,6 +356,45 @@ void __noreturn kthread_complete_and_exit(struct completion *comp, long code) } EXPORT_SYMBOL(kthread_complete_and_exit); +static void kthread_fetch_affinity(struct kthread *k, struct cpumask *mask) +{ + if (k->node == NUMA_NO_NODE) { + cpumask_copy(mask, housekeeping_cpumask(HK_TYPE_KTHREAD)); + } else { + /* + * The node cpumask is racy when read from kthread() but: + * - a racing CPU going down won't be present in kthread_online_mask + * - a racing CPU going up will be handled by kthreads_online_cpu() + */ + cpumask_and(mask, cpumask_of_node(k->node), &kthread_online_mask); + cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_KTHREAD)); + if (cpumask_empty(mask)) + cpumask_copy(mask, housekeeping_cpumask(HK_TYPE_KTHREAD)); + } +} + +static int kthread_affine_node(void) +{ + struct kthread *kthread = to_kthread(current); + cpumask_var_t affinity; + + WARN_ON_ONCE(kthread_is_per_cpu(current)); + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + mutex_lock(&kthreads_hotplug_lock); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + kthread_fetch_affinity(kthread, affinity); + set_cpus_allowed_ptr(current, affinity); + mutex_unlock(&kthreads_hotplug_lock); + + free_cpumask_var(affinity); + + return 0; +} + static int kthread(void *_create) { static const struct sched_param param = { .sched_priority = 0 }; @@ -369,7 +425,6 @@ static int kthread(void *_create) * back to default in case they have been changed. */ sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_UNINTERRUPTIBLE); @@ -385,6 +440,9 @@ static int kthread(void *_create) self->started = 1; + if (!(current->flags & PF_NO_SETAFFINITY)) + kthread_affine_node(); + ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); @@ -779,6 +837,66 @@ int kthreadd(void *unused) return 0; } +static int kthreads_hotplug_update(void) +{ + cpumask_var_t affinity; + struct kthread *k; + int err; + + if (list_empty(&kthreads_hotplug)) + return 0; + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + err = 0; + + list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { + if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || + kthread_is_per_cpu(k->task))) { + err = -EINVAL; + continue; + } + kthread_fetch_affinity(k, affinity); + set_cpus_allowed_ptr(k->task, affinity); + } + + free_cpumask_var(affinity); + + return err; +} + +static int kthreads_offline_cpu(unsigned int cpu) +{ + int ret = 0; + + mutex_lock(&kthreads_hotplug_lock); + cpumask_clear_cpu(cpu, &kthread_online_mask); + ret = kthreads_hotplug_update(); + mutex_unlock(&kthreads_hotplug_lock); + + return ret; +} + +static int kthreads_online_cpu(unsigned int cpu) +{ + int ret = 0; + + mutex_lock(&kthreads_hotplug_lock); + cpumask_set_cpu(cpu, &kthread_online_mask); + ret = kthreads_hotplug_update(); + mutex_unlock(&kthreads_hotplug_lock); + + return ret; +} + +static int kthreads_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online", + kthreads_online_cpu, kthreads_offline_cpu); +} +early_initcall(kthreads_init); + void __kthread_init_worker(struct kthread_worker *worker, const char *name, struct lock_class_key *key)