From patchwork Wed Dec 11 15:40:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13903662 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D635E77180 for ; Wed, 11 Dec 2024 15:41:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D1856B0092; Wed, 11 Dec 2024 10:41:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 980ED6B0093; Wed, 11 Dec 2024 10:41:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 821AC6B0095; Wed, 11 Dec 2024 10:41:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5EBFE6B0092 for ; Wed, 11 Dec 2024 10:41:13 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1688E4301E for ; Wed, 11 Dec 2024 15:41:13 +0000 (UTC) X-FDA: 82883091270.23.5D8506E Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id 941B5180006 for ; Wed, 11 Dec 2024 15:40:47 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Nc7sYBQH; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733931661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2C1DesTlD5IAw4irhlcOAfgC9IFycDCvt73b0GVaSrw=; b=lWc3q8nRM1uMmCEF8TB0afh9Dov7g9/QS/N3/c76PWriATt3/RxosUI2uOR2uugU/FVDuX YCNy/vIelgEGqmsJaJa5ZJYweKFYzqIt5NyYVNflKV53jyg5sKbVEnoxPUCffq+F9NFg80 I//z1hwsyHghaOcJNBDS0pdiurWz1LA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733931661; a=rsa-sha256; cv=none; b=V/Qts6MmVsLZh+fz4UTxKqbx+DmD40QdgWP8Ix5Hu5V7943Vf3hszqVRBFZpHAGfA2/y/o BuYvpQGctdag8AakolhETyGAWwsHlrR7guYj6enlsatH/ycvTmvqD3kPKnZ2vax3dWtI74 U9lCMnoEOt9loS7o4Ar13MHHvLP7ed4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Nc7sYBQH; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id B36E85C041E; Wed, 11 Dec 2024 15:40:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2DA0C4CEDD; Wed, 11 Dec 2024 15:41:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733931669; bh=2qfuTV6k+A8iDq1teARtVDQzBAoKchz+tMTAsMIX4+E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Nc7sYBQHWsneHm+RDymrniaj4mitPN7j7ea2YdKP9niGyvkp1gXpfHct8rHMa0YrG f6feb1+HULCsQIuaNNjRvMFIisKqI5UDVEynQ6L5LQorZ7Z4wG2lAGhay7NKO1y+cZ aGyNBpPV21LSFM/FPM+6Itrf7nXwHXtZoCi83bikfpV03LmnrcEb+3dQhBHJ9DVuOq F/EcAj7YWWA1devSxAGgVuSlW7rRIm4sjiHDLig0WcerfXUSeH/wrXyX9k3PN5c6Xb +6R/3najAYR04xWmYNvgUQ6yrWiHnlSeTdr+8FYOUhYju18hmYs9M/HtOtrTv8Z8Ck 1kcjrlunevEMA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Zqiang , rcu@vger.kernel.org, Uladzislau Rezki Subject: [PATCH 11/19] kthread: Make sure kthread hasn't started while binding it Date: Wed, 11 Dec 2024 16:40:24 +0100 Message-ID: <20241211154035.75565-12-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241211154035.75565-1-frederic@kernel.org> References: <20241211154035.75565-1-frederic@kernel.org> MIME-Version: 1.0 X-Stat-Signature: 84p8i34o78qd3nboyrkj5cwo8zwu6xy6 X-Rspamd-Queue-Id: 941B5180006 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1733931647-759300 X-HE-Meta: U2FsdGVkX1/4jwOu2+AWvO1YXc1N06fE5T9ZVJ2ds4yg+C5YTa6Xg7yuSaTlcy7ZmbXiwMuBytNbJVLEL2tp2NdcOdlkZRxVNTrOtnT0y9NQNLha6T+gftXqwgj91eQrerfLOOTqO4KAF/TNgMi9jSEdOtCPjVWtdrctfIirEba8JKDyhYgpBYrkkesLZwAzw5GcBGpCsuhspr775aNodZrmLKoLt2CmRSlXRky21gL2Nad7OzKXLfikMDn+SUxEScFVojeJxoyqe+s4ZdlE29KzMPCvA30Rw1Q2Fzw39uQENFqDjNTAoKO8Mry6UEXGiybMn3os2gZyBM66/LYK3e8NAeZYZDOmGXqjLq6mq4BqmkF27xKQtMyNBKCku3umY/mairqwThof+YS4FKMevSvhhbO3uCHor1ULuPhNJJpyIn2E4YOauWNMjHj2l8ngDLOtW9UybujExQaHJM0WFnOo3B07X4SPaqmvSHwKRqcDyFBPYQb8N+7B0UkkIW2dtQd5pmrKlLX9tl+/dHZjikzUTxbasFyi7Yc8sOXsTTDfEkjOldFXYkbxImTzq9wvsSk+oQIAfsaw0dyPLWZU429v6RPjew4036EkP6t5Pinx/pwtayms3Myc+y/2h8rE1lZc1uT47DmQXE+A9gI/KvuE7pgeiCGaWsrokRHBFEOjkNFrjKTTOs1L40lqSwtzXD9Pp29VfjNHZBfZgADeN9Cr5rh7wHN8oQRAQujgLJ61/+412N5UevKcgNIAhtW9/aPNn0nWpCJ+oK2P21ttfwcN0hgI5shuNhxRDP1UlBZHHDNFxMyPKSAiGVV5D93qx4UvbWdRkQr9bOcZTW/thvgP3cjPsiwgJ+mcawbn1xDu0ONRmR2qE4UV61JOy+sN+s/VMokdlgOPYzxrHTLAuE2p77fhH2bSbNKuXKHaaY4WnpxaigcgvBpLZY7mq0m5xg9XxH7qVkvtcPUNUWG rYGlrb5C wESpsuSXxODDK4LqQt1VhD3KYwYHSvU793fa0CKa1swLbuFdkrp2cPMrhyVaNSb6i65x58zsnWQh7m3pqER5/sHsuROXcfD2IOdkeKbb8/8OJG3hOX7myp9xC3x2RwBtnUNRp8dDQi5l51onX5eaKa25WZXNmEGCd5R90TZWh0fW9KyqsPZnGNYPCxCr0oGNPGuwawtDtSuy/CDwM+PJYN4fhPAAnKOzqPrROciA8fBbffd3hxOEcGTV48ZpmoQw6Kcqf1cS3sGnalHT7RldYvQd1aD7FpJX26BXOHlbPz5SmBqP/n/dqUWtx6g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Make sure the kthread is sleeping in the schedule_preempt_disabled() call before calling its handler when kthread_bind[_mask]() is called on it. This provides a sanity check verifying that the task is not randomly blocked later at some point within its function handler, in which case it could be just concurrently awaken, leaving the call to do_set_cpus_allowed() without any effect until the next voluntary sleep. Rely on the wake-up ordering to ensure that the newly introduced "started" field returns the expected value: TASK A TASK B ------ ------ READ kthread->started wake_up_process(B) rq_lock() ... rq_unlock() // RELEASE schedule() rq_lock() // ACQUIRE // schedule task B rq_unlock() WRITE kthread->started Similarly, writing kthread->started before subsequent voluntary sleeps will be visible after calling wait_task_inactive() in __kthread_bind_mask(), reporting potential misuse of the API. Upcoming patches will make further use of this facility. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index a5ac612b1609..b6f9ce475a4f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -53,6 +53,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + int started; int result; int (*threadfn)(void *); void *data; @@ -382,6 +383,8 @@ static int kthread(void *_create) schedule_preempt_disabled(); preempt_enable(); + self->started = 1; + ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); @@ -540,7 +543,9 @@ static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask) { + struct kthread *kthread = to_kthread(p); __kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE); + WARN_ON_ONCE(kthread->started); } /** @@ -554,7 +559,9 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask) */ void kthread_bind(struct task_struct *p, unsigned int cpu) { + struct kthread *kthread = to_kthread(p); __kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE); + WARN_ON_ONCE(kthread->started); } EXPORT_SYMBOL(kthread_bind); From patchwork Wed Dec 11 15:40:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13903663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBE79E77180 for ; Wed, 11 Dec 2024 15:41:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53D6F6B0095; Wed, 11 Dec 2024 10:41:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EF236B0098; Wed, 11 Dec 2024 10:41:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 367F86B0099; Wed, 11 Dec 2024 10:41:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 11B566B0095 for ; Wed, 11 Dec 2024 10:41:17 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8064CAF310 for ; Wed, 11 Dec 2024 15:41:16 +0000 (UTC) X-FDA: 82883091942.14.3C56319 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id C56B418000F for ; Wed, 11 Dec 2024 15:41:11 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cHTxkkVn; spf=pass (imf24.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733931650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=onQI45ii58UNueXOSDyL3wz/jz1jpp4THZgMbgek52o=; b=8FVocDAxw0VqZ4z7yx7kO8R2Jhp+YJwgMuBGA2BP2MsDkOYYuDfCPNvL6ci/a8XuZ76SGl O2jytIR1UJO0lVqGpcZVilzrmFATdCxdrXVP5T3wz5qhsj2ytRi0V/DQeWJoinECDBmDEV YTrbsT+EZY2xlPTnBkYaD+CIQviZFEs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cHTxkkVn; spf=pass (imf24.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733931650; a=rsa-sha256; cv=none; b=ndeivQpvEbMDqX0xP4ZzmXLvCuldLtR9FV2e2dNBGf0sKt8g6EuTWikXLIDcHGE1IrBDca L2bY7inPlrDUumuX4QcqFH9mjvx5l5MSfMcN0Ox7MnkhPXiC61U+JO7u9KP3yAjMKMKUAf iaHxY+PAhwTwUdMvBWgr4dSRrmrJsUY= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 7BCBD5C041E; Wed, 11 Dec 2024 15:40:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4FD8BC4CEDE; Wed, 11 Dec 2024 15:41:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733931673; bh=Zl9E3LG4ePzB3S1zktQqMaaRMvPxhBEbf67CVv5m/KE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cHTxkkVnesefVanc4+eZ3LWwfNJWnYX7m658TiDwNSPmtX/3AnFpkbX4TrJ/ImQy4 Hu4WVheUhsjZawEfpaSNjDmofDHG5tL8WAJgb9pFg0vauoXffLb/bjDBeXVP/ymgrU KJyu5XUCrwo0usqm95LRig9tq26ZTpe/WlQb8Wbf2RsIkj4GLJlDVQTjZfIPIL+cZ9 tX08hMyrBYYVU32XsWlR3jaEdJkBbRiQkecnHN9+b9UVv/zxjT7HbRNknY0gLixVD+ Bz66hGcuGvyv1oHODsCFMOSYEwWtYO3HEfm+O5JppuSLhmsIA0tbIF2YFQtCWwckYE o1Iw05m/KSfCg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Uladzislau Rezki , Zqiang , rcu@vger.kernel.org Subject: [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Date: Wed, 11 Dec 2024 16:40:25 +0100 Message-ID: <20241211154035.75565-13-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241211154035.75565-1-frederic@kernel.org> References: <20241211154035.75565-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C56B418000F X-Rspam-User: X-Stat-Signature: 3kgk1qnopwzueobbqoe47pyk3yqzi5ue X-HE-Tag: 1733931671-843804 X-HE-Meta: U2FsdGVkX1+Lj2hMdCft+yGyWe0ivwLhEKCiFTeRa7PfbizG1UizON1ZxV0CnDlG/ELwHfvAziq6zeyO89XCPeJ5mwvAreDmp6teYXEaM5dcoYdCMMG8z6Vt9Oz0XECv6sW9WVECbqfxl6EAQe9I72X1NTUs7q95zr+chz4DjCSrb5LGYQJ7BLmqILyAKKcVxv6wQ/G1xJ77D3QuWhci31/5+W0OC5GlOMGoSeKGj24cMIaxBcBkGqw9TtyzTyge2+p3jESIxNb+RkkPzCkiArrdfMqlvDfHRHWCWjo3I86n9c3oAnG8SV96vyZsGoIM+obRQsDj2J2f33oQUgzlpX7NQBqmF4IYGUw5JZHLjv1QMJyhlkTlcyg9PyLPBndHv4vxzwJS2gutZLawg6mGxT98nfui85BPipVXaJemUd7+e4xHzJBx/ucYT0qGxX0JGmLsOaIqsLUZloa0JYikzTYvE5DWI0AglVYCLM5muw/ScouCm/uyeW/nJZotS+3iCcSwu1Azrih9hrsVtefB1ld320jd4ddqRURNkqC1xyFzz0hHyqv+ApImVN5CYOSeys4looZOceveor+DCoVImnBZkYh0D5utvgPseiwQFaz/1vVbNYcXccLlmvvzBZORcEcTfKARf0pSdbkRIoM4JBI71YqrXMSdKHs4MRrDXldsQXz6hhKelxangZ5QOkoZsJjdyRkznZqJ/eBRAdjwD/g8kbkFOuGrP8GNGGQCbfbF+S48733//zMewVqfv7yNVa5Gv863hm+lkogbS2OKNiMyYp7JIXxaNPkFlAmbVXa5AyU63K+vLLk+O/keK8IMCTH+fPBdFuK+Q4CYQ1rdBWg9rNRl+/KkPOM72ypvBqsFK1yTxw6zN414lc+Kz/WwCAHMMGVBNmvOuYaQj4y6809eXt/BV0WpxTJRfI++3uL+Ypv5CBdLJVsf5zbX7cu6lr6HUEbqCB/J+IR5Le4 lfeNFNPK wm/Ywf0JXILm2QXjrLtB02Oj4pw9XIyu1Lo3Hf9FFt7IlncMOOl5MwPc4kRSd0YQ+VXnYqJeBqUjpW29oGNNvz8zxIk73JhqM5ZPeV7rN9hCC2HjX3f1VJEgsWVQi8a+WPXwUNFEpsygvpGC82J8s2eQvSyh/0GppaGN2fQJ4Z6OmEexwnYoJzwSAyE9bdbYwvBLDvu+kG1ezYlbHuhuYbV8Ciz6Auy9vtcu9I8hzgaRheZGWlskX7F6H0By/tk8alYJVr2ENR0Cgea1D0bg0ZLhJtOqfv0HJ8LKcc1tXsMxm6BW8XzDlmETyUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kthreads attached to a preferred NUMA node for their task structure allocation can also be assumed to run preferrably within that same node. A more precise affinity is usually notified by calling kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. For the others, a default affinity to the node is desired and sometimes implemented with more or less success when it comes to deal with hotplug events and nohz_full / CPU Isolation interactions: - kcompactd is affine to its node and handles hotplug but not CPU Isolation - kswapd is affine to its node and ignores hotplug and CPU Isolation - A bunch of drivers create their kthreads on a specific node and don't take care about affining further. Handle that default node affinity preference at the generic level instead, provided a kthread is created on an actual node and doesn't apply any specific affinity such as a given CPU or a custom cpumask to bind to before its first wake-up. This generic handling is aware of CPU hotplug events and CPU isolation such that: * When a housekeeping CPU goes up that is part of the node of a given kthread, the related task is re-affined to that own node if it was previously running on the default last resort online housekeeping set from other nodes. * When a housekeeping CPU goes down while it was part of the node of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the same node or, as a last resort, to all housekeepers from other nodes. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/cpuhotplug.h | 1 + kernel/kthread.c | 106 ++++++++++++++++++++++++++++++++++++- 2 files changed, 106 insertions(+), 1 deletion(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index a04b73c40173..6cc5e484547c 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -240,6 +240,7 @@ enum cpuhp_state { CPUHP_AP_WORKQUEUE_ONLINE, CPUHP_AP_RANDOM_ONLINE, CPUHP_AP_RCUTREE_ONLINE, + CPUHP_AP_KTHREADS_ONLINE, CPUHP_AP_BASE_CACHEINFO_ONLINE, CPUHP_AP_ONLINE_DYN, CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 40, diff --git a/kernel/kthread.c b/kernel/kthread.c index b6f9ce475a4f..3394ff024a5a 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -35,6 +35,9 @@ static DEFINE_SPINLOCK(kthread_create_lock); static LIST_HEAD(kthread_create_list); struct task_struct *kthreadd_task; +static LIST_HEAD(kthreads_hotplug); +static DEFINE_MUTEX(kthreads_hotplug_lock); + struct kthread_create_info { /* Information passed to kthread() from kthreadd. */ @@ -53,6 +56,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + unsigned int node; int started; int result; int (*threadfn)(void *); @@ -64,6 +68,8 @@ struct kthread { #endif /* To store the full name if task comm is truncated. */ char *full_name; + struct task_struct *task; + struct list_head hotplug_node; }; enum KTHREAD_BITS { @@ -122,8 +128,11 @@ bool set_kthread_struct(struct task_struct *p) init_completion(&kthread->exited); init_completion(&kthread->parked); + INIT_LIST_HEAD(&kthread->hotplug_node); p->vfork_done = &kthread->exited; + kthread->task = p; + kthread->node = tsk_fork_get_node(current); p->worker_private = kthread; return true; } @@ -314,6 +323,11 @@ void __noreturn kthread_exit(long result) { struct kthread *kthread = to_kthread(current); kthread->result = result; + if (!list_empty(&kthread->hotplug_node)) { + mutex_lock(&kthreads_hotplug_lock); + list_del(&kthread->hotplug_node); + mutex_unlock(&kthreads_hotplug_lock); + } do_exit(0); } EXPORT_SYMBOL(kthread_exit); @@ -339,6 +353,48 @@ void __noreturn kthread_complete_and_exit(struct completion *comp, long code) } EXPORT_SYMBOL(kthread_complete_and_exit); +static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask) +{ + cpumask_and(cpumask, cpumask_of_node(kthread->node), + housekeeping_cpumask(HK_TYPE_KTHREAD)); + + if (cpumask_empty(cpumask)) + cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); +} + +static void kthread_affine_node(void) +{ + struct kthread *kthread = to_kthread(current); + cpumask_var_t affinity; + + WARN_ON_ONCE(kthread_is_per_cpu(current)); + + if (kthread->node == NUMA_NO_NODE) { + housekeeping_affine(current, HK_TYPE_KTHREAD); + } else { + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) { + WARN_ON_ONCE(1); + return; + } + + mutex_lock(&kthreads_hotplug_lock); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + /* + * The node cpumask is racy when read from kthread() but: + * - a racing CPU going down will either fail on the subsequent + * call to set_cpus_allowed_ptr() or be migrated to housekeepers + * afterwards by the scheduler. + * - a racing CPU going up will be handled by kthreads_online_cpu() + */ + kthread_fetch_affinity(kthread, affinity); + set_cpus_allowed_ptr(current, affinity); + mutex_unlock(&kthreads_hotplug_lock); + + free_cpumask_var(affinity); + } +} + static int kthread(void *_create) { static const struct sched_param param = { .sched_priority = 0 }; @@ -369,7 +425,6 @@ static int kthread(void *_create) * back to default in case they have been changed. */ sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_UNINTERRUPTIBLE); @@ -385,6 +440,9 @@ static int kthread(void *_create) self->started = 1; + if (!(current->flags & PF_NO_SETAFFINITY)) + kthread_affine_node(); + ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); @@ -781,6 +839,52 @@ int kthreadd(void *unused) return 0; } +/* + * Re-affine kthreads according to their preferences + * and the newly online CPU. The CPU down part is handled + * by select_fallback_rq() which default re-affines to + * housekeepers in case the preferred affinity doesn't + * apply anymore. + */ +static int kthreads_online_cpu(unsigned int cpu) +{ + cpumask_var_t affinity; + struct kthread *k; + int ret; + + guard(mutex)(&kthreads_hotplug_lock); + + if (list_empty(&kthreads_hotplug)) + return 0; + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + ret = 0; + + list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { + if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || + kthread_is_per_cpu(k->task) || + k->node == NUMA_NO_NODE)) { + ret = -EINVAL; + continue; + } + kthread_fetch_affinity(k, affinity); + set_cpus_allowed_ptr(k->task, affinity); + } + + free_cpumask_var(affinity); + + return ret; +} + +static int kthreads_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online", + kthreads_online_cpu, NULL); +} +early_initcall(kthreads_init); + void __kthread_init_worker(struct kthread_worker *worker, const char *name, struct lock_class_key *key) From patchwork Wed Dec 11 15:40:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13903664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 832CCE7717D for ; Wed, 11 Dec 2024 15:41:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F97A6B0099; Wed, 11 Dec 2024 10:41:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A7F86B009A; Wed, 11 Dec 2024 10:41:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8BAE6B009B; Wed, 11 Dec 2024 10:41:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C2AD86B0099 for ; Wed, 11 Dec 2024 10:41:19 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2ADCF140EDD for ; Wed, 11 Dec 2024 15:41:19 +0000 (UTC) X-FDA: 82883091984.28.BFEF461 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf14.hostedemail.com (Postfix) with ESMTP id CEEE0100013 for ; Wed, 11 Dec 2024 15:40:51 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oeq3o9e2; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf14.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733931655; a=rsa-sha256; cv=none; b=h6F7+O8fsfQrGrzj3I6Ou6JRjqiwgE8DYihtNjB4n4AZErxF/Qz85tuxm6Zjjg7pDxlSjh dzvx/coQHEa1h7D0VmAf0Y+M80WY0tiwKfCkSvD2VvgFw8T6xT/crQ8UUP6nUGcwIvzVsi ArjGinqq9KjXCLnHaoULMEX+reMHXqI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oeq3o9e2; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf14.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733931655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QMfzeDE6TO0jUOHFKLhVcYsWmCqJHbXGFSX6Ot+8q1Q=; b=fh8eYqttayga1woYNwxXMsg5tOrtbo28ca/v1H5C91bUYOWHrVSgPrx4m++uFb8v13mJR8 Fmw5BTSveEbmf/ACX9dODaUzme4Kcwj9tXnyMushFWyz/51u/HXgasnV9nR2Tap1NUu1yW UzUCSGrvLAIuIwXfaXm+dzFx4vDCn0M= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 2D9045C62D8; Wed, 11 Dec 2024 15:40:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 10F0FC4CEDD; Wed, 11 Dec 2024 15:41:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733931676; bh=7BtrEplH3qQrg9kZixHgQN9470zkhnd7tdHddmuV+9U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oeq3o9e2NAUqySEkDCKQkStfvImclCo4E0hGcJqRAGbj2T9C+ll7Pdcz8Laf6EWV/ vxe1zptvJkQR4sEtfJVWkpF/zzUUytTpDonSs/vH0n1eGMp2fuPaW3G4NDw5xJUDSR ArT/hV5/GrlRsYvSno3sTVFPCNF2P7/6dtu9NARfeNTGjNeKmIwCiBpdfA9PpHVrxG fK02SjiY7i2t2uV95xavpTQ064kY9vagR6FnOu21YVfXpUyUZqDxb8nYfmRP1v4KO5 Wcl4MBxrNQuY8p6UXqV1Zz9YvPUpnrau1q2WUb1dODjm2Dq0IosI4rjLfErpjc4sxV 88UdVL2bSSh4g== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Michal Hocko , Vlastimil Babka , Andrew Morton , linux-mm@kvack.org, Peter Zijlstra , Thomas Gleixner , Michal Hocko Subject: [PATCH 13/19] mm: Create/affine kcompactd to its preferred node Date: Wed, 11 Dec 2024 16:40:26 +0100 Message-ID: <20241211154035.75565-14-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241211154035.75565-1-frederic@kernel.org> References: <20241211154035.75565-1-frederic@kernel.org> MIME-Version: 1.0 X-Stat-Signature: ux334kspgxhm958jpfbbzmc8crxygxtw X-Rspam-User: X-Rspamd-Queue-Id: CEEE0100013 X-Rspamd-Server: rspam08 X-HE-Tag: 1733931651-822108 X-HE-Meta: U2FsdGVkX19zvcaebzAa+LhL1xweLxSjqKJ9WJO9Ah/86PLa+kXQsSpnE6duhCTY/UHPPFCPfe21+hem1PY9VVSoHonzXaONa+hQeJdHZmXCr9Xz4Wn3ODz6wmwRHedo2DE0gR2+vJEcPlGx6hU+ta43O6Ov3OlErAkxscNkxs7UdHL1gAK06OxcVa73di6Pq2RLvkJkpENolNKqo/hhABLTbInngLWI+BGaCuuxBA352e9jp34K/NlZoPPTpCOGkwdmidaUG5A08T70Kat0KzRXSCkoAqQrcnGG76ii8QGTiGNbLGohGVB3XDuGoDnrCQEeq10/8ZjzKUZ102JkWbaniw+dqlw3ry5wsmnYAM+SkFVet61ERan7RDACXCWtOI7W4vyyUdhnrZtZMfUqKU0Pux51eHVIbYZeJIF6bHaqxKJNuCWm0ORE5FzJck/bLNIHco1fC7wzAGdk1HSqho37TMukQ5MGprXCxVtyGxiTZwTO8MXvirRlk9k/PJlWaGj3JjCu+LcG4oAOS/TvI/CXuPPNEjBCgBtDuoRMXJtZWkMVjQ5bv4cN/Fdu+hxXaQsnMJZnZGTb3ZDcRBC8Kp5wP14QUB26U8JYnnNuf1MeFal5vYYRZhBG/uQVrMKugJKCIeEG06wISy0xB/oGfb1cqeJCahE1w2V6eWUXKCTJogAgN4BxQI+41mGCEjPdEBZ4BL+e1+thXQxwE71TsTS50cAs+6eiZaGex6MKsmmiT+hZNLLxNc34uUHpZZFsMvLE98yaplxJFXCyQ8UZ6sDtmw5ZVGJpEoPyfSlsPmlQENeuDFPgcyKesrvtBZQSRUkOCNZgrXXt9O0Ctd/aEdC/1chMhN+pte70tUNwUeITLiMgNWNWmjZFZ8whHMPSrDeSJkRx9J/tR0esDEia+8Xz1/adHUHT95R7AsBRsy6Ra60eB4VzOGjAwLxH1amQaGyTB7nUth+qS/BKyT4 pD8xFJZT JRCwqqAwph/Hs2s6vxBWBbyxZ4phJv6MJ/FlXGj11Em8AEQ8tOAEFzFLAsmAQa/XKuNGCBi5ZOOneLV5udJXeWh4FO6Qk3XnJA/iyBvlveZqeSFF2ipQX8GZYdysMGbWfQVqRnEs62DA97CAzdsYMrlEQ46iCeXPv7zpkBYlZk1ZWwwSkkfDD136WP1D8NcC4+vwKO8pA/l0Z/UqY3KYl5Y+UVu6SdOIZCfflPjRY/pPwhy9lx+jEDgB22h+fIkquklwYkTx/QcWcQlbaVRUQTPKS7NQePQInqN4Q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kcompactd is dedicated to a specific node. As such it wants to be preferrably affine to it, memory and CPUs-wise. Use the proper kthread API to achieve that. As a bonus it takes care of CPU-hotplug events and CPU-isolation on its behalf. Acked-by: Vlastimil Babka Acked-by: Michal Hocko Signed-off-by: Frederic Weisbecker --- mm/compaction.c | 43 +++---------------------------------------- 1 file changed, 3 insertions(+), 40 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a2b16b08cbbf..a31c0f5758cf 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -3154,15 +3154,9 @@ void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx) static int kcompactd(void *p) { pg_data_t *pgdat = (pg_data_t *)p; - struct task_struct *tsk = current; long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC); long timeout = default_timeout; - const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(tsk, cpumask); - set_freezable(); pgdat->kcompactd_max_order = 0; @@ -3233,10 +3227,12 @@ void __meminit kcompactd_run(int nid) if (pgdat->kcompactd) return; - pgdat->kcompactd = kthread_run(kcompactd, pgdat, "kcompactd%d", nid); + pgdat->kcompactd = kthread_create_on_node(kcompactd, pgdat, nid, "kcompactd%d", nid); if (IS_ERR(pgdat->kcompactd)) { pr_err("Failed to start kcompactd on node %d\n", nid); pgdat->kcompactd = NULL; + } else { + wake_up_process(pgdat->kcompactd); } } @@ -3254,30 +3250,6 @@ void __meminit kcompactd_stop(int nid) } } -/* - * It's optimal to keep kcompactd on the same CPUs as their memory, but - * not required for correctness. So if the last cpu in a node goes - * away, we get changed to run anywhere: as the first one comes back, - * restore their cpu bindings. - */ -static int kcompactd_cpu_online(unsigned int cpu) -{ - int nid; - - for_each_node_state(nid, N_MEMORY) { - pg_data_t *pgdat = NODE_DATA(nid); - const struct cpumask *mask; - - mask = cpumask_of_node(pgdat->node_id); - - if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids) - /* One of our CPUs online: restore mask */ - if (pgdat->kcompactd) - set_cpus_allowed_ptr(pgdat->kcompactd, mask); - } - return 0; -} - static int proc_dointvec_minmax_warn_RT_change(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -3337,15 +3309,6 @@ static struct ctl_table vm_compaction[] = { static int __init kcompactd_init(void) { int nid; - int ret; - - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, - "mm/compaction:online", - kcompactd_cpu_online, NULL); - if (ret < 0) { - pr_err("kcompactd: failed to register hotplug callbacks.\n"); - return ret; - } for_each_node_state(nid, N_MEMORY) kcompactd_run(nid); From patchwork Wed Dec 11 15:40:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13903665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D603E77182 for ; Wed, 11 Dec 2024 15:41:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCCAC6B009A; Wed, 11 Dec 2024 10:41:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D79566B009B; Wed, 11 Dec 2024 10:41:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCCAD6B009C; Wed, 11 Dec 2024 10:41:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 92D6B6B009A for ; Wed, 11 Dec 2024 10:41:21 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 47DF2121240 for ; Wed, 11 Dec 2024 15:41:21 +0000 (UTC) X-FDA: 82883091480.13.3E0347B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf21.hostedemail.com (Postfix) with ESMTP id 2F6E51C0016 for ; Wed, 11 Dec 2024 15:40:34 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=U9WYCmgx; spf=pass (imf21.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733931655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nqutTAx9KQqgBFO04UaNAUPs+5hccDbER7OZr21LVo8=; b=HXVaTGKsWtFT3qjLo+QDi0B+J8LBQa2GShvR6pDY7vg5UY/qooOlRWHjKj3Bmpzl2RQm5d aV4vYX5selIVGnM2hs7akw3NniSEVyqzWASzyDr6VK1X5QHtEKHgkNFZ8A+VnFKjsXfT+P QQroDTPIDqH7H9XZC0bdwIytLrIx3F4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733931655; a=rsa-sha256; cv=none; b=QYnSgWHdqNCVacetVhzX/LG8KJzguSCj8BWCwgqOpOmRLu5WaoHC2fblw9k0TnyJng88gw t4WgdHQnuT+CiUWWqqWn+YbTj/iOdMfqGZG9f5/3I6b3yaXsQWKT48XgyVd4QudHwLHc7v /TzVduG+1Inf4hYLMEFWOHaFFL4/8v8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=U9WYCmgx; spf=pass (imf21.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 1071F5C62D3; Wed, 11 Dec 2024 15:40:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 70927C4CED7; Wed, 11 Dec 2024 15:41:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733931678; bh=suTQZYxIg8YKQDgrkF/Nr6nKfzRdV+Dy0JZZ2b4bJTg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=U9WYCmgxo2g6VtQBPGF75toF0DAsYCTyprqdP4MP/NL2WIApZWX+8uC1qH9sQ1YFW v0y/ucaUndq8W/x+f1Q4cgbUtSg4zxiTNL7bcU+OEexhm+aII2nwqmXKZhl5fuWft/ oYJSOk3J2Vbvtq40KV4FImTLyFDGnLoTw64Bsu7uP2mqgYs/QG5SrvAfDzx0EOw8Uo GwE42btpRiL/wjR3Iob8c8NqKwooftte+J87DnPtcu9yEJxo7/ta3eqkvxkBV2MG+S 745vWEMlgyYxNhaKejN77vOr5PpisfcJz8TCIyRhlGVTtyiMzo/gUbSXP2/GBZWb1g zwFw/D33ikaZQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, Andrew Morton , Peter Zijlstra , Thomas Gleixner , Michal Hocko Subject: [PATCH 14/19] mm: Create/affine kswapd to its preferred node Date: Wed, 11 Dec 2024 16:40:27 +0100 Message-ID: <20241211154035.75565-15-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241211154035.75565-1-frederic@kernel.org> References: <20241211154035.75565-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2F6E51C0016 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: es8st5ee63gawn7yrknq8fykou4feu1k X-HE-Tag: 1733931634-844103 X-HE-Meta: U2FsdGVkX1/HUdHwM66YRO0/oZaNDjCFR0YmzGwf7+DjALeMqN5XHa0kQqObsRZdI2d/53UeKz12I16tH2kHDqY4TQaMDZ3x5jX5yYUa6SElxBQKxA5nMIEksPt0uN4OpjNU7Q4YKW3muLmS8WGoktnRpmXARbvXJfsl4AVgEPg5Q5DpBi+ku8dgvoKUKk0/5IUf/CX7JQ3BUljgEUvOeoC788gxAHRselGj+UbMKItU3/mZIHjklN5FSlbKF7NaPcfpQHrgCtin5cQ1wATwrFi8wEKfG9TAljHMXfjFvIrtJyj4ZFUEE+/Yldbiu92NBwbEPY6D3KizFzU4gHadpuZ3JsnsdHmA1V4S7R4/DvtvA1fanDkuN3mdQx8cbUkNzPEj2QZc/CwrCaMH/9ZWYSbTO4KW61by6RSvvQL4mdSRigPnnMh/r1ZznCq/EAXUTh474h0Asje7Rd/4c+Kyo/WrUw7TtRIG5LhkKkBz8eJQ50pW+ZEZvFclzySgKz/Loifb8a83cPMVGzGlhJeAt7l1vsgwK+Li1/q+cwcrk+sDiEDugtMo4v/FiT0Xu+SyCpT3ONoDD0i4dwtl/y0CxoakOpTkpyapqkNpMlFL8dVzwGcIbO8oThEvhd2tNpt65DiB2KrfV9yoa3YzUkyJjWqtAk3xt4zKpbKJDupGw2hVz5y0GT7I+CGLc8vm3Ug8JY5QZPDg8Thb9jS9Nw2cfWRZZoGMi/m8DzNNUei0Bn1BzYo66ilxoDbXiz+ZgTFtOqekybMS/3uXQbthsscnY9KmfExE0GyGwwRsBuKbbUHbzHhwUgUtwQhm2TDyIC7q5bCB2p/YkNq8LLfhpwgY6WUbG3SuDYWlZfdYmmTSuLqyzd5PoZ3gCJi6rW/jKR5yOfu94X+5eYdOehSAsmu9eghK+SKwkwdZsnHdlq9t91USB/V4SMSrbFOFx5/dfMLLv8YgofhZfUspVI99O1i aYhNRe4C qr6SqQDktCAv1gE2ahBr7Qfl8u3g+z/6r+7knyStPLa/5+p2VUXHwx3sNhtyUiDT6w8/F5QCMMIqxmhzFoHkrTwzSqXsibcVklvxiAVFBLWFm0xama6g3f8mZocoXKgTcrptUBkhX+4vfkq5cS1T2XRZBRqe50ttcuCMV6/8QvCW9X8RjViz/2pVqoAw8wlhjzRUwE7LHgyVZeUvwd14WOnXskqyZnYWpeQJzN+DNFQ2hC7uOFPrX0Mu6kM750VGcbTUU0KzuUPjQwM5L5fkJ0M7cu9kgTyFSbF85W8b2w/AMcfpYBeJ3qTmpkF4cQLhqWcOvI/zWzltj7Og= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: kswapd is dedicated to a specific node. As such it wants to be preferrably affine to it, memory and CPUs-wise. Use the proper kthread API to achieve that. As a bonus it takes care of CPU-hotplug events and CPU-isolation on its behalf. Acked-by: Vlastimil Babka Acked-by: Michal Hocko Signed-off-by: Frederic Weisbecker --- mm/vmscan.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 76378bc257e3..ec4eab23fb19 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -7182,10 +7182,6 @@ static int kswapd(void *p) unsigned int highest_zoneidx = MAX_NR_ZONES - 1; pg_data_t *pgdat = (pg_data_t *)p; struct task_struct *tsk = current; - const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(tsk, cpumask); /* * Tell the memory management that we're a "memory allocator", @@ -7354,13 +7350,15 @@ void __meminit kswapd_run(int nid) pgdat_kswapd_lock(pgdat); if (!pgdat->kswapd) { - pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid); + pgdat->kswapd = kthread_create_on_node(kswapd, pgdat, nid, "kswapd%d", nid); if (IS_ERR(pgdat->kswapd)) { /* failure at boot is fatal */ pr_err("Failed to start kswapd on node %d,ret=%ld\n", nid, PTR_ERR(pgdat->kswapd)); BUG_ON(system_state < SYSTEM_RUNNING); pgdat->kswapd = NULL; + } else { + wake_up_process(pgdat->kswapd); } } pgdat_kswapd_unlock(pgdat); From patchwork Wed Dec 11 15:40:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13903666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 114EEE7717D for ; Wed, 11 Dec 2024 15:41:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E8706B009D; Wed, 11 Dec 2024 10:41:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 897036B009E; Wed, 11 Dec 2024 10:41:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75E4C6B009F; Wed, 11 Dec 2024 10:41:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 536A06B009D for ; Wed, 11 Dec 2024 10:41:26 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0B3AFAF331 for ; Wed, 11 Dec 2024 15:41:26 +0000 (UTC) X-FDA: 82883091564.27.70D6BF5 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 202CD40026 for ; Wed, 11 Dec 2024 15:41:02 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sCvgr+yk; spf=pass (imf11.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733931665; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i/KBpelLVH+E3LXTlfnW5i5EhTJMAPzaoIzAqkWJmvE=; b=KYTsxj+EzsDWsLMKTYUndhuCmppaZWS48u2XVLI2cL4/g6xt88SCYE4cZvQaPSCc075+7v JQ23XLlgNieQoNdq7PiXUYLONKxySTwipwu+1gyGmzfYfwKaqMsbI/Hn1nUXXfWm+gxNNn 19EXPtx4Kje+QVvfiAnRvU1o5U8W67Q= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sCvgr+yk; spf=pass (imf11.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733931665; a=rsa-sha256; cv=none; b=SK3vZegeGEIvtj4RXfGoXG/lBYUZ6MBv7mNWc8oGYYvV3YDeOBvopKu4BpSdByIr9MJUOR qKgJLl1iO/hTsM5dC64XDwPgVKUf1ZrJflMkrvU37C64hRvWksgRrk4jQTZpNUAr2Gee8I bJwpbKY3FdOsSfSgxTuy0hruYmtLp2c= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3CED45C6335; Wed, 11 Dec 2024 15:40:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9FF31C4CEDE; Wed, 11 Dec 2024 15:41:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733931681; bh=wfa26vkbMku7mDNG8fnP6q/jOJVhfATHz/uHwN2VjCk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sCvgr+ykc+LJbnVbi4/G2Kcv66f8o+lWtDiEYD0vWiLjOhwMWqw1PW7qvoHMVme6q i17QdKAQUXlCCGJeSbefpaPSdF7G4LKkvuYkDNxOOP+8modoCX82sy1ej+nExxiqQD e5X7Mcv9g4D7gc4XeRCWD8Fsd44tIBn9TnbjNsAeXIx8xZsAmTI9ueqPgrPpnvf6CR TT98zadKaIevzUYs6pgENEUtiOq+ZAbQ45ZeITExF8L+xJTbvHM0s09tpFA8MMlWpI xKAckKYHpXsmknvNRa4HgElMoROrJxnd0OkfT4OpGpZh6IL4vfxgz/wmUn3DF6HARE sCD91CxFb17Ew== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Uladzislau Rezki , Zqiang , rcu@vger.kernel.org Subject: [PATCH 15/19] kthread: Implement preferred affinity Date: Wed, 11 Dec 2024 16:40:28 +0100 Message-ID: <20241211154035.75565-16-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241211154035.75565-1-frederic@kernel.org> References: <20241211154035.75565-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 202CD40026 X-Rspamd-Server: rspam12 X-Stat-Signature: wb54ew96ut6meb481anapiu7r81cgo1i X-Rspam-User: X-HE-Tag: 1733931662-395793 X-HE-Meta: U2FsdGVkX1/NztTD6KxfS0aWxI6yV55aMxeBdBP04fiy19CppXEvWtd1d5brXnN0XJmIc6jcv1b45LlbuVsvCJnXYYdCDtxpRqN0UzZ9C4gQlPsi02BDa391hXSxPxOAwhoFkHsqNf7gzoBiEIw4deFl5Sn0eT4wXLMJDrlCe0TsD6kAMai+lFDT7Z76lDoIDH37FSq2yTkzBhyEaapn4t6KXJk58seybzaVXoXKBgAN9/JZ0crWxyYGjXQ61CKii4Gs3Kr4qE8dft8gpN4/nzBA+ZrkP7q7EPkScAmaFDh5BJgazR45qut4FsFINFT0piwQg5cBIdSGpG5Ds+kPvCAABj8rKNaWpr+/4ZZ+HUTH0B1hTiKO9LivZKCd/8xe3ciYVXJqYVVerllS2lk8UKQxKW9NMVMCRnjFG1d/TTYzwS7XC8D2LczOmbUPRWeTb2vzGvaUvuah7b/9efd19IOq3EoY0KKYsAfWoqDTcZvAuG4UPU0SOCwyxyaQZg7wl6CcPJVCOBRDZsxh1txJTYYOoyZuHOkbeSCmft3IVnnoYGR5SwrWv3Ph9R+Dt9/4F34HqsnXVe/a3Z7oYaj2nDUWBpuu3zXfzpskuO5yBQFbxEKaMZwMFtFi2ukR4PuszPRPQBdTZzjh1Pbhv+NbkpaYD+GJdTviHvsXQsNlPcw2hyrGXrigLejJqLao5FzbkzoEP4uY6wMMz9f+S7WjBvkMqecjqSZjQAhODOXpL91Spf0G4LifcgAqch80ajsWgcQBrrRMkYPQ982ou7sOMFjvVJhTkFEfOa7S5mycgy1rnHQBWtHMR4OzDiV614ZCmizcgjyjtRuQr0xokmc3nlUqo6rs+rd97c6MXoh3bENtmt1n/jxuFVA+wkp+NwzqxnNpqlQBSijDZaycmMwrtsBCKdtH1c/+NPmvtQEPLTOJpCAlfWIoepsT969kJuQ5bHoHW+b7cxTH5boBmOn fm/HeYOc BGiL+U/o/BB6fvwZcuKrEK9BBDZD0QsZ6tvjcBcNDgMw1ujgDCelvQT+cS6zETIP62UyWVuXJEitAYhjPbin5QFwhPQwnb54Vv0eyGws4OhFO7BVm6Nj6S+diSg8GG86AJm5x04W7DexuNkkhPCJEQhNhFBvF6fncP7+KkV7Zojjjo9GZxKyZ8HeV/E2tPPbt128cybqSDf4WeFJzIOURuEnusvxoY9KQ207k88QPZDtwI+FZW7R1gs//n4a4viC6cxVfrpXZLbm7nx6fvfdxJzDwVfFcI4JsVD+o+fiByoUVhzqRkMII2Nyybg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Affining kthreads follow either of four existing different patterns: 1) Per-CPU kthreads must stay affine to a single CPU and never execute relevant code on any other CPU. This is currently handled by smpboot code which takes care of CPU-hotplug operations. 2) Kthreads that _have_ to be affine to a specific set of CPUs and can't run anywhere else. The affinity is set through kthread_bind_mask() and the subsystem takes care by itself to handle CPU-hotplug operations. 3) Kthreads that prefer to be affine to a specific NUMA node. That preferred affinity is applied by default when an actual node ID is passed on kthread creation, provided the kthread is not per-CPU and no call to kthread_bind_mask() has been issued before the first wake-up. 4) Similar to the previous point but kthreads have a preferred affinity different than a node. It is set manually like any other task and CPU-hotplug is supposed to be handled by the relevant subsystem so that the task is properly reaffined whenever a given CPU from the preferred affinity comes up. Also care must be taken so that the preferred affinity doesn't cross housekeeping cpumask boundaries. Provide a function to handle the last usecase, mostly reusing the current node default affinity infrastructure. kthread_affine_preferred() is introduced, to be used just like kthread_bind_mask(), right after kthread creation and before the first wake up. The kthread is then affine right away to the cpumask passed through the API if it has online housekeeping CPUs. Otherwise it will be affine to all online housekeeping CPUs as a last resort. As with node affinity, it is aware of CPU hotplug events such that: * When a housekeeping CPU goes up that is part of the preferred affinity of a given kthread, the related task is re-affined to that preferred affinity if it was previously running on the default last resort online housekeeping set. * When a housekeeping CPU goes down while it was part of the preferred affinity of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the preferred affinity or, as a last resort, to all housekeepers from other nodes. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/kthread.h | 1 + kernel/kthread.c | 68 ++++++++++++++++++++++++++++++++++++----- 2 files changed, 62 insertions(+), 7 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index b11f53c1ba2e..30209bdf83a2 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -85,6 +85,7 @@ kthread_run_on_cpu(int (*threadfn)(void *data), void *data, void free_kthread_struct(struct task_struct *k); void kthread_bind(struct task_struct *k, unsigned int cpu); void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask); +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask); int kthread_stop(struct task_struct *k); int kthread_stop_put(struct task_struct *k); bool kthread_should_stop(void); diff --git a/kernel/kthread.c b/kernel/kthread.c index 3394ff024a5a..6bb958a75a0b 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -70,6 +70,7 @@ struct kthread { char *full_name; struct task_struct *task; struct list_head hotplug_node; + struct cpumask *preferred_affinity; }; enum KTHREAD_BITS { @@ -327,6 +328,11 @@ void __noreturn kthread_exit(long result) mutex_lock(&kthreads_hotplug_lock); list_del(&kthread->hotplug_node); mutex_unlock(&kthreads_hotplug_lock); + + if (kthread->preferred_affinity) { + kfree(kthread->preferred_affinity); + kthread->preferred_affinity = NULL; + } } do_exit(0); } @@ -355,9 +361,17 @@ EXPORT_SYMBOL(kthread_complete_and_exit); static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask) { - cpumask_and(cpumask, cpumask_of_node(kthread->node), - housekeeping_cpumask(HK_TYPE_KTHREAD)); + const struct cpumask *pref; + if (kthread->preferred_affinity) { + pref = kthread->preferred_affinity; + } else { + if (WARN_ON_ONCE(kthread->node == NUMA_NO_NODE)) + return; + pref = cpumask_of_node(kthread->node); + } + + cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD)); if (cpumask_empty(cpumask)) cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); } @@ -440,7 +454,7 @@ static int kthread(void *_create) self->started = 1; - if (!(current->flags & PF_NO_SETAFFINITY)) + if (!(current->flags & PF_NO_SETAFFINITY) && !self->preferred_affinity) kthread_affine_node(); ret = -EINTR; @@ -839,12 +853,53 @@ int kthreadd(void *unused) return 0; } +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask) +{ + struct kthread *kthread = to_kthread(p); + cpumask_var_t affinity; + unsigned long flags; + int ret; + + if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { + WARN_ON(1); + return -EINVAL; + } + + WARN_ON_ONCE(kthread->preferred_affinity); + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL); + if (!kthread->preferred_affinity) { + ret = -ENOMEM; + goto out; + } + + mutex_lock(&kthreads_hotplug_lock); + cpumask_copy(kthread->preferred_affinity, mask); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + kthread_fetch_affinity(kthread, affinity); + + /* It's safe because the task is inactive. */ + raw_spin_lock_irqsave(&p->pi_lock, flags); + do_set_cpus_allowed(p, affinity); + raw_spin_unlock_irqrestore(&p->pi_lock, flags); + + mutex_unlock(&kthreads_hotplug_lock); +out: + free_cpumask_var(affinity); + + return 0; +} + /* * Re-affine kthreads according to their preferences * and the newly online CPU. The CPU down part is handled * by select_fallback_rq() which default re-affines to - * housekeepers in case the preferred affinity doesn't - * apply anymore. + * housekeepers from other nodes in case the preferred + * affinity doesn't apply anymore. */ static int kthreads_online_cpu(unsigned int cpu) { @@ -864,8 +919,7 @@ static int kthreads_online_cpu(unsigned int cpu) list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || - kthread_is_per_cpu(k->task) || - k->node == NUMA_NO_NODE)) { + kthread_is_per_cpu(k->task))) { ret = -EINVAL; continue; }