From patchwork Tue Nov 12 14:22:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13872296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1738FD42B85 for ; Tue, 12 Nov 2024 14:23:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CACB6B00E7; Tue, 12 Nov 2024 09:23:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9764F6B00F7; Tue, 12 Nov 2024 09:23:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 817396B00F8; Tue, 12 Nov 2024 09:23:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5DB136B00E7 for ; Tue, 12 Nov 2024 09:23:33 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 13DF8160191 for ; Tue, 12 Nov 2024 14:23:33 +0000 (UTC) X-FDA: 82777660686.14.52794FE Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf04.hostedemail.com (Postfix) with ESMTP id 58B754001F for ; Tue, 12 Nov 2024 14:22:38 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bX6I+zoH; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf04.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731421235; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N6Kj0gZYTbZYKQG2dUeSbKMk8SIG5ub+2cY5qRXJj1I=; b=xSoB/ofNY7DoAvNj40+dcRj7uzsqVT5Yg6nlZtG9XwLTlMA4cvt+D8buyQtjtxKzLbiPK6 IE+5wAuNkvcU/bp7QW13lY/ef0DQINsqm/w8aEIlNT16ktccOMKWA/SEewazSp+FEFq5ZR brqgiqkefb5seJ3umIIwEBfGKikRsck= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bX6I+zoH; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf04.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731421235; a=rsa-sha256; cv=none; b=CdcBOtnG0fTYrxWhUiIrAqOy3+4iy73hgAKb4LXLz9u2ELHv0ClQPMpK6XRuj+QrsAS4zz YYUgjAxmQy6wScyPx/o9+XvK9XDlMo2Dga/0m1Abqu4EORGkXhlwzXIQXqV4b+Wd7oHvHK KHCukd3pw7/5/JFS29Mui4aZJ0w+PjY= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 210265C54E8; Tue, 12 Nov 2024 14:22:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 12BFCC4CECD; Tue, 12 Nov 2024 14:23:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731421410; bh=1Mtlm//q48SrsPqs3+PH5LEw1zSw/x3ZR0//41DWXT0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bX6I+zoHIDtpnApj8lTc4/kUq9/Avb5U6PIJXGr6R7ZH+w5tkNcKKlu21bICUDGN/ vayr3D5njBiqePvwXiwHyfN3kXc06v69j6prrZAqfNIOoSrm+C+z3Q4pL4M9q7C9uS wuZcO0UGrrDCK82FPd8TKMDwRf9wA/+UEHiQzK67WXEbvJsJe0NzgqK9g5y85trGjw 1q7fRsyVwt8VkR8OxvGOhleSa7Ia/nqon3h/KDe5lg2kCq7+DLI23O2etxWOwsasvC OoogZcMGZcCTjUSo3xgRwG4+kM9yXk8NPYiZZoznEPGulBno+uw9HmRgV70VCAv/Ql rD/v3aloGzdYQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Zqiang , rcu@vger.kernel.org, Uladzislau Rezki Subject: [PATCH 13/21] kthread: Make sure kthread hasn't started while binding it Date: Tue, 12 Nov 2024 15:22:37 +0100 Message-ID: <20241112142248.20503-14-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241112142248.20503-1-frederic@kernel.org> References: <20241112142248.20503-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 58B754001F X-Stat-Signature: 1ekmetpwy5w9mr5ogrgmi6jjdh6334pw X-Rspam-User: X-HE-Tag: 1731421358-625119 X-HE-Meta: U2FsdGVkX1+CxfQuljOjFzLEFDlZWQ4caZYmfo7aJEBcHXp2XVuv4JezvdhNdAFFGEmKk0GCYZOTASY3U9qNS0UhGfqGitAlUgLzjDYTkRB2FfdVudeNMT/d+/x/WwMvdxgupBFbdyyWHMW/BjXPT4UOwhmP/+EoC2+a8E0pX47Xxa/+1mTlUKHc4cNgiXfmzRUQwUfnPbuX622wrMrMx7af609wd1XgPG5HFazRdsku8NC45G6io7jK7E72p8xjr/1pzZkmDzrtqFcWp9UFNPRvTUVDeKp74hjlOt33jetiGSnhSewCkTXqcxM5dJttPQ+xLm9FuRqb+R2+qZG8mTAY7Kh/gKJSKj2Vbpk50iVAi7+7mrBEP2jJkls/sSAH5dLA0Mvml458qrE6EvCKnFP1UGJDp0zx9qgocct8IJMdeA6lHzfaT/6gm5C+wYY8FGQHuDFMxfdsYrkwWc6NawHi2PSdGMk6TpI2F4E7C2Z6Lrz3/9Tf8oNbwamjE2op0EkJO3Rir/5JZUu1rOoGGKynMa7H02qeHw2IlwX8d9VUp5IYRt5TWtGT0Kz3YrKgcksPeWg57XV6jE2APGP5EZYN3HM+5+/Er9r9LLKtXsu+8oV9jy2FsdB4ruCRJEBEkTpaEr5wIBZDWCSgSeqiYJSj2H/+SOsdj75WsKsV3uQ+NEyN5DXfDcK7kJKTXk6FiIUFD4N6QRBg8yPmp9BTAKSYBZY5fU44A7rnjg0SoDdkrqLImXh6xxQqSJh2YSCDdoLPsyVAFZ7/6S4ItFCDs9fPWYRII0j5jeztyXf8eBIZ66wBOK/9q03XKcKsAKHfNiDJU6jsss/5fk/W4kWItxy5BC4dFMBIVZKLv4krTxJd8peA4VFonS2Lc8q2tAlkqDm9NiJzU+z5ACOAYkJwURUvuOEnpr9n1hZiXMBeb8objZQBLu7udYxarr9N3+5W7VsLMdKW3K+Q8nejtd7 ikohjbTO OjHnXjM/Mc2FfI3SXcvoNJ9IGW/9weuk9mG5RjvMYGzoLbWkec2mZbB/VvyxlDFPatOtYmkda73A/rXX4VHgRtGlZDBWTK0aXc5wxq5MS2393Qhd3Kbt6fOUT3kNbPkRQlliSWimEcYQyvETpngJMIzCBNUG6UrdfnjhdaVZXWc1xthRqzWv61/4c6/YWE99eDtwdWMGAM/aaCUCW8XQsPUCa2NrjlsIiHUdHd4YNZXGoLQl3jTS0NraXPWZqeI6iSDBndR4izCb6e1GAT3MC7ppK6N8Vop7ja3qBfH24xjFGReTpzRc4iA6ERA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Make sure the kthread is sleeping in the schedule_preempt_disabled() call before calling its handler when kthread_bind[_mask]() is called on it. This provides a sanity check verifying that the task is not randomly blocked later at some point within its function handler, in which case it could be just concurrently awaken, leaving the call to do_set_cpus_allowed() without any effect until the next voluntary sleep. Rely on the wake-up ordering to ensure that the newly introduced "started" field returns the expected value: TASK A TASK B ------ ------ READ kthread->started wake_up_process(B) rq_lock() ... rq_unlock() // RELEASE schedule() rq_lock() // ACQUIRE // schedule task B rq_unlock() WRITE kthread->started Similarly, writing kthread->started before subsequent voluntary sleeps will be visible after calling wait_task_inactive() in __kthread_bind_mask(), reporting potential misuse of the API. Upcoming patches will make further use of this facility. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index 9bb36897b6c6..b9bdb21a0101 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -53,6 +53,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + int started; int result; int (*threadfn)(void *); void *data; @@ -382,6 +383,8 @@ static int kthread(void *_create) schedule_preempt_disabled(); preempt_enable(); + self->started = 1; + ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); @@ -540,7 +543,9 @@ static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask) { + struct kthread *kthread = to_kthread(p); __kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE); + WARN_ON_ONCE(kthread->started); } /** @@ -554,7 +559,9 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask) */ void kthread_bind(struct task_struct *p, unsigned int cpu) { + struct kthread *kthread = to_kthread(p); __kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE); + WARN_ON_ONCE(kthread->started); } EXPORT_SYMBOL(kthread_bind); From patchwork Tue Nov 12 14:22:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13872297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D21D2D42B89 for ; Tue, 12 Nov 2024 14:23:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D24F8D0002; Tue, 12 Nov 2024 09:23:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5574B8D0001; Tue, 12 Nov 2024 09:23:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 383828D0002; Tue, 12 Nov 2024 09:23:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 15E5B8D0001 for ; Tue, 12 Nov 2024 09:23:37 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 97604A01F2 for ; Tue, 12 Nov 2024 14:23:36 +0000 (UTC) X-FDA: 82777659342.19.CB66D6C Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf20.hostedemail.com (Postfix) with ESMTP id BB5041C002F for ; Tue, 12 Nov 2024 14:22:42 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=kiABhKiz; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731421239; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k0+AxhLJsS/cWA9UXBUN/9YbI08WTDHWq4laG3A9trQ=; b=OFgGRKIyn8Aucbv5w2aemk2CwlaNGmgcNO4HIQzF8V0zEDpy0USi8rpwt+dv6spKYKOYCx +r0yHWQHvM0v4+QCsJuvX0AthzHfHHfuGfcO95meW25URlftlhF7t/u1XKaEyDe/SuFJae cI1OcR8QLd8SHBecgbrQPnI18i4wEnQ= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=kiABhKiz; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731421239; a=rsa-sha256; cv=none; b=ozT3vyMYGIyed6GzwpscB37kOtShb6yuUTJSXD/65vE2n2pie32YCpk9NmYzmX81xKbJY3 KR0M04UOsZhJTGVJoA0GcorrxVS9mJ/o8ABAFrpcfSTX+5S9Es1Mvv9j3ymUYiQDhmcwjT OMpKxEt1WNP3MwfL+8jdEHcnqnBl7j8= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id BD97F5C553C; Tue, 12 Nov 2024 14:22:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9BFDEC4CED0; Tue, 12 Nov 2024 14:23:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731421413; bh=fruRAFRJD5hyDJnlGTMnLVDzGhwGwDTnT8l0kLXzHsE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kiABhKizd+xSU7eSDo97xVfxHa0380/ByomV7zUv6ueneevzh1xXifVnlAMUS7vd8 LHxbO/iDk8dsf9rXzv51qRdEFEdXaKqJavyrjCroXq6+gWSCd3rXPq5v7nzDo9ocR4 tH6JMZJYeGkFMmnT1lCHj7V0PWcwU1FE8omcEQJQz41Ktze/luV4ToL8xrUI+JU/AS lBj2FTzGXaG+Iod2JkZASzYKEr/i/HqFjGb3CtqVzlWpILLdqXNDoiclaLdx3N6nHY qeutjgCZ5Ne8bxZsQCxxKdGobkHKHaC6KpxR7mLOgq1EqOtYhRYya9soZueVmEvKFd T/keCoqVQsMGw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Uladzislau Rezki , Zqiang , rcu@vger.kernel.org Subject: [PATCH 14/21] kthread: Default affine kthread to its preferred NUMA node Date: Tue, 12 Nov 2024 15:22:38 +0100 Message-ID: <20241112142248.20503-15-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241112142248.20503-1-frederic@kernel.org> References: <20241112142248.20503-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BB5041C002F X-Stat-Signature: uf1a157sszwwyfqqq4tj66kk4pzrkjtb X-Rspam-User: X-HE-Tag: 1731421362-513907 X-HE-Meta: U2FsdGVkX1/ReyTOS3s6XCzSXej6Ms+U6jOvSpTbW2JIocS7OvZNLcxdNEW1At+pDAr9ctCZndc9Kgvd8NCkYuW4zpcraMDgLhPMiyuXV+x/UYYBco9fB1f1X4BprBwqLnSIDjLmQgczXBRvnOPUyMf1kOistlpwrxGKPBkDUkO3NRqVSBE3iVzdrmsCyC3WMDN1j+nPnX+R89Iu5yHVyV70f4ysljM/lFmP5gHFRIHZkZybCOCSOxeZz/bC3qtgg03oWjQo8NNzABTlC7AL7xLsF4gltIh5w0EmlRAM98QbOqEx5QZxVrZMsT9Y9cFCVXRq1IXeUdlk5ABIIsquH7bZsTZGRU4lv22LkWNXwmIsy2fTeNP6GIqqXP5dgldbbg1kx2j8qjDq4MKi3yIGfVwGFmW/VN7nOKzllGayB1XLdqQGgU5+rBALZrjUtS1y1gJWretHHJnF3czjVwk0Kyy0V0EsiDOa5V9+iCxHxoco4qhA1K9vp7CcHuH8kqWN5A4hKfxBSncsQni5dcl+s/melAuu7+Xk1AaFSpmdoFbZXk1OY2m9hJ7LT8+lsK0SOtZbV/NngSJ2ApPdAmlf8T0AfctnoNQ0lzGvoEbxzx7EBzWUM3WkBAcvVP7oGUIYOONFS+Vujmbycp4rzqtGbPoRlaAGiOcbOGVIfzM8e7dnRk9OFEbG7p8Zd9UGVBjWrkhDj5QqHyBS4gi00n9za9AkvTZxAvHb9MXFoUMC2nAO4iv6k4hjSThjvVWIylCkMVHO7XBzml5fu+jvCRV/K7LPToDT9pT5tbcEJj58cWkuv3PHoKPuFPept1Sa/a+tE9gJwY+41sulkTdg/qP2HXOniGV4546WLXLxSfTy+N6Fx8MZ8YTMGh1W+56SXucUE7KXnyD6FUxO9J03b/iz+c0As2knG9h1t9A7G8LMYLGrh9GRUfdZMuYYaODbTduk7rON8su6KWm9qMOx6tc eUDth1QJ nshaBxDpTMuHL5PvsS1fmE1pz9F3Zv+u3LCjO8T/Xn1MmsQ4YS+1B1rjKTSc4Sz1/6gyZN6Kz3w7q7SvUdnnq07NUoqRRwhogPTq8F7zhIGQ/jxCQ1RGhrkVpYMCSyV70XU6ccTdoYAjxvoSaeQZGIW0aYPTeL1jkswRbWZq8/bEna/nKjaOcje86dxgExDlF7t+j1wwQApAYAVJiVcUQXY50sQLTtY1EX9MDptsnIlNFCeiBrzLpO6ic3s0bOAOoS+E1X61WRsKcceLpyR8DoISjF6mU8Z3w/sXyBCX55Fcien+FBYnLGPA5og== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kthreads attached to a preferred NUMA node for their task structure allocation can also be assumed to run preferrably within that same node. A more precise affinity is usually notified by calling kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. For the others, a default affinity to the node is desired and sometimes implemented with more or less success when it comes to deal with hotplug events and nohz_full / CPU Isolation interactions: - kcompactd is affine to its node and handles hotplug but not CPU Isolation - kswapd is affine to its node and ignores hotplug and CPU Isolation - A bunch of drivers create their kthreads on a specific node and don't take care about affining further. Handle that default node affinity preference at the generic level instead, provided a kthread is created on an actual node and doesn't apply any specific affinity such as a given CPU or a custom cpumask to bind to before its first wake-up. This generic handling is aware of CPU hotplug events and CPU isolation such that: * When a housekeeping CPU goes up that is part of the node of a given kthread, the related task is re-affined to that own node if it was previously running on the default last resort online housekeeping set from other nodes. * When a housekeeping CPU goes down while it was part of the node of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the same node or, as a last resort, to all housekeepers from other nodes. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/cpuhotplug.h | 1 + kernel/kthread.c | 106 ++++++++++++++++++++++++++++++++++++- 2 files changed, 106 insertions(+), 1 deletion(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 2361ed4d2b15..228f27150a93 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -239,6 +239,7 @@ enum cpuhp_state { CPUHP_AP_WORKQUEUE_ONLINE, CPUHP_AP_RANDOM_ONLINE, CPUHP_AP_RCUTREE_ONLINE, + CPUHP_AP_KTHREADS_ONLINE, CPUHP_AP_BASE_CACHEINFO_ONLINE, CPUHP_AP_ONLINE_DYN, CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 40, diff --git a/kernel/kthread.c b/kernel/kthread.c index b9bdb21a0101..df6a0551e8ba 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -35,6 +35,9 @@ static DEFINE_SPINLOCK(kthread_create_lock); static LIST_HEAD(kthread_create_list); struct task_struct *kthreadd_task; +static LIST_HEAD(kthreads_hotplug); +static DEFINE_MUTEX(kthreads_hotplug_lock); + struct kthread_create_info { /* Information passed to kthread() from kthreadd. */ @@ -53,6 +56,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + unsigned int node; int started; int result; int (*threadfn)(void *); @@ -64,6 +68,8 @@ struct kthread { #endif /* To store the full name if task comm is truncated. */ char *full_name; + struct task_struct *task; + struct list_head hotplug_node; }; enum KTHREAD_BITS { @@ -122,8 +128,11 @@ bool set_kthread_struct(struct task_struct *p) init_completion(&kthread->exited); init_completion(&kthread->parked); + INIT_LIST_HEAD(&kthread->hotplug_node); p->vfork_done = &kthread->exited; + kthread->task = p; + kthread->node = tsk_fork_get_node(current); p->worker_private = kthread; return true; } @@ -314,6 +323,11 @@ void __noreturn kthread_exit(long result) { struct kthread *kthread = to_kthread(current); kthread->result = result; + if (!list_empty(&kthread->hotplug_node)) { + mutex_lock(&kthreads_hotplug_lock); + list_del(&kthread->hotplug_node); + mutex_unlock(&kthreads_hotplug_lock); + } do_exit(0); } EXPORT_SYMBOL(kthread_exit); @@ -339,6 +353,48 @@ void __noreturn kthread_complete_and_exit(struct completion *comp, long code) } EXPORT_SYMBOL(kthread_complete_and_exit); +static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask) +{ + cpumask_and(cpumask, cpumask_of_node(kthread->node), + housekeeping_cpumask(HK_TYPE_KTHREAD)); + + if (cpumask_empty(cpumask)) + cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); +} + +static void kthread_affine_node(void) +{ + struct kthread *kthread = to_kthread(current); + cpumask_var_t affinity; + + WARN_ON_ONCE(kthread_is_per_cpu(current)); + + if (kthread->node == NUMA_NO_NODE) { + housekeeping_affine(current, HK_TYPE_RCU); + } else { + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) { + WARN_ON_ONCE(1); + return; + } + + mutex_lock(&kthreads_hotplug_lock); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + /* + * The node cpumask is racy when read from kthread() but: + * - a racing CPU going down will either fail on the subsequent + * call to set_cpus_allowed_ptr() or be migrated to housekeepers + * afterwards by the scheduler. + * - a racing CPU going up will be handled by kthreads_online_cpu() + */ + kthread_fetch_affinity(kthread, affinity); + set_cpus_allowed_ptr(current, affinity); + mutex_unlock(&kthreads_hotplug_lock); + + free_cpumask_var(affinity); + } +} + static int kthread(void *_create) { static const struct sched_param param = { .sched_priority = 0 }; @@ -369,7 +425,6 @@ static int kthread(void *_create) * back to default in case they have been changed. */ sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_UNINTERRUPTIBLE); @@ -385,6 +440,9 @@ static int kthread(void *_create) self->started = 1; + if (!(current->flags & PF_NO_SETAFFINITY)) + kthread_affine_node(); + ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); @@ -781,6 +839,52 @@ int kthreadd(void *unused) return 0; } +/* + * Re-affine kthreads according to their preferences + * and the newly online CPU. The CPU down part is handled + * by select_fallback_rq() which default re-affines to + * housekeepers in case the preferred affinity doesn't + * apply anymore. + */ +static int kthreads_online_cpu(unsigned int cpu) +{ + cpumask_var_t affinity; + struct kthread *k; + int ret; + + guard(mutex)(&kthreads_hotplug_lock); + + if (list_empty(&kthreads_hotplug)) + return 0; + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + ret = 0; + + list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { + if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || + kthread_is_per_cpu(k->task) || + k->node == NUMA_NO_NODE)) { + ret = -EINVAL; + continue; + } + kthread_fetch_affinity(k, affinity); + set_cpus_allowed_ptr(k->task, affinity); + } + + free_cpumask_var(affinity); + + return ret; +} + +static int kthreads_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online", + kthreads_online_cpu, NULL); +} +early_initcall(kthreads_init); + void __kthread_init_worker(struct kthread_worker *worker, const char *name, struct lock_class_key *key) From patchwork Tue Nov 12 14:22:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13872298 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19CC0D42B88 for ; Tue, 12 Nov 2024 14:23:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C90F8D0003; Tue, 12 Nov 2024 09:23:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 978178D0001; Tue, 12 Nov 2024 09:23:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 818A68D0003; Tue, 12 Nov 2024 09:23:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 587AD8D0001 for ; Tue, 12 Nov 2024 09:23:39 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1F7091C4A94 for ; Tue, 12 Nov 2024 14:23:39 +0000 (UTC) X-FDA: 82777659216.01.AB3D43C Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf03.hostedemail.com (Postfix) with ESMTP id 539DD20004 for ; Tue, 12 Nov 2024 14:23:18 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Nwym0DBh; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf03.hostedemail.com: domain of frederic@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731421355; a=rsa-sha256; cv=none; b=uXefqbnPHveet6KjC2cJeVmsYVJ4H6CWH93QUpeOCaS5pVD1byjx17gPi2JOefRxTc74Qr tM/bIbkKCsaF+yJNBeoPDHe8Q8WT6kS8DKfT6nfurfwTyPns0oluJBmTicsDEJ1rZvQMJE qA3Uh2PjAgMO/GrVwyF6W6c5N9cyZ+g= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Nwym0DBh; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf03.hostedemail.com: domain of frederic@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731421355; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QMfzeDE6TO0jUOHFKLhVcYsWmCqJHbXGFSX6Ot+8q1Q=; b=eOkxcvtuoCfzglRWt52qWMyJYLxY5uJFCk2f5EcwF/i9XKRL7+8DBgmYZO2Gitt9XC2zfp Tik9HmclUjwPxQz45DGwrtyzK35gwaZ88OBUshMowOw7eFw2r/vIk85pMRcyUsHhPy/PLW 3tq7DZVG8Cr/oQX1hTQwJxqMuZYB8n8= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 6C41BA41C11; Tue, 12 Nov 2024 14:21:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42AD8C4CECD; Tue, 12 Nov 2024 14:23:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731421416; bh=7BtrEplH3qQrg9kZixHgQN9470zkhnd7tdHddmuV+9U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Nwym0DBhhcTIwnkBrtiXN0Zp9WMkosF7kKibRx7BQA2VpqG9VTmW7/kNN32GigUWh eGEyA7c0DhVnewNn5JsR7G4Y/OS87vA3OTF+26Cyk9BPin48X3G26F0212UQzRdxs5 2fF4KMleehk7g1LT102UKvgu3K+Kgb+CRtd8OYe5N8PsO8gmkRaDdYTQOHPDuYoLnH uUCMVkYtOmeqfNL3fXTDdopjA1jGPL9L7PaS1xyiXrP/Maw2ThvyAqaxbOsjspiRgA QaWbRGasHp2OEqfAng5Ky7fOBZL8hRp0+PYYsUFsd6A984tK0DIanGey074Rcajr58 AE9BAL0WVPomQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Michal Hocko , Vlastimil Babka , Andrew Morton , linux-mm@kvack.org, Peter Zijlstra , Thomas Gleixner , Michal Hocko Subject: [PATCH 15/21] mm: Create/affine kcompactd to its preferred node Date: Tue, 12 Nov 2024 15:22:39 +0100 Message-ID: <20241112142248.20503-16-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241112142248.20503-1-frederic@kernel.org> References: <20241112142248.20503-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 539DD20004 X-Rspamd-Server: rspam11 X-Stat-Signature: kk5ytagb1zne9srcpj3zfqqrj1hsync7 X-HE-Tag: 1731421398-560149 X-HE-Meta: U2FsdGVkX18FdgRF21FmVjmdwI+3xB6WI6ZBn41C7ukz8fqSu99WNiDzhpUsRuinzmNecE0PIcsqrusoyO4X/HnsXdixSCrbE8YbzbEBJkzL4oUXue9fBOIidpOJcO+wPpf0HB8ssI1HmjspwkNtU7z3BinOLj0zAfFB8ZcibLwC9tv3EkLzPfq5C7O6YYhJHLPoccgeBC3EGdHFVb2qf1NeaYC9UQjZmVo1z5DUBAKvUik/cb/f909S/jOvYE/6/rdedTRR99GPvmiCaf9Mdy8wHI3XWu2oMWPOl2lAvdlfHZSOT+PY3xLHwDvDqtkEn5yhcOka5tu25TnfV504iKSYICQeQOJt6kG1uNJUQ3as/Y41WIwShcsHvlzIICBeQm5Pb22IzeO2x2KZU1+grSlJgm5SG0W7BgCLd12eYX7Saw/NfSoWGJW+gG9Id8NpA1eohThoYPSOckixNPwNuBqJpvdLz7EQsjjNH6ufvnp1PYNSsDYd8KAFpeGRppKqmnYqRWQkJoPpRW3f9jtvXSWFM0CNut7gLaTl0PL/yUaTjeVaYZD+7e7sguWXnjfHTUeIAEImFH3IX2SUyxiwZf2JjYFQ2iyfdT5s20FQK8VbrSrDfH/OxxXAGsK2jpzJDS/EaPwvd4G3gM/t9tC+gOLZHMmizEEyauNkFN80sxlf2dbxhBs93SWVsvCQK8IpXaQzr0+t9vbZNgj+0ET3P6m3fWIkHjvO5RY36zV7B85zEHMldmkT9F/dKHteTxMdtjTVtzewtzVxUlnD71Sak6zFSUHPp/b2cXUa7XSNusPpLh86AeKS5cQNhNOPZxnxWPY+bfs+JuJ1e9LTp4r7+4SuM6cu8/33fOHcyO+rbYprXVfE8fMuHZBhOv7dP6JjiwyQ1eKzibF2ZWbeaSttDpRCwu4Ppwi1sCiq5vVCb4AfhENlDo7ys1OvTE1iL+tqS0Lue7Daut+KPydxBPR eq8SYqiB bD5C5YDvRfp3Qrghvvk253pTf5ksYGFyxOkEmWqSqFuBKA9QkHZk6QG8ZglW9hrq38pU7DehcxY1jz2VIemEbRO0q8sgSMfw2sV46eghejtx3s+9g3Z9T8vohAoHD6PoUWkdwivibotsqw5d4w1n3j+I/2LIc8Nhua47sXZdf4zuBsUo34NHo1nHyv2ygdo3jbDX8ryphULyDbI8iy68KUGeLj4L7OvCN6sphULu25hulw+oxAt5ySsy4CxjVYWRahf4JICNXq37YjyZQH3XyYLYP27CEIgGul/ai X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kcompactd is dedicated to a specific node. As such it wants to be preferrably affine to it, memory and CPUs-wise. Use the proper kthread API to achieve that. As a bonus it takes care of CPU-hotplug events and CPU-isolation on its behalf. Acked-by: Vlastimil Babka Acked-by: Michal Hocko Signed-off-by: Frederic Weisbecker --- mm/compaction.c | 43 +++---------------------------------------- 1 file changed, 3 insertions(+), 40 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a2b16b08cbbf..a31c0f5758cf 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -3154,15 +3154,9 @@ void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx) static int kcompactd(void *p) { pg_data_t *pgdat = (pg_data_t *)p; - struct task_struct *tsk = current; long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC); long timeout = default_timeout; - const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(tsk, cpumask); - set_freezable(); pgdat->kcompactd_max_order = 0; @@ -3233,10 +3227,12 @@ void __meminit kcompactd_run(int nid) if (pgdat->kcompactd) return; - pgdat->kcompactd = kthread_run(kcompactd, pgdat, "kcompactd%d", nid); + pgdat->kcompactd = kthread_create_on_node(kcompactd, pgdat, nid, "kcompactd%d", nid); if (IS_ERR(pgdat->kcompactd)) { pr_err("Failed to start kcompactd on node %d\n", nid); pgdat->kcompactd = NULL; + } else { + wake_up_process(pgdat->kcompactd); } } @@ -3254,30 +3250,6 @@ void __meminit kcompactd_stop(int nid) } } -/* - * It's optimal to keep kcompactd on the same CPUs as their memory, but - * not required for correctness. So if the last cpu in a node goes - * away, we get changed to run anywhere: as the first one comes back, - * restore their cpu bindings. - */ -static int kcompactd_cpu_online(unsigned int cpu) -{ - int nid; - - for_each_node_state(nid, N_MEMORY) { - pg_data_t *pgdat = NODE_DATA(nid); - const struct cpumask *mask; - - mask = cpumask_of_node(pgdat->node_id); - - if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids) - /* One of our CPUs online: restore mask */ - if (pgdat->kcompactd) - set_cpus_allowed_ptr(pgdat->kcompactd, mask); - } - return 0; -} - static int proc_dointvec_minmax_warn_RT_change(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -3337,15 +3309,6 @@ static struct ctl_table vm_compaction[] = { static int __init kcompactd_init(void) { int nid; - int ret; - - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, - "mm/compaction:online", - kcompactd_cpu_online, NULL); - if (ret < 0) { - pr_err("kcompactd: failed to register hotplug callbacks.\n"); - return ret; - } for_each_node_state(nid, N_MEMORY) kcompactd_run(nid); From patchwork Tue Nov 12 14:22:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13872308 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62ECCD42B88 for ; Tue, 12 Nov 2024 14:23:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC1168D0006; Tue, 12 Nov 2024 09:23:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D498E8D0001; Tue, 12 Nov 2024 09:23:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9ABD8D0006; Tue, 12 Nov 2024 09:23:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 984908D0001 for ; Tue, 12 Nov 2024 09:23:42 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 467FBC01E6 for ; Tue, 12 Nov 2024 14:23:42 +0000 (UTC) X-FDA: 82777659174.09.1475428 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf12.hostedemail.com (Postfix) with ESMTP id BCA6440021 for ; Tue, 12 Nov 2024 14:23:21 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Q1O9NV5N; spf=pass (imf12.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731421187; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8naD96APlxs8kUiD4q6PDGQvlmGm/MxvS7Zm1QGRIvE=; b=4Ex+y4azsnJOm4sW9ZKjdW8Rh2gRY4vXr1wEsLwFJO/9S2ntn0UKMQW4oBaqQOA260I/Gb g4p0/s6PfEHIv1C0EWyTKFgHc0tF7p8S6YRrLHw3g674sKF7zK4ddrPN5iNDre6XT4QH91 zQTsZ/2CCamlTmqASg0SzPOetkDw0f8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Q1O9NV5N; spf=pass (imf12.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731421187; a=rsa-sha256; cv=none; b=GLpah2X61MVL24ntr8zY3vtWCxOQQnnJ898juFqXbj6pEGdTD10qQ+0o8iSB8efPtWbJs4 ucBqeQwT5wGc68VfhszhQ/2oEZs22tx9Y41GZ/HJcSuhk6D5ZuVFpRw09DO3Eh8eylbFiW jq1ej+lqkf0/WFfuDAPDT1Kt+I2gv8w= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 74B9E5C5640; Tue, 12 Nov 2024 14:22:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D9259C4CECD; Tue, 12 Nov 2024 14:23:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731421419; bh=KNvMrDCkq+FkVo1LT5dQKXPLt7u3YoAjScUJDJ7bxzk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Q1O9NV5NBLcP9bidRunvKKRNekVsicJ56mPgrMO9TvKroDqkxtfTAUxMn23f5UqVl ZHIxRDlI7gIoJmY7caOSxYA18hbZKOC9TuKHNcfO52HrJMEK5SE3m+kL2f4ybC+KiT SlAu7sXTxUGFvRZaSTJXCZoNoMwT7Elw6A18DfiuHa+xaWDQv+EyX7jIEmOisY0vNU cUUDYMB0q26vkcQQsF/R4jOhcL2hbY5HWrBMf4SspagRSkOIurR90wuSrT3/QVDrzM VEOwjgK1adSLPK4xsiD7fgruumY9q7dn6fGogIV+O5puHjTu7iY2YMRDB4J8EB7PsD 0MDaJpt6OxGKg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, Andrew Morton , Peter Zijlstra , Thomas Gleixner , Michal Hocko Subject: [PATCH 16/21] mm: Create/affine kswapd to its preferred node Date: Tue, 12 Nov 2024 15:22:40 +0100 Message-ID: <20241112142248.20503-17-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241112142248.20503-1-frederic@kernel.org> References: <20241112142248.20503-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BCA6440021 X-Stat-Signature: tnrxekhyukmn8o7cuw7dbqixa5jw8uop X-Rspam-User: X-HE-Tag: 1731421401-369508 X-HE-Meta: U2FsdGVkX1+NmbNNblc1JCcMPs6YzlugQbbCLxQBBQ4rqcRfimmh+GhiSZzX2SsGs7kgWb8XfQ9zfBtki2L70jTFgEROV1bTK2dlPWms1LBtkhlnGmIhcTBhfjwmjlvNoMpVFX3SUqHTM15eIhQ2RTJwhgHreH0OXBzA9WBcgd69hwFGZzkTwUT4kIn0H349PugwOYHXS8RSSnf7sNu67OOYhh3cgSsJS6rMfesHVofCyB1Mj3HDvkqeNe/2RflespzmFRKsvpk3CDgj06PsedUQSBKrXBct9J94vquNTHxf1CxgPXUwOyHRA65aD9xluGnyPaCZVU6psxYAzrmoiucoYWibTsX2sR3OkmwZLmTvdJ8a7Yhy/lkMLVisD3Ozmfjnrg62ABUeLBz0uEnbgQ90g5eHpyUi65E0UuRrY5ZlmXKNR+xZfTksH4JAmen9k/nIDyjOPt+04HAsZbA1cQjHcfKzbD635scf4aFbcePffJyVVHIKKgEcLrMqnBG7HmOONdhBCqQyqijoifhCvhaBxB/fyOmWkrCFxXgarulJszW+8wHt0TylCNxeXGMuWm7sh9i49qJU/MWHV95SVoNnrKP6u8OpZvrtVVDI4pPGKroxsFNCYP3p1oIzBoYryIZKejbWHGvJDAUFjwMVxxLrDDU27pjH0uK+8AyTf30kB/myR7kDFXaeNP2saJkbXA0FaDgrJnQoyj8tPo+OgLdJottj2q5R/Gv/kxcvhCFZpcexKh8hp4sH5EztOmoXZwh2iOrP4EjlDA0DYi5/cSr5H0j5CAjQvZDzUmTek2g3MSsppBxY5VWA4dbGcg+GsgiBVgJeCBf7pFZlCFDB8VBTLEagM2D54VXUG2HKwBLYlhpDZfKnSBxOLxKFJI7J1eboK85aZbFzzbjw2ahGn+EuQgIYqXXDjus/JfgIB/QYGhcpr/JMCos906BkU2+aFXHRT03SeYAY51xlTg1 Wp1ysB5m Uu65LWZSmTbSkEhYe7aeMcSIMzu96ZYzr/thM0yNRsh9NhrJuAqcLui+qdJdSCqW7VAtaYjd0kqNHysaDg5n2b6RSLTBMtoRxrhGVSuaqzxdtRilp+zYiipeAlLg6eGW1oop6WiiZ5pmAnhBXoEnlL0F6445MnCh7SGywPmycNbvHyt1VOvU4euEo+myTMMnCJs/jDg4cRhAs8qzDNRP/T9/VegPPlPFvwmyKE2ivmGZdeD9uBWjwyM+xc7YiO8X/YbZ4+mNPRWWL2FD/moY+t7XWbReWglAoj9UwnZlahjZJH9Q9dSbwOQBnBhUWQdCK6AV0UXPEsApAcew= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: kswapd is dedicated to a specific node. As such it wants to be preferrably affine to it, memory and CPUs-wise. Use the proper kthread API to achieve that. As a bonus it takes care of CPU-hotplug events and CPU-isolation on its behalf. Acked-by: Vlastimil Babka Acked-by: Michal Hocko Signed-off-by: Frederic Weisbecker --- mm/vmscan.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 28ba2b06fc7d..5d7686bef51c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -7168,10 +7168,6 @@ static int kswapd(void *p) unsigned int highest_zoneidx = MAX_NR_ZONES - 1; pg_data_t *pgdat = (pg_data_t *)p; struct task_struct *tsk = current; - const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(tsk, cpumask); /* * Tell the memory management that we're a "memory allocator", @@ -7340,13 +7336,15 @@ void __meminit kswapd_run(int nid) pgdat_kswapd_lock(pgdat); if (!pgdat->kswapd) { - pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid); + pgdat->kswapd = kthread_create_on_node(kswapd, pgdat, nid, "kswapd%d", nid); if (IS_ERR(pgdat->kswapd)) { /* failure at boot is fatal */ pr_err("Failed to start kswapd on node %d,ret=%ld\n", nid, PTR_ERR(pgdat->kswapd)); BUG_ON(system_state < SYSTEM_RUNNING); pgdat->kswapd = NULL; + } else { + wake_up_process(pgdat->kswapd); } } pgdat_kswapd_unlock(pgdat); From patchwork Tue Nov 12 14:22:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13872309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24A41D42B85 for ; Tue, 12 Nov 2024 14:23:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8B1D8D0009; Tue, 12 Nov 2024 09:23:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3AA78D0001; Tue, 12 Nov 2024 09:23:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B5AA8D0009; Tue, 12 Nov 2024 09:23:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6B4D98D0001 for ; Tue, 12 Nov 2024 09:23:46 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1FB8F1A010B for ; Tue, 12 Nov 2024 14:23:46 +0000 (UTC) X-FDA: 82777660392.22.9559689 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 9D8EB180010 for ; Tue, 12 Nov 2024 14:23:14 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jtvLRf24; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731421248; a=rsa-sha256; cv=none; b=SwUII61zvtRTv+yCFlXTorTm9sRxNQgOu6wSvwN8HGB81Y4EnYnI+PlG6LbBJK/AkXpfSl Ch7M6p/uTl2OTqpJ8vKBe5AHcIjMKsgjZtU7/LrAUCdtcantNC1s6T7AuVBL6KMYpMiRBM M7u/+xsgSiCZgJu1X3wI8uiixuvSISY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jtvLRf24; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731421248; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u5orFVSumv12STGpLH1rJ7LZUt0aBf1OAEJEhhsv9Qw=; b=l2qSX39G0hEdL3Mo5HxASooBSAb1++i1tmgYxbHUlwvWlporZnXc4gG0SfcqFRQWNCJydL sdbTqAhGWtt2ZZgzpDVvtb4E/QzYE2yNBwh6jHJwRKP6TrN0L9B78zvWKEsEvul4+ZgKyX k3Xi6T/dribDgViy44CaC7owx3l++5I= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 42ACA5C5558; Tue, 12 Nov 2024 14:22:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13733C4CECD; Tue, 12 Nov 2024 14:23:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731421423; bh=pa85GAr4tPpJJbyysqwfPwjTT1mOzOrKwHdNJ+6moto=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jtvLRf24a5DYRdTyw6weBAEzquiZKdPSG3J5Bqnes7vezMHYk5u33kuGAedyiSdEC DBtIz3NlR/I1xinJ4ZzV2zVmopScflrzYj0m2hHq4JSZKM01v6WC7VJPqn4iLeHli3 JivN3BDsVZ/XqWwU+l2Ou3xIT6WRn+TvYqwYNjkq8U1g+SdirnSEeJyIiBWqF79Ed3 fnV33hYuupkEWt2xczXdVANYVq94jBBzuZRv5VdyJdpBEPzDqg04955QyAYGRAqGsb UT6Psz8RKBZvIVi7ZCN7+FAvqZIDMzx2BaV6FvuxEwHmlF57sYpM6yOMzK3KqnM4ai Bc7/kKBLSB0Yw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Uladzislau Rezki , Zqiang , rcu@vger.kernel.org Subject: [PATCH 17/21] kthread: Implement preferred affinity Date: Tue, 12 Nov 2024 15:22:41 +0100 Message-ID: <20241112142248.20503-18-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241112142248.20503-1-frederic@kernel.org> References: <20241112142248.20503-1-frederic@kernel.org> MIME-Version: 1.0 X-Stat-Signature: e89gm9n337xuyjgfej8p8qg9agjye8us X-Rspamd-Queue-Id: 9D8EB180010 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1731421394-487149 X-HE-Meta: U2FsdGVkX18zfv7JMu+tmq0GmkZJjxyTYdWylL09spX0R8f8aOrdYAPqzQLnl70zalDDMO4hgypS37JPLjQ89TdAz9Kbp2gvmyOr0/b1MCRt7LUez3DnE0PK2eZnQz6s3u8h1Fx8BsHO/YmwHPRza/YGsaBntAbWzejygJSliiwlo9b8ylo/izuXXpQxhSWiAFo+TdVXCkrwN4IBNnCuNh0UGmARJfOt04YwgSCcFHsJrJm7Bdpw3Tlqkn43HHAN+YcZWEUOPpjnQya2jvxtX4E9gDrG8jchEvMLmHnF6uE0l6fKBUoDe7CiNhYN5z3C2U6VhacMMFWh3C56p1xSB653CLRr6d3d94iYfthiq0d2RXYZAjtK9nve8IZS9LtkQbT0gsd+85c3NyLddWNFVfBSWpdDmjxxyCzM4RGsAktvF5CFg/HZiAT3o9NkFpFKpcxxE+PkG8tPpPRxPxTYwydzoB/yWQbr1gF2k3TojkQksxfYQAiZHox0GFv0hT4QFAMVsoTfOHWO3VFjnhVbpWVycdRZ3fKKfzz52eDyT6YTVM45sTbp26u+lVhqiaC0tbAgfDr/U8cHKyIi2mC73v0V9se+++ZDWwfbxgjw0bEuq06pSxmm8gv8d0wTCBzaiUYunthj3Pkc+Vc5lWKWG1f6mxAo9G7FyCzWNLeYOD9cWljLbILiP0+cKxqbds8LsPH9BoC1bYyhpl3rxkRwibsEXfc2bO6lZLDcB7Bdr1OhmIoDsrtKOpqvy66+UKryJFjAcWtknwYXuMMzTm5h4tc2+hDBSTy5IFBtiAy0ndy2PbQZJs/NzMkAlE++fMPXE7UDFyQn6ZFul5/qKG76gydOCXtIRdwAGa7rq56yNd2m4Ho1pwvr6znLgeabRA3IkJmUwTODfjl7ZY7C6WZOuyytGOfmuKMxs/95RaORFQymP66blcykd/bOSf6tSpHJ4R19pTcz717mGL7WWvC v9BhOMcH I5HESAbgnvb33p0b9IgqQ/9LNgL9gger5x1nizNNE7sdbea4ipjH4md8jV4jFvkOqXQS3pdMmiOji180qKZU/2xw2a2i1WU6XcglyQaqOK06FDvOAGoSNNvqsB9lHnZwTQyWehKuKP1X7URLkI3HbprXP1Xlc8t+e7sd5ZgoGi7UqiUCvF4xdYbBScm0iutfE35doXj1hjl1/RFh09+aBBxUMBMBLI331tHzy40+XTfNbtO4oIuJtCwZMYWuLgEIAKxXzir8UsfG248gyQNTaJWCjscBZcu89GK16B4375unMjN3TAQr9d0aZFQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Affining kthreads follow either of four existing different patterns: 1) Per-CPU kthreads must stay affine to a single CPU and never execute relevant code on any other CPU. This is currently handled by smpboot code which takes care of CPU-hotplug operations. 2) Kthreads that _have_ to be affine to a specific set of CPUs and can't run anywhere else. The affinity is set through kthread_bind_mask() and the subsystem takes care by itself to handle CPU-hotplug operations. 3) Kthreads that prefer to be affine to a specific NUMA node. That preferred affinity is applied by default when an actual node ID is passed on kthread creation, provided the kthread is not per-CPU and no call to kthread_bind_mask() has been issued before the first wake-up. 4) Similar to the previous point but kthreads have a preferred affinity different than a node. It is set manually like any other task and CPU-hotplug is supposed to be handled by the relevant subsystem so that the task is properly reaffined whenever a given CPU from the preferred affinity comes up. Also care must be taken so that the preferred affinity doesn't cross housekeeping cpumask boundaries. Provide a function to handle the last usecase, mostly reusing the current node default affinity infrastructure. kthread_affine_preferred() is introduced, to be used just like kthread_bind_mask(), right after kthread creation and before the first wake up. The kthread is then affine right away to the cpumask passed through the API if it has online housekeeping CPUs. Otherwise it will be affine to all online housekeeping CPUs as a last resort. As with node affinity, it is aware of CPU hotplug events such that: * When a housekeeping CPU goes up that is part of the preferred affinity of a given kthread, the related task is re-affined to that preferred affinity if it was previously running on the default last resort online housekeeping set. * When a housekeeping CPU goes down while it was part of the preferred affinity of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the preferred affinity or, as a last resort, to all housekeepers from other nodes. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/kthread.h | 1 + kernel/kthread.c | 68 ++++++++++++++++++++++++++++++++++++----- 2 files changed, 62 insertions(+), 7 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index b11f53c1ba2e..30209bdf83a2 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -85,6 +85,7 @@ kthread_run_on_cpu(int (*threadfn)(void *data), void *data, void free_kthread_struct(struct task_struct *k); void kthread_bind(struct task_struct *k, unsigned int cpu); void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask); +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask); int kthread_stop(struct task_struct *k); int kthread_stop_put(struct task_struct *k); bool kthread_should_stop(void); diff --git a/kernel/kthread.c b/kernel/kthread.c index df6a0551e8ba..43724fc6e021 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -70,6 +70,7 @@ struct kthread { char *full_name; struct task_struct *task; struct list_head hotplug_node; + struct cpumask *preferred_affinity; }; enum KTHREAD_BITS { @@ -327,6 +328,11 @@ void __noreturn kthread_exit(long result) mutex_lock(&kthreads_hotplug_lock); list_del(&kthread->hotplug_node); mutex_unlock(&kthreads_hotplug_lock); + + if (kthread->preferred_affinity) { + kfree(kthread->preferred_affinity); + kthread->preferred_affinity = NULL; + } } do_exit(0); } @@ -355,9 +361,17 @@ EXPORT_SYMBOL(kthread_complete_and_exit); static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask) { - cpumask_and(cpumask, cpumask_of_node(kthread->node), - housekeeping_cpumask(HK_TYPE_KTHREAD)); + const struct cpumask *pref; + if (kthread->preferred_affinity) { + pref = kthread->preferred_affinity; + } else { + if (WARN_ON_ONCE(kthread->node == NUMA_NO_NODE)) + return; + pref = cpumask_of_node(kthread->node); + } + + cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD)); if (cpumask_empty(cpumask)) cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); } @@ -440,7 +454,7 @@ static int kthread(void *_create) self->started = 1; - if (!(current->flags & PF_NO_SETAFFINITY)) + if (!(current->flags & PF_NO_SETAFFINITY) && !self->preferred_affinity) kthread_affine_node(); ret = -EINTR; @@ -839,12 +853,53 @@ int kthreadd(void *unused) return 0; } +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask) +{ + struct kthread *kthread = to_kthread(p); + cpumask_var_t affinity; + unsigned long flags; + int ret; + + if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { + WARN_ON(1); + return -EINVAL; + } + + WARN_ON_ONCE(kthread->preferred_affinity); + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL); + if (!kthread->preferred_affinity) { + ret = -ENOMEM; + goto out; + } + + mutex_lock(&kthreads_hotplug_lock); + cpumask_copy(kthread->preferred_affinity, mask); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + kthread_fetch_affinity(kthread, affinity); + + /* It's safe because the task is inactive. */ + raw_spin_lock_irqsave(&p->pi_lock, flags); + do_set_cpus_allowed(p, affinity); + raw_spin_unlock_irqrestore(&p->pi_lock, flags); + + mutex_unlock(&kthreads_hotplug_lock); +out: + free_cpumask_var(affinity); + + return 0; +} + /* * Re-affine kthreads according to their preferences * and the newly online CPU. The CPU down part is handled * by select_fallback_rq() which default re-affines to - * housekeepers in case the preferred affinity doesn't - * apply anymore. + * housekeepers from other nodes in case the preferred + * affinity doesn't apply anymore. */ static int kthreads_online_cpu(unsigned int cpu) { @@ -864,8 +919,7 @@ static int kthreads_online_cpu(unsigned int cpu) list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || - kthread_is_per_cpu(k->task) || - k->node == NUMA_NO_NODE)) { + kthread_is_per_cpu(k->task))) { ret = -EINVAL; continue; }