From patchwork Fri Jan 12 15:46:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13518585 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D1366EB7D; Fri, 12 Jan 2024 15:46:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hqvOXEaD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2E52C43399; Fri, 12 Jan 2024 15:46:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705074393; bh=5LTULEAdFpJcQBG6D6ceSXiQtGzk7tED42bNjO0Rk+M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hqvOXEaDSm1CyW5xE6ar9yif2JT/BSSeb2hAxaUYvxBC6RV1EP+KCjwxqnM4uTE5d ROXKPFvKXMRZAaQSuZlw9GGniXf904gAus9MjM1pD+J8eNFhVBCnk0jzebVLg92FUl cF3wK2nh3V0kXZI6b3Umu13fTPJTisxf4RZIHX7Yus7DyXamvjfI41l4vVq20EakH2 bddQVhGqId+9VUGOA3l5xGOFZHX6altm21y38nHW/TdVLH8m1QwYnexISLbIIF1jiK blXNHTKWz7zh31Y66VGCeg14Iimr3ahuwaE3N0rhD9BPwEUx5sVUc3OJ1hpNCo4e6y ekPoSnbYMIVWw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , rcu , Kalesh Singh Subject: [PATCH 1/7] rcu/exp: Fix RCU expedited parallel grace period kworker allocation failure recovery Date: Fri, 12 Jan 2024 16:46:15 +0100 Message-Id: <20240112154621.261852-2-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240112154621.261852-1-frederic@kernel.org> References: <20240112154621.261852-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Under CONFIG_RCU_EXP_KTHREAD=y, the nodes initialization for expedited grace periods is queued to a kworker. However if the allocation of that kworker failed, the nodes initialization is performed synchronously by the caller instead. Now the check for kworker initialization failure relies on the kworker pointer to be NULL while its value might actually encapsulate an allocation failure error. Make sure to handle this case. Reviewed-by: Kalesh Singh Fixes: 9621fbee44df ("rcu: Move expedited grace period (GP) work to RT kthread_worker") Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 499803234176..b000c1a83fc2 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4760,6 +4760,7 @@ static void __init rcu_start_exp_gp_kworkers(void) rcu_exp_par_gp_kworker = kthread_create_worker(0, par_gp_kworker_name); if (IS_ERR_OR_NULL(rcu_exp_par_gp_kworker)) { pr_err("Failed to create %s!\n", par_gp_kworker_name); + rcu_exp_par_gp_kworker = NULL; kthread_destroy_worker(rcu_exp_gp_kworker); return; } From patchwork Fri Jan 12 15:46:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13518586 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6643B7317A; Fri, 12 Jan 2024 15:46:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="V8L388gC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38020C433A6; Fri, 12 Jan 2024 15:46:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705074396; bh=ZJYHTwI5oMs3AJIma8DdT7q/1S5boVpTb/VWCHsVrwM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=V8L388gCFiFAPfJpvHRXTW20amcvbGG24SdVNjndMJL8nHjTvMmUHcdqKrydxUyV/ rTsL0wBbmuha5OjYIdinbVSOjyUVNcC9B9B4ChWyXd4pVxU0h5mVmJiYIIDUb9q5IO amXR2FhNdMmRVoR+CrWAZhKQvRrr6i5KP/zmooVe41ao0JKf2w1o91JNlTbHbyK6K+ z8y/DGQ0tiIYxneqmbTx3XHUh2Me5XucgTiL59h3sl8aWy9RJx10eBTa/XnkhVV/LS J3h0gBt7F+iheOSkWPHxXx24RfcIJTiBk2YBbm5oH5j8/rhbd8FunUkvAPgyCy1eQF kyulePQrD/htg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , rcu , Kalesh Singh Subject: [PATCH 2/7] rcu/exp: Handle RCU expedited grace period kworker allocation failure Date: Fri, 12 Jan 2024 16:46:16 +0100 Message-Id: <20240112154621.261852-3-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240112154621.261852-1-frederic@kernel.org> References: <20240112154621.261852-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Just like is done for the kworker performing nodes initialization, gracefully handle the possible allocation failure of the RCU expedited grace period main kworker. While at it perform a rename of the related checking functions to better reflect the expedited specifics. Reviewed-by: Kalesh Singh Fixes: 9621fbee44df ("rcu: Move expedited grace period (GP) work to RT kthread_worker") Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 2 ++ kernel/rcu/tree_exp.h | 25 +++++++++++++++++++------ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b000c1a83fc2..5cd90f1762e9 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4754,6 +4754,7 @@ static void __init rcu_start_exp_gp_kworkers(void) rcu_exp_gp_kworker = kthread_create_worker(0, gp_kworker_name); if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) { pr_err("Failed to create %s!\n", gp_kworker_name); + rcu_exp_gp_kworker = NULL; return; } @@ -4762,6 +4763,7 @@ static void __init rcu_start_exp_gp_kworkers(void) pr_err("Failed to create %s!\n", par_gp_kworker_name); rcu_exp_par_gp_kworker = NULL; kthread_destroy_worker(rcu_exp_gp_kworker); + rcu_exp_gp_kworker = NULL; return; } diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 014ddf672165..6123a60d9a4d 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -427,7 +427,12 @@ static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp) __sync_rcu_exp_select_node_cpus(rewp); } -static inline bool rcu_gp_par_worker_started(void) +static inline bool rcu_exp_worker_started(void) +{ + return !!READ_ONCE(rcu_exp_gp_kworker); +} + +static inline bool rcu_exp_par_worker_started(void) { return !!READ_ONCE(rcu_exp_par_gp_kworker); } @@ -477,7 +482,12 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) __sync_rcu_exp_select_node_cpus(rewp); } -static inline bool rcu_gp_par_worker_started(void) +static inline bool rcu_exp_worker_started(void) +{ + return !!READ_ONCE(rcu_gp_wq); +} + +static inline bool rcu_exp_par_worker_started(void) { return !!READ_ONCE(rcu_par_gp_wq); } @@ -540,7 +550,7 @@ static void sync_rcu_exp_select_cpus(void) rnp->exp_need_flush = false; if (!READ_ONCE(rnp->expmask)) continue; /* Avoid early boot non-existent wq. */ - if (!rcu_gp_par_worker_started() || + if (!rcu_exp_par_worker_started() || rcu_scheduler_active != RCU_SCHEDULER_RUNNING || rcu_is_last_leaf_node(rnp)) { /* No worker started yet or last leaf, do direct call. */ @@ -955,7 +965,7 @@ static void rcu_exp_print_detail_task_stall_rnp(struct rcu_node *rnp) */ void synchronize_rcu_expedited(void) { - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); + bool use_worker; unsigned long flags; struct rcu_exp_work rew; struct rcu_node *rnp; @@ -966,6 +976,9 @@ void synchronize_rcu_expedited(void) lock_is_held(&rcu_sched_lock_map), "Illegal synchronize_rcu_expedited() in RCU read-side critical section"); + use_worker = (rcu_scheduler_active != RCU_SCHEDULER_INIT) && + rcu_exp_worker_started(); + /* Is the state is such that the call is a grace period? */ if (rcu_blocking_is_gp()) { // Note well that this code runs with !PREEMPT && !SMP. @@ -995,7 +1008,7 @@ void synchronize_rcu_expedited(void) return; /* Someone else did our work for us. */ /* Ensure that load happens before action based on it. */ - if (unlikely(boottime)) { + if (unlikely(!use_worker)) { /* Direct call during scheduler init and early_initcalls(). */ rcu_exp_sel_wait_wake(s); } else { @@ -1013,7 +1026,7 @@ void synchronize_rcu_expedited(void) /* Let the next expedited grace period start. */ mutex_unlock(&rcu_state.exp_mutex); - if (likely(!boottime)) + if (likely(use_worker)) synchronize_rcu_expedited_destroy_work(&rew); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); From patchwork Fri Jan 12 15:46:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13518587 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 089C57319F; Fri, 12 Jan 2024 15:46:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uS5uBpu4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id ADB5FC433B1; Fri, 12 Jan 2024 15:46:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705074398; bh=/cQxWnqvZS3pZ1bKPTWJvWEqbyrOWscjBjQf2J79KU8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uS5uBpu4kmfDvBa6MjrP2/TzKSp5FYWW9aaF9yP9XYVFRx/Nk8Ha0xuiBe2SY+EJX h/Q3e+WveP34vV+jJ1/sZLgVmR+Seez65KaL1eyTtt5ocdYzk6NRzSKm2eOo4h0I18 AgJ+iR0rMeJfLB3om+1zMJe2fIQKmUchD7AKm623NtIb3GgLXwR6ElPrqH5dZTulrI wqftYvX92Qlasa0lmfgLGibORq3bpTvZnK35ILr9XBb30e4jyMecA7mjIM5/ZPJXo1 TsQdm53r0YNHKy/LA9rCOnPR4LZ7v7pIBw274xP/kVqssVLp7/b6X/PUkr+ewoqOsh 9gZ47S9YFKGLg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , rcu Subject: [PATCH 3/7] rcu: s/boost_kthread_mutex/kthread_mutex Date: Fri, 12 Jan 2024 16:46:17 +0100 Message-Id: <20240112154621.261852-4-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240112154621.261852-1-frederic@kernel.org> References: <20240112154621.261852-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This mutex is currently protecting per node boost kthreads creation and affinity setting across CPU hotplug operations. Since the expedited kworkers will soon be split per node as well, they will be subject to the same concurrency constraints against hotplug. Therefore their creation and affinity tuning operations will be grouped with those of boost kthreads and then rely on the same mutex. To prepare for that, generalize its name. Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 2 +- kernel/rcu/tree.h | 2 +- kernel/rcu/tree_plugin.h | 10 +++++----- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 5cd90f1762e9..0c818e75e8e2 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4929,7 +4929,7 @@ static void __init rcu_init_one(void) init_waitqueue_head(&rnp->exp_wq[2]); init_waitqueue_head(&rnp->exp_wq[3]); spin_lock_init(&rnp->exp_lock); - mutex_init(&rnp->boost_kthread_mutex); + mutex_init(&rnp->kthread_mutex); raw_spin_lock_init(&rnp->exp_poll_lock); rnp->exp_seq_poll_rq = RCU_GET_STATE_COMPLETED; INIT_WORK(&rnp->exp_poll_wq, sync_rcu_do_polled_gp); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index e9821a8422db..13e7b0d907ab 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -113,7 +113,7 @@ struct rcu_node { /* side effect, not as a lock. */ unsigned long boost_time; /* When to start boosting (jiffies). */ - struct mutex boost_kthread_mutex; + struct mutex kthread_mutex; /* Exclusion for thread spawning and affinity */ /* manipulation. */ struct task_struct *boost_kthread_task; diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 41021080ad25..0d307674915c 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1195,7 +1195,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) struct sched_param sp; struct task_struct *t; - mutex_lock(&rnp->boost_kthread_mutex); + mutex_lock(&rnp->kthread_mutex); if (rnp->boost_kthread_task || !rcu_scheduler_fully_active) goto out; @@ -1212,7 +1212,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */ out: - mutex_unlock(&rnp->boost_kthread_mutex); + mutex_unlock(&rnp->kthread_mutex); } /* @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) * no outgoing CPU. If there are no CPUs left in the affinity set, * this function allows the kthread to execute on any CPU. * - * Any future concurrent calls are serialized via ->boost_kthread_mutex. + * Any future concurrent calls are serialized via ->kthread_mutex. */ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) { @@ -1237,7 +1237,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) return; if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) return; - mutex_lock(&rnp->boost_kthread_mutex); + mutex_lock(&rnp->kthread_mutex); mask = rcu_rnp_online_cpus(rnp); for_each_leaf_node_possible_cpu(rnp, cpu) if ((mask & leaf_node_cpu_bit(rnp, cpu)) && @@ -1250,7 +1250,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) cpumask_clear_cpu(outgoingcpu, cm); } set_cpus_allowed_ptr(t, cm); - mutex_unlock(&rnp->boost_kthread_mutex); + mutex_unlock(&rnp->kthread_mutex); free_cpumask_var(cm); } From patchwork Fri Jan 12 15:46:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13518588 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 053B1745DD; Fri, 12 Jan 2024 15:46:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iWqY4Itv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02C57C43390; Fri, 12 Jan 2024 15:46:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705074400; bh=eRD9p7kQfk59EpEoFx0hyM9cqrA750l1QKynzHAEjBk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iWqY4ItvYn/nlg+ThvpzoZmAGs/jIV+67i8VsPEXEae22c4PqUqPVjG5/Es9L9/cH MVnPX0UQ+TC1bA4tjGVTMrUmwFwpiwYZdvKA0sUDEGQtxhckBPOjSHiwtrV4V60d0o mIfeTXKDqQh3+F594UCw65oHTiXxJnWyJeCeWuZw19Hhjl0VIe56RRaLRtxnWfEVlN mdpzwcB0vahoysza8T6RAWQkAtsBDjoIDKTrA9z002TjOyLXLXTO9KeJpGa4fRmH7B UClUcnjoMyIuMgb/utu957OSGEkKyDqMOxkulGaffBmSgu2TI+dFADCASmJrPsLD9i 360pRstM54iNg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , rcu Subject: [PATCH 4/7] rcu/exp: Move expedited kthread worker creation functions above rcutree_prepare_cpu() Date: Fri, 12 Jan 2024 16:46:18 +0100 Message-Id: <20240112154621.261852-5-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240112154621.261852-1-frederic@kernel.org> References: <20240112154621.261852-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The expedited kthread worker performing the per node initialization is going to be split into per node kthreads. As such, the future per node kthread creation will need to be called from CPU hotplug callbacks instead of an initcall, right beside the per node boost kthread creation. To prepare for that, move the kthread worker creation above rcutree_prepare_cpu() as a first step to make the review smoother for the upcoming modifications. No intended functional change. Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 96 +++++++++++++++++++++++------------------------ 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0c818e75e8e2..8f4fc8882e93 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4405,6 +4405,54 @@ rcu_boot_init_percpu_data(int cpu) rcu_boot_init_nocb_percpu_data(rdp); } +#ifdef CONFIG_RCU_EXP_KTHREAD +struct kthread_worker *rcu_exp_gp_kworker; +struct kthread_worker *rcu_exp_par_gp_kworker; + +static void __init rcu_start_exp_gp_kworkers(void) +{ + const char *par_gp_kworker_name = "rcu_exp_par_gp_kthread_worker"; + const char *gp_kworker_name = "rcu_exp_gp_kthread_worker"; + struct sched_param param = { .sched_priority = kthread_prio }; + + rcu_exp_gp_kworker = kthread_create_worker(0, gp_kworker_name); + if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) { + pr_err("Failed to create %s!\n", gp_kworker_name); + rcu_exp_gp_kworker = NULL; + return; + } + + rcu_exp_par_gp_kworker = kthread_create_worker(0, par_gp_kworker_name); + if (IS_ERR_OR_NULL(rcu_exp_par_gp_kworker)) { + pr_err("Failed to create %s!\n", par_gp_kworker_name); + rcu_exp_par_gp_kworker = NULL; + kthread_destroy_worker(rcu_exp_gp_kworker); + rcu_exp_gp_kworker = NULL; + return; + } + + sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); + sched_setscheduler_nocheck(rcu_exp_par_gp_kworker->task, SCHED_FIFO, + ¶m); +} + +static inline void rcu_alloc_par_gp_wq(void) +{ +} +#else /* !CONFIG_RCU_EXP_KTHREAD */ +struct workqueue_struct *rcu_par_gp_wq; + +static void __init rcu_start_exp_gp_kworkers(void) +{ +} + +static inline void rcu_alloc_par_gp_wq(void) +{ + rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0); + WARN_ON(!rcu_par_gp_wq); +} +#endif /* CONFIG_RCU_EXP_KTHREAD */ + /* * Invoked early in the CPU-online process, when pretty much all services * are available. The incoming CPU is not present. @@ -4741,54 +4789,6 @@ static int rcu_pm_notify(struct notifier_block *self, return NOTIFY_OK; } -#ifdef CONFIG_RCU_EXP_KTHREAD -struct kthread_worker *rcu_exp_gp_kworker; -struct kthread_worker *rcu_exp_par_gp_kworker; - -static void __init rcu_start_exp_gp_kworkers(void) -{ - const char *par_gp_kworker_name = "rcu_exp_par_gp_kthread_worker"; - const char *gp_kworker_name = "rcu_exp_gp_kthread_worker"; - struct sched_param param = { .sched_priority = kthread_prio }; - - rcu_exp_gp_kworker = kthread_create_worker(0, gp_kworker_name); - if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) { - pr_err("Failed to create %s!\n", gp_kworker_name); - rcu_exp_gp_kworker = NULL; - return; - } - - rcu_exp_par_gp_kworker = kthread_create_worker(0, par_gp_kworker_name); - if (IS_ERR_OR_NULL(rcu_exp_par_gp_kworker)) { - pr_err("Failed to create %s!\n", par_gp_kworker_name); - rcu_exp_par_gp_kworker = NULL; - kthread_destroy_worker(rcu_exp_gp_kworker); - rcu_exp_gp_kworker = NULL; - return; - } - - sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); - sched_setscheduler_nocheck(rcu_exp_par_gp_kworker->task, SCHED_FIFO, - ¶m); -} - -static inline void rcu_alloc_par_gp_wq(void) -{ -} -#else /* !CONFIG_RCU_EXP_KTHREAD */ -struct workqueue_struct *rcu_par_gp_wq; - -static void __init rcu_start_exp_gp_kworkers(void) -{ -} - -static inline void rcu_alloc_par_gp_wq(void) -{ - rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0); - WARN_ON(!rcu_par_gp_wq); -} -#endif /* CONFIG_RCU_EXP_KTHREAD */ - /* * Spawn the kthreads that handle RCU's grace periods. */ From patchwork Fri Jan 12 15:46:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13518589 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B28F745FD; Fri, 12 Jan 2024 15:46:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="B3LfSmTl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4DF70C43390; Fri, 12 Jan 2024 15:46:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705074403; bh=1IQabjvd17B12deVrDUr1/xKUrLOgjkUshKCpo4h+Yg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=B3LfSmTlRJu9qNVlOBptMzJL5aEZLZ3OlaQSdxWnjzg7y3ayQ5ET4FUnIPEvGSZuL mnqF6MCbdpeZsZaX7XtPZyONoRN/4bW0yY6fPK+l0PhcyM+Z5jKo8pIOjUngWXKaTI 0v24jY6KX9Hf6DIJhpxipvdGow8jkn1IZSQB19Qevn9mz0S4PJhRCBZQN3TArUpBOL AU9hpr2dN+/jRlHH/A139XCZwvKPX9RxZ8uRi9bpAujfOWsQNfXWdguVHVSV80fhyz IAph+yPXe2Pc1M2cQfkGlT0p2W7O7bvECk5oqE6jlDVqU0hBMKFN7uz+ST3Gm8Y8CY g3ENoYMFadDuQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , rcu Subject: [PATCH 5/7] rcu/exp: Make parallel exp gp kworker per rcu node Date: Fri, 12 Jan 2024 16:46:19 +0100 Message-Id: <20240112154621.261852-6-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240112154621.261852-1-frederic@kernel.org> References: <20240112154621.261852-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node initialization is performed in parallel via workqueues (one work per node). However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is performed by a single kworker serializing each node initialization (one work for all nodes). The second part is certainly less scalable and efficient beyond a single leaf node. To improve this, expand this single kworker into per-node kworkers. This new layout is eventually intended to remove the workqueues based implementation since it will essentially now become duplicate code. Signed-off-by: Frederic Weisbecker --- kernel/rcu/rcu.h | 1 - kernel/rcu/tree.c | 65 ++++++++++++++++++++++++++-------------- kernel/rcu/tree.h | 3 ++ kernel/rcu/tree_exp.h | 10 +++---- kernel/rcu/tree_plugin.h | 10 ++----- 5 files changed, 54 insertions(+), 35 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index dcfb666f2499..4bc8cd6d461e 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -625,7 +625,6 @@ void rcu_force_quiescent_state(void); extern struct workqueue_struct *rcu_gp_wq; #ifdef CONFIG_RCU_EXP_KTHREAD extern struct kthread_worker *rcu_exp_gp_kworker; -extern struct kthread_worker *rcu_exp_par_gp_kworker; #else /* !CONFIG_RCU_EXP_KTHREAD */ extern struct workqueue_struct *rcu_par_gp_wq; #endif /* CONFIG_RCU_EXP_KTHREAD */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8f4fc8882e93..af6dd4e0be48 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4407,33 +4407,39 @@ rcu_boot_init_percpu_data(int cpu) #ifdef CONFIG_RCU_EXP_KTHREAD struct kthread_worker *rcu_exp_gp_kworker; -struct kthread_worker *rcu_exp_par_gp_kworker; -static void __init rcu_start_exp_gp_kworkers(void) +static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) { - const char *par_gp_kworker_name = "rcu_exp_par_gp_kthread_worker"; - const char *gp_kworker_name = "rcu_exp_gp_kthread_worker"; + struct kthread_worker *kworker; + const char *name = "rcu_exp_par_gp_kthread_worker/%d"; struct sched_param param = { .sched_priority = kthread_prio }; + int rnp_index = rnp - rcu_get_root(); - rcu_exp_gp_kworker = kthread_create_worker(0, gp_kworker_name); + if (rnp->exp_kworker) + return; + + kworker = kthread_create_worker(0, name, rnp_index); + if (IS_ERR_OR_NULL(kworker)) { + pr_err("Failed to create par gp kworker on %d/%d\n", + rnp->grplo, rnp->grphi); + return; + } + WRITE_ONCE(rnp->exp_kworker, kworker); + sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, ¶m); +} + +static void __init rcu_start_exp_gp_kworker(void) +{ + const char *name = "rcu_exp_gp_kthread_worker"; + struct sched_param param = { .sched_priority = kthread_prio }; + + rcu_exp_gp_kworker = kthread_create_worker(0, name); if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) { - pr_err("Failed to create %s!\n", gp_kworker_name); + pr_err("Failed to create %s!\n", name); rcu_exp_gp_kworker = NULL; return; } - - rcu_exp_par_gp_kworker = kthread_create_worker(0, par_gp_kworker_name); - if (IS_ERR_OR_NULL(rcu_exp_par_gp_kworker)) { - pr_err("Failed to create %s!\n", par_gp_kworker_name); - rcu_exp_par_gp_kworker = NULL; - kthread_destroy_worker(rcu_exp_gp_kworker); - rcu_exp_gp_kworker = NULL; - return; - } - sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); - sched_setscheduler_nocheck(rcu_exp_par_gp_kworker->task, SCHED_FIFO, - ¶m); } static inline void rcu_alloc_par_gp_wq(void) @@ -4442,7 +4448,11 @@ static inline void rcu_alloc_par_gp_wq(void) #else /* !CONFIG_RCU_EXP_KTHREAD */ struct workqueue_struct *rcu_par_gp_wq; -static void __init rcu_start_exp_gp_kworkers(void) +static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) +{ +} + +static void __init rcu_start_exp_gp_kworker(void) { } @@ -4453,6 +4463,17 @@ static inline void rcu_alloc_par_gp_wq(void) } #endif /* CONFIG_RCU_EXP_KTHREAD */ +static void rcu_spawn_rnp_kthreads(struct rcu_node *rnp) +{ + if ((IS_ENABLED(CONFIG_RCU_EXP_KTHREAD) || + IS_ENABLED(CONFIG_RCU_BOOST)) && rcu_scheduler_fully_active) { + mutex_lock(&rnp->kthread_mutex); + rcu_spawn_one_boost_kthread(rnp); + rcu_spawn_exp_par_gp_kworker(rnp); + mutex_unlock(&rnp->kthread_mutex); + } +} + /* * Invoked early in the CPU-online process, when pretty much all services * are available. The incoming CPU is not present. @@ -4501,7 +4522,7 @@ int rcutree_prepare_cpu(unsigned int cpu) rdp->rcu_iw_gp_seq = rdp->gp_seq - 1; trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl")); raw_spin_unlock_irqrestore_rcu_node(rnp, flags); - rcu_spawn_one_boost_kthread(rnp); + rcu_spawn_rnp_kthreads(rnp); rcu_spawn_cpu_nocb_kthread(cpu); WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1); @@ -4823,10 +4844,10 @@ static int __init rcu_spawn_gp_kthread(void) * due to rcu_scheduler_fully_active. */ rcu_spawn_cpu_nocb_kthread(smp_processor_id()); - rcu_spawn_one_boost_kthread(rdp->mynode); + rcu_spawn_rnp_kthreads(rdp->mynode); rcu_spawn_core_kthreads(); /* Create kthread worker for expedited GPs */ - rcu_start_exp_gp_kworkers(); + rcu_start_exp_gp_kworker(); return 0; } early_initcall(rcu_spawn_gp_kthread); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 13e7b0d907ab..e173808f486f 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -72,6 +72,9 @@ struct rcu_node { /* Online CPUs for next expedited GP. */ /* Any CPU that has ever been online will */ /* have its bit set. */ + struct kthread_worker *exp_kworker; + /* Workers performing per node expedited GP */ + /* initialization. */ unsigned long cbovldmask; /* CPUs experiencing callback overload. */ unsigned long ffmask; /* Fully functional CPUs. */ diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 6123a60d9a4d..0318a8a062d5 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -432,9 +432,9 @@ static inline bool rcu_exp_worker_started(void) return !!READ_ONCE(rcu_exp_gp_kworker); } -static inline bool rcu_exp_par_worker_started(void) +static inline bool rcu_exp_par_worker_started(struct rcu_node *rnp) { - return !!READ_ONCE(rcu_exp_par_gp_kworker); + return !!READ_ONCE(rnp->exp_kworker); } static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp) @@ -445,7 +445,7 @@ static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp) * another work item on the same kthread worker can result in * deadlock. */ - kthread_queue_work(rcu_exp_par_gp_kworker, &rnp->rew.rew_work); + kthread_queue_work(READ_ONCE(rnp->exp_kworker), &rnp->rew.rew_work); } static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp) @@ -487,7 +487,7 @@ static inline bool rcu_exp_worker_started(void) return !!READ_ONCE(rcu_gp_wq); } -static inline bool rcu_exp_par_worker_started(void) +static inline bool rcu_exp_par_worker_started(struct rcu_node *rnp) { return !!READ_ONCE(rcu_par_gp_wq); } @@ -550,7 +550,7 @@ static void sync_rcu_exp_select_cpus(void) rnp->exp_need_flush = false; if (!READ_ONCE(rnp->expmask)) continue; /* Avoid early boot non-existent wq. */ - if (!rcu_exp_par_worker_started() || + if (!rcu_exp_par_worker_started(rnp) || rcu_scheduler_active != RCU_SCHEDULER_RUNNING || rcu_is_last_leaf_node(rnp)) { /* No worker started yet or last leaf, do direct call. */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 0d307674915c..09bdd36ca9ff 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1195,14 +1195,13 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) struct sched_param sp; struct task_struct *t; - mutex_lock(&rnp->kthread_mutex); - if (rnp->boost_kthread_task || !rcu_scheduler_fully_active) - goto out; + if (rnp->boost_kthread_task) + return; t = kthread_create(rcu_boost_kthread, (void *)rnp, "rcub/%d", rnp_index); if (WARN_ON_ONCE(IS_ERR(t))) - goto out; + return; raw_spin_lock_irqsave_rcu_node(rnp, flags); rnp->boost_kthread_task = t; @@ -1210,9 +1209,6 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) sp.sched_priority = kthread_prio; sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */ - - out: - mutex_unlock(&rnp->kthread_mutex); } /* From patchwork Fri Jan 12 15:46:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13518590 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13CFF74E1B; Fri, 12 Jan 2024 15:46:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tY1pMKUv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 95A49C433F1; Fri, 12 Jan 2024 15:46:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705074405; bh=W7d1v2FEbfmQrL73SdAC9c2Lc8l9OAodYD1knjYh+P4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tY1pMKUvqPs4oTm7xBPHiPsvQ2a569yt4ovw2IxwZDbiF4LTJmkCx0DcjDP4H8lJ4 AKX7YqSCLH3V5SdLMJrVJhUoMrsgZ9SDB8ndMj8SQ2huSe1WSC3c6FJnpz4+MumGNs EECSZkOOFlv1ErfuABj0T8MDwFL7zJ0UCHmNStCelDdktnXPSmuPp/Murka0QHO0Bs h45+ghD5v/5hNzg8zLDB1dFZsvApDDTvxngIPNxBjnqIIs75qD3Ap7ApxmKKj81no3 RH1m5R0x7FBvCKNXkv6YnnPSZL6CA5pK9NUDZ9OTCEIgWSUhI+Retrwt+zaTncornM ET2C1lKttjnfw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , rcu Subject: [PATCH 6/7] rcu/exp: Handle parallel exp gp kworkers affinity Date: Fri, 12 Jan 2024 16:46:20 +0100 Message-Id: <20240112154621.261852-7-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240112154621.261852-1-frederic@kernel.org> References: <20240112154621.261852-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Affine the parallel expedited gp kworkers to their respective RCU node in order to make them close to the cache their are playing with. This reuses the boost kthreads machinery that probe into CPU hotplug operations such that the kthreads become/stay affine to their respective node as soon/long as they contain online CPUs. Otherwise and if the current CPU going down was the last online on the leaf node, the related kthread is affine to the housekeeping CPUs. In the long run, this affinity VS CPU hotplug operation game should probably be implemented at the generic kthread level. Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 79 +++++++++++++++++++++++++++++++++++++--- kernel/rcu/tree_plugin.h | 42 ++------------------- 2 files changed, 78 insertions(+), 43 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index af6dd4e0be48..10fc2551d160 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -145,7 +145,7 @@ static int rcu_scheduler_fully_active __read_mostly; static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp, unsigned long gps, unsigned long flags); -static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu); +static struct task_struct *rcu_boost_task(struct rcu_node *rnp); static void invoke_rcu_core(void); static void rcu_report_exp_rdp(struct rcu_data *rdp); static void sync_sched_exp_online_cleanup(int cpu); @@ -4428,6 +4428,16 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, ¶m); } +static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp) +{ + struct kthread_worker *kworker = READ_ONCE(rnp->exp_kworker); + + if (!kworker) + return NULL; + + return kworker->task; +} + static void __init rcu_start_exp_gp_kworker(void) { const char *name = "rcu_exp_gp_kthread_worker"; @@ -4452,6 +4462,11 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) { } +static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp) +{ + return NULL; +} + static void __init rcu_start_exp_gp_kworker(void) { } @@ -4530,13 +4545,67 @@ int rcutree_prepare_cpu(unsigned int cpu) } /* - * Update RCU priority boot kthread affinity for CPU-hotplug changes. + * Update kthreads affinity during CPU-hotplug changes. + * + * Set the per-rcu_node kthread's affinity to cover all CPUs that are + * served by the rcu_node in question. The CPU hotplug lock is still + * held, so the value of rnp->qsmaskinit will be stable. + * + * We don't include outgoingcpu in the affinity set, use -1 if there is + * no outgoing CPU. If there are no CPUs left in the affinity set, + * this function allows the kthread to execute on any CPU. + * + * Any future concurrent calls are serialized via ->kthread_mutex. */ -static void rcutree_affinity_setting(unsigned int cpu, int outgoing) +static void rcutree_affinity_setting(unsigned int cpu, int outgoingcpu) { - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + cpumask_var_t cm; + unsigned long mask; + struct rcu_data *rdp; + struct rcu_node *rnp; + struct task_struct *task_boost, *task_exp; - rcu_boost_kthread_setaffinity(rdp->mynode, outgoing); + if (!IS_ENABLED(CONFIG_RCU_EXP_KTHREAD) && !IS_ENABLED(CONFIG_RCU_BOOST)) + return; + + rdp = per_cpu_ptr(&rcu_data, cpu); + rnp = rdp->mynode; + + task_boost = rcu_boost_task(rnp); + task_exp = rcu_exp_par_gp_task(rnp); + + /* + * If CPU is the boot one, those tasks are created later from early + * initcall since kthreadd must be created first. + */ + if (!task_boost && !task_exp) + return; + + if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) + return; + + mutex_lock(&rnp->kthread_mutex); + mask = rcu_rnp_online_cpus(rnp); + for_each_leaf_node_possible_cpu(rnp, cpu) + if ((mask & leaf_node_cpu_bit(rnp, cpu)) && + cpu != outgoingcpu) + cpumask_set_cpu(cpu, cm); + cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU)); + if (cpumask_empty(cm)) { + cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU)); + if (outgoingcpu >= 0) + cpumask_clear_cpu(outgoingcpu, cm); + } + + if (task_exp) + set_cpus_allowed_ptr(task_exp, cm); + + if (task_boost) + set_cpus_allowed_ptr(task_boost, cm); + + mutex_unlock(&rnp->kthread_mutex); + + free_cpumask_var(cm); } /* diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 09bdd36ca9ff..08246cca663f 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1211,43 +1211,9 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */ } -/* - * Set the per-rcu_node kthread's affinity to cover all CPUs that are - * served by the rcu_node in question. The CPU hotplug lock is still - * held, so the value of rnp->qsmaskinit will be stable. - * - * We don't include outgoingcpu in the affinity set, use -1 if there is - * no outgoing CPU. If there are no CPUs left in the affinity set, - * this function allows the kthread to execute on any CPU. - * - * Any future concurrent calls are serialized via ->kthread_mutex. - */ -static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) +static struct task_struct *rcu_boost_task(struct rcu_node *rnp) { - struct task_struct *t = rnp->boost_kthread_task; - unsigned long mask; - cpumask_var_t cm; - int cpu; - - if (!t) - return; - if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) - return; - mutex_lock(&rnp->kthread_mutex); - mask = rcu_rnp_online_cpus(rnp); - for_each_leaf_node_possible_cpu(rnp, cpu) - if ((mask & leaf_node_cpu_bit(rnp, cpu)) && - cpu != outgoingcpu) - cpumask_set_cpu(cpu, cm); - cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU)); - if (cpumask_empty(cm)) { - cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU)); - if (outgoingcpu >= 0) - cpumask_clear_cpu(outgoingcpu, cm); - } - set_cpus_allowed_ptr(t, cm); - mutex_unlock(&rnp->kthread_mutex); - free_cpumask_var(cm); + return READ_ONCE(rnp->boost_kthread_task); } #else /* #ifdef CONFIG_RCU_BOOST */ @@ -1266,10 +1232,10 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) { } -static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) +static struct task_struct * rcu_boost_task(struct rcu_node *rnp) { + return NULL; } - #endif /* #else #ifdef CONFIG_RCU_BOOST */ /* From patchwork Fri Jan 12 15:46:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13518591 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9709768E0; Fri, 12 Jan 2024 15:46:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aO/cYc46" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DCAA9C43399; Fri, 12 Jan 2024 15:46:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705074408; bh=ghE3NvV6Osi8mVKGPqrvdRHt1SQMGOctvMN/yTIyii8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aO/cYc46DIMH3mlv7H3k3jWKxDvCCCttXDIa+56/NUh/ybmPJ0lMTmI/VO+ziEBHK xFrxNxtFKQ2S0hr9f31ZYD4S9NviEMqUVtjmBT9n0I+qQBt1L8T46YWtgZbUEKS9NF 8EbPE9hpvs265Cqa+mq9q5EbXRc1Uw7WK4HCywWm3eBwXUBR1u8tI1RiwTkc/tNOi0 Vk6cATHsIKHWJ2nsXxNHiDEvTcqY9bCrdqi/tpe6CkiUHuCWa/6BItjDe8YgSJwmJc +d2OdEXmySVgUwog8t+p91l/NFwOecEXeVIE5XQlO4z6+U/cdaMKdGLVyZwrM69Wyh l5JhhgIz4bIag== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , rcu , Anna-Maria Behnsen , Thomas Gleixner , Neeraj upadhyay Subject: [PATCH 7/7] rcu/exp: Remove rcu_par_gp_wq Date: Fri, 12 Jan 2024 16:46:21 +0100 Message-Id: <20240112154621.261852-8-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240112154621.261852-1-frederic@kernel.org> References: <20240112154621.261852-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 TREE04 running on short iterations can produce writer stalls of the following kind: ??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0 task:rcu_torture_wri state:D stack:14568 pid:83 ppid:2 flags:0x00004000 Call Trace: __schedule+0x2de/0x850 ? trace_event_raw_event_rcu_exp_funnel_lock+0x6d/0xb0 schedule+0x4f/0x90 synchronize_rcu_expedited+0x430/0x670 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_synchronize_rcu_expedited+0x10/0x10 do_rtws_sync.constprop.0+0xde/0x230 rcu_torture_writer+0x4b4/0xcd0 ? __pfx_rcu_torture_writer+0x10/0x10 kthread+0xc7/0xf0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 Waiting for an expedited grace period and polling for an expedited grace period both are operations that internally rely on the same workqueue performing necessary asynchronous work. However, a dependency chain is involved between those two operations, as depicted below: ====== CPU 0 ======= ====== CPU 1 ======= synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); start_poll_synchronize_rcu_expedited queue_work(rcu_gp_wq, &rnp->exp_poll_wq); synchronize_rcu_expedited_queue_work() queue_work(rcu_gp_wq, &rew->rew_work); wait_event() // A, wait for &rew->rew_work completion mutex_unlock() // B //======> switch to kworker sync_rcu_do_polled_gp() { synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); // C, wait B .... } // D Since workqueues are usually implemented on top of several kworkers handling the queue concurrently, the above situation wouldn't deadlock most of the time because A then doesn't depend on D. But in case of memory stress, a single kworker may end up handling alone all the works in a serialized way. In that case the above layout becomes a problem because A then waits for D, closing a circular dependency: A -> D -> C -> B -> A This however only happens when CONFIG_RCU_EXP_KTHREAD=n. Indeed synchronize_rcu_expedited() is otherwise implemented on top of a kthread worker while polling still relies on rcu_gp_wq workqueue, breaking the above circular dependency chain. Fix this with making expedited grace period to always rely on kthread worker. The workqueue based implementation is essentially a duplicate anyway now that the per-node initialization is performed by per-node kthread workers. Meanwhile the CONFIG_RCU_EXP_KTHREAD switch is still kept around to manage the scheduler policy of these kthread workers. Reported-by: Anna-Maria Behnsen Reported-by: Thomas Gleixner Suggested-by: Joel Fernandes Suggested-by: Paul E. McKenney Suggested-by: Neeraj upadhyay Signed-off-by: Frederic Weisbecker --- kernel/rcu/rcu.h | 4 --- kernel/rcu/tree.c | 40 ++++-------------------- kernel/rcu/tree.h | 6 +--- kernel/rcu/tree_exp.h | 73 +------------------------------------------ 4 files changed, 8 insertions(+), 115 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 4bc8cd6d461e..4e65a92e528e 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -623,11 +623,7 @@ int rcu_get_gp_kthreads_prio(void); void rcu_fwd_progress_check(unsigned long j); void rcu_force_quiescent_state(void); extern struct workqueue_struct *rcu_gp_wq; -#ifdef CONFIG_RCU_EXP_KTHREAD extern struct kthread_worker *rcu_exp_gp_kworker; -#else /* !CONFIG_RCU_EXP_KTHREAD */ -extern struct workqueue_struct *rcu_par_gp_wq; -#endif /* CONFIG_RCU_EXP_KTHREAD */ void rcu_gp_slow_register(atomic_t *rgssp); void rcu_gp_slow_unregister(atomic_t *rgssp); #endif /* #else #ifdef CONFIG_TINY_RCU */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 10fc2551d160..944e55085262 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4405,7 +4405,6 @@ rcu_boot_init_percpu_data(int cpu) rcu_boot_init_nocb_percpu_data(rdp); } -#ifdef CONFIG_RCU_EXP_KTHREAD struct kthread_worker *rcu_exp_gp_kworker; static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) @@ -4425,7 +4424,9 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) return; } WRITE_ONCE(rnp->exp_kworker, kworker); - sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, ¶m); + + if (IS_ENABLED(CONFIG_RCU_EXP_KTHREAD)) + sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, ¶m); } static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp) @@ -4449,39 +4450,14 @@ static void __init rcu_start_exp_gp_kworker(void) rcu_exp_gp_kworker = NULL; return; } - sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); -} - -static inline void rcu_alloc_par_gp_wq(void) -{ -} -#else /* !CONFIG_RCU_EXP_KTHREAD */ -struct workqueue_struct *rcu_par_gp_wq; - -static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) -{ -} - -static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp) -{ - return NULL; -} - -static void __init rcu_start_exp_gp_kworker(void) -{ -} -static inline void rcu_alloc_par_gp_wq(void) -{ - rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0); - WARN_ON(!rcu_par_gp_wq); + if (IS_ENABLED(CONFIG_RCU_EXP_KTHREAD)) + sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); } -#endif /* CONFIG_RCU_EXP_KTHREAD */ static void rcu_spawn_rnp_kthreads(struct rcu_node *rnp) { - if ((IS_ENABLED(CONFIG_RCU_EXP_KTHREAD) || - IS_ENABLED(CONFIG_RCU_BOOST)) && rcu_scheduler_fully_active) { + if (rcu_scheduler_fully_active) { mutex_lock(&rnp->kthread_mutex); rcu_spawn_one_boost_kthread(rnp); rcu_spawn_exp_par_gp_kworker(rnp); @@ -4565,9 +4541,6 @@ static void rcutree_affinity_setting(unsigned int cpu, int outgoingcpu) struct rcu_node *rnp; struct task_struct *task_boost, *task_exp; - if (!IS_ENABLED(CONFIG_RCU_EXP_KTHREAD) && !IS_ENABLED(CONFIG_RCU_BOOST)) - return; - rdp = per_cpu_ptr(&rcu_data, cpu); rnp = rdp->mynode; @@ -5256,7 +5229,6 @@ void __init rcu_init(void) /* Create workqueue for Tree SRCU and for expedited GPs. */ rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0); WARN_ON(!rcu_gp_wq); - rcu_alloc_par_gp_wq(); /* Fill in default value for rcutree.qovld boot parameter. */ /* -After- the rcu_node ->lock fields are initialized! */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index e173808f486f..f35e47f24d80 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -21,14 +21,10 @@ #include "rcu_segcblist.h" -/* Communicate arguments to a workqueue handler. */ +/* Communicate arguments to a kthread worker handler. */ struct rcu_exp_work { unsigned long rew_s; -#ifdef CONFIG_RCU_EXP_KTHREAD struct kthread_work rew_work; -#else - struct work_struct rew_work; -#endif /* CONFIG_RCU_EXP_KTHREAD */ }; /* RCU's kthread states for tracing. */ diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 0318a8a062d5..6b83537480b1 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -418,7 +418,6 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp) static void rcu_exp_sel_wait_wake(unsigned long s); -#ifdef CONFIG_RCU_EXP_KTHREAD static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp) { struct rcu_exp_work *rewp = @@ -470,69 +469,6 @@ static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew kthread_queue_work(rcu_exp_gp_kworker, &rew->rew_work); } -static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew) -{ -} -#else /* !CONFIG_RCU_EXP_KTHREAD */ -static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) -{ - struct rcu_exp_work *rewp = - container_of(wp, struct rcu_exp_work, rew_work); - - __sync_rcu_exp_select_node_cpus(rewp); -} - -static inline bool rcu_exp_worker_started(void) -{ - return !!READ_ONCE(rcu_gp_wq); -} - -static inline bool rcu_exp_par_worker_started(struct rcu_node *rnp) -{ - return !!READ_ONCE(rcu_par_gp_wq); -} - -static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp) -{ - int cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1); - - INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus); - /* If all offline, queue the work on an unbound CPU. */ - if (unlikely(cpu > rnp->grphi - rnp->grplo)) - cpu = WORK_CPU_UNBOUND; - else - cpu += rnp->grplo; - queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work); -} - -static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp) -{ - flush_work(&rnp->rew.rew_work); -} - -/* - * Work-queue handler to drive an expedited grace period forward. - */ -static void wait_rcu_exp_gp(struct work_struct *wp) -{ - struct rcu_exp_work *rewp; - - rewp = container_of(wp, struct rcu_exp_work, rew_work); - rcu_exp_sel_wait_wake(rewp->rew_s); -} - -static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew) -{ - INIT_WORK_ONSTACK(&rew->rew_work, wait_rcu_exp_gp); - queue_work(rcu_gp_wq, &rew->rew_work); -} - -static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew) -{ - destroy_work_on_stack(&rew->rew_work); -} -#endif /* CONFIG_RCU_EXP_KTHREAD */ - /* * Select the nodes that the upcoming expedited grace period needs * to wait for. @@ -965,7 +901,6 @@ static void rcu_exp_print_detail_task_stall_rnp(struct rcu_node *rnp) */ void synchronize_rcu_expedited(void) { - bool use_worker; unsigned long flags; struct rcu_exp_work rew; struct rcu_node *rnp; @@ -976,9 +911,6 @@ void synchronize_rcu_expedited(void) lock_is_held(&rcu_sched_lock_map), "Illegal synchronize_rcu_expedited() in RCU read-side critical section"); - use_worker = (rcu_scheduler_active != RCU_SCHEDULER_INIT) && - rcu_exp_worker_started(); - /* Is the state is such that the call is a grace period? */ if (rcu_blocking_is_gp()) { // Note well that this code runs with !PREEMPT && !SMP. @@ -1008,7 +940,7 @@ void synchronize_rcu_expedited(void) return; /* Someone else did our work for us. */ /* Ensure that load happens before action based on it. */ - if (unlikely(!use_worker)) { + if (unlikely((rcu_scheduler_active == RCU_SCHEDULER_INIT) || !rcu_exp_worker_started())) { /* Direct call during scheduler init and early_initcalls(). */ rcu_exp_sel_wait_wake(s); } else { @@ -1025,9 +957,6 @@ void synchronize_rcu_expedited(void) /* Let the next expedited grace period start. */ mutex_unlock(&rcu_state.exp_mutex); - - if (likely(use_worker)) - synchronize_rcu_expedited_destroy_work(&rew); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);