From patchwork Mon Jun 20 22:44:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 12888342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1554C433EF for ; Mon, 20 Jun 2022 22:45:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244400AbiFTWpI (ORCPT ); Mon, 20 Jun 2022 18:45:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244259AbiFTWpH (ORCPT ); Mon, 20 Jun 2022 18:45:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DD661A05B; Mon, 20 Jun 2022 15:45:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D02E961019; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3862CC341C4; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655765105; bh=PcdJtet5VwwATXE0ihwz9dBO3c9x1VGVtMFNGXXF/wI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NptLNFLyJvcCYODs7JhifyUV6A4wMxNruwVmxsPwa1Khj8mcpKSVY9ADSE30KKhZi c7wUqpK3T93EmXq1HY8rhuXsuQ108Jhws6eWPYcVxJhkNBxXug47legO2Zz0cfAhLp 37/RXTYOaF1dd2ZEZj0b6EukvH2TSKqdiNxZn//HMDq1ubFlnbboX3FiTu7vp0mtPR ZsTdm9BxBplPIAyJC/0BZKLZn1MAqn0dxKp0qND49AZzqQ778lvLZdceHyjrcJWm5t ryeeMq/DAtZO2X+i7kd8gNiGmQzKK5jXd3O3exMmYWDNHc5dkqnz1pDgjsmatzxzRo bDU9okORea1OA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E680D5C05B9; Mon, 20 Jun 2022 15:45:04 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, Frederic Weisbecker , Neeraj Upadhyay , Boqun Feng , Uladzislau Rezki , Joel Fernandes , Zqiang , "Paul E . McKenney" Subject: [PATCH rcu 1/7] rcu/nocb: Add/del rdp to iterate from rcuog itself Date: Mon, 20 Jun 2022 15:44:57 -0700 Message-Id: <20220620224503.3841196-1-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> References: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Frederic Weisbecker NOCB rdp's are part of a group whose list is iterated by the corresponding rdp leader. This list is RCU traversed because an rdp can be either added or deleted concurrently. Upon addition, a new iteration to the list after a synchronization point (a pair of LOCK/UNLOCK ->nocb_gp_lock) is forced to make sure: 1) we didn't miss a new element added in the middle of an iteration 2) we didn't ignore a whole subset of the list due to an element being quickly deleted and then re-added. 3) we prevent from probably other surprises... Although this layout is expected to be safe, it doesn't help anybody to sleep well. Simplify instead the nocb state toggling with moving the list modification from the nocb (de-)offloading workqueue to the rcuog kthreads instead. Whenever the rdp leader is expected to (re-)set the SEGCBLIST_KTHREAD_GP flag of a target rdp, the latter is queued so that the leader handles the flag flip along with adding or deleting the target rdp to the list to iterate. This way the list modification and iteration happen from the same kthread and those operations can't race altogether. As a bonus, the flags for each rdp don't need to be checked locklessly before each iteration, which is one less opportunity to produce nightmares. Signed-off-by: Frederic Weisbecker Cc: Neeraj Upadhyay Cc: Boqun Feng Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Zqiang Signed-off-by: Paul E. McKenney Reviewed-by: Neeraj Upadhyay --- kernel/rcu/tree.h | 1 + kernel/rcu/tree_nocb.h | 138 +++++++++++++++++++++-------------------- 2 files changed, 71 insertions(+), 68 deletions(-) diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 2ccf5845957df..4f8532c33558f 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -235,6 +235,7 @@ struct rcu_data { * if rdp_gp. */ struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */ + struct rcu_data *nocb_toggling_rdp; /* rdp queued for (de-)offloading */ /* The following fields are used by CB kthread, hence new cacheline. */ struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp; diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 46694e13398a3..dac74952e1d1b 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -546,52 +546,43 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, } } -/* - * Check if we ignore this rdp. - * - * We check that without holding the nocb lock but - * we make sure not to miss a freshly offloaded rdp - * with the current ordering: - * - * rdp_offload_toggle() nocb_gp_enabled_cb() - * ------------------------- ---------------------------- - * WRITE flags LOCK nocb_gp_lock - * LOCK nocb_gp_lock READ/WRITE nocb_gp_sleep - * READ/WRITE nocb_gp_sleep UNLOCK nocb_gp_lock - * UNLOCK nocb_gp_lock READ flags - */ -static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp) -{ - u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP; - - return rcu_segcblist_test_flags(&rdp->cblist, flags); -} - -static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp, - bool *needwake_state) +static int nocb_gp_toggle_rdp(struct rcu_data *rdp, + bool *wake_state) { struct rcu_segcblist *cblist = &rdp->cblist; + unsigned long flags; + int ret; - if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) { - if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) { - rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP); - if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) - *needwake_state = true; - } - return false; + rcu_nocb_lock_irqsave(rdp, flags); + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) && + !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) { + /* + * Offloading. Set our flag and notify the offload worker. + * We will handle this rdp until it ever gets de-offloaded. + */ + rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP); + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) + *wake_state = true; + ret = 1; + } else if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) && + rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) { + /* + * De-offloading. Clear our flag and notify the de-offload worker. + * We will ignore this rdp until it ever gets re-offloaded. + */ + rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP); + if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) + *wake_state = true; + ret = 0; + } else { + WARN_ON_ONCE(1); + ret = -1; } - /* - * De-offloading. Clear our flag and notify the de-offload worker. - * We will ignore this rdp until it ever gets re-offloaded. - */ - WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)); - rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP); - if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) - *needwake_state = true; - return true; -} + rcu_nocb_unlock_irqrestore(rdp, flags); + return ret; +} /* * No-CBs GP kthreads come here to wait for additional callbacks to show up @@ -609,7 +600,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) bool needwait_gp = false; // This prevents actual uninitialized use. bool needwake; bool needwake_gp; - struct rcu_data *rdp; + struct rcu_data *rdp, *rdp_toggling = NULL; struct rcu_node *rnp; unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning. bool wasempty = false; @@ -634,19 +625,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) * is added to the list, so the skipped-over rcu_data structures * won't be ignored for long. */ - list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) { - bool needwake_state = false; - - if (!nocb_gp_enabled_cb(rdp)) - continue; + list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) { trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check")); rcu_nocb_lock_irqsave(rdp, flags); - if (nocb_gp_update_state_deoffloading(rdp, &needwake_state)) { - rcu_nocb_unlock_irqrestore(rdp, flags); - if (needwake_state) - swake_up_one(&rdp->nocb_state_wq); - continue; - } + lockdep_assert_held(&rdp->nocb_lock); bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass); if (bypass_ncbs && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) || @@ -656,8 +638,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass); } else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) { rcu_nocb_unlock_irqrestore(rdp, flags); - if (needwake_state) - swake_up_one(&rdp->nocb_state_wq); continue; /* No callbacks here, try next. */ } if (bypass_ncbs) { @@ -705,8 +685,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) } if (needwake_gp) rcu_gp_kthread_wake(); - if (needwake_state) - swake_up_one(&rdp->nocb_state_wq); } my_rdp->nocb_gp_bypass = bypass; @@ -739,15 +717,49 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) !READ_ONCE(my_rdp->nocb_gp_sleep)); trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("EndWait")); } + if (!rcu_nocb_poll) { raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); + // (De-)queue an rdp to/from the group if its nocb state is changing + rdp_toggling = my_rdp->nocb_toggling_rdp; + if (rdp_toggling) + my_rdp->nocb_toggling_rdp = NULL; + if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) { WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); del_timer(&my_rdp->nocb_timer); } WRITE_ONCE(my_rdp->nocb_gp_sleep, true); raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); + } else { + rdp_toggling = READ_ONCE(my_rdp->nocb_toggling_rdp); + if (rdp_toggling) { + /* + * Paranoid locking to make sure nocb_toggling_rdp is well + * reset *before* we (re)set SEGCBLIST_KTHREAD_GP or we could + * race with another round of nocb toggling for this rdp. + * Nocb locking should prevent from that already but we stick + * to paranoia, especially in rare path. + */ + raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); + my_rdp->nocb_toggling_rdp = NULL; + raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); + } + } + + if (rdp_toggling) { + bool wake_state = false; + int ret; + + ret = nocb_gp_toggle_rdp(rdp_toggling, &wake_state); + if (ret == 1) + list_add_tail(&rdp_toggling->nocb_entry_rdp, &my_rdp->nocb_head_rdp); + else if (ret == 0) + list_del(&rdp_toggling->nocb_entry_rdp); + if (wake_state) + swake_up_one(&rdp_toggling->nocb_state_wq); } + my_rdp->nocb_gp_seq = -1; WARN_ON(signal_pending(current)); } @@ -966,6 +978,8 @@ static int rdp_offload_toggle(struct rcu_data *rdp, swake_up_one(&rdp->nocb_cb_wq); raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); + // Queue this rdp for add/del to/from the list to iterate on rcuog + WRITE_ONCE(rdp_gp->nocb_toggling_rdp, rdp); if (rdp_gp->nocb_gp_sleep) { rdp_gp->nocb_gp_sleep = false; wake_gp = true; @@ -1013,8 +1027,6 @@ static long rcu_nocb_rdp_deoffload(void *arg) swait_event_exclusive(rdp->nocb_state_wq, !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP)); - /* Stop nocb_gp_wait() from iterating over this structure. */ - list_del_rcu(&rdp->nocb_entry_rdp); /* * Lock one last time to acquire latest callback updates from kthreads * so we can later handle callbacks locally without locking. @@ -1079,16 +1091,6 @@ static long rcu_nocb_rdp_offload(void *arg) pr_info("Offloading %d\n", rdp->cpu); - /* - * Cause future nocb_gp_wait() invocations to iterate over - * structure, resetting ->nocb_gp_sleep and waking up the related - * "rcuog". Since nocb_gp_wait() in turn locks ->nocb_gp_lock - * before setting ->nocb_gp_sleep again, we are guaranteed to - * iterate this newly added structure before "rcuog" goes to - * sleep again. - */ - list_add_tail_rcu(&rdp->nocb_entry_rdp, &rdp->nocb_gp_rdp->nocb_head_rdp); - /* * Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING * is set. From patchwork Mon Jun 20 22:44:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 12888341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E135C43334 for ; Mon, 20 Jun 2022 22:45:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244260AbiFTWpH (ORCPT ); Mon, 20 Jun 2022 18:45:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244303AbiFTWpH (ORCPT ); Mon, 20 Jun 2022 18:45:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4674014022; Mon, 20 Jun 2022 15:45:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D4C866137C; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3ADD3C341C5; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655765105; bh=J/mJiAf5BiNxtLCxXf+imq/vEjNCEVMz8bfuRKWJtpg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pVgRaCDO9Jrh+30vuXL5TFNTmAzz0mYEbk0girSTVzUOWTuuhvAnw08sAkbtxUFkt bwLW5H7tG2b2NQF8jHls7oLlq+Nf1x6a00skS3xRGeHn7vqAOIiNtrGVzdbcHYWFEt ra0SYmO4z+9cL4RVX9Oi9YFk4uzxKz0UwrtstPiF2jv9Ep4omLSAC/P/o3tuG6l+P2 0UZ0nIwBrJB+kI441fxtpMTJ0TXq9HVrw7ERgcUjdatdWgrdgOeNpnPNy4Lr9Ml0aV kE/HqfTIrrOnvm3kaeL74/8X/iTD7fOQ/GFqWAb08hWlQOu6Q0jybxst4NnOGm2IL7 ikMJv1z/mnUYg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E97AA5C05C8; Mon, 20 Jun 2022 15:45:04 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, Zqiang , Neeraj Upadhyay , Boqun Feng , Uladzislau Rezki , Joel Fernandes , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 2/7] rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order Date: Mon, 20 Jun 2022 15:44:58 -0700 Message-Id: <20220620224503.3841196-2-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> References: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zqiang In case of failure to spawn either rcuog or rcuo[p] kthreads for a given rdp, rcu_nocb_rdp_deoffload() needs to be called with the hotplug lock and the barrier_mutex held. However cpus write lock is already held while calling rcutree_prepare_cpu(). It's not possible to call rcu_nocb_rdp_deoffload() from there with just locking the barrier_mutex or this would result in a locking inversion against rcu_nocb_cpu_deoffload() which holds both locks in the reverse order. Simply solve this with inverting the locking order inside rcu_nocb_cpu_[de]offload(). This will be a pre-requisite to toggle NOCB states toward cpusets anyway. Signed-off-by: Zqiang Cc: Neeraj Upadhyay Cc: Boqun Feng Cc: Uladzislau Rezki Cc: Joel Fernandes Signed-off-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney Reviewed-by: Neeraj Upadhyay --- kernel/rcu/tree_nocb.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index dac74952e1d1b..f2f2cab6285a1 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1055,8 +1055,8 @@ int rcu_nocb_cpu_deoffload(int cpu) struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); int ret = 0; - mutex_lock(&rcu_state.barrier_mutex); cpus_read_lock(); + mutex_lock(&rcu_state.barrier_mutex); if (rcu_rdp_is_offloaded(rdp)) { if (cpu_online(cpu)) { ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp); @@ -1067,8 +1067,8 @@ int rcu_nocb_cpu_deoffload(int cpu) ret = -EINVAL; } } - cpus_read_unlock(); mutex_unlock(&rcu_state.barrier_mutex); + cpus_read_unlock(); return ret; } @@ -1134,8 +1134,8 @@ int rcu_nocb_cpu_offload(int cpu) struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); int ret = 0; - mutex_lock(&rcu_state.barrier_mutex); cpus_read_lock(); + mutex_lock(&rcu_state.barrier_mutex); if (!rcu_rdp_is_offloaded(rdp)) { if (cpu_online(cpu)) { ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp); @@ -1146,8 +1146,8 @@ int rcu_nocb_cpu_offload(int cpu) ret = -EINVAL; } } - cpus_read_unlock(); mutex_unlock(&rcu_state.barrier_mutex); + cpus_read_unlock(); return ret; } From patchwork Mon Jun 20 22:44:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 12888346 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F241C433EF for ; Mon, 20 Jun 2022 22:45:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244645AbiFTWpK (ORCPT ); Mon, 20 Jun 2022 18:45:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244520AbiFTWpJ (ORCPT ); Mon, 20 Jun 2022 18:45:09 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAFC21400A; Mon, 20 Jun 2022 15:45:07 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 74AFBB8162B; Mon, 20 Jun 2022 22:45:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 35569C3411B; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655765105; bh=cQbg2ryuG/N2qvkfvmVUmcUhzkg7BvIShVSh77S0juM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LXFQQDRiDagN0Wc8W5BP8V9XoBDfE81irw0r8Vpo7b9nNzN3wj4iBGdPBzWy1E/d7 HqsAOX+eYc3SPS9K3J7G3YVxzLCrSXE7nt8zdYSIxSQ02SP8SbmbV1jAHeBAAhBdyP b4tlkbLXOh/5xxhjq61hLFF4kEnMkvf/6jSfiHmySsEzWcgQxyPdMDdxxk/4WlfRfy IS58DLtbQ9j5Ccup8gd3jAzakNqUWngTTvegs3TqBLpPUYX6SYtLbpulkt1KK4CbDT imIudZOrQjlV8DYCh4OsRc4J2sIp3x0vyxy0z4vLFFxwcWjsUKaEdGH0kMX4+cj2lx UiCeBVZ3VHUag== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id EB7E25C0A15; Mon, 20 Jun 2022 15:45:04 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, Zqiang , Neeraj Upadhyay , Boqun Feng , Uladzislau Rezki , Joel Fernandes , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 3/7] rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call Date: Mon, 20 Jun 2022 15:44:59 -0700 Message-Id: <20220620224503.3841196-3-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> References: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zqiang If the rcuog/o[p] kthreads spawn failed, the offloaded rdp needs to be explicitly deoffloaded, otherwise the target rdp is still considered offloaded even though nothing actually handles the callbacks. Signed-off-by: Zqiang Cc: Neeraj Upadhyay Cc: Boqun Feng Cc: Uladzislau Rezki Cc: Joel Fernandes Signed-off-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney Reviewed-by: Neeraj Upadhyay --- kernel/rcu/tree_nocb.h | 80 +++++++++++++++++++++++++++++++++--------- 1 file changed, 64 insertions(+), 16 deletions(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index f2f2cab6285a1..4cf9a29bba79d 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -986,10 +986,7 @@ static int rdp_offload_toggle(struct rcu_data *rdp, } raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); - if (wake_gp) - wake_up_process(rdp_gp->nocb_gp_kthread); - - return 0; + return wake_gp; } static long rcu_nocb_rdp_deoffload(void *arg) @@ -997,9 +994,15 @@ static long rcu_nocb_rdp_deoffload(void *arg) struct rcu_data *rdp = arg; struct rcu_segcblist *cblist = &rdp->cblist; unsigned long flags; - int ret; + int wake_gp; + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; - WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id()); + /* + * rcu_nocb_rdp_deoffload() may be called directly if + * rcuog/o[p] spawn failed, because at this time the rdp->cpu + * is not online yet. + */ + WARN_ON_ONCE((rdp->cpu != raw_smp_processor_id()) && cpu_online(rdp->cpu)); pr_info("De-offloading %d\n", rdp->cpu); @@ -1023,10 +1026,41 @@ static long rcu_nocb_rdp_deoffload(void *arg) */ rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE); invoke_rcu_core(); - ret = rdp_offload_toggle(rdp, false, flags); - swait_event_exclusive(rdp->nocb_state_wq, - !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | - SEGCBLIST_KTHREAD_GP)); + wake_gp = rdp_offload_toggle(rdp, false, flags); + + mutex_lock(&rdp_gp->nocb_gp_kthread_mutex); + if (rdp_gp->nocb_gp_kthread) { + if (wake_gp) + wake_up_process(rdp_gp->nocb_gp_kthread); + + /* + * If rcuo[p] kthread spawn failed, directly remove SEGCBLIST_KTHREAD_CB. + * Just wait SEGCBLIST_KTHREAD_GP to be cleared by rcuog. + */ + if (!rdp->nocb_cb_kthread) { + rcu_nocb_lock_irqsave(rdp, flags); + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB); + rcu_nocb_unlock_irqrestore(rdp, flags); + } + + swait_event_exclusive(rdp->nocb_state_wq, + !rcu_segcblist_test_flags(cblist, + SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP)); + } else { + /* + * No kthread to clear the flags for us or remove the rdp from the nocb list + * to iterate. Do it here instead. Locking doesn't look stricly necessary + * but we stick to paranoia in this rare path. + */ + rcu_nocb_lock_irqsave(rdp, flags); + rcu_segcblist_clear_flags(&rdp->cblist, + SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP); + rcu_nocb_unlock_irqrestore(rdp, flags); + + list_del(&rdp->nocb_entry_rdp); + } + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex); + /* * Lock one last time to acquire latest callback updates from kthreads * so we can later handle callbacks locally without locking. @@ -1047,7 +1081,7 @@ static long rcu_nocb_rdp_deoffload(void *arg) WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); - return ret; + return 0; } int rcu_nocb_cpu_deoffload(int cpu) @@ -1079,7 +1113,8 @@ static long rcu_nocb_rdp_offload(void *arg) struct rcu_data *rdp = arg; struct rcu_segcblist *cblist = &rdp->cblist; unsigned long flags; - int ret; + int wake_gp; + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id()); /* @@ -1089,6 +1124,9 @@ static long rcu_nocb_rdp_offload(void *arg) if (!rdp->nocb_gp_rdp) return -EINVAL; + if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread)) + return -EINVAL; + pr_info("Offloading %d\n", rdp->cpu); /* @@ -1113,7 +1151,9 @@ static long rcu_nocb_rdp_offload(void *arg) * WRITE flags READ callbacks * rcu_nocb_unlock() rcu_nocb_unlock() */ - ret = rdp_offload_toggle(rdp, true, flags); + wake_gp = rdp_offload_toggle(rdp, true, flags); + if (wake_gp) + wake_up_process(rdp_gp->nocb_gp_kthread); swait_event_exclusive(rdp->nocb_state_wq, rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) && rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)); @@ -1126,7 +1166,7 @@ static long rcu_nocb_rdp_offload(void *arg) rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE); rcu_nocb_unlock_irqrestore(rdp, flags); - return ret; + return 0; } int rcu_nocb_cpu_offload(int cpu) @@ -1248,7 +1288,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu) "rcuog/%d", rdp_gp->cpu); if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) { mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex); - return; + goto end; } WRITE_ONCE(rdp_gp->nocb_gp_kthread, t); if (kthread_prio) @@ -1260,12 +1300,20 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu) t = kthread_run(rcu_nocb_cb_kthread, rdp, "rcuo%c/%d", rcu_state.abbr, cpu); if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__)) - return; + goto end; if (kthread_prio) sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); WRITE_ONCE(rdp->nocb_cb_kthread, t); WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread); + return; +end: + mutex_lock(&rcu_state.barrier_mutex); + if (rcu_rdp_is_offloaded(rdp)) { + rcu_nocb_rdp_deoffload(rdp); + cpumask_clear_cpu(cpu, rcu_nocb_mask); + } + mutex_unlock(&rcu_state.barrier_mutex); } /* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids). */ From patchwork Mon Jun 20 22:45:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 12888344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FA66CCA47C for ; Mon, 20 Jun 2022 22:45:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244537AbiFTWpJ (ORCPT ); Mon, 20 Jun 2022 18:45:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244164AbiFTWpH (ORCPT ); Mon, 20 Jun 2022 18:45:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48C9A15719; Mon, 20 Jun 2022 15:45:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D89DD61380; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D79DC341C7; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655765105; bh=MMOx/Mf+yKKYBlxJNUzhxx2Qq1PONGZHWw4SbSZ2Fg0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=De2IbPfUhj6EBIINGusqFVBFmyqaHC2Q4V/WSdOprQoKjbnXGSOGNvojCrZb/yfM8 a6OsC+yxw2LdULBw3Kc8ZzflHVoZMX+G5pMUYr1Ds7e/+gX0LmK/HmdS6a23FY2DPM BoBv4g4TZH+lzMexrpNjCli8DPnPwNzxHumebBMKGGuJI+XOjl2cqxfzcv5OC1SFAH K9jCJmmGg7AGAggM9NhfEh7kKNXEtYgLOUR9GQ/HstpsTXlcjM7PGPahlYCBzUiNcS j8sIBw+WLrpPM7OWChmjQnamGW213G/bWXi9uwcsIoy+OFYdWL4CCl8kEYSrS9Qxtd pydIILwYsLO7w== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id ED9825C0A33; Mon, 20 Jun 2022 15:45:04 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, Joel Fernandes , Kalesh Singh , Uladzislau Rezki , kernel test robot , "Paul E . McKenney" Subject: [PATCH rcu 4/7] rcu/nocb: Add an option to offload all CPUs on boot Date: Mon, 20 Jun 2022 15:45:00 -0700 Message-Id: <20220620224503.3841196-4-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> References: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Joel Fernandes Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have callback offloading on any of the CPUs, nor can any of the CPUs be switched to enable callback offloading at runtime. Although this is intentional, it would be nice to have a way to offload all the CPUs without having to make random bootloaders specify either the rcu_nocbs= or the rcu_nohz_full= kernel-boot parameters. This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that switches the default so as to offload callback processing on all of the CPUs. This default can still be overridden using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters. Reviewed-by: Kalesh Singh Reviewed-by: Uladzislau Rezki (In v4.1, fixed issues with CONFIG maze reported by kernel test robot). Reported-by: kernel test robot Signed-off-by: Joel Fernandes Signed-off-by: Paul E. McKenney Reviewed-by: Neeraj Upadhyay --- Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ kernel/rcu/Kconfig | 13 +++++++++++++ kernel/rcu/tree_nocb.h | 15 ++++++++++++++- 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2522b11e593f2..34605c275294c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3659,6 +3659,9 @@ just as if they had also been called out in the rcu_nocbs= boot parameter. + Note that this argument takes precedence over + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option. + noiotrap [SH] Disables trapped I/O port accesses. noirqdebug [X86-32] Disables the code which attempts to detect and @@ -4557,6 +4560,9 @@ no-callback mode from boot but the mode may be toggled at runtime via cpusets. + Note that this argument takes precedence over + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option. + rcu_nocb_poll [KNL] Rather than requiring that offloaded CPUs (specified by rcu_nocbs= above) explicitly diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index 1c630e573548d..27aab870ae4cf 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -262,6 +262,19 @@ config RCU_NOCB_CPU Say Y here if you need reduced OS jitter, despite added overhead. Say N here if you are unsure. +config RCU_NOCB_CPU_DEFAULT_ALL + bool "Offload RCU callback processing from all CPUs by default" + depends on RCU_NOCB_CPU + default n + help + Use this option to offload callback processing from all CPUs + by default, in the absence of the rcu_nocbs or nohz_full boot + parameter. This also avoids the need to use any boot parameters + to achieve the effect of offloading all CPUs on boot. + + Say Y here if you want offload all CPUs by default on boot. + Say N here if you are unsure. + config TASKS_TRACE_RCU_READ_MB bool "Tasks Trace RCU readers use memory barriers in user and idle" depends on RCU_EXPERT && TASKS_TRACE_RCU diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 4cf9a29bba79d..60cc92cc66552 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1197,11 +1197,21 @@ void __init rcu_init_nohz(void) { int cpu; bool need_rcu_nocb_mask = false; + bool offload_all = false; struct rcu_data *rdp; +#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) + if (!rcu_state.nocb_is_setup) { + need_rcu_nocb_mask = true; + offload_all = true; + } +#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */ + #if defined(CONFIG_NO_HZ_FULL) - if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) + if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) { need_rcu_nocb_mask = true; + offload_all = false; /* NO_HZ_FULL has its own mask. */ + } #endif /* #if defined(CONFIG_NO_HZ_FULL) */ if (need_rcu_nocb_mask) { @@ -1222,6 +1232,9 @@ void __init rcu_init_nohz(void) cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask); #endif /* #if defined(CONFIG_NO_HZ_FULL) */ + if (offload_all) + cpumask_setall(rcu_nocb_mask); + if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n"); cpumask_and(rcu_nocb_mask, cpu_possible_mask, From patchwork Mon Jun 20 22:45:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 12888343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0F64CCA480 for ; Mon, 20 Jun 2022 22:45:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244425AbiFTWpI (ORCPT ); Mon, 20 Jun 2022 18:45:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244163AbiFTWpH (ORCPT ); Mon, 20 Jun 2022 18:45:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B41E19F8E; Mon, 20 Jun 2022 15:45:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DBCCA61381; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 446A9C3411C; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655765105; bh=XDrxmKz0bTXwZ0WpBKmpBvBfH5ezkcFLIoxVI2Soqag=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LlL/btuFxDjEpX19ZQKfgM+uDaD57gGiZjBz3/2+SX/A7DnJpvlChtmTzrn5EbZet x2wuu+/8TYh38L63GcbRWVsqcF4aMz9mAsI9SZ12Q5XKBDlV9fbVFM7rqgWNDnXhB5 9ue9ih1dMMPpT4SFcM3jLdLUla1MAJDQsi1lF0rSGhreX0FGUQXMIzK0B2TOHE0Xd1 FroOnoKj9S01sWioYXusBx6eYfeYZRD7HpDz/WwVsERPYOcRlsCXbs0UDJn3n9qlgX 1Y9uvKdrzWMKIr1uOugigSgKKibs8o9+Ru1qVnh+wtjSMqdsTJXpNX8o8c3Y7Ym8C6 pKXlrGLx0+atg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id EF8955C0ADC; Mon, 20 Jun 2022 15:45:04 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, Zqiang , kernel test robot , "Paul E . McKenney" Subject: [PATCH rcu 5/7] rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread() Date: Mon, 20 Jun 2022 15:45:01 -0700 Message-Id: <20220620224503.3841196-5-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> References: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zqiang Callbacks are invoked in RCU kthreads when calbacks are offloaded (rcu_nocbs boot parameter) or when RCU's softirq handler has been offloaded to rcuc kthreads (use_softirq==0). The current code allows for the rcu_nocbs case but not the use_softirq case. This commit adds support for the use_softirq case. Reported-by: kernel test robot Signed-off-by: Zqiang Signed-off-by: Paul E. McKenney Reviewed-by: Neeraj Upadhyay --- kernel/rcu/tree.c | 4 ++-- kernel/rcu/tree.h | 2 +- kernel/rcu/tree_plugin.h | 33 +++++++++++++++++++-------------- 3 files changed, 22 insertions(+), 17 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index c25ba442044a6..74455671e6cf2 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2530,7 +2530,7 @@ static void rcu_do_batch(struct rcu_data *rdp) trace_rcu_batch_end(rcu_state.name, 0, !rcu_segcblist_empty(&rdp->cblist), need_resched(), is_idle_task(current), - rcu_is_callbacks_kthread()); + rcu_is_callbacks_kthread(rdp)); return; } @@ -2608,7 +2608,7 @@ static void rcu_do_batch(struct rcu_data *rdp) rcu_nocb_lock_irqsave(rdp, flags); rdp->n_cbs_invoked += count; trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(), - is_idle_task(current), rcu_is_callbacks_kthread()); + is_idle_task(current), rcu_is_callbacks_kthread(rdp)); /* Update counts and requeue any remaining callbacks. */ rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 4f8532c33558f..649ad4f0129b1 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -426,7 +426,7 @@ static void rcu_flavor_sched_clock_irq(int user); static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck); static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags); static void rcu_preempt_boost_start_gp(struct rcu_node *rnp); -static bool rcu_is_callbacks_kthread(void); +static bool rcu_is_callbacks_kthread(struct rcu_data *rdp); static void rcu_cpu_kthread_setup(unsigned int cpu); static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp); static bool rcu_preempt_has_tasks(struct rcu_node *rnp); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index c8ba0fe17267c..0483e1338c413 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1012,6 +1012,25 @@ static void rcu_cpu_kthread_setup(unsigned int cpu) WRITE_ONCE(rdp->rcuc_activity, jiffies); } +static bool rcu_is_callbacks_nocb_kthread(struct rcu_data *rdp) +{ +#ifdef CONFIG_RCU_NOCB_CPU + return rdp->nocb_cb_kthread == current; +#else + return false; +#endif +} + +/* + * Is the current CPU running the RCU-callbacks kthread? + * Caller must have preemption disabled. + */ +static bool rcu_is_callbacks_kthread(struct rcu_data *rdp) +{ + return rdp->rcu_cpu_kthread_task == current || + rcu_is_callbacks_nocb_kthread(rdp); +} + #ifdef CONFIG_RCU_BOOST /* @@ -1151,15 +1170,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags) } } -/* - * Is the current CPU running the RCU-callbacks kthread? - * Caller must have preemption disabled. - */ -static bool rcu_is_callbacks_kthread(void) -{ - return __this_cpu_read(rcu_data.rcu_cpu_kthread_task) == current; -} - #define RCU_BOOST_DELAY_JIFFIES DIV_ROUND_UP(CONFIG_RCU_BOOST_DELAY * HZ, 1000) /* @@ -1242,11 +1252,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags) raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } -static bool rcu_is_callbacks_kthread(void) -{ - return false; -} - static void rcu_preempt_boost_start_gp(struct rcu_node *rnp) { } From patchwork Mon Jun 20 22:45:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 12888345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56AD0CCA482 for ; Mon, 20 Jun 2022 22:45:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244303AbiFTWpJ (ORCPT ); Mon, 20 Jun 2022 18:45:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244004AbiFTWpH (ORCPT ); Mon, 20 Jun 2022 18:45:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB70B1400F; Mon, 20 Jun 2022 15:45:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6B4556135A; Mon, 20 Jun 2022 22:45:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90953C341C8; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655765105; bh=YeBHehtFq6AdT6OCOL99JKnvZVJzXnbqY95m1WmhH7E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QwwfJdbr1zUhilxZV10pPyd3wRCsuY28CIvT80dRwy/rjvRlzEEAcSAZyhhzQ77zr XwMXMgp9GWIcj8e7EalIHffLystOsPfi4xqJ1KDb1Uckr8RQwutPo0iPEKJ67G/3BF kw7oS+VGEYtDSxAiQk4vdCHdsR1pQS94J8Ljxaq0TWQD+PvoCWCnXhwg4PGfpoXmX5 BMqoZdmMBb7AOFRt7qA5qmabQ/ESzBgW7BMJLcuxzZidRDKewrhVfmmdvnaUzgdUNN mXanotKBen1cPJBJEpfUYm2dxN7YI82/3BdI3tmTM2dR3OcX7I3N+c4ZIUnqP3H31+ D+Nl+PAP8UflA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id F1E055C0B06; Mon, 20 Jun 2022 15:45:04 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" , Joel Fernandes Subject: [PATCH rcu 6/7] rcu/nocb: Add option to opt rcuo kthreads out of RT priority Date: Mon, 20 Jun 2022 15:45:02 -0700 Message-Id: <20220620224503.3841196-6-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> References: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Uladzislau Rezki (Sony)" This commit introduces a RCU_NOCB_CPU_CB_BOOST Kconfig option that prevents rcuo kthreads from running at real-time priority, even in kernels built with RCU_BOOST. This capability is important to devices needing low-latency (as in a few milliseconds) response from expedited RCU grace periods, but which are not running a classic real-time workload. On such devices, permitting the rcuo kthreads to run at real-time priority results in unacceptable latencies imposed on the application tasks, which run as SCHED_OTHER. See for example the following trace output: <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270 If that rcuop kthread were permitted to run at real-time SCHED_FIFO priority, it would monopolize its CPU for hundreds of milliseconds while invoking those 34619 RCU callback functions, which would cause an unacceptably long latency spike for many application stacks on Android platforms. However, some existing real-time workloads require that callback invocation run at SCHED_FIFO priority, for example, those running on systems with heavy SCHED_OTHER background loads. (It is the real-time system's administrator's responsibility to make sure that important real-time tasks run at a higher priority than do RCU's kthreads.) Therefore, this new RCU_NOCB_CPU_CB_BOOST Kconfig option defaults to "y" on kernels built with PREEMPT_RT and defaults to "n" otherwise. The effect is to preserve current behavior for real-time systems, but for other systems to allow expedited RCU grace periods to run with real-time priority while continuing to invoke RCU callbacks as SCHED_OTHER. As you would expect, this RCU_NOCB_CPU_CB_BOOST Kconfig option has no effect except on CPUs with offloaded RCU callbacks. Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney Acked-by: Joel Fernandes (Google) Reviewed-by: Neeraj Upadhyay --- kernel/rcu/Kconfig | 16 ++++++++++++++++ kernel/rcu/tree.c | 6 +++++- kernel/rcu/tree_nocb.h | 3 ++- 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index 27aab870ae4cf..c05ca52cdf64d 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -275,6 +275,22 @@ config RCU_NOCB_CPU_DEFAULT_ALL Say Y here if you want offload all CPUs by default on boot. Say N here if you are unsure. +config RCU_NOCB_CPU_CB_BOOST + bool "Offload RCU callback from real-time kthread" + depends on RCU_NOCB_CPU && RCU_BOOST + default y if PREEMPT_RT + help + Use this option to invoke offloaded callbacks as SCHED_FIFO + to avoid starvation by heavy SCHED_OTHER background load. + Of course, running as SCHED_FIFO during callback floods will + cause the rcuo[ps] kthreads to monopolize the CPU for hundreds + of milliseconds or more. Therefore, when enabling this option, + it is your responsibility to ensure that latency-sensitive + tasks either run with higher priority or run on some other CPU. + + Say Y here if you want to set RT priority for offloading kthreads. + Say N here if you are building a !PREEMPT_RT kernel and are unsure. + config TASKS_TRACE_RCU_READ_MB bool "Tasks Trace RCU readers use memory barriers in user and idle" depends on RCU_EXPERT && TASKS_TRACE_RCU diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 74455671e6cf2..3b9f45ebb4999 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -154,7 +154,11 @@ static void sync_sched_exp_online_cleanup(int cpu); static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp); static bool rcu_rdp_is_offloaded(struct rcu_data *rdp); -/* rcuc/rcub/rcuop kthread realtime priority */ +/* + * rcuc/rcub/rcuop kthread realtime priority. The "rcuop" + * real-time priority(enabling/disabling) is controlled by + * the extra CONFIG_RCU_NOCB_CPU_CB_BOOST configuration. + */ static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0; module_param(kthread_prio, int, 0444); diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 60cc92cc66552..fa8e4f82e60c0 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu) if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__)) goto end; - if (kthread_prio) + if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST) && kthread_prio) sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); + WRITE_ONCE(rdp->nocb_cb_kthread, t); WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread); return; From patchwork Mon Jun 20 22:45:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 12888347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52F9CC43334 for ; Mon, 20 Jun 2022 22:45:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244259AbiFTWpL (ORCPT ); Mon, 20 Jun 2022 18:45:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244646AbiFTWpK (ORCPT ); Mon, 20 Jun 2022 18:45:10 -0400 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E19514022; Mon, 20 Jun 2022 15:45:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 97E21CE179E; Mon, 20 Jun 2022 22:45:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9097AC341CA; Mon, 20 Jun 2022 22:45:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655765105; bh=l+tebky4TEgEPH7xlL3vd3F/iKwURUSr+8Dhto1NkeI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kn+fY/za3b/dCuWSJA3LIonsb2y/u0EDBpD5zGeNjIl3lsqa3Dxpo7WJgKFOgEpWh 3ajx7V4PhKrfBK4X7FEMgGHeD5jeZGh5QiI0xgE+XZMMLlpvcldjaAWacRoxEq2k++ KNCGrkFCHmKV0JAfSc00YFisAexYV19m4fafpkaADdlN8QVPD0F6UEgW1Ufo2IjI5L aUB+6vN47jNHPgCQejWZMGfQ37GFFeAQ04dzs6LBlxA1IeTxlDpvcyJSweoIFK1xmY YvXgPcmz3VZ4BlGDir2o7ei86JEl0qQE6CG8RX8fzQNjriyY9ULZTVSkwqhb45TNNu +vsJrPnfVrKCw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id F3A375C0BCC; Mon, 20 Jun 2022 15:45:04 -0700 (PDT) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, rostedt@goodmis.org, Zqiang , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 7/7] rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty Date: Mon, 20 Jun 2022 15:45:03 -0700 Message-Id: <20220620224503.3841196-7-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> References: <20220620224455.GA3840881@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zqiang Currently, if the 'rcu_nocb_poll' kernel boot parameter is enabled, all rcuog kthreads enter polling mode. However, if all of a given group of rcuo kthreads correspond to CPUs that have been de-offloaded, the corresponding rcuog kthread will nonetheless still wake up periodically, unnecessarily consuming power and perturbing workloads. Fortunately, this situation is easily detected by the fact that the rcuog kthread's CPU's rcu_data structure's ->nocb_head_rdp list is empty. This commit saves power and avoids unnecessarily perturbing workloads by putting an rcuog kthread to sleep during any time period when all of its rcuo kthreads' CPUs are de-offloaded. Co-developed-by: Frederic Weisbecker Signed-off-by: Frederic Weisbecker Signed-off-by: Zqiang Signed-off-by: Paul E. McKenney Reviewed-by: Neeraj Upadhyay --- kernel/rcu/tree_nocb.h | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index fa8e4f82e60c0..a8f574d8850d2 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -584,6 +584,14 @@ static int nocb_gp_toggle_rdp(struct rcu_data *rdp, return ret; } +static void nocb_gp_sleep(struct rcu_data *my_rdp, int cpu) +{ + trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep")); + swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq, + !READ_ONCE(my_rdp->nocb_gp_sleep)); + trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep")); +} + /* * No-CBs GP kthreads come here to wait for additional callbacks to show up * or for grace periods to end. @@ -701,13 +709,19 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) /* Polling, so trace if first poll in the series. */ if (gotcbs) trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll")); - schedule_timeout_idle(1); + if (list_empty(&my_rdp->nocb_head_rdp)) { + raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); + if (!my_rdp->nocb_toggling_rdp) + WRITE_ONCE(my_rdp->nocb_gp_sleep, true); + raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); + /* Wait for any offloading rdp */ + nocb_gp_sleep(my_rdp, cpu); + } else { + schedule_timeout_idle(1); + } } else if (!needwait_gp) { /* Wait for callbacks to appear. */ - trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep")); - swait_event_interruptible_exclusive(my_rdp->nocb_gp_wq, - !READ_ONCE(my_rdp->nocb_gp_sleep)); - trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("EndSleep")); + nocb_gp_sleep(my_rdp, cpu); } else { rnp = my_rdp->mynode; trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("StartWait"));