From patchwork Wed May 31 10:17:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7CD7C77B7C for ; Wed, 31 May 2023 10:17:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235348AbjEaKR4 (ORCPT ); Wed, 31 May 2023 06:17:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235400AbjEaKRw (ORCPT ); Wed, 31 May 2023 06:17:52 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12699134; Wed, 31 May 2023 03:17:51 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 94FB860EFE; Wed, 31 May 2023 10:17:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB99EC4339C; Wed, 31 May 2023 10:17:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528270; bh=DNw6SN2JUww7kf2Xz6WH9RgMFZWQt8aiAF4n3naxZ8k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K++9WA8Zkr7l786Wuf62fwfUApIVGQi01biW6e8wGwQHBKgrpQAKkzLRVNJ9FJFEb 6q65BxMPjwToVuIYiLjc0tsamZrX2LVeun2tYJtaXb1oIs4VZ8f5zNynUVKIFo9Wzf 7Jp3q2wWzb5lwsh8j8K1CD0ggFseQ83QSaFCPehYErbXWv0Yt9ZbL3UfGolFiebYxL vwQS/eSA+ngbVol48R179tOYHhEdlXk0rd5tGNlytVY1fbLl6TGPabqkXjSJASlMwr +nWW1+WF/4o4WreAbpmiay1sUBGK2cZsdmmmneQyDHfkZLVJYvUejCepARivd4PRwJ 52LgxlEAwMn/w== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 1/9] rcu: Assume IRQS disabled from rcu_report_dead() Date: Wed, 31 May 2023 12:17:28 +0200 Message-Id: <20230531101736.12981-2-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org rcu_report_dead() is the last RCU word from the CPU down through the hotplug path. It is called in the idle loop right before the CPU shuts down for good. Because it removes the CPU from the grace period state machine and reports an ultimate quiescent state if necessary, no further use of RCU is allowed. Therefore it is expected that IRQs are disabled upon calling this function and are not to be re-enabled again until the CPU shuts down. Remove the IRQs disablement from that function and verify instead that it is actually called with IRQs disabled as it is expected at that special point in the idle path. Signed-off-by: Frederic Weisbecker Reviewed-by: Joel Fernandes (Google) --- kernel/rcu/tree.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index fae9b4e29c93..bc4e7c9b51cb 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4476,11 +4476,16 @@ void rcu_cpu_starting(unsigned int cpu) */ void rcu_report_dead(unsigned int cpu) { - unsigned long flags, seq_flags; + unsigned long flags; unsigned long mask; struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ + /* + * IRQS must be disabled from now on and until the CPU dies, or an interrupt + * may introduce a new READ-side while it is actually off the QS masks. + */ + lockdep_assert_irqs_disabled(); // Do any dangling deferred wakeups. do_nocb_deferred_wakeup(rdp); @@ -4488,7 +4493,6 @@ void rcu_report_dead(unsigned int cpu) /* Remove outgoing CPU from mask in the leaf rcu_node structure. */ mask = rdp->grpmask; - local_irq_save(seq_flags); arch_spin_lock(&rcu_state.ofl_lock); raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */ rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq); @@ -4502,8 +4506,6 @@ void rcu_report_dead(unsigned int cpu) WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask); raw_spin_unlock_irqrestore_rcu_node(rnp, flags); arch_spin_unlock(&rcu_state.ofl_lock); - local_irq_restore(seq_flags); - rdp->cpu_started = false; } From patchwork Wed May 31 10:17:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261914 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32D5EC77B7A for ; Wed, 31 May 2023 10:18:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235630AbjEaKR7 (ORCPT ); Wed, 31 May 2023 06:17:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235580AbjEaKRz (ORCPT ); Wed, 31 May 2023 06:17:55 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEA5B189; Wed, 31 May 2023 03:17:53 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5A3C763082; Wed, 31 May 2023 10:17:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8369EC433D2; Wed, 31 May 2023 10:17:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528272; bh=3ZmWcklXUR3MN9rHzHFfe6CnosWBzUs6vc5XNpXadQg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b/QCg1wWiNj9X/gkuZ0Hj0YPkz0x7EprEOoUIVhh1cluWe91jhRxuFn8VRtDjkl4O iteyi2K+SmcZskb3rmpPuP0P3oGf5/hw3qwZaFJPI00YJqSRR4hhpUi1ooL57mSfqy VrPHr7pluMcGH2DlYoj5SJaMnv3vGzZlGgFN/XKRbAhNpQvLCjeaL0AypR8OP+EaAx W/dYpCC154dWA/qjMzSp639es5o6NbwKg/gfoIajV5C1FGsjTU3wlhfiqdV2+ScOHL Dq4+opCaI1ey7lfmIq3PQPlHdKaE84JOw36SmyM8t4rCU4s0IWscLODhwAmTD4773B Pyfs+GtT3zSrQ== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 2/9] rcu: Use rcu_segcblist_segempty() instead of open coding it Date: Wed, 31 May 2023 12:17:29 +0200 Message-Id: <20230531101736.12981-3-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org This makes the code more readable. Signed-off-by: Frederic Weisbecker Reviewed-by: Joel Fernandes (Google) Reviewed-by: Qiuxu Zhuo --- kernel/rcu/rcu_segcblist.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c index f71fac422c8f..1693ea22ef1b 100644 --- a/kernel/rcu/rcu_segcblist.c +++ b/kernel/rcu/rcu_segcblist.c @@ -368,7 +368,7 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, smp_mb(); /* Ensure counts are updated before callback is entrained. */ rhp->next = NULL; for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--) - if (rsclp->tails[i] != rsclp->tails[i - 1]) + if (!rcu_segcblist_segempty(rsclp, i)) break; rcu_segcblist_inc_seglen(rsclp, i); WRITE_ONCE(*rsclp->tails[i], rhp); @@ -551,7 +551,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq) * as their ->gp_seq[] grace-period completion sequence number. */ for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--) - if (rsclp->tails[i] != rsclp->tails[i - 1] && + if (!rcu_segcblist_segempty(rsclp, i) && ULONG_CMP_LT(rsclp->gp_seq[i], seq)) break; From patchwork Wed May 31 10:17:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50348C77B7C for ; Wed, 31 May 2023 10:18:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235893AbjEaKSL (ORCPT ); Wed, 31 May 2023 06:18:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234902AbjEaKR6 (ORCPT ); Wed, 31 May 2023 06:17:58 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67BDB13E; Wed, 31 May 2023 03:17:57 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7298760EFE; Wed, 31 May 2023 10:17:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6ED96C433EF; Wed, 31 May 2023 10:17:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528275; bh=V2pULL6IX7rEXNXR3IXgioKKTYc27dEUYXc+7cPP0iA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=m/Zre9LvUfzC2p/nEewycKx9N2uzuB5M4uMoOrgQ51lx1M78xh13elU8WAn6WzmBx FmasKscjTZWx/8oAo0ziDp1QgjQuRKuI4+MvyzxPmtaWrr0u8fRJKZgnojAXLolsIQ OcdN1Cvw7PuxpSsGIuqO04M+YrC1/q13zPurAQloyFpgBWK5INUDAjnnrOsFFmDGIQ 2UvBEIPDKYuB23h25IupgEpOIpRNpQU5YMPA3bkux0tHVoCxFKnqjNIM3CDhe3hMps FxnR5fewL6m+1lchavyNiAqZxJ3DUgxcrde2uj+rNbTgOnorC7TPjKYquWMMTUKBt5 m9kNrzHqQS1VA== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 3/9] rcu: Rename jiffies_till_flush to jiffies_lazy_flush Date: Wed, 31 May 2023 12:17:30 +0200 Message-Id: <20230531101736.12981-4-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org The variable name jiffies_till_flush is too generic and therefore: * It may shadow a global variable * It doesn't tell on what it operates Make the name more precise, along with the related APIs. Signed-off-by: Frederic Weisbecker --- kernel/rcu/rcu.h | 8 ++++---- kernel/rcu/rcuscale.c | 6 +++--- kernel/rcu/tree_nocb.h | 20 ++++++++++---------- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 98c1544cf572..236d6f837c49 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -526,11 +526,11 @@ enum rcutorture_type { }; #if defined(CONFIG_RCU_LAZY) -unsigned long rcu_lazy_get_jiffies_till_flush(void); -void rcu_lazy_set_jiffies_till_flush(unsigned long j); +unsigned long rcu_get_jiffies_lazy_flush(void); +void rcu_set_jiffies_lazy_flush(unsigned long j); #else -static inline unsigned long rcu_lazy_get_jiffies_till_flush(void) { return 0; } -static inline void rcu_lazy_set_jiffies_till_flush(unsigned long j) { } +static inline unsigned long rcu_get_jiffies_lazy_flush(void) { return 0; } +static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { } #endif #if defined(CONFIG_TREE_RCU) diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c index 7fba3ab66e35..53ec80868791 100644 --- a/kernel/rcu/rcuscale.c +++ b/kernel/rcu/rcuscale.c @@ -725,9 +725,9 @@ kfree_scale_init(void) if (kfree_by_call_rcu) { /* do a test to check the timeout. */ - orig_jif = rcu_lazy_get_jiffies_till_flush(); + orig_jif = rcu_get_jiffies_lazy_flush(); - rcu_lazy_set_jiffies_till_flush(2 * HZ); + rcu_set_jiffies_lazy_flush(2 * HZ); rcu_barrier(); jif_start = jiffies; @@ -736,7 +736,7 @@ kfree_scale_init(void) smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1); - rcu_lazy_set_jiffies_till_flush(orig_jif); + rcu_set_jiffies_lazy_flush(orig_jif); if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) { pr_alert("ERROR: call_rcu() CBs are not being lazy as expected!\n"); diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 43229d2b0c44..8320eb77b58b 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -264,21 +264,21 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force) * left unsubmitted to RCU after those many jiffies. */ #define LAZY_FLUSH_JIFFIES (10 * HZ) -static unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES; +static unsigned long jiffies_lazy_flush = LAZY_FLUSH_JIFFIES; #ifdef CONFIG_RCU_LAZY // To be called only from test code. -void rcu_lazy_set_jiffies_till_flush(unsigned long jif) +void rcu_lazy_set_jiffies_lazy_flush(unsigned long jif) { - jiffies_till_flush = jif; + jiffies_lazy_flush = jif; } -EXPORT_SYMBOL(rcu_lazy_set_jiffies_till_flush); +EXPORT_SYMBOL(rcu_lazy_set_jiffies_lazy_flush); -unsigned long rcu_lazy_get_jiffies_till_flush(void) +unsigned long rcu_lazy_get_jiffies_lazy_flush(void) { - return jiffies_till_flush; + return jiffies_lazy_flush; } -EXPORT_SYMBOL(rcu_lazy_get_jiffies_till_flush); +EXPORT_SYMBOL(rcu_lazy_get_jiffies_lazy_flush); #endif /* @@ -299,7 +299,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, */ if (waketype == RCU_NOCB_WAKE_LAZY && rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) { - mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush); + mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_lazy_flush); WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype); } else if (waketype == RCU_NOCB_WAKE_BYPASS) { mod_timer(&rdp_gp->nocb_timer, jiffies + 2); @@ -482,7 +482,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // flush ->nocb_bypass to ->cblist. if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) || (ncbs && bypass_is_lazy && - (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush))) || + (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) || ncbs >= qhimark) { rcu_nocb_lock(rdp); *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); @@ -723,7 +723,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) lazy_ncbs = READ_ONCE(rdp->lazy_len); if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) && - (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush) || + (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || bypass_ncbs > 2 * qhimark)) { flush_bypass = true; } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) && From patchwork Wed May 31 10:17:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2BE0C77B7A for ; Wed, 31 May 2023 10:18:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235870AbjEaKSM (ORCPT ); Wed, 31 May 2023 06:18:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235869AbjEaKSC (ORCPT ); Wed, 31 May 2023 06:18:02 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D2A7189; Wed, 31 May 2023 03:17:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 01B8862F14; Wed, 31 May 2023 10:17:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5814DC433D2; Wed, 31 May 2023 10:17:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528278; bh=vs1shr8Oz5T6vzLyfco2GglAWiji2BVDH39dzR8TnMU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i0C481BdTj94a+sJDkbJnVZXFJpcRPBz4x11sA9d0i3dm5oa6FyjBGPgjE4bePMDP U181pKEUpURPiFzXSFroP2wF3u8a4L/qt3481sWrb42NxDK1KH3B7SmQfZ/VbkZ6Bv UFGoHOnO67lbjAoTmyy2otF6qeMvoZp4D0JDJquNvZiviXt/EnCVccn5c42TPZSyod AAhtydlMiCQEGRhlKBdb4OuUFaUZOJyXLPk79uGq+W+vT0X6yN+1Zm+Dy0yq5qHp1k IW5J7fAsVWeQt+id9NicMd86nBIuQ4hjARMj67WHqPmdK0+vA7oDhPfGTJLO7m5qTh QGKYTBrSItg2A== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 4/9] rcu: Introduce lazy queue's own qhimark Date: Wed, 31 May 2023 12:17:31 +0200 Message-Id: <20230531101736.12981-5-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org The lazy and the regular bypass queues share the same thresholds in terms of number of callbacks after which a flush to the main list is performed: 10 000 callbacks. However lazy and regular bypass don't have the same purposes and neither should their respective thresholds: * The bypass queue stands for relieving the main queue in case of a callback storm. It makes sense to allow a high number of callbacks to pile up before flushing to the main queue, especially as the life cycle for this queue is very short (1 jiffy). * The lazy queue aims to spare wake ups and reduce the number of grace periods. There it doesn't make sense to allow a huge number of callbacks before flushing so as not to introduce memory pressure, especially as the life cycle for this queue is very long (10 seconds) For those reasons, set the default threshold for the lazy queue to 100. Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 2 ++ kernel/rcu/tree_nocb.h | 9 ++++----- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index bc4e7c9b51cb..9b98d87fa22e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -379,6 +379,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) static long blimit = DEFAULT_RCU_BLIMIT; #define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit. static long qhimark = DEFAULT_RCU_QHIMARK; +#define DEFAULT_RCU_QHIMARK_LAZY 100 // If this many pending, flush. +static long qhimark_lazy = DEFAULT_RCU_QHIMARK_LAZY; #define DEFAULT_RCU_QLOMARK 100 // Once only this many pending, use blimit. static long qlowmark = DEFAULT_RCU_QLOMARK; #define DEFAULT_RCU_QOVLD_MULT 2 diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 8320eb77b58b..c08447db5a2e 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -480,10 +480,9 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // If ->nocb_bypass has been used too long or is too full, // flush ->nocb_bypass to ->cblist. - if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) || - (ncbs && bypass_is_lazy && - (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush))) || - ncbs >= qhimark) { + if (ncbs && + ((!bypass_is_lazy && ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) || + (bypass_is_lazy && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || ncbs >= qhimark_lazy)))) { rcu_nocb_lock(rdp); *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); @@ -724,7 +723,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || - bypass_ncbs > 2 * qhimark)) { + bypass_ncbs > 2 * qhimark_lazy)) { flush_bypass = true; } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) || From patchwork Wed May 31 10:17:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8B4CC77B7A for ; Wed, 31 May 2023 10:18:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235876AbjEaKSO (ORCPT ); Wed, 31 May 2023 06:18:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235400AbjEaKSE (ORCPT ); Wed, 31 May 2023 06:18:04 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20D2A192; Wed, 31 May 2023 03:18:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9198263082; Wed, 31 May 2023 10:18:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E23F9C4339E; Wed, 31 May 2023 10:17:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528281; bh=6T9PufNrI67qOiSAF/52EuU4TI9tRCG4o5fvW1OJc0o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fkVWQNRtlxpcY2As9Q6qfsRTeQjdEIWbYSQED2VojPxVw3Go3BxDgEygTzHVsXAPY vXAHatJvEKVf3WwQP/uVflw/CEm1YR8Y44zPhSDTF1YCHNs9AEBVv9INEEGPFJ6jvL 0lSvKiNs/LQu9yizqC2Ds7ofvQKu3LuR9qJtjjOZW+FfcXTdaCank7nF/KbcpxDpyC 0FmT8Yxn4/SHANswoBqJt6X5uJkikVf5sS4p95CtRygTzKqwrguM55LImEEUtlDyZN 6pSbSuzgNYerqlI9BeJNsnEik5g0ADjWEzopCq6tqJF9GaOtukav31+Fi2lmh5pfe0 YQSpi/Aq0ayPw== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 5/9] rcu: Add rcutree.lazy_enabled boot parameter Date: Wed, 31 May 2023 12:17:32 +0200 Message-Id: <20230531101736.12981-6-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org Allow to overwrite the arbitrary default number of lazy callbacks threshold that is currently set to 100. This allows for tuning between powersaving, throughtput and memory consumption expectations. As a bonus, setting this value to 0 disables lazy callbacks. Signed-off-by: Frederic Weisbecker --- Documentation/admin-guide/kernel-parameters.txt | 5 +++++ kernel/rcu/tree.c | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 505978cfb548..dd2be4249061 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4798,6 +4798,11 @@ Set threshold of queued RCU callbacks beyond which batch limiting is disabled. + rcutree.qhimark_lazy = [KNL] + Set threshold of queued lazy RCU callbacks beyond which + batch must be flushed to the main queue. If set to 0, + disable lazy queue. + rcutree.qlowmark= [KNL] Set threshold of queued RCU callbacks below which batch limiting is re-enabled. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9b98d87fa22e..e33c0d889216 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -390,6 +390,7 @@ static long qovld_calc = -1; // No pre-initialization lock acquisitions! module_param(blimit, long, 0444); module_param(qhimark, long, 0444); +module_param(qhimark_lazy, long, 0444); module_param(qlowmark, long, 0444); module_param(qovld, long, 0444); @@ -2655,7 +2656,7 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in) kasan_record_aux_stack_noalloc(head); local_irq_save(flags); rdp = this_cpu_ptr(&rcu_data); - lazy = lazy_in && !rcu_async_should_hurry(); + lazy = lazy_in && qhimark_lazy && !rcu_async_should_hurry(); /* Add the callback to our list. */ if (unlikely(!rcu_segcblist_is_enabled(&rdp->cblist))) { From patchwork Wed May 31 10:17:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261918 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED4CCC77B7C for ; Wed, 31 May 2023 10:18:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235927AbjEaKSe (ORCPT ); Wed, 31 May 2023 06:18:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235891AbjEaKSL (ORCPT ); Wed, 31 May 2023 06:18:11 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB4951B5; Wed, 31 May 2023 03:18:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2A88F632A4; Wed, 31 May 2023 10:18:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 73362C433D2; Wed, 31 May 2023 10:18:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528283; bh=Q+V6lDaB+nI7Zb8ZY0l0DT3+Ay/ikMB41CH+Vnegztk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iOURKizWFHjf9TqHBki8TDKWAOfA1cDXYYFNpLsdj95k/S5ge3ejck9fnzTm3zPG4 82cRaBH93RDUvCYwmwZOGEiOXDfbxQQUiQYVqgY30gH5zrdT0kVy0TuWAT56jh5a7X YQgYrQ+gHo7pWIVCVujqT3rcp/fiYhh7hUL9UuHWDvY5YTXrUaVVXU9Q4Ib+TDQqix dqH0611o5ATRE2MVyK7LcgH0rWNT/Jj1a5xGvXKTLBnM1dN/mCnEunc3AxVXXteKrE 4OiigEYTnMJoMzEMIduiLqgIjHXOW9COAd8Y7o1W1PTF5g4oZIRwf/expMWiSZkmfS /rh8hKBeOj1Lg== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 6/9] rcu/nocb: Rename was_alldone to was_pending Date: Wed, 31 May 2023 12:17:33 +0200 Message-Id: <20230531101736.12981-7-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org Upon enqueuing on an offloaded rdp, RCU checks if the queue was previsouly only made of done callbacks and relies on that information in order to determine if the rcuog kthread needs to be woken up. In order to prepare for moving the lazy callbacks from the bypass queue to the main queue, track instead if there are pending callbacks. For now the meaning of "having pending callbacks" is just the reverse of "having only done callbacks". However lazy callbacks will be ignored from the pending queue in a further patch. Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 14 +++++++------- kernel/rcu/tree.h | 2 +- kernel/rcu/tree_nocb.h | 28 ++++++++++++++-------------- 3 files changed, 22 insertions(+), 22 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e33c0d889216..d71b9915c91e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2633,7 +2633,7 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in) unsigned long flags; bool lazy; struct rcu_data *rdp; - bool was_alldone; + bool was_pending; /* Misaligned rcu_head! */ WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1)); @@ -2670,7 +2670,7 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in) } check_cb_ovld(rdp); - if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy)) + if (rcu_nocb_try_bypass(rdp, head, &was_pending, flags, lazy)) return; // Enqueued onto ->nocb_bypass, so just leave. // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock. rcu_segcblist_enqueue(&rdp->cblist, head); @@ -2686,7 +2686,7 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in) /* Go handle any RCU core processing required. */ if (unlikely(rcu_rdp_is_offloaded(rdp))) { - __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */ + __call_rcu_nocb_wake(rdp, was_pending, flags); /* unlocks */ } else { __call_rcu_core(rdp, head, flags); local_irq_restore(flags); @@ -3936,8 +3936,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp) { unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence); unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap); + bool nocb_no_pending = false; bool wake_nocb = false; - bool was_alldone = false; lockdep_assert_held(&rcu_state.barrier_lock); if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq)) @@ -3951,9 +3951,9 @@ static void rcu_barrier_entrain(struct rcu_data *rdp) * queue. This way we don't wait for bypass timer that can reach seconds * if it's fully lazy. */ - was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist); + nocb_no_pending = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist); WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); - wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist); + wake_nocb = nocb_no_pending && rcu_segcblist_pend_cbs(&rdp->cblist); if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) { atomic_inc(&rcu_state.barrier_cpu_count); } else { @@ -4549,7 +4549,7 @@ void rcutree_migrate_callbacks(int cpu) check_cb_ovld_locked(my_rdp, my_rnp); if (rcu_rdp_is_offloaded(my_rdp)) { raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */ - __call_rcu_nocb_wake(my_rdp, true, flags); + __call_rcu_nocb_wake(my_rdp, false, flags); } else { rcu_nocb_unlock(my_rdp); /* irqs remain disabled. */ raw_spin_unlock_irqrestore_rcu_node(my_rnp, flags); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 192536916f9a..966abe037f57 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -464,7 +464,7 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force); static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, unsigned long j, bool lazy); static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - bool *was_alldone, unsigned long flags, + bool *was_pending, unsigned long flags, bool lazy); static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, unsigned long flags); diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index c08447db5a2e..d8b17c69110a 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -413,7 +413,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j) * there is only one CPU in operation. */ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - bool *was_alldone, unsigned long flags, + bool *was_pending, unsigned long flags, bool lazy) { unsigned long c; @@ -427,7 +427,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // Pure softirq/rcuc based processing: no bypassing, no // locking. if (!rcu_rdp_is_offloaded(rdp)) { - *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); + *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); return false; } @@ -435,7 +435,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // locking. if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { rcu_nocb_lock(rdp); - *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); + *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); return false; /* Not offloaded, no bypassing. */ } @@ -443,7 +443,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING) { rcu_nocb_lock(rdp); WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); - *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); + *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); return false; } @@ -468,8 +468,8 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // Lazy CBs throttle this back and do immediate bypass queuing. if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) { rcu_nocb_lock(rdp); - *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); - if (*was_alldone) + *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); + if (!*was_pending) trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstQ")); @@ -484,10 +484,10 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, ((!bypass_is_lazy && ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) || (bypass_is_lazy && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || ncbs >= qhimark_lazy)))) { rcu_nocb_lock(rdp); - *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist); + *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy)) { - if (*was_alldone) + if (!*was_pending) trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstQ")); WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); @@ -503,7 +503,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // The flush succeeded and we moved CBs into the regular list. // Don't wait for the wake up timer as it may be too far ahead. // Wake up the GP thread now instead, if the cblist was empty. - __call_rcu_nocb_wake(rdp, *was_alldone, flags); + __call_rcu_nocb_wake(rdp, *was_pending, flags); return true; // Callback already enqueued. } @@ -539,7 +539,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, if (!rcu_segcblist_pend_cbs(&rdp->cblist)) { trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQwake")); - __call_rcu_nocb_wake(rdp, true, flags); + __call_rcu_nocb_wake(rdp, false, flags); } else { trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQnoWake")); @@ -555,7 +555,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, * * If warranted, also wake up the kthread servicing this CPUs queues. */ -static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, +static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_pending, unsigned long flags) __releases(rdp->nocb_lock) { @@ -578,7 +578,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone, len = rcu_segcblist_n_cbs(&rdp->cblist); bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass); lazy_len = READ_ONCE(rdp->lazy_len); - if (was_alldone) { + if (!was_pending) { rdp->qlen_last_fqs_check = len; // Only lazy CBs in bypass list if (lazy_len && bypass_len == lazy_len) { @@ -1767,12 +1767,12 @@ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, } static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - bool *was_alldone, unsigned long flags, bool lazy) + bool *was_pending, unsigned long flags, bool lazy) { return false; } -static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, +static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_pending, unsigned long flags) { WARN_ON_ONCE(1); /* Should be dead code! */ From patchwork Wed May 31 10:17:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261920 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C859C77B7C for ; Wed, 31 May 2023 10:19:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235400AbjEaKS6 (ORCPT ); Wed, 31 May 2023 06:18:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231514AbjEaKST (ORCPT ); Wed, 31 May 2023 06:18:19 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 431FBE45; Wed, 31 May 2023 03:18:07 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1E1FE632D2; Wed, 31 May 2023 10:18:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1058EC4339B; Wed, 31 May 2023 10:18:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528286; bh=JI+zCdQIPywTzYQ2XlPqVyYDc0CNvcXd6W1q9yu8sXw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Os0hL/d0D1vHsP5DYmYRTnXAZGUc2+ziS3PNtLtBTMwCXolEiwr139pBMuwGBNc2/ leTQoDEx6jiv8+TBy3r2W3WrViCL4ZKijCMvYkFKvFtmoBEBei6w/L88R+jWXYxrHz DRVEsmh8r6s6UWA1BcUORqIXb6x8Bq00KYGUVvC4XkQDfTLX9vqEcW13b0u8BULgSi EF7qbsUnbQ1nIHC9IeAoA9FQtTmzP45ZSWI/4oE0/A3M/pzBqLY2g9LaAj8SnE3oJe aJ2OXV21lMIQsYLH6FqaZn/PlFzzpcv7HlBlIJLzuufBqwcNm1FJrNf1jqIbgNe06a iq/G31Xt3a+hA== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 7/9] rcu: Implement lazyness on the main segcblist level Date: Wed, 31 May 2023 12:17:34 +0200 Message-Id: <20230531101736.12981-8-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org The lazy queue is currently implemented by the bypass list which only exists on CONFIG_RCU_NOCB=y with offloaded rdp. Supporting the lazy queue on non offloaded rdp will require a different approach based on the main per-cpu segmented callback list. And ideally most of the lazy infrastructure behind offloaded and non-offloaded should be made generic and consolidated. Therefore in order to prepare for supporting lazy callbacks on non-offloaded rdp, switch the lazy callbacks infrastructure from the bypass list to the main segmented callback list. Lazy callbacks are then enqueued like any other callbacks to the RCU_NEXT_TAIL segment and a SEGCBLIST_NEXT_TAIL_LAZY flag tells if that segment is completely lazy or not. A lazy queue gets ignored by acceleration, unless it can piggyback with the acceleration of existing callbacks in RCU_NEXT_READY_TAIL or RCU_WAIT_TAIL. If anything this introduces a tiny optimization as compared to the bypass list. As for the offloaded implementation specifics, the rcuog kthread is only woken up if the RCU_NEXT_TAIL segment is not lazy. Suggested-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker --- include/linux/rcu_segcblist.h | 13 +-- kernel/rcu/rcu_segcblist.c | 42 ++++++++-- kernel/rcu/rcu_segcblist.h | 21 ++++- kernel/rcu/tree.c | 98 ++++++++++++++++++++-- kernel/rcu/tree.h | 8 +- kernel/rcu/tree_nocb.h | 154 ++++++++++++++-------------------- 6 files changed, 221 insertions(+), 115 deletions(-) diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h index 659d13a7ddaa..9bc2d556d4d4 100644 --- a/include/linux/rcu_segcblist.h +++ b/include/linux/rcu_segcblist.h @@ -196,12 +196,13 @@ struct rcu_cblist { * | rcuc kthread, without holding nocb_lock. | * ---------------------------------------------------------------------------- */ -#define SEGCBLIST_ENABLED BIT(0) -#define SEGCBLIST_RCU_CORE BIT(1) -#define SEGCBLIST_LOCKING BIT(2) -#define SEGCBLIST_KTHREAD_CB BIT(3) -#define SEGCBLIST_KTHREAD_GP BIT(4) -#define SEGCBLIST_OFFLOADED BIT(5) +#define SEGCBLIST_ENABLED BIT(0) +#define SEGCBLIST_RCU_CORE BIT(1) +#define SEGCBLIST_LOCKING BIT(2) +#define SEGCBLIST_KTHREAD_CB BIT(3) +#define SEGCBLIST_KTHREAD_GP BIT(4) +#define SEGCBLIST_OFFLOADED BIT(5) +#define SEGCBLIST_NEXT_TAIL_LAZY BIT(6) struct rcu_segcblist { struct rcu_head *head; diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c index 1693ea22ef1b..9f604d721cb9 100644 --- a/kernel/rcu/rcu_segcblist.c +++ b/kernel/rcu/rcu_segcblist.c @@ -291,6 +291,27 @@ bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp) !rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL); } +/* + * Does the specified segcblist have pending callbacks beyond the + * lazy ones? + */ +bool rcu_segcblist_pend_cbs_nolazy(struct rcu_segcblist *rsclp) +{ + int i; + + if (!rcu_segcblist_pend_cbs(rsclp)) + return false; + + if (!rcu_segcblist_n_cbs_lazy(rsclp)) + return true; + + for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) + if (!rcu_segcblist_segempty(rsclp, i)) + return true; + + return false; +} + /* * Return a pointer to the first callback in the specified rcu_segcblist * structure. This is useful for diagnostics. @@ -320,9 +341,9 @@ struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp) * Return false if there are no CBs awaiting grace periods, otherwise, * return true and store the nearest waited-upon grace period into *lp. */ -bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp) +bool rcu_segcblist_nextgp_nolazy(struct rcu_segcblist *rsclp, unsigned long *lp) { - if (!rcu_segcblist_pend_cbs(rsclp)) + if (!rcu_segcblist_pend_cbs_nolazy(rsclp)) return false; *lp = rsclp->gp_seq[RCU_WAIT_TAIL]; return true; @@ -537,6 +558,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq) bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq) { int i, j; + bool empty_dest = true; WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp)); if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL)) @@ -550,10 +572,14 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq) * callbacks in the RCU_NEXT_TAIL segment, and assigned "seq" * as their ->gp_seq[] grace-period completion sequence number. */ - for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--) - if (!rcu_segcblist_segempty(rsclp, i) && - ULONG_CMP_LT(rsclp->gp_seq[i], seq)) - break; + for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--) { + if (!rcu_segcblist_segempty(rsclp, i)) { + if (ULONG_CMP_LT(rsclp->gp_seq[i], seq)) + break; + else + empty_dest = false; + } + } /* * If all the segments contain callbacks that correspond to @@ -579,6 +605,10 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq) if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL) return false; + /* Ignore lazy callbacks, unless there is a queue they can piggyback in. */ + if (rcu_segcblist_next_is_lazy(rsclp) && empty_dest) + return false; + /* Accounting: everything below i is about to get merged into i. */ for (j = i + 1; j <= RCU_NEXT_TAIL; j++) rcu_segcblist_move_seglen(rsclp, j, i); diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h index 4fe877f5f654..620ca48e782b 100644 --- a/kernel/rcu/rcu_segcblist.h +++ b/kernel/rcu/rcu_segcblist.h @@ -104,6 +104,24 @@ static inline bool rcu_segcblist_completely_offloaded(struct rcu_segcblist *rscl return false; } +static inline bool rcu_segcblist_next_is_lazy(struct rcu_segcblist *rsclp) +{ + if (IS_ENABLED(CONFIG_RCU_LAZY) && + rcu_segcblist_test_flags(rsclp, SEGCBLIST_NEXT_TAIL_LAZY)) + return true; + + return false; +} + +/* Return number of callbacks in segmented callback list. */ +static inline long rcu_segcblist_n_cbs_lazy(struct rcu_segcblist *rsclp) +{ + if (rcu_segcblist_next_is_lazy(rsclp)) + return rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL); + else + return 0; +} + /* * Are all segments following the specified segment of the specified * rcu_segcblist structure empty of callbacks? (The specified @@ -132,9 +150,10 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp); void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload); bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp); bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp); +bool rcu_segcblist_pend_cbs_nolazy(struct rcu_segcblist *rsclp); struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp); struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp); -bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp); +bool rcu_segcblist_nextgp_nolazy(struct rcu_segcblist *rsclp, unsigned long *lp); void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, struct rcu_head *rhp); bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index d71b9915c91e..e48ccbe0f2f6 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -394,6 +394,16 @@ module_param(qhimark_lazy, long, 0444); module_param(qlowmark, long, 0444); module_param(qovld, long, 0444); +/* + * LAZY_FLUSH_JIFFIES decides the maximum amount of time that + * can elapse before lazy callbacks are flushed. Lazy callbacks + * could be flushed much earlier for a number of other reasons + * however, LAZY_FLUSH_JIFFIES will ensure no lazy callbacks are + * left unsubmitted to RCU after those many jiffies. + */ +#define LAZY_FLUSH_JIFFIES (10 * HZ) +static unsigned long jiffies_lazy_flush = LAZY_FLUSH_JIFFIES; + static ulong jiffies_till_first_fqs = IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 0 : ULONG_MAX; static ulong jiffies_till_next_fqs = ULONG_MAX; static bool rcu_kick_kthreads; @@ -1074,8 +1084,12 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp) * number. */ gp_seq_req = rcu_seq_snap(&rcu_state.gp_seq); - if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_req)) + if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_req)) { + /* The RCU_NEXT_TAIL has been flushed, reset the lazy bit accordingly */ + if (IS_ENABLED(CONFIG_RCU_LAZY) && qhimark_lazy && rcu_segcblist_completely_offloaded(&rdp->cblist)) + rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); ret = rcu_start_this_gp(rnp, rdp, gp_seq_req); + } /* Trace depending on how much we were able to accelerate. */ if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL)) @@ -1105,7 +1119,11 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp, c = rcu_seq_snap(&rcu_state.gp_seq); if (!READ_ONCE(rdp->gpwrap) && ULONG_CMP_GE(rdp->gp_seq_needed, c)) { /* Old request still live, so mark recent callbacks. */ - (void)rcu_segcblist_accelerate(&rdp->cblist, c); + if (rcu_segcblist_accelerate(&rdp->cblist, c)) { + /* The RCU_NEXT_TAIL has been flushed, reset the lazy bit accordingly */ + if (IS_ENABLED(CONFIG_RCU_LAZY) && qhimark_lazy && rcu_segcblist_completely_offloaded(&rdp->cblist)) + rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + } return; } raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ @@ -2626,6 +2644,56 @@ static void check_cb_ovld(struct rcu_data *rdp) raw_spin_unlock_rcu_node(rnp); } +/* + * Handle lazy callbacks. Return true if no further handling is needed (unlocks nocb then). + * Return false if further treatment is needed (wake rcuog kthread, set the nocb timer, etc...). + */ +static bool __call_rcu_lazy(struct rcu_data *rdp, bool was_pending, bool lazy, unsigned long flags) + __releases(rdp->nocb_lock) +{ + long lazy_len; + unsigned long timeout; + + if (!rcu_segcblist_next_is_lazy(&rdp->cblist)) + return false; + + /* New callback is not lazy, unlazy the queue */ + if (!lazy) { + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + return false; + } + + lazy_len = rcu_segcblist_get_seglen(&rdp->cblist, RCU_NEXT_TAIL); + /* First lazy callback on an empty queue, set the timer if necessary */ + if (lazy_len == 1) { + WRITE_ONCE(rdp->lazy_firstq, jiffies); + if (!was_pending) + return false; + else + goto out; + } + + /* Too many lazy callbacks, unlazy them */ + if (lazy_len >= qhimark_lazy) { + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + return false; + } + + timeout = rdp->lazy_firstq + jiffies_lazy_flush; + + /* Lazy callbacks are too old, unlazy them */ + if (time_after(READ_ONCE(jiffies), timeout)) { + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + return false; + } + +out: + /* No further treatment is needed */ + rcu_nocb_unlock_irqrestore(rdp, flags); + + return true; +} + static void __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in) { @@ -2670,8 +2738,10 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in) } check_cb_ovld(rdp); + if (rcu_nocb_try_bypass(rdp, head, &was_pending, flags, lazy)) return; // Enqueued onto ->nocb_bypass, so just leave. + // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock. rcu_segcblist_enqueue(&rdp->cblist, head); if (__is_kvfree_rcu_offset((unsigned long)func)) @@ -2684,6 +2754,9 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in) trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued")); + if (__call_rcu_lazy(rdp, was_pending, lazy, flags)) + return; + /* Go handle any RCU core processing required. */ if (unlikely(rcu_rdp_is_offloaded(rdp))) { __call_rcu_nocb_wake(rdp, was_pending, flags); /* unlocks */ @@ -3948,12 +4021,18 @@ static void rcu_barrier_entrain(struct rcu_data *rdp) rcu_nocb_lock(rdp); /* * Flush bypass and wakeup rcuog if we add callbacks to an empty regular - * queue. This way we don't wait for bypass timer that can reach seconds - * if it's fully lazy. + * queue. This way we don't wait for bypass timer. */ - nocb_no_pending = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist); - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); + nocb_no_pending = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs_nolazy(&rdp->cblist); + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies)); + /* + * Make sure the entrained callback isn't treated as lazy. This brainlessly + * flush the queue and might even prevent the next lazy callback from being + * treated as lazy if RCU_NEXT_TAIL is empty. But no big deal. + */ + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); wake_nocb = nocb_no_pending && rcu_segcblist_pend_cbs(&rdp->cblist); + if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) { atomic_inc(&rcu_state.barrier_cpu_count); } else { @@ -4536,7 +4615,12 @@ void rcutree_migrate_callbacks(int cpu) my_rdp = this_cpu_ptr(&rcu_data); my_rnp = my_rdp->mynode; rcu_nocb_lock(my_rdp); /* irqs already disabled. */ - WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies, false)); + WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies)); + /* + * We are going to merge external callbacks, make sure they won't + * be accidentally tagged as lazy. + */ + rcu_segcblist_clear_flags(&my_rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); raw_spin_lock_rcu_node(my_rnp); /* irqs already disabled. */ /* Leverage recent GPs and set GP for new callbacks. */ needwake = rcu_advance_cbs(my_rnp, rdp) || diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 966abe037f57..90b39ff8ad70 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -281,8 +281,7 @@ struct rcu_data { unsigned long last_sched_clock; /* Jiffies of last rcu_sched_clock_irq(). */ struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */ /* the first RCU stall timeout */ - - long lazy_len; /* Length of buffered lazy callbacks. */ + unsigned long lazy_firstq; int cpu; }; @@ -462,10 +461,9 @@ static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq); static void rcu_init_one_nocb(struct rcu_node *rnp); static bool wake_nocb_gp(struct rcu_data *rdp, bool force); static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - unsigned long j, bool lazy); + unsigned long j); static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - bool *was_pending, unsigned long flags, - bool lazy); + bool *was_pending, unsigned long flags, bool lazy); static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, unsigned long flags); static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level); diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index d8b17c69110a..fbd54a2e1f17 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -256,16 +256,6 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force) return __wake_nocb_gp(rdp_gp, rdp, force, flags); } -/* - * LAZY_FLUSH_JIFFIES decides the maximum amount of time that - * can elapse before lazy callbacks are flushed. Lazy callbacks - * could be flushed much earlier for a number of other reasons - * however, LAZY_FLUSH_JIFFIES will ensure no lazy callbacks are - * left unsubmitted to RCU after those many jiffies. - */ -#define LAZY_FLUSH_JIFFIES (10 * HZ) -static unsigned long jiffies_lazy_flush = LAZY_FLUSH_JIFFIES; - #ifdef CONFIG_RCU_LAZY // To be called only from test code. void rcu_lazy_set_jiffies_lazy_flush(unsigned long jif) @@ -327,16 +317,16 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, * * Note that this function always returns true if rhp is NULL. */ -static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp_in, - unsigned long j, bool lazy) +static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, + unsigned long j) { struct rcu_cblist rcl; - struct rcu_head *rhp = rhp_in; + long len = rcu_cblist_n_cbs(&rdp->nocb_bypass); WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp)); rcu_lockdep_assert_cblist_protected(rdp); lockdep_assert_held(&rdp->nocb_bypass_lock); - if (rhp && !rcu_cblist_n_cbs(&rdp->nocb_bypass)) { + if (rhp && !len) { raw_spin_unlock(&rdp->nocb_bypass_lock); return false; } @@ -344,22 +334,15 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp_ if (rhp) rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */ - /* - * If the new CB requested was a lazy one, queue it onto the main - * ->cblist so that we can take advantage of the grace-period that will - * happen regardless. But queue it onto the bypass list first so that - * the lazy CB is ordered with the existing CBs in the bypass list. - */ - if (lazy && rhp) { - rcu_cblist_enqueue(&rdp->nocb_bypass, rhp); - rhp = NULL; - } rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp); - WRITE_ONCE(rdp->lazy_len, 0); rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl); WRITE_ONCE(rdp->nocb_bypass_first, j); rcu_nocb_bypass_unlock(rdp); + + if (len) + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + return true; } @@ -372,13 +355,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp_ * Note that this function always returns true if rhp is NULL. */ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - unsigned long j, bool lazy) + unsigned long j) { if (!rcu_rdp_is_offloaded(rdp)) return true; rcu_lockdep_assert_cblist_protected(rdp); rcu_nocb_bypass_lock(rdp); - return rcu_nocb_do_flush_bypass(rdp, rhp, j, lazy); + return rcu_nocb_do_flush_bypass(rdp, rhp, j); } /* @@ -391,7 +374,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j) if (!rcu_rdp_is_offloaded(rdp) || !rcu_nocb_bypass_trylock(rdp)) return; - WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j, false)); + WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j)); } /* @@ -413,14 +396,12 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j) * there is only one CPU in operation. */ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - bool *was_pending, unsigned long flags, - bool lazy) + bool *was_pending, unsigned long flags, bool lazy) { unsigned long c; unsigned long cur_gp_seq; unsigned long j = jiffies; long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass); - bool bypass_is_lazy = (ncbs == READ_ONCE(rdp->lazy_len)); lockdep_assert_irqs_disabled(); @@ -435,7 +416,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, // locking. if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { rcu_nocb_lock(rdp); - *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); + *was_pending = rcu_segcblist_pend_cbs_nolazy(&rdp->cblist); return false; /* Not offloaded, no bypassing. */ } @@ -443,7 +424,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING) { rcu_nocb_lock(rdp); WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); - *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); + *was_pending = rcu_segcblist_pend_cbs_nolazy(&rdp->cblist); return false; } @@ -460,33 +441,34 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, else if (c > nocb_nobypass_lim_per_jiffy) c = nocb_nobypass_lim_per_jiffy; } - WRITE_ONCE(rdp->nocb_nobypass_count, c); // If there hasn't yet been all that many ->cblist enqueues // this jiffy, tell the caller to enqueue onto ->cblist. But flush // ->nocb_bypass first. - // Lazy CBs throttle this back and do immediate bypass queuing. - if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) { + if (c < nocb_nobypass_lim_per_jiffy) { rcu_nocb_lock(rdp); - *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); + if (!rcu_segcblist_next_is_lazy(&rdp->cblist) || !lazy) + WRITE_ONCE(rdp->nocb_nobypass_count, c); + *was_pending = rcu_segcblist_pend_cbs_nolazy(&rdp->cblist); if (!*was_pending) trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstQ")); - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false)); + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j)); WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); return false; // Caller must enqueue the callback. } + WRITE_ONCE(rdp->nocb_nobypass_count, c); + // If ->nocb_bypass has been used too long or is too full, // flush ->nocb_bypass to ->cblist. if (ncbs && - ((!bypass_is_lazy && ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) || - (bypass_is_lazy && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || ncbs >= qhimark_lazy)))) { + ((j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark)) { rcu_nocb_lock(rdp); - *was_pending = rcu_segcblist_pend_cbs(&rdp->cblist); + *was_pending = rcu_segcblist_pend_cbs_nolazy(&rdp->cblist);; - if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy)) { + if (!rcu_nocb_flush_bypass(rdp, rhp, j)) { if (!*was_pending) trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstQ")); @@ -494,7 +476,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, return false; // Caller must enqueue the callback. } if (j != rdp->nocb_gp_adv_time && - rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) && + rcu_segcblist_nextgp_nolazy(&rdp->cblist, &cur_gp_seq) && rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) { rcu_advance_cbs_nowake(rdp->mynode, rdp); rdp->nocb_gp_adv_time = j; @@ -515,9 +497,6 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */ rcu_cblist_enqueue(&rdp->nocb_bypass, rhp); - if (lazy) - WRITE_ONCE(rdp->lazy_len, rdp->lazy_len + 1); - if (!ncbs) { WRITE_ONCE(rdp->nocb_bypass_first, j); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQ")); @@ -525,18 +504,14 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp, rcu_nocb_bypass_unlock(rdp); smp_mb(); /* Order enqueue before wake. */ // A wake up of the grace period kthread or timer adjustment - // needs to be done only if: - // 1. Bypass list was fully empty before (this is the first - // bypass list entry), or: - // 2. Both of these conditions are met: - // a. The bypass list previously had only lazy CBs, and: - // b. The new CB is non-lazy. - if (ncbs && (!bypass_is_lazy || lazy)) { + // needs to be done only if bypass list was fully empty before + // (this is the first bypass list entry). + if (ncbs) { local_irq_restore(flags); } else { // No-CBs GP kthread might be indefinitely asleep, if so, wake. rcu_nocb_lock(rdp); // Rare during call_rcu() flood. - if (!rcu_segcblist_pend_cbs(&rdp->cblist)) { + if (!rcu_segcblist_pend_cbs_nolazy(&rdp->cblist)) { trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQwake")); __call_rcu_nocb_wake(rdp, false, flags); @@ -559,10 +534,8 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_pending, unsigned long flags) __releases(rdp->nocb_lock) { - long bypass_len; unsigned long cur_gp_seq; unsigned long j; - long lazy_len; long len; struct task_struct *t; @@ -576,12 +549,11 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_pending, } // Need to actually to a wakeup. len = rcu_segcblist_n_cbs(&rdp->cblist); - bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass); - lazy_len = READ_ONCE(rdp->lazy_len); if (!was_pending) { rdp->qlen_last_fqs_check = len; - // Only lazy CBs in bypass list - if (lazy_len && bypass_len == lazy_len) { + // Only lazy CBs in queue + if (rcu_segcblist_n_cbs_lazy(&rdp->cblist) && + !rcu_cblist_n_cbs(&rdp->nocb_bypass)) { rcu_nocb_unlock_irqrestore(rdp, flags); wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY, TPS("WakeLazy")); @@ -601,7 +573,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_pending, rdp->qlen_last_fqs_check = len; j = jiffies; if (j != rdp->nocb_gp_adv_time && - rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) && + rcu_segcblist_nextgp_nolazy(&rdp->cblist, &cur_gp_seq) && rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) { rcu_advance_cbs_nowake(rdp->mynode, rdp); rdp->nocb_gp_adv_time = j; @@ -712,42 +684,35 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) */ list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) { long bypass_ncbs; - bool flush_bypass = false; long lazy_ncbs; trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check")); rcu_nocb_lock_irqsave(rdp, flags); lockdep_assert_held(&rdp->nocb_lock); bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass); - lazy_ncbs = READ_ONCE(rdp->lazy_len); + lazy_ncbs = rcu_segcblist_n_cbs_lazy(&rdp->cblist); - if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) && - (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_lazy_flush) || - bypass_ncbs > 2 * qhimark_lazy)) { - flush_bypass = true; - } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) && + if (lazy_ncbs && + (time_after(j, READ_ONCE(rdp->lazy_firstq) + jiffies_lazy_flush) || + lazy_ncbs > 2 * qhimark_lazy)) { + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + } + + if (bypass_ncbs && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) || bypass_ncbs > 2 * qhimark)) { - flush_bypass = true; - } else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) { - rcu_nocb_unlock_irqrestore(rdp, flags); - continue; /* No callbacks here, try next. */ - } - - if (flush_bypass) { // Bypass full or old, so flush it. (void)rcu_nocb_try_flush_bypass(rdp, j); bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass); - lazy_ncbs = READ_ONCE(rdp->lazy_len); + } else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) { + rcu_nocb_unlock_irqrestore(rdp, flags); + continue; /* No callbacks here, try next. */ } if (bypass_ncbs) { trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, bypass_ncbs == lazy_ncbs ? TPS("Lazy") : TPS("Bypass")); - if (bypass_ncbs == lazy_ncbs) - lazy = true; - else - bypass = true; + bypass = true; } rnp = rdp->mynode; @@ -755,7 +720,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) needwake_gp = false; if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL) || - (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) && + (rcu_segcblist_nextgp_nolazy(&rdp->cblist, &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq))) { raw_spin_lock_rcu_node(rnp); /* irqs disabled. */ needwake_gp = rcu_advance_cbs(rnp, rdp); @@ -767,7 +732,14 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) WARN_ON_ONCE(wasempty && !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)); - if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq)) { + /* + * Lazy callbacks haven't expired and haven't been piggybacked within + * the last acceleration. + */ + if (rcu_segcblist_n_cbs_lazy(&rdp->cblist)) + lazy = true; + + if (rcu_segcblist_nextgp_nolazy(&rdp->cblist, &cur_gp_seq)) { if (!needwait_gp || ULONG_CMP_LT(cur_gp_seq, wait_gp_seq)) wait_gp_seq = cur_gp_seq; @@ -954,7 +926,7 @@ static void nocb_cb_wait(struct rcu_data *rdp) local_bh_enable(); lockdep_assert_irqs_enabled(); rcu_nocb_lock_irqsave(rdp, flags); - if (rcu_segcblist_nextgp(cblist, &cur_gp_seq) && + if (rcu_segcblist_nextgp_nolazy(cblist, &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) && raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */ needwake_gp = rcu_advance_cbs(rdp->mynode, rdp); @@ -1134,7 +1106,7 @@ static long rcu_nocb_rdp_deoffload(void *arg) * return false, which means that future calls to rcu_nocb_try_bypass() * will refuse to put anything into the bypass. */ - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies)); /* * Start with invoking rcu_core() early. This way if the current thread * happens to preempt an ongoing call to rcu_core() in the middle, @@ -1144,6 +1116,9 @@ static long rcu_nocb_rdp_deoffload(void *arg) */ rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE); invoke_rcu_core(); + /* Deoffloaded doesn't support lazyness yet */ + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + wake_gp = rdp_offload_toggle(rdp, false, flags); mutex_lock(&rdp_gp->nocb_gp_kthread_mutex); @@ -1329,7 +1304,7 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) for_each_cpu(cpu, rcu_nocb_mask) { struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); - count += READ_ONCE(rdp->lazy_len); + count += rcu_segcblist_n_cbs_lazy(&rdp->cblist); } mutex_unlock(&rcu_state.barrier_mutex); @@ -1368,7 +1343,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp))) continue; - if (!READ_ONCE(rdp->lazy_len)) + if (!rcu_segcblist_n_cbs_lazy(&rdp->cblist)) continue; rcu_nocb_lock_irqsave(rdp, flags); @@ -1377,12 +1352,12 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) * lock we may still race with increments from the enqueuer but still * we know for sure if there is at least one lazy callback. */ - _count = READ_ONCE(rdp->lazy_len); + _count = rcu_segcblist_n_cbs_lazy(&rdp->cblist); if (!_count) { rcu_nocb_unlock_irqrestore(rdp, flags); continue; } - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); rcu_nocb_unlock_irqrestore(rdp, flags); wake_nocb_gp(rdp, false); sc->nr_to_scan -= _count; @@ -1474,7 +1449,6 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) raw_spin_lock_init(&rdp->nocb_gp_lock); timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0); rcu_cblist_init(&rdp->nocb_bypass); - WRITE_ONCE(rdp->lazy_len, 0); mutex_init(&rdp->nocb_gp_kthread_mutex); } @@ -1761,7 +1735,7 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force) } static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, - unsigned long j, bool lazy) + unsigned long j) { return true; } From patchwork Wed May 31 10:17:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F197C77B7A for ; Wed, 31 May 2023 10:18:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235950AbjEaKSn (ORCPT ); Wed, 31 May 2023 06:18:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235910AbjEaKSV (ORCPT ); Wed, 31 May 2023 06:18:21 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D5E0E58; Wed, 31 May 2023 03:18:10 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BC8616395F; Wed, 31 May 2023 10:18:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02BADC433EF; Wed, 31 May 2023 10:18:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528289; bh=fAqo46X9Mw2V2BkQzIfih4SUfDwqn3L8IoBC3xT459U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ePLMv+jMZFPi3Hit+6vSJE2+ji1QEtu/1Aed674mJKNLlzome+1Jg43GPLnjJ2muu gSce7Q0YAsBablSP4vC2TV5+BD03NtbB8u5xHW++Ld80Q0yhYOnIw0zUqRIShaJkf9 d7e5vOF4kABaMNpamGZ8EPOqUJkWg2Fvb2TLHizOgLe7IMQEiBK27isqZDXZF2vjky 07zI8sNXooZAfWZb6TfrV4UVhOHGdcexmRnZPXv5530QkkzQFUcM939ZzZ+O8HV55g zRQttLupK1gTS2yKpzILZClL120vGPG+9wzLK4/rWUTpE7SGCIXwEzIqjj/3N/NlEv A5UthPlU6UryQ== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 8/9] rcu: Make segcblist flags test strict Date: Wed, 31 May 2023 12:17:35 +0200 Message-Id: <20230531101736.12981-9-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org While testing several flags at once, make sure that all of them verify the test. This will be necessary to check if an rdp is (de-)offloading. Signed-off-by: Frederic Weisbecker --- kernel/rcu/rcu_segcblist.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h index 620ca48e782b..36245efdf800 100644 --- a/kernel/rcu/rcu_segcblist.h +++ b/kernel/rcu/rcu_segcblist.h @@ -70,7 +70,7 @@ static inline void rcu_segcblist_clear_flags(struct rcu_segcblist *rsclp, static inline bool rcu_segcblist_test_flags(struct rcu_segcblist *rsclp, int flags) { - return READ_ONCE(rsclp->flags) & flags; + return (READ_ONCE(rsclp->flags) & flags) == flags; } /* From patchwork Wed May 31 10:17:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13261921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83F8BC77B7C for ; Wed, 31 May 2023 10:19:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235917AbjEaKTR (ORCPT ); Wed, 31 May 2023 06:19:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235892AbjEaKSd (ORCPT ); Wed, 31 May 2023 06:18:33 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23FFEE60; Wed, 31 May 2023 03:18:13 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 77C8E62F14; Wed, 31 May 2023 10:18:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2B01C4339E; Wed, 31 May 2023 10:18:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685528291; bh=UJ984DEuGUy+c9Gxy5RefY2CK3sksB4978t6B/2fQmI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SuWIKGN/RGQe1Q6HP6LLDw1InPXrKGDuH1xs/Tc7ed4n8wUT8q36JHaCQbS7ktbXn es99kCwnSbmEia1ntLac22hUD2P4WG7dK1/PyQ6nbhxKggM7OWxkrjXmmjbf5Al/cj 9hzIXMNMsxtYUgNLPMNEIGQompdm2Q/vbHBGS8B1BCevTtV+dYq/BU31YhsSL/FGXK 6bWOvZorwzc8h6NRyvgUqXFkT9iuN3NVWWXr/wDGPjdHmXz6jC95ViGmxhUjejZvwV Oi840c8TYaAMmt7nh8rpDpuX85cYDUumQfqYEvDNe5Y8IBMigjEGM6JKhPhiYCrHiC TJPjs3K19Ee/Q== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , rcu , Uladzislau Rezki , Neeraj Upadhyay , Joel Fernandes , Giovanni Gherdovich Subject: [PATCH 9/9] rcu: Support lazy callbacks with CONFIG_RCU_NOCB_CPU=n Date: Wed, 31 May 2023 12:17:36 +0200 Message-Id: <20230531101736.12981-10-frederic@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230531101736.12981-1-frederic@kernel.org> References: <20230531101736.12981-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org Support the lazy queue with CONFIG_RCU_NOCB_CPU=n or CONFIG_RCU_NOCB=y with non-offloaded rdp. This reuses most of the lazy infrastructure. The major difference is the addition of a dedicated per-CPU timer which runs as long as the queue is lazy to make sure that lazy callbacks eventually expire. It's worth noting that the timer is not cancelled when the lazy queue is accelerated (reset) for performance reasons. It may therefore run spuriously though the delay is long enough (10 secs) for it to go mostly unnoticed. Nohz_full CPUs shouldn't suffer from that since they rely on the NOCB implementation. Some interesting numbers have been observed on a mostly idle system. The test runs 100 times "sleep 10" on an 8 CPU machine and computes an average of the idle time spent on all CPUs per C-state before and after this patch. The following displays the improvement: Before the patch: POLL: 0.000006 C1: 0.001064 C1E: 0.000777 C3: 0.000457 C6: 2.711224 C7s: 47.484802 Total: 50.198330 After the patch: POLL: 0.000011 C1: 0.001088 C1E: 0.000874 C3: 0.000545 C6: 3.234707 C7s: 53.101949 Total: 56.339175 Diff: POLL: +0.000005 (+43.73%) C1: +0.000024 (+2.25%) C1E: +0.000097 (+11.11%) C3: +0.000088 (+16.16%) C6: +0.523482 (+16.18%) C7s: +5.617148 (+10.58%) Total +6.140844 (+10.90%) It's worth noting that the above may depend on the idle load (here an idle ssh connection is probably the source of some periodic lazy callbacks queued that get batched, hence the improvement). And more important further testing is mandatory to ensure that this doesn't introduce a performance regression on busy loads. Signed-off-by: Frederic Weisbecker --- kernel/rcu/Kconfig | 2 +- kernel/rcu/rcu_segcblist.h | 9 +++ kernel/rcu/tree.c | 153 +++++++++++++++++++++++++++++++++++-- kernel/rcu/tree.h | 1 + kernel/rcu/tree_nocb.h | 110 +++----------------------- 5 files changed, 167 insertions(+), 108 deletions(-) diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index bdd7eadb33d8..b5e45c3a77e5 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -308,7 +308,7 @@ config TASKS_TRACE_RCU_READ_MB config RCU_LAZY bool "RCU callback lazy invocation functionality" - depends on RCU_NOCB_CPU + depends on TREE_RCU default n help To save power, batch RCU callbacks and flush after delay, memory diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h index 36245efdf800..0033da9a42fa 100644 --- a/kernel/rcu/rcu_segcblist.h +++ b/kernel/rcu/rcu_segcblist.h @@ -104,6 +104,15 @@ static inline bool rcu_segcblist_completely_offloaded(struct rcu_segcblist *rscl return false; } +static inline bool rcu_segcblist_nocb_transitioning(struct rcu_segcblist *rsclp) +{ + if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) && + rcu_segcblist_test_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_RCU_CORE)) + return true; + + return false; +} + static inline bool rcu_segcblist_next_is_lazy(struct rcu_segcblist *rsclp) { if (IS_ENABLED(CONFIG_RCU_LAZY) && diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e48ccbe0f2f6..467a9cda7e71 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -669,8 +669,19 @@ NOKPROBE_SYMBOL(__rcu_irq_enter_check_tick); */ int rcu_needs_cpu(void) { - return !rcu_segcblist_empty(&this_cpu_ptr(&rcu_data)->cblist) && - !rcu_rdp_is_offloaded(this_cpu_ptr(&rcu_data)); + struct rcu_data *rdp = this_cpu_ptr(&rcu_data); + + if (rcu_segcblist_empty(&rdp->cblist)) + return false; + + if (rcu_rdp_is_offloaded(rdp)) + return false; + + if (IS_ENABLED(CONFIG_RCU_LAZY) && + rcu_segcblist_n_cbs_lazy(&rdp->cblist) == rcu_segcblist_n_cbs(&rdp->cblist)) + return false; + + return true; } /* @@ -1086,7 +1097,7 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp) gp_seq_req = rcu_seq_snap(&rcu_state.gp_seq); if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_req)) { /* The RCU_NEXT_TAIL has been flushed, reset the lazy bit accordingly */ - if (IS_ENABLED(CONFIG_RCU_LAZY) && qhimark_lazy && rcu_segcblist_completely_offloaded(&rdp->cblist)) + if (IS_ENABLED(CONFIG_RCU_LAZY) && qhimark_lazy && !rcu_segcblist_nocb_transitioning(&rdp->cblist)) rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); ret = rcu_start_this_gp(rnp, rdp, gp_seq_req); } @@ -1121,7 +1132,7 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp, /* Old request still live, so mark recent callbacks. */ if (rcu_segcblist_accelerate(&rdp->cblist, c)) { /* The RCU_NEXT_TAIL has been flushed, reset the lazy bit accordingly */ - if (IS_ENABLED(CONFIG_RCU_LAZY) && qhimark_lazy && rcu_segcblist_completely_offloaded(&rdp->cblist)) + if (IS_ENABLED(CONFIG_RCU_LAZY) && qhimark_lazy && !rcu_segcblist_nocb_transitioning(&rdp->cblist)) rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); } return; @@ -2556,6 +2567,14 @@ static int __init rcu_spawn_core_kthreads(void) static void __call_rcu_core(struct rcu_data *rdp, struct rcu_head *head, unsigned long flags) { +#ifdef CONFIG_RCU_LAZY + if (rcu_segcblist_n_cbs_lazy(&rdp->cblist) == 1) { + if (!timer_pending(&rdp->lazy_timer)) { + rdp->lazy_timer.expires = jiffies + jiffies_lazy_flush; + add_timer_on(&rdp->lazy_timer, smp_processor_id()); + } + } +#endif /* * If called from an extended quiescent state, invoke the RCU * core in order to force a re-evaluation of RCU's idleness. @@ -2577,6 +2596,7 @@ static void __call_rcu_core(struct rcu_data *rdp, struct rcu_head *head, if (unlikely(rcu_segcblist_n_cbs(&rdp->cblist) > rdp->qlen_last_fqs_check + qhimark)) { + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); /* Are we ignoring a completed grace period? */ note_gp_changes(rdp); @@ -2644,6 +2664,110 @@ static void check_cb_ovld(struct rcu_data *rdp) raw_spin_unlock_rcu_node(rnp); } +#ifdef CONFIG_RCU_LAZY +static unsigned long +lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) +{ + int cpu; + unsigned long count = 0; + + /* Snapshot count of all CPUs */ + for_each_possible_cpu(cpu) { + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + + count += rcu_segcblist_n_cbs_lazy(&rdp->cblist); + } + + return count ? count : SHRINK_EMPTY; +} + +static unsigned long +lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + int cpu; + unsigned long flags; + unsigned long count = 0; + + /* Protect against concurrent (de-)offloading. */ + if (!mutex_trylock(&rcu_state.barrier_mutex)) { + /* + * But really don't insist if barrier_mutex is contended since we + * can't guarantee that it will never engage in a dependency + * chain involving memory allocation. The lock is seldom contended + * anyway. + */ + return 0; + } + + /* Snapshot count of all CPUs */ + for_each_possible_cpu(cpu) { + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + int _count; + + if (!rcu_segcblist_n_cbs_lazy(&rdp->cblist)) + continue; + + rcu_nocb_lock_irqsave(rdp, flags); + _count = rcu_segcblist_n_cbs_lazy(&rdp->cblist); + if (!_count) { + rcu_nocb_unlock_irqrestore(rdp, flags); + continue; + } + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + rcu_nocb_unlock_irqrestore(rdp, flags); + + if (rcu_rdp_is_offloaded(rdp)) + wake_nocb_gp(rdp, false); + sc->nr_to_scan -= _count; + count += _count; + if (sc->nr_to_scan <= 0) + break; + } + + mutex_unlock(&rcu_state.barrier_mutex); + + return count ? count : SHRINK_STOP; +} + +static struct shrinker lazy_rcu_shrinker = { + .count_objects = lazy_rcu_shrink_count, + .scan_objects = lazy_rcu_shrink_scan, + .batch = 0, + .seeks = DEFAULT_SEEKS, +}; + +/* Lazy timer expiration callback for non-offloaded rdp */ +static void rcu_lazy_timer(struct timer_list *timer) +{ + unsigned long flags; + struct rcu_data *rdp = container_of(timer, struct rcu_data, lazy_timer); + unsigned long delta; + unsigned long jiff; + + WARN_ON_ONCE(rdp->cpu != smp_processor_id()); + /* + * Protect against concurrent (de-)offloading on -RT where softirqs + * are preemptible. + */ + local_irq_save(flags); + if (rcu_rdp_is_offloaded(rdp)) + goto out; + + if (!rcu_segcblist_n_cbs_lazy(&rdp->cblist)) + goto out; + + jiff = READ_ONCE(jiffies); + delta = jiff - rdp->lazy_firstq; + + if (delta >= LAZY_FLUSH_JIFFIES) + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + else + mod_timer(timer, jiff + (LAZY_FLUSH_JIFFIES - delta)); +out: + local_irq_restore(flags); +} +#endif + /* * Handle lazy callbacks. Return true if no further handling is needed (unlocks nocb then). * Return false if further treatment is needed (wake rcuog kthread, set the nocb timer, etc...). @@ -2667,7 +2791,11 @@ static bool __call_rcu_lazy(struct rcu_data *rdp, bool was_pending, bool lazy, u /* First lazy callback on an empty queue, set the timer if necessary */ if (lazy_len == 1) { WRITE_ONCE(rdp->lazy_firstq, jiffies); - if (!was_pending) + /* + * nocb_gp_wait() will set the timer for us if it is already tracking + * pending callbacks. + */ + if (!rcu_rdp_is_offloaded(rdp) || !was_pending) return false; else goto out; @@ -3958,7 +4086,8 @@ static int rcu_pending(int user) /* Has RCU gone idle with this CPU needing another grace period? */ if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && !rcu_rdp_is_offloaded(rdp) && - !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) + !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL) && + !rcu_segcblist_next_is_lazy(&rdp->cblist)) return 1; /* Have RCU grace period completed or started? */ @@ -4363,6 +4492,9 @@ rcu_boot_init_percpu_data(int cpu) rdp->rcu_onl_gp_flags = RCU_GP_CLEANED; rdp->last_sched_clock = jiffies; rdp->cpu = cpu; +#ifdef CONFIG_RCU_LAZY + timer_setup(&rdp->lazy_timer, rcu_lazy_timer, TIMER_PINNED); +#endif rcu_boot_init_nocb_percpu_data(rdp); } @@ -4588,6 +4720,9 @@ void rcu_report_dead(unsigned int cpu) WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask); raw_spin_unlock_irqrestore_rcu_node(rnp, flags); arch_spin_unlock(&rcu_state.ofl_lock); +#ifdef CONFIG_RCU_LAZY + del_timer(&rdp->lazy_timer); +#endif rdp->cpu_started = false; } @@ -5098,6 +5233,12 @@ void __init rcu_init(void) (void)start_poll_synchronize_rcu_expedited(); rcu_test_sync_prims(); + +#ifdef CONFIG_RCU_LAZY + if (register_shrinker(&lazy_rcu_shrinker, "rcu-lazy")) + pr_err("Failed to register lazy_rcu shrinker!\n"); +#endif // #ifdef CONFIG_RCU_LAZY + } #include "tree_stall.h" diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 90b39ff8ad70..e21698260fac 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -282,6 +282,7 @@ struct rcu_data { struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */ /* the first RCU stall timeout */ unsigned long lazy_firstq; + struct timer_list lazy_timer; int cpu; }; diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index fbd54a2e1f17..327a480d606c 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1055,6 +1055,9 @@ static int rdp_offload_toggle(struct rcu_data *rdp, struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; bool wake_gp = false; + /* Unlazy pending callbacks (don't bother arming the right lazy timer) */ + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); + rcu_segcblist_offload(cblist, offload); if (rdp->nocb_cb_sleep) @@ -1116,9 +1119,6 @@ static long rcu_nocb_rdp_deoffload(void *arg) */ rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE); invoke_rcu_core(); - /* Deoffloaded doesn't support lazyness yet */ - rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); - wake_gp = rdp_offload_toggle(rdp, false, flags); mutex_lock(&rdp_gp->nocb_gp_kthread_mutex); @@ -1259,6 +1259,12 @@ static long rcu_nocb_rdp_offload(void *arg) rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE); rcu_nocb_unlock_irqrestore(rdp, flags); + /* + * The lazy timer is protected against concurrent (de-)offloading. + * Still, no need to keep it around. + */ + del_timer(&rdp->lazy_timer); + return 0; } @@ -1286,99 +1292,6 @@ int rcu_nocb_cpu_offload(int cpu) } EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload); -#ifdef CONFIG_RCU_LAZY -static unsigned long -lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) -{ - int cpu; - unsigned long count = 0; - - if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask))) - return 0; - - /* Protect rcu_nocb_mask against concurrent (de-)offloading. */ - if (!mutex_trylock(&rcu_state.barrier_mutex)) - return 0; - - /* Snapshot count of all CPUs */ - for_each_cpu(cpu, rcu_nocb_mask) { - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); - - count += rcu_segcblist_n_cbs_lazy(&rdp->cblist); - } - - mutex_unlock(&rcu_state.barrier_mutex); - - return count ? count : SHRINK_EMPTY; -} - -static unsigned long -lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) -{ - int cpu; - unsigned long flags; - unsigned long count = 0; - - if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask))) - return 0; - /* - * Protect against concurrent (de-)offloading. Otherwise nocb locking - * may be ignored or imbalanced. - */ - if (!mutex_trylock(&rcu_state.barrier_mutex)) { - /* - * But really don't insist if barrier_mutex is contended since we - * can't guarantee that it will never engage in a dependency - * chain involving memory allocation. The lock is seldom contended - * anyway. - */ - return 0; - } - - /* Snapshot count of all CPUs */ - for_each_cpu(cpu, rcu_nocb_mask) { - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); - int _count; - - if (WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp))) - continue; - - if (!rcu_segcblist_n_cbs_lazy(&rdp->cblist)) - continue; - - rcu_nocb_lock_irqsave(rdp, flags); - /* - * Recheck under the nocb lock. Since we are not holding the bypass - * lock we may still race with increments from the enqueuer but still - * we know for sure if there is at least one lazy callback. - */ - _count = rcu_segcblist_n_cbs_lazy(&rdp->cblist); - if (!_count) { - rcu_nocb_unlock_irqrestore(rdp, flags); - continue; - } - rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_NEXT_TAIL_LAZY); - rcu_nocb_unlock_irqrestore(rdp, flags); - wake_nocb_gp(rdp, false); - sc->nr_to_scan -= _count; - count += _count; - if (sc->nr_to_scan <= 0) - break; - } - - mutex_unlock(&rcu_state.barrier_mutex); - - return count ? count : SHRINK_STOP; -} - -static struct shrinker lazy_rcu_shrinker = { - .count_objects = lazy_rcu_shrink_count, - .scan_objects = lazy_rcu_shrink_scan, - .batch = 0, - .seeks = DEFAULT_SEEKS, -}; -#endif // #ifdef CONFIG_RCU_LAZY - void __init rcu_init_nohz(void) { int cpu; @@ -1409,11 +1322,6 @@ void __init rcu_init_nohz(void) if (!rcu_state.nocb_is_setup) return; -#ifdef CONFIG_RCU_LAZY - if (register_shrinker(&lazy_rcu_shrinker, "rcu-lazy")) - pr_err("Failed to register lazy_rcu shrinker!\n"); -#endif // #ifdef CONFIG_RCU_LAZY - if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n"); cpumask_and(rcu_nocb_mask, cpu_possible_mask,