From patchwork Thu Oct 19 12:01:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13428737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10261CDB465 for ; Thu, 19 Oct 2023 12:02:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345523AbjJSMCT (ORCPT ); Thu, 19 Oct 2023 08:02:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345538AbjJSMCO (ORCPT ); Thu, 19 Oct 2023 08:02:14 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACA85193; Thu, 19 Oct 2023 05:02:12 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E16BAC433C9; Thu, 19 Oct 2023 12:02:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697716932; bh=3y9CeNYUlIJW5JcYaNHy3H8Dnbv10KviDi7g0Akst9A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pT1BJ9EYiFJuh1grJANBeWldKA8R12Y2yhHwv6Qd/kmSR2BPN3hP3jRVRAQkGrzmc su74tO3dzxd9/oVp/MpTByCpaGK3yikim86Qpl8A8gdeCfbN2DOdIcfYTJIvD4Vr/m MEUueOLZeMVBxIkMB0WNM1jx/3FU9tOrz8sW6RkEak7WdQcxtOh/EiWqwK28mWdX9v NbAMjBEL9EyPX/wFGr/b0u2229GShll0d7zpKPS8c8Tnf7HAgx+HpsdAljmArcaZ5V kvwyrBLq4nWqIPondsqPUQlJmOULpKp2eea9BeF+38S+Yh45pToDnCoIE3Gm2HwoeH c1hVUW/cwkhnA== From: Frederic Weisbecker To: LKML Cc: Zhen Lei , Boqun Feng , Joel Fernandes , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , "Paul E . McKenney" , Steven Rostedt , Uladzislau Rezki , rcu , Frederic Weisbecker Subject: [PATCH 1/6] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Date: Thu, 19 Oct 2023 14:01:57 +0200 Message-Id: <20231019120202.1216228-2-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019120202.1216228-1-frederic@kernel.org> References: <20231019120202.1216228-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei The rcu_check_gp_kthread_starvation() function uses task_cpu() to sample the last CPU that the grace-period kthread ran on, and task_cpu() samples the thread_info structure's ->cpu field. But this field will always contain a number corresponding to a CPU that was online some time in the past, thus never a negative number. This invariant is checked by a WARN_ON_ONCE() in set_task_cpu(). This means that if the grace-period kthread exists, that is, if the "gpk" local variable is non-NULL, the "cpu" local variable will be non-negative. This in turn means that the existing check for non-negative "cpu" is redundant with the enclosing check for non-NULL "gpk". This commit threefore removes the redundant check of "cpu". Signed-off-by: Zhen Lei Signed-off-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree_stall.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 6f06dc12904a..b5ce0580074e 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void) pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); - if (cpu >= 0) { - if (cpu_is_offline(cpu)) { - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); - } else { - pr_err("Stack dump where RCU GP kthread last ran:\n"); - dump_cpu_task(cpu); - } + if (cpu_is_offline(cpu)) { + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); + } else { + pr_err("Stack dump where RCU GP kthread last ran:\n"); + dump_cpu_task(cpu); } wake_up_process(gpk); } From patchwork Thu Oct 19 12:01:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13428738 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8A5DCDB482 for ; Thu, 19 Oct 2023 12:02:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345517AbjJSMCU (ORCPT ); Thu, 19 Oct 2023 08:02:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345524AbjJSMCR (ORCPT ); Thu, 19 Oct 2023 08:02:17 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94893139; Thu, 19 Oct 2023 05:02:15 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C12E1C433CB; Thu, 19 Oct 2023 12:02:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697716935; bh=GEHGTe6rFYiWYuzzhF7E1xB6zZAaH9xdzrwx/Nmj808=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CmRYaHD8Ss4xqJGRjGttd2Ez3eIm5v7dMgvaQckYHR1EhHiD/bxalxyYOhADE6CYF pv9sC3eH9C4dC7pZbDUByJeoSIxyUajO/iVjWz0oZvJPx2wuTxt9QewagVf1+l92YS Miu9UbJYlH124Yl0Zmsc67DYiODAjiBYCc++FvCm7PuyCQ24hbWxue9Hvjq+hhadmD 3pFGGLtNdaIoi1yPIrT6qrLznaP/e+TJbtx1COqngefXOtJ//YgBj4otkOrGUCbDFR FPVcwgiY5VBo2hI1utzZ/VozYbkaBQSUy6DQAsS/DuMw+P9ze8obPsh/1OLAVgXjg6 cug2En7U8fTGw== From: Frederic Weisbecker To: LKML Cc: Zhen Lei , Boqun Feng , Joel Fernandes , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , "Paul E . McKenney" , Steven Rostedt , Uladzislau Rezki , rcu , Frederic Weisbecker Subject: [PATCH 2/6] rcu: Don't redump the stalled CPU where RCU GP kthread last ran Date: Thu, 19 Oct 2023 14:01:58 +0200 Message-Id: <20231019120202.1216228-3-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019120202.1216228-1-frederic@kernel.org> References: <20231019120202.1216228-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks(). If the CPU on where RCU GP kthread last ran is stalled, its stack does not need to be dumped again. We can search the corresponding backtrace based on the printed CPU ID. For example: [ 87.328275] rcu: rcu_sched kthread starved for ... ->cpu=3 <--------| ... ... | [ 89.385007] NMI backtrace for cpu 3 <--------| [ 89.385179] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.0+ #22 <--| [ 89.385188] Hardware name: linux,dummy-virt (DT) [ 89.385196] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 89.385204] pc : arch_cpu_idle+0x40/0xc0 [ 89.385211] lr : arch_cpu_idle+0x2c/0xc0 ... ... [ 89.385566] Call trace: [ 89.385574] arch_cpu_idle+0x40/0xc0 [ 89.385581] default_idle_call+0x100/0x450 [ 89.385589] cpuidle_idle_call+0x2f8/0x460 [ 89.385596] do_idle+0x1dc/0x3d0 [ 89.385604] cpu_startup_entry+0x5c/0xb0 [ 89.385613] secondary_start_kernel+0x35c/0x520 Signed-off-by: Zhen Lei Reviewed-by: Joel Fernandes (Google) Signed-off-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree_stall.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b5ce0580074e..fc04a8d7ce96 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) data_race(READ_ONCE(rcu_state.gp_state)), gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); if (gpk) { + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); if (cpu_is_offline(cpu)) { pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); - } else { + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { pr_err("Stack dump where RCU GP kthread last ran:\n"); dump_cpu_task(cpu); } From patchwork Thu Oct 19 12:01:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13428739 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18CDECDB465 for ; Thu, 19 Oct 2023 12:02:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345565AbjJSMC2 (ORCPT ); Thu, 19 Oct 2023 08:02:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345522AbjJSMCU (ORCPT ); Thu, 19 Oct 2023 08:02:20 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B28B187; Thu, 19 Oct 2023 05:02:18 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1F72C433C8; Thu, 19 Oct 2023 12:02:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697716938; bh=Vnj5hN2dPt9MAc5s8TCef1rcSgQpQJDKJ311AZcgTbY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oHiWEClJahpQOALP7cH2yadREFbTb1JxW4abwGr8xTP67tB9OFxQYrkXWIO/TEQkN /6naZ9UwnQatD4XgLhF6rWC6UmJ8xaALg0DSBMb26W8CLI8Ff1PP9yyswYP7g+GHfP PtF01MW4KsALhsRkSIz2rs9UmJ6jZIZrzmQUii1Tmf+WZ6JCd+iExLL1Xs8OY1qTcZ 520FMxdydmor7RotTaYeHx+IilBR6CWoq7U0H054JnJ2WAc9k5M/X48vXBJoCNZ5lA KV3NYPlHj0mwG0g/Z7CFWIL+lmy+Lu+2I0m2TNg+pM20pwLZcnmJSmbBHhNH+k8rRv O+wnLthrWvBxg== From: Frederic Weisbecker To: LKML Cc: Zhen Lei , Boqun Feng , Joel Fernandes , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , "Paul E . McKenney" , Steven Rostedt , Uladzislau Rezki , rcu , Frederic Weisbecker Subject: [PATCH 3/6] rcu: Eliminate check_cpu_stall() duplicate code Date: Thu, 19 Oct 2023 14:01:59 +0200 Message-Id: <20231019120202.1216228-4-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019120202.1216228-1-frederic@kernel.org> References: <20231019120202.1216228-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei The code and comments of self-detected and other-detected RCU CPU stall warnings are identical except the output function. This commit therefore refactors so as to consolidate the duplicate code. Signed-off-by: Zhen Lei Signed-off-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree_stall.h | 42 +++++++++++++++-------------------------- 1 file changed, 15 insertions(+), 27 deletions(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index fc04a8d7ce96..2443d1d4a6dc 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -711,7 +711,7 @@ static void print_cpu_stall(unsigned long gps) static void check_cpu_stall(struct rcu_data *rdp) { - bool didstall = false; + bool self_detected; unsigned long gs1; unsigned long gs2; unsigned long gps; @@ -758,10 +758,10 @@ static void check_cpu_stall(struct rcu_data *rdp) return; /* No stall or GP completed since entering function. */ rnp = rdp->mynode; jn = jiffies + ULONG_MAX / 2; + self_detected = READ_ONCE(rnp->qsmask) & rdp->grpmask; if (rcu_gp_in_progress() && - (READ_ONCE(rnp->qsmask) & rdp->grpmask) && + (self_detected || ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { - /* * If a virtual machine is stopped by the host it can look to * the watchdog like an RCU stall. Check to see if the host @@ -770,33 +770,21 @@ static void check_cpu_stall(struct rcu_data *rdp) if (kvm_check_and_clear_guest_paused()) return; - /* We haven't checked in, so go dump stack. */ - print_cpu_stall(gps); - if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) - rcu_ftrace_dump(DUMP_ALL); - didstall = true; - - } else if (rcu_gp_in_progress() && - ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + if (self_detected) { + /* We haven't checked in, so go dump stack. */ + print_cpu_stall(gps); + } else { + /* They had a few time units to dump stack, so complain. */ + print_other_cpu_stall(gs2, gps); + } - /* - * If a virtual machine is stopped by the host it can look to - * the watchdog like an RCU stall. Check to see if the host - * stopped the vm. - */ - if (kvm_check_and_clear_guest_paused()) - return; - - /* They had a few time units to dump stack, so complain. */ - print_other_cpu_stall(gs2, gps); if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) rcu_ftrace_dump(DUMP_ALL); - didstall = true; - } - if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) { - jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3; - WRITE_ONCE(rcu_state.jiffies_stall, jn); + + if (READ_ONCE(rcu_state.jiffies_stall) == jn) { + jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3; + WRITE_ONCE(rcu_state.jiffies_stall, jn); + } } } From patchwork Thu Oct 19 12:02:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13428740 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92967CDB483 for ; Thu, 19 Oct 2023 12:02:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345546AbjJSMCa (ORCPT ); Thu, 19 Oct 2023 08:02:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345553AbjJSMCX (ORCPT ); Thu, 19 Oct 2023 08:02:23 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 232CF185; Thu, 19 Oct 2023 05:02:21 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 823F8C433C7; Thu, 19 Oct 2023 12:02:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697716940; bh=bhK7cpMcCCHf65DdGbg8GB1m9tvgdwB0ladW+FEjA30=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fNaDKtq/fWbOHQCRBpTHuRLfmW/MkoB4YaR+5e7TvIKkSpO9vMtSbl8LPYrwEC6GU kGh+KVdvoaTwG+I5OB34Q6CUfaM/Vb0u7mJKhbm+xx6YYeKL3+Gjobcv6PdyGNr7hi Ak8ZoNd3jjByotxBUXe+iBAHO+9W0kacIJrFExvo8ghKWRk4qhmcnJNFX0B11u8eTP pxwb/A80ol9BUz4O58JTSbAl5/TAbiEV3JAvfVSziiMKL1d7yhqPPGhQikuejMPpCF N4CyeTeIOPXZg7iUACYUZuRcNTzHLYJXY0GEkB16aEGmRxm223FhE7BFHNukrH21ac uwmPuz1lye1rQ== From: Frederic Weisbecker To: LKML Cc: "Paul E. McKenney" , Boqun Feng , Joel Fernandes , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , Steven Rostedt , Uladzislau Rezki , rcu , Frederic Weisbecker Subject: [PATCH 4/6] rcu: Add RCU CPU stall notifier Date: Thu, 19 Oct 2023 14:02:00 +0200 Message-Id: <20231019120202.1216228-5-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019120202.1216228-1-frederic@kernel.org> References: <20231019120202.1216228-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Paul E. McKenney" It is sometimes helpful to have a way for the subsystem causing the stall to dump its state when an RCU CPU stall occurs. This commit therefore bases rcu_stall_chain_notifier_register() and rcu_stall_chain_notifier_unregister() on atomic notifiers in order to provide this functionality. Signed-off-by: Paul E. McKenney Cc: Steven Rostedt Signed-off-by: Frederic Weisbecker --- include/linux/rcu_notifier.h | 32 +++++++++++++++++++ kernel/rcu/rcu.h | 6 ++++ kernel/rcu/tree_exp.h | 6 +++- kernel/rcu/tree_stall.h | 59 +++++++++++++++++++++++++++++++++++- 4 files changed, 101 insertions(+), 2 deletions(-) create mode 100644 include/linux/rcu_notifier.h diff --git a/include/linux/rcu_notifier.h b/include/linux/rcu_notifier.h new file mode 100644 index 000000000000..ebf371364581 --- /dev/null +++ b/include/linux/rcu_notifier.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Read-Copy Update notifiers, initially RCU CPU stall notifier. + * Separate from rcupdate.h to avoid #include loops. + * + * Copyright (C) 2023 Paul E. McKenney. + */ + +#ifndef __LINUX_RCU_NOTIFIER_H +#define __LINUX_RCU_NOTIFIER_H + +// Actions for RCU CPU stall notifier calls. +#define RCU_STALL_NOTIFY_NORM 1 +#define RCU_STALL_NOTIFY_EXP 2 + +#ifdef CONFIG_RCU_STALL_COMMON + +#include +#include + +int rcu_stall_chain_notifier_register(struct notifier_block *n); +int rcu_stall_chain_notifier_unregister(struct notifier_block *n); + +#else // #ifdef CONFIG_RCU_STALL_COMMON + +// No RCU CPU stall warnings in Tiny RCU. +static inline int rcu_stall_chain_notifier_register(struct notifier_block *n) { return -EEXIST; } +static inline int rcu_stall_chain_notifier_unregister(struct notifier_block *n) { return -ENOENT; } + +#endif // #else // #ifdef CONFIG_RCU_STALL_COMMON + +#endif /* __LINUX_RCU_NOTIFIER_H */ diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 98e13be411af..ef3bab977407 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -654,4 +654,10 @@ static inline bool rcu_cpu_beenfullyonline(int cpu) { return true; } bool rcu_cpu_beenfullyonline(int cpu); #endif +#ifdef CONFIG_RCU_STALL_COMMON +int rcu_stall_notifier_call_chain(unsigned long val, void *v); +#else // #ifdef CONFIG_RCU_STALL_COMMON +static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { return NOTIFY_DONE; } +#endif // #else // #ifdef CONFIG_RCU_STALL_COMMON + #endif /* __LINUX_RCU_H */ diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 8239b39d945b..6d7cea5d591f 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -621,10 +621,14 @@ static void synchronize_rcu_expedited_wait(void) } for (;;) { + unsigned long j; + if (synchronize_rcu_expedited_wait_once(jiffies_stall)) return; if (rcu_stall_is_suppressed()) continue; + j = jiffies; + rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_EXP, (void *)(j - jiffies_start)); trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall")); pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", rcu_state.name); @@ -647,7 +651,7 @@ static void synchronize_rcu_expedited_wait(void) } } pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", - jiffies - jiffies_start, rcu_state.expedited_sequence, + j - jiffies_start, rcu_state.expedited_sequence, data_race(rnp_root->expmask), ".T"[!!data_race(rnp_root->exp_tasks)]); if (ndetected) { diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 2443d1d4a6dc..49544f932279 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -8,6 +8,7 @@ */ #include +#include ////////////////////////////////////////////////////////////////////////////// // @@ -770,6 +771,7 @@ static void check_cpu_stall(struct rcu_data *rdp) if (kvm_check_and_clear_guest_paused()) return; + rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_NORM, (void *)j - gps); if (self_detected) { /* We haven't checked in, so go dump stack. */ print_cpu_stall(gps); @@ -790,7 +792,7 @@ static void check_cpu_stall(struct rcu_data *rdp) ////////////////////////////////////////////////////////////////////////////// // -// RCU forward-progress mechanisms, including of callback invocation. +// RCU forward-progress mechanisms, including for callback invocation. /* @@ -1042,3 +1044,58 @@ static int __init rcu_sysrq_init(void) return 0; } early_initcall(rcu_sysrq_init); + + +////////////////////////////////////////////////////////////////////////////// +// +// RCU CPU stall-warning notifiers + +static ATOMIC_NOTIFIER_HEAD(rcu_cpu_stall_notifier_list); + +/** + * rcu_stall_chain_notifier_register - Add an RCU CPU stall notifier + * @n: Entry to add. + * + * Adds an RCU CPU stall notifier to an atomic notifier chain. + * The @action passed to a notifier will be @RCU_STALL_NOTIFY_NORM or + * friends. The @data will be the duration of the stalled grace period, + * in jiffies, coerced to a void* pointer. + * + * Returns 0 on success, %-EEXIST on error. + */ +int rcu_stall_chain_notifier_register(struct notifier_block *n) +{ + return atomic_notifier_chain_register(&rcu_cpu_stall_notifier_list, n); +} +EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_register); + +/** + * rcu_stall_chain_notifier_unregister - Remove an RCU CPU stall notifier + * @n: Entry to add. + * + * Removes an RCU CPU stall notifier from an atomic notifier chain. + * + * Returns zero on success, %-ENOENT on failure. + */ +int rcu_stall_chain_notifier_unregister(struct notifier_block *n) +{ + return atomic_notifier_chain_unregister(&rcu_cpu_stall_notifier_list, n); +} +EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_unregister); + +/* + * rcu_stall_notifier_call_chain - Call functions in an RCU CPU stall notifier chain + * @val: Value passed unmodified to notifier function + * @v: Pointer passed unmodified to notifier function + * + * Calls each function in the RCU CPU stall notifier chain in turn, which + * is an atomic call chain. See atomic_notifier_call_chain() for more + * information. + * + * This is for use within RCU, hence the omission of the extra asterisk + * to indicate a non-kerneldoc format header comment. + */ +int rcu_stall_notifier_call_chain(unsigned long val, void *v) +{ + return atomic_notifier_call_chain(&rcu_cpu_stall_notifier_list, val, v); +} From patchwork Thu Oct 19 12:02:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13428741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 857B9CDB484 for ; Thu, 19 Oct 2023 12:02:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345521AbjJSMCb (ORCPT ); Thu, 19 Oct 2023 08:02:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345489AbjJSMCZ (ORCPT ); Thu, 19 Oct 2023 08:02:25 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC747196; Thu, 19 Oct 2023 05:02:23 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 39F88C433C9; Thu, 19 Oct 2023 12:02:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697716943; bh=ShCXAwxxAgan1qRtdZgfmeLTKWI3nAC2NT8Wlxximew=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dts7cxLin6moTJ4VS6MB8iwZruk43SsPTZw8Y7xGuQiusfPVRvoSKElmb9+PswaGw HmuXE35MmGzGFvK0MHpqfkROEdfJ4FzPcJjAKixSSKyCn1j6RjzCwzeaRvr1cblqXh zm+3o/B//sdxbV0f1p6/RF/DbJR7AKZe1DP+Jgts8bFI8cdYvlswOAxBdpMeCy9NTP eXaen7P8rM6eNO8B6+TUgrUSgjA/xs0uFqf9L7DHTdslcCOQW+N4DtZZnLq7yk7+6o 6zlbD3ixKQ+pqDRRhltG5Y7vqwdOcn3iHdWCxlTfIA/xKsbi3k9xfnWjLAJijAfrWk wm98opZP+BGig== From: Frederic Weisbecker To: LKML Cc: "Paul E. McKenney" , Boqun Feng , Joel Fernandes , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , Steven Rostedt , Uladzislau Rezki , rcu , Frederic Weisbecker Subject: [PATCH 5/6] rcutorture: Add test of RCU CPU stall notifiers Date: Thu, 19 Oct 2023 14:02:01 +0200 Message-Id: <20231019120202.1216228-6-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019120202.1216228-1-frederic@kernel.org> References: <20231019120202.1216228-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Paul E. McKenney" This commit registers an RCU CPU stall notifier when testing RCU CPU stalls. The notifier logs a message similar to the following: rcu_torture_stall_nf: v=1, duration=21001. Signed-off-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker --- kernel/rcu/rcutorture.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index ade42d6a9d9b..1830cacac01d 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -2428,6 +2429,16 @@ static int rcutorture_booster_init(unsigned int cpu) return 0; } +static int rcu_torture_stall_nf(struct notifier_block *nb, unsigned long v, void *ptr) +{ + pr_info("%s: v=%lu, duration=%lu.\n", __func__, v, (unsigned long)ptr); + return NOTIFY_OK; +} + +static struct notifier_block rcu_torture_stall_block = { + .notifier_call = rcu_torture_stall_nf, +}; + /* * CPU-stall kthread. It waits as specified by stall_cpu_holdoff, then * induces a CPU stall for the time specified by stall_cpu. @@ -2435,9 +2446,14 @@ static int rcutorture_booster_init(unsigned int cpu) static int rcu_torture_stall(void *args) { int idx; + int ret; unsigned long stop_at; VERBOSE_TOROUT_STRING("rcu_torture_stall task started"); + ret = rcu_stall_chain_notifier_register(&rcu_torture_stall_block); + if (ret) + pr_info("%s: rcu_stall_chain_notifier_register() returned %d, %sexpected.\n", + __func__, ret, !IS_ENABLED(CONFIG_RCU_STALL_COMMON) ? "un" : ""); if (stall_cpu_holdoff > 0) { VERBOSE_TOROUT_STRING("rcu_torture_stall begin holdoff"); schedule_timeout_interruptible(stall_cpu_holdoff * HZ); @@ -2481,6 +2497,11 @@ static int rcu_torture_stall(void *args) cur_ops->readunlock(idx); } pr_alert("%s end.\n", __func__); + if (!ret) { + ret = rcu_stall_chain_notifier_unregister(&rcu_torture_stall_block); + if (ret) + pr_info("%s: rcu_stall_chain_notifier_unregister() returned %d.\n", __func__, ret); + } torture_shutdown_absorb("rcu_torture_stall"); while (!kthread_should_stop()) schedule_timeout_interruptible(10 * HZ); From patchwork Thu Oct 19 12:02:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13428742 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D7AACDB482 for ; Thu, 19 Oct 2023 12:02:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345616AbjJSMCi (ORCPT ); Thu, 19 Oct 2023 08:02:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345535AbjJSMC3 (ORCPT ); Thu, 19 Oct 2023 08:02:29 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74279184; Thu, 19 Oct 2023 05:02:27 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E6747C433C7; Thu, 19 Oct 2023 12:02:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697716947; bh=R+5y2QNUvpt4INL3lQPSCQLkuydVmA5dErPkd3V4WNA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ItD6WkAGIh+xK7y9khfcK2WoitkBiQmlb1XYlEbrfsCoQBQnd1fMcde0whMt0guyT T/oL5W8GM3s8qLYTiW/mHomS+NMwQ+V+CxvTHOMJRHK7noXvHpbcQIyMNBVlO+OI1a mUlClE9U0oWNc4ApcnTz/PYb/zZK12oxEcFmHMzfQsDlEtjA0jekjMpZzUl63sDIZb XVgJg4SKfnYEjztzHIYLmunoO1xW/Y62k6q01/41DCerBvm6Pvs7Y1pSTlsuC9KOQT pcpRjWR6AqZaKzKMkxjKm1nMqlUugxvFo8mExZ7Fk3r59lQJ32avEnYHekqWXzYvr9 6sb27otZE3N0Q== From: Frederic Weisbecker To: LKML Cc: "Joel Fernandes (Google)" , Boqun Feng , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , "Paul E . McKenney" , Steven Rostedt , Uladzislau Rezki , rcu , Huacai Chen , Binbin Zhou , Sergey Senozhatsky , Thomas Gleixner , stable@vger.kernel.org, Frederic Weisbecker Subject: [PATCH 6/6] rcu/tree: Defer setting of jiffies during stall reset Date: Thu, 19 Oct 2023 14:02:02 +0200 Message-Id: <20231019120202.1216228-7-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231019120202.1216228-1-frederic@kernel.org> References: <20231019120202.1216228-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: "Joel Fernandes (Google)" There are instances where rcu_cpu_stall_reset() is called when jiffies did not get a chance to update for a long time. Before jiffies is updated, the CPU stall detector can go off triggering false-positives where a just-started grace period appears to be ages old. In the past, we disabled stall detection in rcu_cpu_stall_reset() however this got changed [1]. This is resulting in false-positives in KGDB usecase [2]. Fix this by deferring the update of jiffies to the third run of the FQS loop. This is more robust, as, even if rcu_cpu_stall_reset() is called just before jiffies is read, we would end up pushing out the jiffies read by 3 more FQS loops. Meanwhile the CPU stall detection will be delayed and we will not get any false positives. [1] https://lore.kernel.org/all/20210521155624.174524-2-senozhatsky@chromium.org/ [2] https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Tested with rcutorture.cpu_stall option as well to verify stall behavior with/without patch. Tested-by: Huacai Chen Reported-by: Binbin Zhou Closes: https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Suggested-by: Paul McKenney Cc: Sergey Senozhatsky Cc: Thomas Gleixner Cc: stable@vger.kernel.org Fixes: a80be428fbc1 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Signed-off-by: Joel Fernandes (Google) Signed-off-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree.c | 12 ++++++++++++ kernel/rcu/tree.h | 4 ++++ kernel/rcu/tree_stall.h | 20 ++++++++++++++++++-- 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index cb1caefa8bd0..d85779f67aea 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1556,10 +1556,22 @@ static bool rcu_gp_fqs_check_wake(int *gfp) */ static void rcu_gp_fqs(bool first_time) { + int nr_fqs = READ_ONCE(rcu_state.nr_fqs_jiffies_stall); struct rcu_node *rnp = rcu_get_root(); WRITE_ONCE(rcu_state.gp_activity, jiffies); WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1); + + WARN_ON_ONCE(nr_fqs > 3); + /* Only countdown nr_fqs for stall purposes if jiffies moves. */ + if (nr_fqs) { + if (nr_fqs == 1) { + WRITE_ONCE(rcu_state.jiffies_stall, + jiffies + rcu_jiffies_till_stall_check()); + } + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs); + } + if (first_time) { /* Collect dyntick-idle snapshots. */ force_qs_rnp(dyntick_save_progress_counter); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 192536916f9a..e9821a8422db 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -386,6 +386,10 @@ struct rcu_state { /* in jiffies. */ unsigned long jiffies_stall; /* Time at which to check */ /* for CPU stalls. */ + int nr_fqs_jiffies_stall; /* Number of fqs loops after + * which read jiffies and set + * jiffies_stall. Stall + * warnings disabled if !0. */ unsigned long jiffies_resched; /* Time at which to resched */ /* a reluctant CPU. */ unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */ diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 49544f932279..ac8e86babe44 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -150,12 +150,17 @@ static void panic_on_rcu_stall(void) /** * rcu_cpu_stall_reset - restart stall-warning timeout for current grace period * + * To perform the reset request from the caller, disable stall detection until + * 3 fqs loops have passed. This is required to ensure a fresh jiffies is + * loaded. It should be safe to do from the fqs loop as enough timer + * interrupts and context switches should have passed. + * * The caller must disable hard irqs. */ void rcu_cpu_stall_reset(void) { - WRITE_ONCE(rcu_state.jiffies_stall, - jiffies + rcu_jiffies_till_stall_check()); + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3); + WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX); } ////////////////////////////////////////////////////////////////////////////// @@ -171,6 +176,7 @@ static void record_gp_stall_check_time(void) WRITE_ONCE(rcu_state.gp_start, j); j1 = rcu_jiffies_till_stall_check(); smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq. + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0); WRITE_ONCE(rcu_state.jiffies_stall, j + j1); rcu_state.jiffies_resched = j + j1 / 2; rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs); @@ -726,6 +732,16 @@ static void check_cpu_stall(struct rcu_data *rdp) !rcu_gp_in_progress()) return; rcu_stall_kick_kthreads(); + + /* + * Check if it was requested (via rcu_cpu_stall_reset()) that the FQS + * loop has to set jiffies to ensure a non-stale jiffies value. This + * is required to have good jiffies value after coming out of long + * breaks of jiffies updates. Not doing so can cause false positives. + */ + if (READ_ONCE(rcu_state.nr_fqs_jiffies_stall) > 0) + return; + j = jiffies; /*