From patchwork Thu Jan 5 00:51:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E75D8C54EBD for ; Thu, 5 Jan 2023 00:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229880AbjAEAzg (ORCPT ); Wed, 4 Jan 2023 19:55:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230239AbjAEAyo (ORCPT ); Wed, 4 Jan 2023 19:54:44 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F76647335; Wed, 4 Jan 2023 16:51:30 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0A5CBB818F0; Thu, 5 Jan 2023 00:51:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 996F2C433F1; Thu, 5 Jan 2023 00:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672879887; bh=4aIDMnvVK/cTLHasisTKiXTMm58qBLZArVpwD6fnVeo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nGnOL3uQlRZc4QbM4HI+eNT1H2o82vAbdbiVxk7xy4nZLxwOjZzo5o4QNn6+qZDoH k4LbN1YcOBhEPdvBD2OvCM+1p18h2xO40Y9YN2RiXu27XdL+Q7h9FUWzR1wENH4h7P L6fULio4gg+0e0cFjdBckAcJRX8WYGrwv9Mda9UdPMThOczSkF47+BBKuImpeQgqQl WxDg3Fvs6chDNP6+qY31WRX/LeVYVTjNI5xuXVhn8agW8FgurUyiZS5xv+GkmkVZWx CAbvgzgYYJid38BvxFB06kOJ62k04e23UADvUnv2udVFiNK0MrRtErd0Kf4r6+nXDh sbJHBhAifMI5g== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 4EE055C05CA; Wed, 4 Jan 2023 16:51:27 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, Zhen Lei , Elliott@paulmck-ThinkPad-P17-Gen-1.home, Robert , Tejun Heo , "Peter Zijlstra (Intel)" , Josh Don , Andrew Morton , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 1/6] genirq: Fix the return type of kstat_cpu_irqs_sum() Date: Wed, 4 Jan 2023 16:51:21 -0800 Message-Id: <20230105005126.1772294-1-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> References: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei The type of member ->irqs_sum is unsigned long, but kstat_cpu_irqs_sum() returns int, which can result in truncation. Therefore, change the kstat_cpu_irqs_sum() function's return value to unsigned long to avoid truncation. Fixes: f2c66cd8eedd ("/proc/stat: scalability of irq num per cpu") Reported-by: Elliott, Robert (Servers) Signed-off-by: Zhen Lei Cc: Tejun Heo Cc: "Peter Zijlstra (Intel)" Cc: Josh Don Cc: Andrew Morton Reviewed-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- include/linux/kernel_stat.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index ddb5a358fd829..90e2fdc17d79f 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -75,7 +75,7 @@ extern unsigned int kstat_irqs_usr(unsigned int irq); /* * Number of interrupts per cpu, since bootup */ -static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu) +static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu) { return kstat_cpu(cpu).irqs_sum; } From patchwork Thu Jan 5 00:51:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089346 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26BF4C54EBE for ; Thu, 5 Jan 2023 00:56:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229885AbjAEAzh (ORCPT ); Wed, 4 Jan 2023 19:55:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230246AbjAEAyo (ORCPT ); Wed, 4 Jan 2023 19:54:44 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6D054D730; Wed, 4 Jan 2023 16:51:30 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3F2DCB81983; Thu, 5 Jan 2023 00:51:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B930C433F2; Thu, 5 Jan 2023 00:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672879887; bh=a9j2iXVLoC1TD3lQ/10Gw2Pvt/B4K6hHFOFLrRwW1Dc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XzYY8NJbzzoB8B0ZzIvvzNz+KpyoRzc2GqLb2niUTABfa/LQsT+r65gP22WcxsbGM z8gt+860oprHMixmLDBNlW+qYTVAW6ITC0ZMfGzAd5KAfxXjQQKTG2bIGpdBgWRn/+ UjTzjWOR/SntEOgajwvN+hmAaBXIVPZtW03pxWPgAwZJnq8hQ5+mFc89geOGyHYEHm yG9heL833MlrxE3ACZ8eKlGEufE34O2MQXGifrm4yFdNE7Ou1cYSnfVyaCslke8Bn0 SQqM7rO5DpVcFG2IzevyK2SVx2H2X1KuQPrH8yMkeU07PDcCs/0IcZr0gFsTWpfyQK hTaJXjLr9mkhQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 513905C086D; Wed, 4 Jan 2023 16:51:27 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, Zhen Lei , Josh Don , Tejun Heo , Peter Zijlstra , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 2/6] sched: Add helper kstat_cpu_softirqs_sum() Date: Wed, 4 Jan 2023 16:51:22 -0800 Message-Id: <20230105005126.1772294-2-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> References: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei Add a kstat_cpu_softirqs_sum() function that is similar to kstat_cpu_irqs_sum(), but which counts software interrupts since boot on the specified CPU. Signed-off-by: Zhen Lei Cc: Josh Don Cc: Tejun Heo Cc: Peter Zijlstra Reviewed-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- include/linux/kernel_stat.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 90e2fdc17d79f..898076e173a92 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu) return kstat_cpu(cpu).softirqs[irq]; } +static inline unsigned int kstat_cpu_softirqs_sum(int cpu) +{ + int i; + unsigned int sum = 0; + + for (i = 0; i < NR_SOFTIRQS; i++) + sum += kstat_softirqs_cpu(i, cpu); + + return sum; +} + /* * Number of interrupts per specific IRQ source, since bootup */ From patchwork Thu Jan 5 00:51:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46030C46467 for ; Thu, 5 Jan 2023 00:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230479AbjAEAz3 (ORCPT ); Wed, 4 Jan 2023 19:55:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230226AbjAEAyn (ORCPT ); Wed, 4 Jan 2023 19:54:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9658B4D725; Wed, 4 Jan 2023 16:51:28 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3592B61883; Thu, 5 Jan 2023 00:51:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94E93C433F0; Thu, 5 Jan 2023 00:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672879887; bh=NWSPmqPtnSB+GjAOqFi44AyevCau+gcT+vefcRYY03w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M50iFfXGCcAnPZ8l78Kh9X49SXipU0UAVMQiAoCz6DVJ/5bbAMvbnMUjFloCWG99A PWodxkEnTq5cFcsKYdoAnOf/STsamLmqIIuvo99/KUcQ07KHGLM8SqDWqoXYQ4C3qf Ca86GmQLIPXlgSHqGiVGX3hxPzBjB9E9eClyITbBuJEqqwwJM9RloW0wRoN+3+ss9C GTsZFuLom8WXnGJ4JA8Z9MQYZPuCXYTUexMSx1kCoPGu4wnTpWGl3ts1Xv1dojOLso 1UZj4nEAf8vV3u7NFKwzzETkbQNtEprR4kObiFXnivZtFVhyXRdA4GgXexsm0TjR/Q J670JA8N63SJQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 5340F5C1456; Wed, 4 Jan 2023 16:51:27 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, Zhen Lei , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 3/6] sched: Add helper nr_context_switches_cpu() Date: Wed, 4 Jan 2023 16:51:23 -0800 Message-Id: <20230105005126.1772294-3-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> References: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei Add a function nr_context_switches_cpu() that returns number of context switches since boot on the specified CPU. This information will be used to diagnose RCU CPU stalls. Signed-off-by: Zhen Lei Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Valentin Schneider Reviewed-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- include/linux/kernel_stat.h | 1 + kernel/sched/core.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 898076e173a92..9935f7ecbfb9e 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -52,6 +52,7 @@ DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat); #define kstat_cpu(cpu) per_cpu(kstat, cpu) #define kcpustat_cpu(cpu) per_cpu(kernel_cpustat, cpu) +extern unsigned long long nr_context_switches_cpu(int cpu); extern unsigned long long nr_context_switches(void); extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 25b582b6ee5f7..2e40a6c116e1c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5282,6 +5282,11 @@ bool single_task_running(void) } EXPORT_SYMBOL(single_task_running); +unsigned long long nr_context_switches_cpu(int cpu) +{ + return cpu_rq(cpu)->nr_switches; +} + unsigned long long nr_context_switches(void) { int i; From patchwork Thu Jan 5 00:51:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D80C5C54EBC for ; Thu, 5 Jan 2023 00:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229800AbjAEAzc (ORCPT ); Wed, 4 Jan 2023 19:55:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230229AbjAEAyn (ORCPT ); Wed, 4 Jan 2023 19:54:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFC0A4D72D; Wed, 4 Jan 2023 16:51:28 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4A7FA6189B; Thu, 5 Jan 2023 00:51:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2FD6C433EF; Thu, 5 Jan 2023 00:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672879887; bh=nc8f00m/t1FUKq8wPb19xh+76EgjE2wiw2mRKt316K0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kcYfpGV2PeUMyIKyV1NmzVvj+FR7gQq3tRYBJD3AcFqjIinv0btiDKEGmaubq1bZR ZfC8sONHGKCER0o0h2KnYB00t37PmRiISKCcBtBM+VK1Vu9+pk6jZZIiyHv4iLKNdk SbdLv1oeIEjXjjzyyiX6Ir4OCsFJSuCmf23sPkMIuFk2ccTrXD4x2N6tLOJ0aVnjIc vvdGF2K1maWRAP1a/o9VEdN02tO3Xmek+ilcoZY9VqJ4cW3U+VYAvfbOkI+J7s3N7V 3WhqDsQ5xKSlZ7OAOUZihPOaXiczE37j5jcubaqNpHCtSEoz6j6YW8ztrqRS0NRdb2 LME6zP0w5olMQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 555905C149B; Wed, 4 Jan 2023 16:51:27 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, Zhen Lei , Mukesh Ojha , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 4/6] rcu: Add RCU stall diagnosis information Date: Wed, 4 Jan 2023 16:51:24 -0800 Message-Id: <20230105005126.1772294-4-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> References: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei Because RCU CPU stall warnings are driven from the scheduling-clock interrupt handler, a workload consisting of a very large number of short-duration hardware interrupts can result in misleading stall-warning messages. On systems supporting only a single level of interrupts, that is, where interrupts handlers cannot be interrupted, this can produce misleading diagnostics. The stack traces will show the innocent-bystander interrupted task, not the interrupts that are at the very least exacerbating the stall. This situation can be improved by displaying the number of interrupts and the CPU time that they have consumed. Diagnosing other types of stalls can be eased by also providing the count of softirqs and the CPU time that they consumed as well as the number of context switches and the task-level CPU time consumed. Consider the following output given this change: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (1250 ticks this GP) rcu: hardirqs softirqs csw/system rcu: number: 624 45 0 rcu: cputime: 69 1 2425 ==> 2500(ms) This output shows that the number of hard and soft interrupts is small, there are no context switches, and the system takes up a lot of time. This indicates that the current task is looping with preemption disabled. The impact on system performance is negligible because snapshot is recorded only once for all continuous RCU stalls. This added debugging information is suppressed by default and can be enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or by booting with rcupdate.rcu_cpu_stall_cputime=1. Signed-off-by: Zhen Lei Reviewed-by: Mukesh Ojha Reviewed-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- .../admin-guide/kernel-parameters.txt | 6 ++++ kernel/rcu/Kconfig.debug | 13 ++++++++ kernel/rcu/rcu.h | 1 + kernel/rcu/tree.c | 18 +++++++++++ kernel/rcu/tree.h | 19 ++++++++++++ kernel/rcu/tree_stall.h | 31 +++++++++++++++++++ kernel/rcu/update.c | 2 ++ 7 files changed, 90 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6cfa6e3996cf7..43ca7f3ac96a1 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5113,6 +5113,12 @@ rcupdate.rcu_cpu_stall_timeout to be used (after conversion from seconds to milliseconds). + rcupdate.rcu_cpu_stall_cputime= [KNL] + Provide statistics on the cputime and count of + interrupts and tasks during the sampling period. For + multiple continuous RCU stalls, all sampling periods + begin at half of the first RCU stall timeout. + rcupdate.rcu_expedited= [KNL] Use expedited grace-period primitives, for example, synchronize_rcu_expedited() instead diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug index 232e29fe3e5ec..49da904df6aa6 100644 --- a/kernel/rcu/Kconfig.debug +++ b/kernel/rcu/Kconfig.debug @@ -92,6 +92,19 @@ config RCU_EXP_CPU_STALL_TIMEOUT says to use the RCU_CPU_STALL_TIMEOUT value converted from seconds to milliseconds. +config RCU_CPU_STALL_CPUTIME + bool "Provide additional RCU stall debug information" + depends on RCU_STALL_COMMON + default n + help + Collect statistics during the sampling period, such as the number of + (hard interrupts, soft interrupts, task switches) and the cputime of + (hard interrupts, soft interrupts, kernel tasks) are added to the + RCU stall report. For multiple continuous RCU stalls, all sampling + periods begin at half of the first RCU stall timeout. + The boot option rcupdate.rcu_cpu_stall_cputime has the same function + as this one, but will override this if it exists. + config RCU_TRACE bool "Enable tracing for RCU" depends on DEBUG_KERNEL diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index c5aa934de59b0..ff35920e1055e 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -224,6 +224,7 @@ extern int rcu_cpu_stall_ftrace_dump; extern int rcu_cpu_stall_suppress; extern int rcu_cpu_stall_timeout; extern int rcu_exp_cpu_stall_timeout; +extern int rcu_cpu_stall_cputime; int rcu_jiffies_till_stall_check(void); int rcu_exp_jiffies_till_stall_check(void); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index cf34a961821ad..65552e6a6a5d2 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -925,6 +925,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) rdp->rcu_iw_gp_seq = rnp->gp_seq; irq_work_queue_on(&rdp->rcu_iw, rdp->cpu); } + + if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) { + int cpu = rdp->cpu; + struct rcu_snap_record *rsrp; + struct kernel_cpustat *kcsp; + + kcsp = &kcpustat_cpu(cpu); + + rsrp = &rdp->snap_record; + rsrp->cputime_irq = kcpustat_field(kcsp, CPUTIME_IRQ, cpu); + rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu); + rsrp->cputime_system = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu); + rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu); + rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu); + rsrp->nr_csw = nr_context_switches_cpu(rdp->cpu); + rsrp->jiffies = jiffies; + rsrp->gp_seq = rdp->gp_seq; + } } return 0; diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index fcb5d696eb170..192536916f9a6 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -158,6 +158,23 @@ union rcu_noqs { u16 s; /* Set of bits, aggregate OR here. */ }; +/* + * Record the snapshot of the core stats at half of the first RCU stall timeout. + * The member gp_seq is used to ensure that all members are updated only once + * during the sampling period. The snapshot is taken only if this gp_seq is not + * equal to rdp->gp_seq. + */ +struct rcu_snap_record { + unsigned long gp_seq; /* Track rdp->gp_seq counter */ + u64 cputime_irq; /* Accumulated cputime of hard irqs */ + u64 cputime_softirq;/* Accumulated cputime of soft irqs */ + u64 cputime_system; /* Accumulated cputime of kernel tasks */ + unsigned long nr_hardirqs; /* Accumulated number of hard irqs */ + unsigned int nr_softirqs; /* Accumulated number of soft irqs */ + unsigned long long nr_csw; /* Accumulated number of task switches */ + unsigned long jiffies; /* Track jiffies value */ +}; + /* Per-CPU data for read-copy update. */ struct rcu_data { /* 1) quiescent-state and grace-period handling : */ @@ -262,6 +279,8 @@ struct rcu_data { short rcu_onl_gp_flags; /* ->gp_flags at last online. */ unsigned long last_fqs_resched; /* Time of last rcu_resched(). */ unsigned long last_sched_clock; /* Jiffies of last rcu_sched_clock_irq(). */ + struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */ + /* the first RCU stall timeout */ long lazy_len; /* Length of buffered lazy callbacks. */ int cpu; diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 5653560573e22..6de15fb10bc47 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -428,6 +428,35 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp return j > 2 * HZ; } +static void print_cpu_stat_info(int cpu) +{ + struct rcu_snap_record rsr, *rsrp; + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu); + + if (!rcu_cpu_stall_cputime) + return; + + rsrp = &rdp->snap_record; + if (rsrp->gp_seq != rdp->gp_seq) + return; + + rsr.cputime_irq = kcpustat_field(kcsp, CPUTIME_IRQ, cpu); + rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu); + rsr.cputime_system = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu); + + pr_err("\t hardirqs softirqs csw/system\n"); + pr_err("\t number: %8ld %10d %12lld\n", + kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs, + kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs, + nr_context_switches_cpu(cpu) - rsrp->nr_csw); + pr_err("\tcputime: %8lld %10lld %12lld ==> %d(ms)\n", + div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC), + div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC), + div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC), + jiffies_to_msecs(jiffies - rsrp->jiffies)); +} + /* * Print out diagnostic information for the specified stalled CPU. * @@ -484,6 +513,8 @@ static void print_cpu_stall_info(int cpu) data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart, rcuc_starved ? buf : "", falsepositive ? " (false positive?)" : ""); + + print_cpu_stat_info(cpu); } /* Complain about starvation of grace-period kthread. */ diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index f5e6a2f95a2a0..8d72cb7caeadf 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -508,6 +508,8 @@ int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT; module_param(rcu_cpu_stall_timeout, int, 0644); int rcu_exp_cpu_stall_timeout __read_mostly = CONFIG_RCU_EXP_CPU_STALL_TIMEOUT; module_param(rcu_exp_cpu_stall_timeout, int, 0644); +int rcu_cpu_stall_cputime __read_mostly = IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME); +module_param(rcu_cpu_stall_cputime, int, 0644); #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ // Suppress boot-time RCU CPU stall warnings and rcutorture writer stall From patchwork Thu Jan 5 00:51:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C62F1C53210 for ; Thu, 5 Jan 2023 00:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229773AbjAEAzb (ORCPT ); Wed, 4 Jan 2023 19:55:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230227AbjAEAyn (ORCPT ); Wed, 4 Jan 2023 19:54:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6F544D726; Wed, 4 Jan 2023 16:51:28 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 43A8A61892; Thu, 5 Jan 2023 00:51:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9DB9CC43392; Thu, 5 Jan 2023 00:51:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672879887; bh=KvTTOEuf3w3HlqviVybOzjEM5gng/swpX4BJIk+B47g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NGyVkmIQee17I1jwvZ64LOwKtzTsgwKuE0my358MPYotRSPRdZhIZ3vtVkzDvbKz5 ZAIIt/oVOVISlenv3a+kWgsc8VbatWUXzybT92E7FQelrR85cuDLjTbjmbdZVxYcUT K6y1Q/8lF1SxKprLJKDuYOxvVWiOXBn4T4S666bnBjBa6XsGEbJFMzOUhkP8WNrJs/ Be+frtvWL4W/NnvgnZFen+zoKiya4EP6ACJPFk98n//QO3s1+uOxyrY1H3ME2+f8Cp rLLno0Fur/WiCmHb96lAI9CH1+KBQZmxQV1l3oM6qc+SXR9rb9HzZRmHatQfRJGkql uP7lTROTxuaEg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 574A75C1ADF; Wed, 4 Jan 2023 16:51:27 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, Zhen Lei , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH rcu 5/6] rcu: Align the output of RCU CPU stall warning messages Date: Wed, 4 Jan 2023 16:51:25 -0800 Message-Id: <20230105005126.1772294-5-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> References: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org From: Zhen Lei Time stamps are added to the output in kernels built with CONFIG_PRINTK_TIME=y, which causes misaligned output. Therefore, replace pr_cont() with pr_err(), which fixes alignment and gets rid of a couple of despised pr_cont() calls. Before: [ 37.567343] rcu: INFO: rcu_preempt self-detected stall on CPU [ 37.567839] rcu: 0-....: (1500 ticks this GP) idle=*** [ 37.568270] (t=1501 jiffies g=4717 q=28 ncpus=4) [ 37.568668] CPU: 0 PID: 313 Comm: test0 Not tainted 6.1.0-rc4 #8 After: [ 36.762074] rcu: INFO: rcu_preempt self-detected stall on CPU [ 36.762543] rcu: 0-....: (1499 ticks this GP) idle=*** [ 36.763003] rcu: (t=1500 jiffies g=5097 q=27 ncpus=4) [ 36.763522] CPU: 0 PID: 313 Comm: test0 Not tainted 6.1.0-rc4 #9 Signed-off-by: Zhen Lei Reviewed-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_stall.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 6de15fb10bc47..f360894f5599d 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -619,7 +619,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) for_each_possible_cpu(cpu) totqlen += rcu_get_n_cbs_cpu(cpu); - pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n", + pr_err("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n", smp_processor_id(), (long)(jiffies - gps), (long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus); if (ndetected) { @@ -680,7 +680,7 @@ static void print_cpu_stall(unsigned long gps) raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags); for_each_possible_cpu(cpu) totqlen += rcu_get_n_cbs_cpu(cpu); - pr_cont("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n", + pr_err("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n", jiffies - gps, (long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus); From patchwork Thu Jan 5 00:51:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 13089347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 117C2C54EBF for ; Thu, 5 Jan 2023 00:56:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229843AbjAEAze (ORCPT ); Wed, 4 Jan 2023 19:55:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230235AbjAEAyo (ORCPT ); Wed, 4 Jan 2023 19:54:44 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 356214D72F; Wed, 4 Jan 2023 16:51:29 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C4D996188D; Thu, 5 Jan 2023 00:51:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 08C6AC433AF; Thu, 5 Jan 2023 00:51:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672879888; bh=p3/KoXMV3EksWgDrlSxHSSeH/du9IVFe1LVWFfB/aF0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Qp55mSALdEETA8doDowXWpt7ob8SUwAcHlAlim/KlsxOjNvGDuJNfozajdIethgyz 2fpENCx28cQLLWyKLqHvEmQ7u3FxFz3xjs7Pqj5SUlbIatcA+ShZSHAqWEjykVexQ0 UFtRz+/D3r456Eod/gg5JCP4ZDbmDGJCgLZ601h6kumyMS0U/vs1HMF1LB77Pf/OGr 9AupvPDY+F8zc0R32DPMZpGk/bte7GR1PdnDoKCBn8TRvMG9IhllKEGhKx4fN2i29h zCHCSAHg/NhkrSXicUP42GGRpiouq8e0qbERddsKryyg3TQb1mqoLj5shg/70nBxXY tDoVc0Len0xxg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 5958C5C1AE0; Wed, 4 Jan 2023 16:51:27 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Paul E. McKenney" , Dave Chinner , Dmitry Vyukov Subject: [PATCH rcu 6/6] rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts Date: Wed, 4 Jan 2023 16:51:26 -0800 Message-Id: <20230105005126.1772294-6-paulmck@kernel.org> X-Mailer: git-send-email 2.31.1.189.g2e36527f23 In-Reply-To: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> References: <20230105005105.GA1772125@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org The maximum value of RCU CPU stall-warning timeouts has historically been five minutes (300 seconds). However, the recently introduced expedited RCU CPU stall-warning timeout is instead limited to 21 seconds. This causes problems for CI/fuzzing services such as syzkaller by obscuring the issue in question with expedited RCU CPU stall-warning timeout splats. This commit therefore sets the RCU_EXP_CPU_STALL_TIMEOUT Kconfig options upper bound to 300000 milliseconds, which is 300 seconds and 5 minutes. [ paulmck: Apply feedback from Hillf Danton. ] Reported-by: Dave Chinner Reported-by: Dmitry Vyukov Tested-by: Dmitry Vyukov Signed-off-by: Paul E. McKenney --- kernel/rcu/Kconfig.debug | 2 +- kernel/rcu/tree_stall.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug index 49da904df6aa6..2984de629f749 100644 --- a/kernel/rcu/Kconfig.debug +++ b/kernel/rcu/Kconfig.debug @@ -82,7 +82,7 @@ config RCU_CPU_STALL_TIMEOUT config RCU_EXP_CPU_STALL_TIMEOUT int "Expedited RCU CPU stall timeout in milliseconds" depends on RCU_STALL_COMMON - range 0 21000 + range 0 300000 default 0 help If a given expedited RCU grace period extends more than the diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index f360894f5599d..b10b8349bb2a4 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -39,7 +39,7 @@ int rcu_exp_jiffies_till_stall_check(void) // CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, so check the allowed range. // The minimum clamped value is "2UL", because at least one full // tick has to be guaranteed. - till_stall_check = clamp(msecs_to_jiffies(cpu_stall_timeout), 2UL, 21UL * HZ); + till_stall_check = clamp(msecs_to_jiffies(cpu_stall_timeout), 2UL, 300UL * HZ); if (cpu_stall_timeout && jiffies_to_msecs(till_stall_check) != cpu_stall_timeout) WRITE_ONCE(rcu_exp_cpu_stall_timeout, jiffies_to_msecs(till_stall_check));