From patchwork Fri Aug 16 04:39:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13765435 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B36177111; Fri, 16 Aug 2024 04:39:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783198; cv=none; b=kCNixKUREk88vpQJVuC4jr800P6zby/UyHiP+O7FyamlD8rnaAHaBJL7jXv6sObNWUMLTELRFaM+GjdP+f8tQYjTdHEVgMTXv8vsFVwzDAPF3XIRzLtzV2O7j5YNmGhMla/4CpYgowMkUYBvZrAANKlyGvdYjFFDj4I6qeJvUsI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783198; c=relaxed/simple; bh=EQ7mddlc0hf+YaM1u17yUZvNq42jgh8Dic8l77MpinI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jB90XMA4HOA+VVAuMTpq4+cjFdvPY5DmvOybYqjyl2B40Kws3TmJdCh3/pOnQMwOzQto4YxN7yyyU6U/I2PPvOy5UOEZvnEVzzm5tWCkRXaFcbHN/0NjlaC7RQiWOBiGiAS425rof4GY2H28zp1SIshTuzw3wVxEJR1qeIgl9pI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=A4VWccQx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="A4VWccQx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 012DAC32782; Fri, 16 Aug 2024 04:39:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723783196; bh=EQ7mddlc0hf+YaM1u17yUZvNq42jgh8Dic8l77MpinI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A4VWccQxvsZoXHT7JlT10k6GxP55bWAnR5Uf0t0vDxdMUWltkNnkDEL2akWGX+HSn KnGIuLkxUFa1k3Y1GxJ3WVLd/sK/2qbXhiYmy0+cpmnAuxAOGl9CjvG1VyXtRVN5BY HUCo1XXN3HAtVBfy5l9hTUhWR+7D6YG8X8Ayc+BnXaweFa+BwM9Pys7ztwWiHJ69UJ 7/n+dtuxbeIe5BiYVUxjK2+2YNukcgPYgIUE+zkkPivRPlhubp0esth4r0UjgbeiFU xWnReSUXlDipjyoHmBRPJTRsky8luwpmX7nuRcZ1XMkKOhFtXLOhPmOzRDzbgSYVfJ 6WLVh+D9hYpEg== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, paulmck@kernel.org, neeraj.upadhyay@kernel.org, neeraj.upadhyay@amd.com, boqun.feng@gmail.com, joel@joelfernandes.org, urezki@gmail.com, frederic@kernel.org, mingo@kernel.org, peterz@infradead.org, leobras@redhat.com, imran.f.khan@oracle.com, riel@surriel.com, tglx@linutronix.de Subject: [PATCH rcu 1/4] locking/csd_lock: Print large numbers as negatives Date: Fri, 16 Aug 2024 10:09:14 +0530 Message-Id: <20240816043917.26537-1-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240816043600.GA25206@neeraj.linux> References: <20240816043600.GA25206@neeraj.linux> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Paul E. McKenney" The CSD-lock-hold diagnostics from CONFIG_CSD_LOCK_WAIT_DEBUG are printed in nanoseconds as unsigned long longs, which is a bit obtuse for human readers when timing bugs result in negative CSD-lock hold times. Yes, there are some people to whom it is immediately obvious that 18446744073709551615 is really -1, but for the rest of us... Therefore, print these numbers as signed long longs, making the negative hold times immediately apparent. Reported-by: Rik van Riel Signed-off-by: Paul E. McKenney Cc: Imran Khan Cc: Ingo Molnar Cc: Leonardo Bras Cc: "Peter Zijlstra (Intel)" Cc: Rik van Riel Reviewed-by: Rik van Riel Signed-off-by: Neeraj Upadhyay --- kernel/smp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index aaffecdad319..e87953729230 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -249,8 +249,8 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */ /* How long since this CSD lock was stuck. */ ts_delta = ts2 - ts0; - pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n", - firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts_delta, + pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %lld ns for CPU#%02d %pS(%ps).\n", + firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), (s64)ts_delta, cpu, csd->func, csd->info); /* * If the CSD lock is still stuck after 5 minutes, it is unlikely From patchwork Fri Aug 16 04:39:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13765436 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAD837D07E; Fri, 16 Aug 2024 04:40:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783203; cv=none; b=TkmHmUQC0ucdMhrvBh0oKRKZKER/Sl+ZUrHKoP3LxqIov27dSciz/d9DkpN3HUVBpSSgrPU/Tax9d84QRT43Inhx+kH2RLYj6zp0RooYL3fZrOcyyq4gJ65iwOXVyZZ+1mDEcrh9KNNvz+vEc3celcgE7uw+DiXmu4c8rCLlr9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783203; c=relaxed/simple; bh=J89s7BvJjaodUbVqWIXuLyWtkXirVXq4up0wxHeWgqw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NSUTaLlImmNhwPfMCLLUuSsvmmZKQ463+xopUcz15iG3xy5OdLL56ufwL2+rFUiFvFV9sYG/McaINh6pkVyjVeGq2FJFQ/dZPnZoYAaTp9ZppmD7Z7G+mngXiNdKqlu+VDhAsGgVYOEGTe6NQJBe2SGaNaPxONyzNddXa0ib/lw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RkZ4HP88; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RkZ4HP88" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 61278C4AF0E; Fri, 16 Aug 2024 04:39:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723783203; bh=J89s7BvJjaodUbVqWIXuLyWtkXirVXq4up0wxHeWgqw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RkZ4HP88N78dCZAVe35PCZTIyGNM95HZt0VsuJbh2C8s0mK3XxkNdiJcpBSCt22Or J3+w+iKZb7BluE5Cp09vdLhUnzlCN6wERR2ziEiuUX7NWFNY3FdGYO+9Z9/WWn+k98 B1kg6IZmQPVP2zB+QprbxcWB1mOBM32vGBlMd56I0Dj6ZW8v8vnZIGb3/YXGosG2mT WylPPpsJefVeXKH/84sLVSwcGCjLWIiR8qeZkrV4UDKNboXc+pC/vQiMMeF5sYyb+U 8Grzv4iKrVatpUUO8s+mrxROfjOqG2JKQnDlU9LDsdsAH9ZNWVmAXsmD5AI6MHhjbE Phg+33rqHZn+g== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, paulmck@kernel.org, neeraj.upadhyay@kernel.org, neeraj.upadhyay@amd.com, boqun.feng@gmail.com, joel@joelfernandes.org, urezki@gmail.com, frederic@kernel.org, mingo@kernel.org, peterz@infradead.org, leobras@redhat.com, imran.f.khan@oracle.com, riel@surriel.com, tglx@linutronix.de Subject: [PATCH rcu 2/4] locking/csd_lock: Provide an indication of ongoing CSD-lock stall Date: Fri, 16 Aug 2024 10:09:15 +0530 Message-Id: <20240816043917.26537-2-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240816043600.GA25206@neeraj.linux> References: <20240816043600.GA25206@neeraj.linux> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Paul E. McKenney" If a CSD-lock stall goes on long enough, it will cause an RCU CPU stall warning. This additional warning provides much additional console-log traffic and little additional information. Therefore, provide a new csd_lock_is_stuck() function that returns true if there is an ongoing CSD-lock stall. This function will be used by the RCU CPU stall warnings to provide a one-line indication of the stall when this function returns true. [ neeraj.upadhyay: Apply Rik van Riel feedback. ] [ neeraj.upadhyay: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney Cc: Imran Khan Cc: Ingo Molnar Cc: Leonardo Bras Cc: "Peter Zijlstra (Intel)" Cc: Rik van Riel Signed-off-by: Neeraj Upadhyay --- include/linux/smp.h | 6 ++++++ kernel/smp.c | 16 ++++++++++++++++ lib/Kconfig.debug | 1 + 3 files changed, 23 insertions(+) diff --git a/include/linux/smp.h b/include/linux/smp.h index fcd61dfe2af3..3871bd32018f 100644 --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -294,4 +294,10 @@ int smpcfd_prepare_cpu(unsigned int cpu); int smpcfd_dead_cpu(unsigned int cpu); int smpcfd_dying_cpu(unsigned int cpu); +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG +bool csd_lock_is_stuck(void); +#else +static inline bool csd_lock_is_stuck(void) { return false; } +#endif + #endif /* __LINUX_SMP_H */ diff --git a/kernel/smp.c b/kernel/smp.c index e87953729230..202cda4d2a55 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -208,6 +208,19 @@ static int csd_lock_wait_getcpu(call_single_data_t *csd) return -1; } +static atomic_t n_csd_lock_stuck; + +/** + * csd_lock_is_stuck - Has a CSD-lock acquisition been stuck too long? + * + * Returns @true if a CSD-lock acquisition is stuck and has been stuck + * long enough for a "non-responsive CSD lock" message to be printed. + */ +bool csd_lock_is_stuck(void) +{ + return !!atomic_read(&n_csd_lock_stuck); +} + /* * Complain if too much time spent waiting. Note that only * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, @@ -229,6 +242,7 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in cpu = csd_lock_wait_getcpu(csd); pr_alert("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock.\n", *bug_id, raw_smp_processor_id(), cpu); + atomic_dec(&n_csd_lock_stuck); return true; } @@ -252,6 +266,8 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %lld ns for CPU#%02d %pS(%ps).\n", firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), (s64)ts_delta, cpu, csd->func, csd->info); + if (firsttime) + atomic_inc(&n_csd_lock_stuck); /* * If the CSD lock is still stuck after 5 minutes, it is unlikely * to become unstuck. Use a signed comparison to avoid triggering diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a30c03a66172..4e5f61cba8e4 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1614,6 +1614,7 @@ config SCF_TORTURE_TEST config CSD_LOCK_WAIT_DEBUG bool "Debugging for csd_lock_wait(), called from smp_call_function*()" depends on DEBUG_KERNEL + depends on SMP depends on 64BIT default n help From patchwork Fri Aug 16 04:39:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13765437 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E6FC33CD2; Fri, 16 Aug 2024 04:40:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783210; cv=none; b=P1JPvFb+WGRbYb8pAPSZkjikFEzPpRYoc2HzJ5IS8L2gGesBxC9QZrwVqhN1TepsNVdWghwFRqdYepBzg864StmggLTF9OQMCwnu/95EH3DrYSbMlWgu3v72MF7AabEVcxH8W2xrkybe8EHmKdPEXBUbqklc9r82ft217UKX1b8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783210; c=relaxed/simple; bh=mlV2aeJQvODcnjObCuPPtD0W0gOiPC0drbmegr+nbl4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tUMoH0Pk+2d0mMQhEFLd6TkNFsKvYYrJWq7KYelMCMerz72/5tXToBQEgKP1ujOmTGs/PB38INQW28X9Rx5MzEX6xWjJWToojNRTAStldpdHTUj414CjMyMxZuE68Aa0nqVfHG0sxTnypBXZLESvAgH36hozFnyFcEtBM/0qBHM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Lt1i4s22; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Lt1i4s22" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 12716C4AF0B; Fri, 16 Aug 2024 04:40:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723783210; bh=mlV2aeJQvODcnjObCuPPtD0W0gOiPC0drbmegr+nbl4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Lt1i4s22YAsJqB0QZCGmDithVDGt0yR2qei2DjNEmsqu6U9fuiekb3kU2E6y53ZWi JIF9TqZZBW3ayXD7vFV+HBkwk7uvOa6rOT4WO6ncF8R5ylMSfV+NHGHUXHd+V9P4wS gsv59jsPJd3Pnz6Et8JLPyuCcHjDDImYHDk3kjQicNxUYgJuzzJBw4O/OJYC2M/b/A fczidfRC2wjfSsjIO5n9cTbvOC3AeI/TbTctDUuqHQviHrUZwGcMskbj4meyreKXqV cQRm8w1uauTXk2BQv0CT6FAIo86p27LTGFGeEi9OUld/VUTnwJdDRR9G3Srj7NsCek z8tNpk+1QYTRg== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, paulmck@kernel.org, neeraj.upadhyay@kernel.org, neeraj.upadhyay@amd.com, boqun.feng@gmail.com, joel@joelfernandes.org, urezki@gmail.com, frederic@kernel.org, mingo@kernel.org, peterz@infradead.org, leobras@redhat.com, imran.f.khan@oracle.com, riel@surriel.com, tglx@linutronix.de Subject: [PATCH rcu 3/4] locking/csd-lock: Use backoff for repeated reports of same incident Date: Fri, 16 Aug 2024 10:09:16 +0530 Message-Id: <20240816043917.26537-3-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240816043600.GA25206@neeraj.linux> References: <20240816043600.GA25206@neeraj.linux> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Paul E. McKenney" Currently, the CSD-lock diagnostics in CONFIG_CSD_LOCK_WAIT_DEBUG=y kernels are emitted at five-second intervals. Although this has proven to be a good time interval for the first diagnostic, if the target CPU keeps interrupts disabled for way longer than five seconds, the ratio of useful new information to pointless repetition increases considerably. Therefore, back off the time period for repeated reports of the same incident, increasing linearly with the number of reports and logarithmicly with the number of online CPUs. [ paulmck: Apply Dan Carpenter feedback. ] Signed-off-by: Paul E. McKenney Cc: Imran Khan Cc: Ingo Molnar Cc: Leonardo Bras Cc: "Peter Zijlstra (Intel)" Cc: Rik van Riel Reviewed-by: Rik van Riel Signed-off-by: Neeraj Upadhyay --- kernel/smp.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 202cda4d2a55..b484ee6dcaf6 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -226,7 +226,7 @@ bool csd_lock_is_stuck(void) * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, * so waiting on other types gets much less information. */ -static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) +static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id, unsigned long *nmessages) { int cpu = -1; int cpux; @@ -249,7 +249,9 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in ts2 = sched_clock(); /* How long since we last checked for a stuck CSD lock.*/ ts_delta = ts2 - *ts1; - if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) + if (likely(ts_delta <= csd_lock_timeout_ns * (*nmessages + 1) * + (!*nmessages ? 1 : (ilog2(num_online_cpus()) / 2 + 1)) || + csd_lock_timeout_ns == 0)) return false; firsttime = !*bug_id; @@ -266,6 +268,7 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %lld ns for CPU#%02d %pS(%ps).\n", firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), (s64)ts_delta, cpu, csd->func, csd->info); + (*nmessages)++; if (firsttime) atomic_inc(&n_csd_lock_stuck); /* @@ -306,12 +309,13 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in */ static void __csd_lock_wait(call_single_data_t *csd) { + unsigned long nmessages = 0; int bug_id = 0; u64 ts0, ts1; ts1 = ts0 = sched_clock(); for (;;) { - if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id)) + if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id, &nmessages)) break; cpu_relax(); } From patchwork Fri Aug 16 04:39:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13765438 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECB4178C7F; Fri, 16 Aug 2024 04:40:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783217; cv=none; b=UElgKUQZ8XYejqfYJONQFz6y7Dv/ikCdSJilmVRz4a+HkCamEJe9NQN175wS8gpeIUHNNCSteV+QqCEi2wAtc/sgQpFZSp2MXmcfLJ5R6Fu6HfDc8Kjb1eET92piNgOJdOr/m+Xz4UrMlyRMXUt2XtYtx/taGKfnZGd8DzzM2rY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723783217; c=relaxed/simple; bh=Hm+00HyEo0NUhcopTYyewIN1gBN8VeHMjOGtWEtmn+E=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cpJRtpUeJGNG4PmEqJSBKDpWftKa3+8eq4Oluz4d4sMQNka07AcEpRJltzme2icavuWXH8UqPKVLz6/Qo7jRFwVL2QbqTPhLqNVhLO6bPDzided7TpEy/kw+7XS53XRF+FqhHrqDEKMbCDu9FZxXZNmSryEu7Nnmlx5IuvlxGy8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Nk33G0lY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Nk33G0lY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C706FC4AF0F; Fri, 16 Aug 2024 04:40:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723783216; bh=Hm+00HyEo0NUhcopTYyewIN1gBN8VeHMjOGtWEtmn+E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Nk33G0lY9gQjk/vtyj3QlMfl2o3aPhX7CXOBrPpbNloWYwkabivJjm5QFzO+L2ahO Qpt5WShIRKJsUpfxsYEJvrK1EapfX9p65jzbVnuFyGY4+hc+eXirAHa52tH7tVy3Ah Ie9o+ZTAp+oxKlUOt7sXZBeGR0ta2/cUB7OQgZ0xPSc+L6SMB0tyG7m+A3eGLnnxqy Mh9LxtPLxKFA8JojoJyspGyrG7o27nvHT1InFbE1z1cxCzT5XyrOXevIhBzcZ87Cbg v+mBCKAVbF2xKJ3E56pAwV4z6V51FugnF7fm+Y39JdbV4AJcM2LXqbyIH8Mbnp73+g 2WOmNRNg/gFuw== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, paulmck@kernel.org, neeraj.upadhyay@kernel.org, neeraj.upadhyay@amd.com, boqun.feng@gmail.com, joel@joelfernandes.org, urezki@gmail.com, frederic@kernel.org, mingo@kernel.org, peterz@infradead.org, leobras@redhat.com, imran.f.khan@oracle.com, riel@surriel.com, tglx@linutronix.de Subject: [PATCH rcu 4/4] smp: print only local CPU info when sched_clock goes backward Date: Fri, 16 Aug 2024 10:09:17 +0530 Message-Id: <20240816043917.26537-4-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240816043600.GA25206@neeraj.linux> References: <20240816043600.GA25206@neeraj.linux> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Rik van Riel About 40% of all csd_lock warnings observed in our fleet appear to be due to sched_clock() going backward in time (usually only a little bit), resulting in ts0 being larger than ts2. When the local CPU is at fault, we should print out a message reflecting that, rather than trying to get the remote CPU's stack trace. Signed-off-by: Rik van Riel Tested-by: "Paul E. McKenney" Signed-off-by: Neeraj Upadhyay --- kernel/smp.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/smp.c b/kernel/smp.c index b484ee6dcaf6..f25e20617b7e 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -254,6 +254,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in csd_lock_timeout_ns == 0)) return false; + if (ts0 > ts2) { + /* Our own sched_clock went backward; don't blame another CPU. */ + ts_delta = ts0 - ts2; + pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta); + *ts1 = ts2; + return false; + } + firsttime = !*bug_id; if (firsttime) *bug_id = atomic_inc_return(&csd_bug_count);