Message ID | 20240713165846.216174-2-neeraj.upadhyay@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | CSD-lock diagnostics enhancements | expand |
On Sat, 2024-07-13 at 22:28 +0530, neeraj.upadhyay@kernel.org wrote: > > @@ -228,6 +241,7 @@ static bool > csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > cpu = csd_lock_wait_getcpu(csd); > pr_alert("csd: CSD lock (#%d) got unstuck on > CPU#%02d, CPU#%02d released the lock.\n", > *bug_id, raw_smp_processor_id(), cpu); > + atomic_dec(&n_csd_lock_stuck); > return true; > } > So we decrement it when it gets unstuck. Good. > @@ -251,6 +265,8 @@ static bool > csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, > waiting %lld ns for CPU#%02d %pS(%ps).\n", > firsttime ? "Detected" : "Continued", *bug_id, > raw_smp_processor_id(), (s64)ts_delta, > cpu, csd->func, csd->info); > + if (firsttime) > + atomic_dec(&n_csd_lock_stuck); > However, I don't see any place where it is incremented when things get stuck, and this line decrements it when a CPU gets stuck for the first time? Should this be an atomic_inc?
On Sat, Jul 13, 2024 at 01:16:47PM -0400, Rik van Riel wrote: > On Sat, 2024-07-13 at 22:28 +0530, neeraj.upadhyay@kernel.org wrote: > > > > @@ -228,6 +241,7 @@ static bool > > csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > > cpu = csd_lock_wait_getcpu(csd); > > pr_alert("csd: CSD lock (#%d) got unstuck on > > CPU#%02d, CPU#%02d released the lock.\n", > > *bug_id, raw_smp_processor_id(), cpu); > > + atomic_dec(&n_csd_lock_stuck); > > return true; > > } > > > > So we decrement it when it gets unstuck. Good. > > > @@ -251,6 +265,8 @@ static bool > > csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > > pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, > > waiting %lld ns for CPU#%02d %pS(%ps).\n", > > firsttime ? "Detected" : "Continued", *bug_id, > > raw_smp_processor_id(), (s64)ts_delta, > > cpu, csd->func, csd->info); > > + if (firsttime) > > + atomic_dec(&n_csd_lock_stuck); > > > > However, I don't see any place where it is incremented when things > get stuck, and this line decrements it when a CPU gets stuck for > the first time? > > Should this be an atomic_inc? Good catch, thank you! I will go get that brown paper bag... Thanx, Paul
diff --git a/include/linux/smp.h b/include/linux/smp.h index fcd61dfe2af3..3871bd32018f 100644 --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -294,4 +294,10 @@ int smpcfd_prepare_cpu(unsigned int cpu); int smpcfd_dead_cpu(unsigned int cpu); int smpcfd_dying_cpu(unsigned int cpu); +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG +bool csd_lock_is_stuck(void); +#else +static inline bool csd_lock_is_stuck(void) { return false; } +#endif + #endif /* __LINUX_SMP_H */ diff --git a/kernel/smp.c b/kernel/smp.c index 81f7083a53e2..c3e8241e9cbb 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -207,6 +207,19 @@ static int csd_lock_wait_getcpu(call_single_data_t *csd) return -1; } +static atomic_t n_csd_lock_stuck; + +/** + * csd_lock_is_stuck - Has a CSD-lock acquisition been stuck too long? + * + * Returns @true if a CSD-lock acquisition is stuck and has been stuck + * long enough for a "non-responsive CSD lock" message to be printed. + */ +bool csd_lock_is_stuck(void) +{ + return !!atomic_read(&n_csd_lock_stuck); +} + /* * Complain if too much time spent waiting. Note that only * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, @@ -228,6 +241,7 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in cpu = csd_lock_wait_getcpu(csd); pr_alert("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock.\n", *bug_id, raw_smp_processor_id(), cpu); + atomic_dec(&n_csd_lock_stuck); return true; } @@ -251,6 +265,8 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %lld ns for CPU#%02d %pS(%ps).\n", firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), (s64)ts_delta, cpu, csd->func, csd->info); + if (firsttime) + atomic_dec(&n_csd_lock_stuck); /* * If the CSD lock is still stuck after 5 minutes, it is unlikely * to become unstuck. Use a signed comparison to avoid triggering