diff mbox

[v2,1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

Message ID 1458147733-29338-2-git-send-email-cmetcalf@mellanox.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Metcalf March 16, 2016, 5:02 p.m. UTC
Currently you can only request a backtrace of either all cpus, or
all cpus but yourself.  It can also be helpful to request a remote
backtrace of a single cpu, and since we want that, the logical
extension is to support a cpumask as the underlying primitive.

This change modifies the existing lib/nmi_backtrace.c code to take
a cpumask as its basic primitive, and modifies the linux/nmi.h code
to use either the old "all/all_but_self" arch methods, or the new
"cpumask" method, depending on which is available.

The existing clients of nmi_backtrace (arm and x86) are converted
to using the new cpumask approach in this change.

Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
---
 arch/arm/include/asm/irq.h    |  4 +--
 arch/arm/kernel/smp.c         |  4 +--
 arch/x86/include/asm/irq.h    |  4 +--
 arch/x86/kernel/apic/hw_nmi.c |  6 ++---
 include/linux/nmi.h           | 63 ++++++++++++++++++++++++++++++++++---------
 lib/nmi_backtrace.c           | 15 +++++------
 6 files changed, 65 insertions(+), 31 deletions(-)

Comments

Peter Zijlstra March 17, 2016, 7:36 p.m. UTC | #1
On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
> Currently you can only request a backtrace of either all cpus, or
> all cpus but yourself.  It can also be helpful to request a remote
> backtrace of a single cpu, and since we want that, the logical
> extension is to support a cpumask as the underlying primitive.
> 
> This change modifies the existing lib/nmi_backtrace.c code to take
> a cpumask as its basic primitive, and modifies the linux/nmi.h code
> to use either the old "all/all_but_self" arch methods, or the new
> "cpumask" method, depending on which is available.
> 
> The existing clients of nmi_backtrace (arm and x86) are converted
> to using the new cpumask approach in this change.

So the past days I've been staring at RCU stall warns, and they can use
a little of this. Their remote stack unwinds are less than useful.
Chris Metcalf March 17, 2016, 10:31 p.m. UTC | #2
On 3/17/2016 3:36 PM, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
>> Currently you can only request a backtrace of either all cpus, or
>> all cpus but yourself.  It can also be helpful to request a remote
>> backtrace of a single cpu, and since we want that, the logical
>> extension is to support a cpumask as the underlying primitive.
>>
>> This change modifies the existing lib/nmi_backtrace.c code to take
>> a cpumask as its basic primitive, and modifies the linux/nmi.h code
>> to use either the old "all/all_but_self" arch methods, or the new
>> "cpumask" method, depending on which is available.
>>
>> The existing clients of nmi_backtrace (arm and x86) are converted
>> to using the new cpumask approach in this change.
> So the past days I've been staring at RCU stall warns, and they can use
> a little of this. Their remote stack unwinds are less than useful.

Were you suggesting this as an improvement for a possible v3, or just a
kind of implicit ack of the patch series?  Thanks!
Peter Zijlstra March 17, 2016, 10:38 p.m. UTC | #3
On Thu, Mar 17, 2016 at 06:31:44PM -0400, Chris Metcalf wrote:
> On 3/17/2016 3:36 PM, Peter Zijlstra wrote:
> >On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
> >>Currently you can only request a backtrace of either all cpus, or
> >>all cpus but yourself.  It can also be helpful to request a remote
> >>backtrace of a single cpu, and since we want that, the logical
> >>extension is to support a cpumask as the underlying primitive.
> >>
> >>This change modifies the existing lib/nmi_backtrace.c code to take
> >>a cpumask as its basic primitive, and modifies the linux/nmi.h code
> >>to use either the old "all/all_but_self" arch methods, or the new
> >>"cpumask" method, depending on which is available.
> >>
> >>The existing clients of nmi_backtrace (arm and x86) are converted
> >>to using the new cpumask approach in this change.
> >So the past days I've been staring at RCU stall warns, and they can use
> >a little of this. Their remote stack unwinds are less than useful.
> 
> Were you suggesting this as an improvement for a possible v3, or just a
> kind of implicit ack of the patch series?  Thanks!

A suggestion more like. I've not actually looked at the 4th patch.

I'll try and fold the patches into the runs I do tomorrow, I'm sure to
trigger lots of fail. Maybe I'll even do that RCU patch.
Paul E. McKenney March 17, 2016, 10:55 p.m. UTC | #4
On Thu, Mar 17, 2016 at 08:36:00PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
> > Currently you can only request a backtrace of either all cpus, or
> > all cpus but yourself.  It can also be helpful to request a remote
> > backtrace of a single cpu, and since we want that, the logical
> > extension is to support a cpumask as the underlying primitive.
> > 
> > This change modifies the existing lib/nmi_backtrace.c code to take
> > a cpumask as its basic primitive, and modifies the linux/nmi.h code
> > to use either the old "all/all_but_self" arch methods, or the new
> > "cpumask" method, depending on which is available.
> > 
> > The existing clients of nmi_backtrace (arm and x86) are converted
> > to using the new cpumask approach in this change.
> 
> So the past days I've been staring at RCU stall warns, and they can use
> a little of this. Their remote stack unwinds are less than useful.

The RCU stall-warn stack traces can be ugly, agreed.

That said, RCU used to use NMI-based stack traces, but switched to the
current scheme due to the NMIs having the unfortunate habit of locking
things up, which IIRC often meant no stack traces at all.  If I recall
correctly, one of the problems was self-deadlock in printk().

Have these problems been fixed?

							Thanx, Paul
Peter Zijlstra March 17, 2016, 11:09 p.m. UTC | #5
On Thu, Mar 17, 2016 at 03:55:57PM -0700, Paul E. McKenney wrote:
> That said, RCU used to use NMI-based stack traces, but switched to the
> current scheme due to the NMIs having the unfortunate habit of locking
> things up, which IIRC often meant no stack traces at all.  If I recall
> correctly, one of the problems was self-deadlock in printk().
> 
> Have these problems been fixed?

Improved is I think the word.

Although I've butchered my printk() into absolute submission :-)
Peter Zijlstra March 17, 2016, 11:11 p.m. UTC | #6
On Thu, Mar 17, 2016 at 03:55:57PM -0700, Paul E. McKenney wrote:
> The RCU stall-warn stack traces can be ugly, agreed.

Ugly isn't the problem, completely random bollocks that puts you on the
wrong path was more the problem.

It uses a stack pointer saved at some random time in the past to start
unwinding an active stack from. Completely and utter misery.
Chris Metcalf March 18, 2016, 12:17 a.m. UTC | #7
On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
> The RCU stall-warn stack traces can be ugly, agreed.
>
> That said, RCU used to use NMI-based stack traces, but switched to the
> current scheme due to the NMIs having the unfortunate habit of locking
> things up, which IIRC often meant no stack traces at all.  If I recall
> correctly, one of the problems was self-deadlock in printk().

Steven Rostedt enabled the per_cpu printk func support in June 2014, and
the nmi_backtrace code uses it to just capture printk output to percpu
buffers, so I think it's going to be a lot more robust than earlier attempts.
Paul E. McKenney March 18, 2016, 12:28 a.m. UTC | #8
On Fri, Mar 18, 2016 at 12:11:28AM +0100, Peter Zijlstra wrote:
> On Thu, Mar 17, 2016 at 03:55:57PM -0700, Paul E. McKenney wrote:
> > The RCU stall-warn stack traces can be ugly, agreed.
> 
> Ugly isn't the problem, completely random bollocks that puts you on the
> wrong path was more the problem.
> 
> It uses a stack pointer saved at some random time in the past to start
> unwinding an active stack from. Completely and utter misery.

Yep, its accuracy does depend on what is going on, which was also my
experience with the NMI-based approach's reliablity.

Perhaps a boot-time parameter enabling the sysadm to pick the desired
flavor of poison?

							Thanx, Paul
Paul E. McKenney March 18, 2016, 12:33 a.m. UTC | #9
On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote:
> On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
> >The RCU stall-warn stack traces can be ugly, agreed.
> >
> >That said, RCU used to use NMI-based stack traces, but switched to the
> >current scheme due to the NMIs having the unfortunate habit of locking
> >things up, which IIRC often meant no stack traces at all.  If I recall
> >correctly, one of the problems was self-deadlock in printk().
> 
> Steven Rostedt enabled the per_cpu printk func support in June 2014, and
> the nmi_backtrace code uses it to just capture printk output to percpu
> buffers, so I think it's going to be a lot more robust than earlier attempts.

That would be a very good thing, give or take the "I think" qualifier.
And assuming that the target CPU is healthy enough to find its way back
to some place that can dump the per-CPU printk buffer.  I might well
be overly paranoid, but I have to suspect that the probability of that
buffer getting dumped is reduced greatly on a CPU that isn't healthy
enough to respond to RCU, though.

But it seems like enabling the experiment might be useful.

"Try enabling the NMI version.  If that doesn't get you your RCU CPU
stall warning stack trace, try the remote-print variant."

Or I suppose we could just do both in succession, just in case their
console was a serial port.  ;-)

							Thanx, Paul
Daniel Thompson March 18, 2016, 9:40 a.m. UTC | #10
On 18/03/16 00:33, Paul E. McKenney wrote:
> On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote:
>> On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
>>> The RCU stall-warn stack traces can be ugly, agreed.
>>>
>>> That said, RCU used to use NMI-based stack traces, but switched to the
>>> current scheme due to the NMIs having the unfortunate habit of locking
>>> things up, which IIRC often meant no stack traces at all.  If I recall
>>> correctly, one of the problems was self-deadlock in printk().
>>
>> Steven Rostedt enabled the per_cpu printk func support in June 2014, and
>> the nmi_backtrace code uses it to just capture printk output to percpu
>> buffers, so I think it's going to be a lot more robust than earlier attempts.
>
> That would be a very good thing, give or take the "I think" qualifier.
> And assuming that the target CPU is healthy enough to find its way back
> to some place that can dump the per-CPU printk buffer.  I might well
> be overly paranoid, but I have to suspect that the probability of that
> buffer getting dumped is reduced greatly on a CPU that isn't healthy
> enough to respond to RCU, though.

The target CPU doesn't dump the buffer. It "just" fields the NMI, stores 
the backtrace and sets a flag.

The buffer is dumped to console by the requesting CPU, either when all 
backtraces have come back or when a timeout is reached.


> But it seems like enabling the experiment might be useful.
>
> "Try enabling the NMI version.  If that doesn't get you your RCU CPU
> stall warning stack trace, try the remote-print variant."
>
> Or I suppose we could just do both in succession, just in case their
> console was a serial port.  ;-)

I guess both might be needed but only when the target CPU is dead enough 
to fail to respond to NMI. In principle, we could exploit the timeout in 
the NMI backtrace logic and only issue the missing backtraces.


Daniel.
Paul E. McKenney March 18, 2016, 11:54 p.m. UTC | #11
On Fri, Mar 18, 2016 at 09:40:25AM +0000, Daniel Thompson wrote:
> On 18/03/16 00:33, Paul E. McKenney wrote:
> >On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote:
> >>On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
> >>>The RCU stall-warn stack traces can be ugly, agreed.
> >>>
> >>>That said, RCU used to use NMI-based stack traces, but switched to the
> >>>current scheme due to the NMIs having the unfortunate habit of locking
> >>>things up, which IIRC often meant no stack traces at all.  If I recall
> >>>correctly, one of the problems was self-deadlock in printk().
> >>
> >>Steven Rostedt enabled the per_cpu printk func support in June 2014, and
> >>the nmi_backtrace code uses it to just capture printk output to percpu
> >>buffers, so I think it's going to be a lot more robust than earlier attempts.
> >
> >That would be a very good thing, give or take the "I think" qualifier.
> >And assuming that the target CPU is healthy enough to find its way back
> >to some place that can dump the per-CPU printk buffer.  I might well
> >be overly paranoid, but I have to suspect that the probability of that
> >buffer getting dumped is reduced greatly on a CPU that isn't healthy
> >enough to respond to RCU, though.
> 
> The target CPU doesn't dump the buffer. It "just" fields the NMI,
> stores the backtrace and sets a flag.
> 
> The buffer is dumped to console by the requesting CPU, either when
> all backtraces have come back or when a timeout is reached.

That does sound a bit more robust, good!

> >But it seems like enabling the experiment might be useful.
> >
> >"Try enabling the NMI version.  If that doesn't get you your RCU CPU
> >stall warning stack trace, try the remote-print variant."
> >
> >Or I suppose we could just do both in succession, just in case their
> >console was a serial port.  ;-)
> 
> I guess both might be needed but only when the target CPU is dead
> enough to fail to respond to NMI. In principle, we could exploit the
> timeout in the NMI backtrace logic and only issue the missing
> backtraces.

It would be really nice if I could call one function that used the
best strategy for getting information (including stack trace) about a
specified CPU.  Ditto for getting information about a specified task,
which might be running or might be preempted at the time.

							Thanx, Paul
diff mbox

Patch

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 1bd9510de1b9..13f9a9a17eca 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -36,8 +36,8 @@  extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
 #endif
 
 #ifdef CONFIG_SMP
-extern void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask);
+#define arch_trigger_cpumask_backtrace(x) arch_trigger_cpumask_backtrace(x)
 #endif
 
 static inline int nr_legacy_irqs(void)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 37312f6749f3..208125658e56 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -758,7 +758,7 @@  static void raise_nmi(cpumask_t *mask)
 	smp_cross_call(mask, IPI_CPU_BACKTRACE);
 }
 
-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
 {
-	nmi_trigger_all_cpu_backtrace(include_self, raise_nmi);
+	nmi_trigger_cpumask_backtrace(mask, raise_nmi);
 }
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index e7de5c9a4fbd..18bdc8cc5c63 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -50,8 +50,8 @@  extern int vector_used_by_percpu_irq(unsigned int vector);
 extern void init_ISA_irqs(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 #endif
 
 #endif /* _ASM_X86_IRQ_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..63f0b69ad6a6 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -27,15 +27,15 @@  u64 hw_nmi_get_sample_period(int watchdog_thresh)
 }
 #endif
 
-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
 static void nmi_raise_cpu_backtrace(cpumask_t *mask)
 {
 	apic->send_IPI_mask(mask, NMI_VECTOR);
 }
 
-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
 {
-	nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
+	nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
 }
 
 static int
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 7ec5b86735f3..951875f4f072 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -31,38 +31,75 @@  static inline void hardlockup_detector_disable(void) {}
 #endif
 
 /*
- * Create trigger_all_cpu_backtrace() out of the arch-provided
- * base function. Return whether such support was available,
+ * Create trigger_all_cpu_backtrace() etc out of the arch-provided
+ * base function(s). Return whether such support was available,
  * to allow calling code to fall back to some other mechanism:
  */
-#ifdef arch_trigger_all_cpu_backtrace
 static inline bool trigger_all_cpu_backtrace(void)
 {
+#if defined(arch_trigger_all_cpu_backtrace)
 	arch_trigger_all_cpu_backtrace(true);
-
 	return true;
+#elif defined(arch_trigger_cpumask_backtrace)
+	arch_trigger_cpumask_backtrace(cpu_online_mask);
+	return true;
+#else
+	return false;
+#endif
 }
+
 static inline bool trigger_allbutself_cpu_backtrace(void)
 {
+#if defined(arch_trigger_all_cpu_backtrace)
 	arch_trigger_all_cpu_backtrace(false);
 	return true;
-}
-
-/* generic implementation */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
-				   void (*raise)(cpumask_t *mask));
-bool nmi_cpu_backtrace(struct pt_regs *regs);
+#elif defined(arch_trigger_cpumask_backtrace)
+	cpumask_var_t mask;
+	int cpu = get_cpu();
 
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+		return false;
+	cpumask_copy(mask, cpu_online_mask);
+	cpumask_clear_cpu(cpu, mask);
+	arch_trigger_cpumask_backtrace(mask);
+	put_cpu();
+	free_cpumask_var(mask);
+	return true;
 #else
-static inline bool trigger_all_cpu_backtrace(void)
-{
 	return false;
+#endif
 }
-static inline bool trigger_allbutself_cpu_backtrace(void)
+
+static inline bool trigger_cpumask_backtrace(struct cpumask *mask)
 {
+#if defined(arch_trigger_cpumask_backtrace)
+	arch_trigger_cpumask_backtrace(mask);
+	return true;
+#else
 	return false;
+#endif
 }
+
+static inline bool trigger_single_cpu_backtrace(int cpu)
+{
+#if defined(arch_trigger_cpumask_backtrace)
+	cpumask_var_t mask;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+		return false;
+	cpumask_set_cpu(cpu, mask);
+	arch_trigger_cpumask_backtrace(mask);
+	free_cpumask_var(mask);
+	return true;
+#else
+	return false;
 #endif
+}
+
+/* generic implementation */
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
+				   void (*raise)(cpumask_t *mask));
+bool nmi_cpu_backtrace(struct pt_regs *regs);
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..db63ac75eba0 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -18,7 +18,7 @@ 
 #include <linux/nmi.h>
 #include <linux/seq_buf.h>
 
-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
 /* For reliability, we're prepared to waste bits here. */
 static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
 static cpumask_t printtrace_mask;
@@ -44,12 +44,12 @@  static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
 }
 
 /*
- * When raise() is called it will be is passed a pointer to the
+ * When raise() is called it will be passed a pointer to the
  * backtrace_mask. Architectures that call nmi_cpu_backtrace()
  * directly from their raise() functions may rely on the mask
  * they are passed being updated as a side effect of this call.
  */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 				   void (*raise)(cpumask_t *mask))
 {
 	struct nmi_seq_buf *s;
@@ -64,10 +64,7 @@  void nmi_trigger_all_cpu_backtrace(bool include_self,
 		return;
 	}
 
-	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
-	if (!include_self)
-		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
-
+	cpumask_copy(to_cpumask(backtrace_mask), mask);
 	cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
 
 	/*
@@ -80,8 +77,8 @@  void nmi_trigger_all_cpu_backtrace(bool include_self,
 	}
 
 	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
-		pr_info("Sending NMI to %s CPUs:\n",
-			(include_self ? "all" : "other"));
+		pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
+			this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
 		raise(to_cpumask(backtrace_mask));
 	}