diff mbox

[v2,3/5] printk/nmi: Try hard to print Oops message in NMI context

Message ID 1448622572-16900-4-git-send-email-pmladek@suse.com (mailing list archive)
State Awaiting Upstream
Delegated to: Simon Horman
Headers show

Commit Message

Petr Mladek Nov. 27, 2015, 11:09 a.m. UTC
Oops messages are important for debugging. We should try hard to get
them directly into the ring buffer and print them to the console
even in NMI context. This is problematic because the needed locks
might already be taken.

What we can do, though, is to zap all printk locks. We already do this
when a printk recursion is detected. This should be safe because
the system is crashing and there shouldn't be any printk caller
that would cause the deadlock.

We should also flush the printk NMI buffers when Oops begins.
bust_spinlocks() might sound like a bad location but
it already does several other printk-related operations.

Finally, we have to serialize the printing to do not mix backtraces
from different CPUs. A simple spinlock is enough. It has already been
used for this purpose, see the commit a9edc88093287183a ("x86/nmi:
Perform a safe NMI stack trace on all CPUs").

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 arch/s390/mm/fault.c   | 1 +
 kernel/printk/nmi.c    | 6 +++++-
 kernel/printk/printk.c | 7 +++++++
 lib/bust_spinlocks.c   | 1 +
 lib/nmi_backtrace.c    | 8 ++++++++
 5 files changed, 22 insertions(+), 1 deletion(-)

Comments

Russell King - ARM Linux Dec. 1, 2015, 11:44 p.m. UTC | #1
On Fri, Nov 27, 2015 at 12:09:30PM +0100, Petr Mladek wrote:
> What we can do, though, is to zap all printk locks. We already do this
> when a printk recursion is detected. This should be safe because
> the system is crashing and there shouldn't be any printk caller
> that would cause the deadlock.

What about serial consoles which may call out to subsystems like the
clk subsystem to enable a clock, which would want to take their own
spinlocks in addition to the serial console driver?

I don't see bust_spinlocks() dealing with any of these locks, so IMHO
trying to make this work in NMI context strikes me as making the
existing solution more unreliable on ARM systems.
Petr Mladek Dec. 4, 2015, 3:27 p.m. UTC | #2
On Tue 2015-12-01 23:44:37, Russell King - ARM Linux wrote:
> On Fri, Nov 27, 2015 at 12:09:30PM +0100, Petr Mladek wrote:
> > What we can do, though, is to zap all printk locks. We already do this
> > when a printk recursion is detected. This should be safe because
> > the system is crashing and there shouldn't be any printk caller
> > that would cause the deadlock.
> 
> What about serial consoles which may call out to subsystems like the
> clk subsystem to enable a clock, which would want to take their own
> spinlocks in addition to the serial console driver?

Yes, there might be more locks used by the serial console but I do
not know how to handle them all easily. IMHO, this patch is just better
than nothing.

> I don't see bust_spinlocks() dealing with any of these locks, so IMHO
> trying to make this work in NMI context strikes me as making the
> existing solution more unreliable on ARM systems.

bust_spinlocks() calls printk_nmi_flush() that would call printk()
that would zap "lockbuf_lock" and "console_sem" when in Oops and NMI.
Yes, there might be more locks blocked but we try to break at least
the first two walls. Also zapping is allowed only once per 30 seconds,
see zap_locks(). Why do you think that it might make things more
unreliable, please?


Thanks for looking,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Dec. 4, 2015, 5:12 p.m. UTC | #3
On Fri, Dec 04, 2015 at 04:27:09PM +0100, Petr Mladek wrote:
> On Tue 2015-12-01 23:44:37, Russell King - ARM Linux wrote:
> > On Fri, Nov 27, 2015 at 12:09:30PM +0100, Petr Mladek wrote:
> > > What we can do, though, is to zap all printk locks. We already do this
> > > when a printk recursion is detected. This should be safe because
> > > the system is crashing and there shouldn't be any printk caller
> > > that would cause the deadlock.
> > 
> > What about serial consoles which may call out to subsystems like the
> > clk subsystem to enable a clock, which would want to take their own
> > spinlocks in addition to the serial console driver?
> 
> Yes, there might be more locks used by the serial console but I do
> not know how to handle them all easily. IMHO, this patch is just better
> than nothing.

I have a slightly different view...

> > I don't see bust_spinlocks() dealing with any of these locks, so IMHO
> > trying to make this work in NMI context strikes me as making the
> > existing solution more unreliable on ARM systems.
> 
> bust_spinlocks() calls printk_nmi_flush() that would call printk()
> that would zap "lockbuf_lock" and "console_sem" when in Oops and NMI.
> Yes, there might be more locks blocked but we try to break at least
> the first two walls. Also zapping is allowed only once per 30 seconds,
> see zap_locks(). Why do you think that it might make things more
> unreliable, please?

Take the scenario where CPU1 is in the middle of a printk(), and is
holding its lock.

CPU0 comes along and decides to trigger a NMI backtrace.  This sends
a NMI to CPU1, which takes it in the middle of the serial console
output.

With the existing solution, the NMI output will be written to the
temporary buffer, and CPU1 has finished handling the NMI it resumes
the serial console output, eventually dropping the lock.  That then
allows CPU0 to print the contents of all buffers, and we get NMI
printk output.

With this solution, as I understand it, we'll instead end up with
CPU1's printk trying to output direct to the console, and although
we've busted a couple of locks, we won't have busted the serial
console locks, so CPU1 will deadlock - and that will stop any output
what so ever.

If this is correct, then the net result is that we go from NMI with
serial console producing output to NMI with serial console being
less reliable at producing output.
David Laight Dec. 7, 2015, 3:48 p.m. UTC | #4
From: Russell King - ARM Linux

> Sent: 04 December 2015 17:13

...
> I have a slightly different view...

> 

> > > I don't see bust_spinlocks() dealing with any of these locks, so IMHO

> > > trying to make this work in NMI context strikes me as making the

> > > existing solution more unreliable on ARM systems.

> >

> > bust_spinlocks() calls printk_nmi_flush() that would call printk()

> > that would zap "lockbuf_lock" and "console_sem" when in Oops and NMI.

> > Yes, there might be more locks blocked but we try to break at least

> > the first two walls. Also zapping is allowed only once per 30 seconds,

> > see zap_locks(). Why do you think that it might make things more

> > unreliable, please?

> 

> Take the scenario where CPU1 is in the middle of a printk(), and is

> holding its lock.

> 

> CPU0 comes along and decides to trigger a NMI backtrace.  This sends

> a NMI to CPU1, which takes it in the middle of the serial console

> output.

> 

> With the existing solution, the NMI output will be written to the

> temporary buffer, and CPU1 has finished handling the NMI it resumes

> the serial console output, eventually dropping the lock.  That then

> allows CPU0 to print the contents of all buffers, and we get NMI

> printk output.


Is the traceback from inside printk() or serial console code
likely to be useful?
If not then why not get the stacktrace generated when the relevant
lock is released?
That should save any faffing with a special buffer.

	David
Petr Mladek Dec. 8, 2015, 11:21 a.m. UTC | #5
> Take the scenario where CPU1 is in the middle of a printk(), and is
> holding its lock.
> 
> CPU0 comes along and decides to trigger a NMI backtrace.  This sends
> a NMI to CPU1, which takes it in the middle of the serial console
> output.
> 
> With the existing solution, the NMI output will be written to the
> temporary buffer, and CPU1 has finished handling the NMI it resumes
> the serial console output, eventually dropping the lock.  That then
> allows CPU0 to print the contents of all buffers, and we get NMI
> printk output.
> 
> With this solution, as I understand it, we'll instead end up with
> CPU1's printk trying to output direct to the console, and although
> we've busted a couple of locks, we won't have busted the serial
> console locks, so CPU1 will deadlock - and that will stop any output
> what so ever.
> 
> If this is correct, then the net result is that we go from NMI with
> serial console producing output to NMI with serial console being
> less reliable at producing output.

You are right. I thought about it a lot and I think that the best
solution is to avoid this patch at all. I guess that it will make
Peter Zijlstra happy as well.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petr Mladek Dec. 8, 2015, 2:49 p.m. UTC | #6
On Mon 2015-12-07 15:48:33, David Laight wrote:
> From: Russell King - ARM Linux
> > Sent: 04 December 2015 17:13
> ...
> > I have a slightly different view...
> > 
> > > > I don't see bust_spinlocks() dealing with any of these locks, so IMHO
> > > > trying to make this work in NMI context strikes me as making the
> > > > existing solution more unreliable on ARM systems.
> > >
> > > bust_spinlocks() calls printk_nmi_flush() that would call printk()
> > > that would zap "lockbuf_lock" and "console_sem" when in Oops and NMI.
> > > Yes, there might be more locks blocked but we try to break at least
> > > the first two walls. Also zapping is allowed only once per 30 seconds,
> > > see zap_locks(). Why do you think that it might make things more
> > > unreliable, please?
> > 
> > Take the scenario where CPU1 is in the middle of a printk(), and is
> > holding its lock.
> > 
> > CPU0 comes along and decides to trigger a NMI backtrace.  This sends
> > a NMI to CPU1, which takes it in the middle of the serial console
> > output.
> > 
> > With the existing solution, the NMI output will be written to the
> > temporary buffer, and CPU1 has finished handling the NMI it resumes
> > the serial console output, eventually dropping the lock.  That then
> > allows CPU0 to print the contents of all buffers, and we get NMI
> > printk output.
> 
> Is the traceback from inside printk() or serial console code
> likely to be useful?

It is useful if a problem is caused by the printk or serial console
code. For example, a slow serial console might cause a soft lockup
if there are too many messages to print.


> If not then why not get the stacktrace generated when the relevant
> lock is released? That should save any faffing with a special
> buffer.

Another question is how to detect that NMI interrupted printk() code.
We would either need to analyze backtrace. Or we would need to
know which CPU took the printk() or console locks. This check
should be race-safe vs. the NMI context.


Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index ec1a30d0d11a..6566344db263 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -80,6 +80,7 @@  void bust_spinlocks(int yes)
 {
 	if (yes) {
 		oops_in_progress = 1;
+		printk_nmi_flush();
 	} else {
 		int loglevel_save = console_loglevel;
 		console_unblank();
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 05b4c09110f9..34ba760ae794 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -202,7 +202,11 @@  void __init printk_nmi_init(void)
 
 void printk_nmi_enter(void)
 {
-	this_cpu_write(printk_func, vprintk_nmi);
+	/*
+	 * We try hard to print the messages directly when Oops is in progress.
+	 */
+	if (!oops_in_progress)
+		this_cpu_write(printk_func, vprintk_nmi);
 }
 
 void printk_nmi_exit(void)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 88641c74163d..e970b623ae26 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1705,6 +1705,13 @@  asmlinkage int vprintk_emit(int facility, int level,
 	}
 
 	lockdep_off();
+	/*
+	 * Messages are passed from NMI context using an extra buffer.
+	 * The only exception is when Oops is in progress. In this case
+	 * we try hard to get them out directly.
+	 */
+	if (unlikely(oops_in_progress && in_nmi()))
+		zap_locks();
 	raw_spin_lock(&logbuf_lock);
 	logbuf_cpu = this_cpu;
 
diff --git a/lib/bust_spinlocks.c b/lib/bust_spinlocks.c
index f8e0e5367398..7a98098b0fef 100644
--- a/lib/bust_spinlocks.c
+++ b/lib/bust_spinlocks.c
@@ -20,6 +20,7 @@  void __attribute__((weak)) bust_spinlocks(int yes)
 {
 	if (yes) {
 		++oops_in_progress;
+		printk_nmi_flush();
 	} else {
 #ifdef CONFIG_VT
 		unblank_screen();
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 26caf51cc238..026e9f16e742 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -74,14 +74,22 @@  void nmi_trigger_all_cpu_backtrace(bool include_self,
 
 bool nmi_cpu_backtrace(struct pt_regs *regs)
 {
+	static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
+		/* Serialize backtraces when printing directly. */
+		if (unlikely(oops_in_progress))
+			arch_spin_lock(&lock);
+
 		pr_warn("NMI backtrace for cpu %d\n", cpu);
 		if (regs)
 			show_regs(regs);
 		else
 			dump_stack();
+
+		if (unlikely(oops_in_progress))
+			arch_spin_unlock(&lock);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}