diff mbox

[v3,4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

Message ID 20151211124159.GB3729@pathway.suse.cz (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show

Commit Message

Petr Mladek Dec. 11, 2015, 12:41 p.m. UTC
On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote:
> On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <pmladek@suse.com> wrote:
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
> >                      13 =>   8 KB for each CPU
> >                      12 =>   4 KB for each CPU
> >
> > +config NMI_LOG_BUF_SHIFT
> > +       int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> > +       range 10 21
> > +       default 13
> > +       depends on PRINTK && HAVE_NMI
> 
> Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met.

Åh, the NMI buffer is enabled on arm via NEED_PRINTK_NMI.

The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going
to fix it the following way:




Thanks a lot for report,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Arnd Bergmann Dec. 11, 2015, 12:47 p.m. UTC | #1
On Friday 11 December 2015 13:41:59 Petr Mladek wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index efcff25a112d..61cfd96a3c96 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT
>         int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
>         range 10 21
>         default 13
> -       depends on PRINTK && HAVE_NMI
> +       depends on PRINTK_NMI
>         help
>           Select the size of a per-CPU buffer where NMI messages are temporary
>           stored. They are copied to the main log buffer in a safe context
> 
> 

Acked-by: Arnd Bergmann <arnd@arndb.de>

I found this on linux-next as well today and came to the same conclusion.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven Dec. 11, 2015, 12:50 p.m. UTC | #2
On Fri, Dec 11, 2015 at 1:41 PM, Petr Mladek <pmladek@suse.com> wrote:
> On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote:
>> On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <pmladek@suse.com> wrote:
>> > --- a/init/Kconfig
>> > +++ b/init/Kconfig
>> > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
>> >                      13 =>   8 KB for each CPU
>> >                      12 =>   4 KB for each CPU
>> >
>> > +config NMI_LOG_BUF_SHIFT
>> > +       int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
>> > +       range 10 21
>> > +       default 13
>> > +       depends on PRINTK && HAVE_NMI
>>
>> Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met.
>
> Åh, the NMI buffer is enabled on arm via NEED_PRINTK_NMI.
>
> The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going
> to fix it the following way:
>
>
> diff --git a/init/Kconfig b/init/Kconfig
> index efcff25a112d..61cfd96a3c96 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT
>         int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
>         range 10 21
>         default 13
> -       depends on PRINTK && HAVE_NMI
> +       depends on PRINTK_NMI
>         help
>           Select the size of a per-CPU buffer where NMI messages are temporary
>           stored. They are copied to the main log buffer in a safe context

Makes sense, as kernel/printk/nmi.c is compiled if PRINTK_NMI is set.

Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Dec. 11, 2015, 10:57 p.m. UTC | #3
On Fri, 11 Dec 2015 13:41:59 +0100 Petr Mladek <pmladek@suse.com> wrote:

> On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote:
> > On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <pmladek@suse.com> wrote:
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
> > >                      13 =>   8 KB for each CPU
> > >                      12 =>   4 KB for each CPU
> > >
> > > +config NMI_LOG_BUF_SHIFT
> > > +       int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> > > +       range 10 21
> > > +       default 13
> > > +       depends on PRINTK && HAVE_NMI
> > 
> > Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met.
> 
> __h, the NMI buffer is enabled on arm via NEED_PRINTK_NMI.
> 
> The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going
> to fix it the following way:
> 
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index efcff25a112d..61cfd96a3c96 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT
>  	int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
>  	range 10 21
>  	default 13
> -	depends on PRINTK && HAVE_NMI
> +	depends on PRINTK_NMI
>  	help
>  	  Select the size of a per-CPU buffer where NMI messages are temporary
>  	  stored. They are copied to the main log buffer in a safe context

I'm wondering why we're building kernel/printk/nmi.o if HAVE_NMI is not
set.

	obj-$(CONFIG_PRINTK_NMI)		+= nmi.o

and

	config PRINTK_NMI
		def_bool y
		depends on PRINTK
		depends on HAVE_NMI || NEED_PRINTK_NMI

This is a bit messy.  NEED_PRINTK_NMI is an added-on hack for one
particular arm variant.  From the changelog:

   "One exception is arm where the deferred printing is used for
    printing backtraces even without NMI.  For this purpose, we define
    NEED_PRINTK_NMI Kconfig flag.  The alternative printk_func is
    explicitly set when IPI_CPU_BACKTRACE is handled."


- why does arm needs deferred printing for backtraces?

- why is this specific to CONFIG_CPU_V7M?

- can this Kconfig logic be cleaned up a bit?

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Dec. 11, 2015, 11:21 p.m. UTC | #4
On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote:
> This is a bit messy.  NEED_PRINTK_NMI is an added-on hack for one
> particular arm variant.  From the changelog:
> 
>    "One exception is arm where the deferred printing is used for
>     printing backtraces even without NMI.  For this purpose, we define
>     NEED_PRINTK_NMI Kconfig flag.  The alternative printk_func is
>     explicitly set when IPI_CPU_BACKTRACE is handled."
> 
> 
> - why does arm needs deferred printing for backtraces?
> 
> - why is this specific to CONFIG_CPU_V7M?
> 
> - can this Kconfig logic be cleaned up a bit?

I think this comes purely from this attempt to apply another round of
cleanups to the nmi backtrace work I did.

As I explained when I did that work, the vast majority of ARM platforms
are unable to trigger anything like a NMI - the FIQ is something that's
generally a property of the secure monitor, and is not accessible to
Linux.  However, there are platforms where it is accessible.

The work to add the FIQ-based variant never happened (I've no idea what
happened to that part, Daniel seems to have lost interest in working on
it.)  So, what we have is the IRQ-based variant merged in mainline, which
would be the fallback for the "FIQ not available" cases, and I carry a
local hack in my tree which provides the FIQ-based version - but if it
were to trigger, it takes out all interrupts (hence why I've not merged
my hack.)

I think the reason that the FIQ-based variant has never really happened
is that hooking into the interrupt controller code to clear down the FIQ
creates such a horrid layering violation, and also a locking mess that
I suspect it's just been given up with.

However, I've found my "hack" useful - it's turned a number of totally
undebuggable hangs (where one CPU silently hangs leaving the others
running with no way to find out where the hung CPU is) into something
that can be debugged.

Now, when we end up triggering the IRQ-based variant, we could already
be in a situation where IRQs are off for the local CPU, so the IRQ is
never delivered.  Others decided that it wasn't acceptable to wait 10sec
for the local CPU to time out, and (iirc) we'd also loose the local CPUs
backtrace in certain situations.

I'm personally happy with the existing code, and I've been wondering why
there's this effort to apply further cleanups - to me, the changelogs
don't seem to make that much sense, unless we want to start using
printk() extensively in NMI functions - using the generic nmi backtrace
code surely gets us something that works across all architectures...

I've been assuming that I've missed something, which is why I've not
said anything on that point until now.
Jiri Kosina Dec. 11, 2015, 11:26 p.m. UTC | #5
On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:

> I'm personally happy with the existing code, and I've been wondering why
> there's this effort to apply further cleanups - to me, the changelogs
> don't seem to make that much sense, unless we want to start using
> printk() extensively in NMI functions - using the generic nmi backtrace
> code surely gets us something that works across all architectures...

It is already being used extensively, and not only for all-CPU backtraces. 
For starters, please consider

- WARN_ON(in_nmi())
- BUG_ON(in_nmi())
- anything being printed out from MCE handlers
Andrew Morton Dec. 11, 2015, 11:30 p.m. UTC | #6
On Fri, 11 Dec 2015 23:21:13 +0000 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote:
> > This is a bit messy.  NEED_PRINTK_NMI is an added-on hack for one
> > particular arm variant.  From the changelog:
> > 
> >    "One exception is arm where the deferred printing is used for
> >     printing backtraces even without NMI.  For this purpose, we define
> >     NEED_PRINTK_NMI Kconfig flag.  The alternative printk_func is
> >     explicitly set when IPI_CPU_BACKTRACE is handled."
> > 
> > 
> > - why does arm needs deferred printing for backtraces?
> > 
> > - why is this specific to CONFIG_CPU_V7M?
> > 
> > - can this Kconfig logic be cleaned up a bit?
> 
> I think this comes purely from this attempt to apply another round of
> cleanups to the nmi backtrace work I did.
> 
> As I explained when I did that work, the vast majority of ARM platforms
> are unable to trigger anything like a NMI - the FIQ is something that's
> generally a property of the secure monitor, and is not accessible to
> Linux.  However, there are platforms where it is accessible.

OK, thanks.  So "not needed at present, might be needed in the future,
useful for out-of-tree debug code"?

> I'm personally happy with the existing code, and I've been wondering why
> there's this effort to apply further cleanups - to me, the changelogs
> don't seem to make that much sense, unless we want to start using
> printk() extensively in NMI functions - using the generic nmi backtrace
> code surely gets us something that works across all architectures...

Yes, I was scratching my head over that.  The patchset takes an nmi-safe
all-cpu-backtrace and generalises that into an nmi-safe printk.  That
*sounds* like a good thing to do but yes, some additional justification
would be helpful.  What real-world value does this patchset really
bring to real-world users?

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Thompson Dec. 14, 2015, 10:28 a.m. UTC | #7
On 11/12/15 23:21, Russell King - ARM Linux wrote:
> As I explained when I did that work, the vast majority of ARM platforms
> are unable to trigger anything like a NMI - the FIQ is something that's
> generally a property of the secure monitor, and is not accessible to
> Linux.  However, there are platforms where it is accessible.
>
> The work to add the FIQ-based variant never happened (I've no idea what
> happened to that part, Daniel seems to have lost interest in working on
> it.)  So, what we have is the IRQ-based variant merged in mainline, which
> would be the fallback for the "FIQ not available" cases, and I carry a
> local hack in my tree which provides the FIQ-based version - but if it
> were to trigger, it takes out all interrupts (hence why I've not merged
> my hack.)
>
> I think the reason that the FIQ-based variant has never really happened
> is that hooking into the interrupt controller code to clear down the FIQ
> creates such a horrid layering violation, and also a locking mess that
> I suspect it's just been given up with.

I haven't quite given up; I'm still looking into this stuff. However 
you're certainly right that connecting the FIQ handler to the GIC code 
in an elegant way is tough.

I've been working in parallel on an arm64 implementation with the result 
that I'm now two lumps of code that are almost, but not quite, ready.

Right now I hope to share latest arm code fairly late in the this 
devcycle (for review rather than merge) followed up with a new version 
very early in v4.6. Even now I think the code needs a long soak in -next 
just in case there are any lurking regressions on particular platforms.

I don't expect anyone to base decisions on my aspirations above but 
would like to reassure Russell that I haven't given up on it.


Daniel.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petr Mladek Dec. 15, 2015, 2:26 p.m. UTC | #8
On Fri 2015-12-11 15:30:54, Andrew Morton wrote:
> On Fri, 11 Dec 2015 23:21:13 +0000 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> 
> > On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote:
> > > This is a bit messy.  NEED_PRINTK_NMI is an added-on hack for one
> > > particular arm variant.  From the changelog:
> > > 
> > >    "One exception is arm where the deferred printing is used for
> > >     printing backtraces even without NMI.  For this purpose, we define
> > >     NEED_PRINTK_NMI Kconfig flag.  The alternative printk_func is
> > >     explicitly set when IPI_CPU_BACKTRACE is handled."
> > > 
> > > 
> > > - why does arm needs deferred printing for backtraces?
> > > 
> > > - why is this specific to CONFIG_CPU_V7M?



> > > - can this Kconfig logic be cleaned up a bit?
> > 
> > I think this comes purely from this attempt to apply another round of
> > cleanups to the nmi backtrace work I did.
> > 
> > As I explained when I did that work, the vast majority of ARM platforms
> > are unable to trigger anything like a NMI - the FIQ is something that's
> > generally a property of the secure monitor, and is not accessible to
> > Linux.  However, there are platforms where it is accessible.
> 
> OK, thanks.  So "not needed at present, might be needed in the future,
> useful for out-of-tree debug code"?

It is possible that I got it a wrong way on arm. The NMI buffer is
usable there on two locations.

First, the temporary is currently used to handle IPI_CPU_BACKTRACE.
It seems that it is not a real NMI. But it seems to be available
(compiled) on all arm system. This is why I introduced NEED_PRINTK_NMI
Kconfig flag to avoid confusion with a real NMI.

Second, there is the FIQ "NMI" handler that is called from
/arch/arm/kernel/entry-armv.S. It is compiled only if _not_
defined $(CONFIG_CPU_V7M). It calls nmi_enter() and nmi_stop().
It looks like a real NMI handler. This is why I defined HAVE_NMI
if (!CPU_V7M).

A solution would be to define HAVE_NMI on all Arm systems and get rid
of NEED_PRINTK_NMI. If you think that it would cause less confusion...


> > there's this effort to apply further cleanups - to me, the changelogs
> > don't seem to make that much sense, unless we want to start using
> > printk() extensively in NMI functions - using the generic nmi backtrace
> > code surely gets us something that works across all architectures...
> 
> Yes, I was scratching my head over that.  The patchset takes an nmi-safe
> all-cpu-backtrace and generalises that into an nmi-safe printk.  That
> *sounds* like a good thing to do but yes, some additional justification
> would be helpful.  What real-world value does this patchset really
> bring to real-world users?

The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes
all NMI messages almost[*] safe on all architectures.

Note that there already are several messages printed in NMI context.
See the mail from Jiri Kosina. They are not easy to avoid.

[*] The temporary buffer is limited. We still should keep
the number of messages in NMI context at minimum.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Dec. 17, 2015, 10:38 p.m. UTC | #9
On Tue, 15 Dec 2015 15:26:21 +0100 Petr Mladek <pmladek@suse.com> wrote:

> > OK, thanks.  So "not needed at present, might be needed in the future,
> > useful for out-of-tree debug code"?
> 
> It is possible that I got it a wrong way on arm. The NMI buffer is
> usable there on two locations.
> 
> First, the temporary is currently used to handle IPI_CPU_BACKTRACE.
> It seems that it is not a real NMI. But it seems to be available
> (compiled) on all arm system. This is why I introduced NEED_PRINTK_NMI
> Kconfig flag to avoid confusion with a real NMI.
> 
> Second, there is the FIQ "NMI" handler that is called from
> /arch/arm/kernel/entry-armv.S. It is compiled only if _not_
> defined $(CONFIG_CPU_V7M). It calls nmi_enter() and nmi_stop().
> It looks like a real NMI handler. This is why I defined HAVE_NMI
> if (!CPU_V7M).
> 
> A solution would be to define HAVE_NMI on all Arm systems and get rid
> of NEED_PRINTK_NMI. If you think that it would cause less confusion...

So does this mean that the patch will be updated?

> 
> > > there's this effort to apply further cleanups - to me, the changelogs
> > > don't seem to make that much sense, unless we want to start using
> > > printk() extensively in NMI functions - using the generic nmi backtrace
> > > code surely gets us something that works across all architectures...
> > 
> > Yes, I was scratching my head over that.  The patchset takes an nmi-safe
> > all-cpu-backtrace and generalises that into an nmi-safe printk.  That
> > *sounds* like a good thing to do but yes, some additional justification
> > would be helpful.  What real-world value does this patchset really
> > bring to real-world users?
> 
> The patchset brings two big advantages. First, it makes the NMI
> backtraces safe on all architectures for free. Second, it makes
> all NMI messages almost[*] safe on all architectures.
> 
> Note that there already are several messages printed in NMI context.
> See the mail from Jiri Kosina. They are not easy to avoid.
> 
> [*] The temporary buffer is limited. We still should keep
> the number of messages in NMI context at minimum.

This is important info - in fact a paragraph which starts with "The
patchset brings two big advantages" is *the most* important info.  I
added the below text to the [1/n] changelog:

: The patchset brings two big advantages.  First, it makes the NMI
: backtraces safe on all architectures for free.  Second, it makes all NMI
: messages almost safe on all architectures (the temporary buffer is
: limited.  We still should keep the number of messages in NMI context at
: minimum).
: 
: Note that there already are several messages printed in NMI context:
: WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
: handlers.  These are not easy to avoid.

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Thompson Dec. 18, 2015, 10:18 a.m. UTC | #10
On 11/12/15 23:26, Jiri Kosina wrote:
> On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:
>
>> I'm personally happy with the existing code, and I've been wondering why
>> there's this effort to apply further cleanups - to me, the changelogs
>> don't seem to make that much sense, unless we want to start using
>> printk() extensively in NMI functions - using the generic nmi backtrace
>> code surely gets us something that works across all architectures...
>
> It is already being used extensively, and not only for all-CPU backtraces.
> For starters, please consider
>
> - WARN_ON(in_nmi())
> - BUG_ON(in_nmi())

Sorry to join in so late but...

Today we risk deadlock when we try to issue these diagnostic errors 
directly from NMI context.

After this change we will still risk deadlock, because that's what the 
diagnostic code is trying to tell us, *and* we delay actually reporting 
the error until, and only if, the NMI handler completes.

I'm not entirely sure that this is an improvement.


> - anything being printed out from MCE handlers

The MCE handlers should only call printk() when they decide to panic and 
*after* busting the spinlocks. At this point deferring printk() until it 
is safe is not very helpful.

When we bust the spinlocks we should probably restore the normal 
printk() function to give best chance of the failure messages making it out.


Daniel.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Dec. 18, 2015, 11:29 a.m. UTC | #11
On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote:
> I'm not entirely sure that this is an improvement.

What I do these days is delete everything in vprintk_emit() and simply
call early_printk().

Kill the useless kmsg buffer crap and locking, just pound bytes to the
UART registers without anything in between.

The other semi usable solution is redirecting to trace_printk() and
recovering the trace buffers from your kdump. But I've found that
typically kdump doesn't work anymore if you properly wedge the machine.
So this is very much a second rate solution.

But this globally locked buffer, calling out to console drivers that do
locking and even scheduling, is an unreliable unfixable trainwreck that
I've given up on.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Dec. 18, 2015, 12:11 p.m. UTC | #12
On Fri, Dec 18, 2015 at 12:29:02PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote:
> > I'm not entirely sure that this is an improvement.
> 
> What I do these days is delete everything in vprintk_emit() and simply
> call early_printk().

On that, whoever made the device model use vprintk_emit() broke the
debugger (KGDB/KDB) printk intercept, and the whole vprintk_func
redirection scheme.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petr Mladek Dec. 18, 2015, 2:52 p.m. UTC | #13
On Fri 2015-12-18 10:18:08, Daniel Thompson wrote:
> On 11/12/15 23:26, Jiri Kosina wrote:
> >On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:
> >
> >>I'm personally happy with the existing code, and I've been wondering why
> >>there's this effort to apply further cleanups - to me, the changelogs
> >>don't seem to make that much sense, unless we want to start using
> >>printk() extensively in NMI functions - using the generic nmi backtrace
> >>code surely gets us something that works across all architectures...
> >
> >It is already being used extensively, and not only for all-CPU backtraces.
> >For starters, please consider
> >
> >- WARN_ON(in_nmi())
> >- BUG_ON(in_nmi())
> 
> Sorry to join in so late but...
> 
> Today we risk deadlock when we try to issue these diagnostic errors
> directly from NMI context.
> 
> After this change we will still risk deadlock, because that's what
> the diagnostic code is trying to tell us, *and* we delay actually
> reporting the error until, and only if, the NMI handler completes.

I think that NMI messages about a possible deadlock are the ones
from

    kernel/locking/rtmutex.c
    kernel/irq_work.c
    include/linux/hardirq.h

You are right that if the deadlock happens, this patch set lowers the
chance to see the message.

On the other hand, all the other printk's in NMI seems to be non-fatal
warnings. In this case, this patch set increases the chance to see
them.

A compromise might be to explicitly call printk_nmi_flush() in the few
fatal cases. Alternatively we could force the messages on the
early_console when available.


> >- anything being printed out from MCE handlers
> 
> The MCE handlers should only call printk() when they decide to panic
> and *after* busting the spinlocks. At this point deferring printk()
> until it is safe is not very helpful.
> 
> When we bust the spinlocks we should probably restore the normal
> printk() function to give best chance of the failure messages making
> it out.

The problem is that we do not know what locks need to be busted. There
are too many consoles and too many locks involved. Also busting locks
open another can of worms.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Thompson Dec. 18, 2015, 5 p.m. UTC | #14
On 18/12/15 14:52, Petr Mladek wrote:
> On Fri 2015-12-18 10:18:08, Daniel Thompson wrote:
>> On 11/12/15 23:26, Jiri Kosina wrote:
>>> On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:
>>>
>>>> I'm personally happy with the existing code, and I've been wondering why
>>>> there's this effort to apply further cleanups - to me, the changelogs
>>>> don't seem to make that much sense, unless we want to start using
>>>> printk() extensively in NMI functions - using the generic nmi backtrace
>>>> code surely gets us something that works across all architectures...
>>>
>>> It is already being used extensively, and not only for all-CPU backtraces.
>>> For starters, please consider
>>>
>>> - WARN_ON(in_nmi())
>>> - BUG_ON(in_nmi())
>>
>> Sorry to join in so late but...
>>
>> Today we risk deadlock when we try to issue these diagnostic errors
>> directly from NMI context.
>>
>> After this change we will still risk deadlock, because that's what
>> the diagnostic code is trying to tell us, *and* we delay actually
>> reporting the error until, and only if, the NMI handler completes.
>
> I think that NMI messages about a possible deadlock are the ones
> from
>
>      kernel/locking/rtmutex.c
>      kernel/irq_work.c
>      include/linux/hardirq.h
>
> You are right that if the deadlock happens, this patch set lowers the
> chance to see the message.
>
> On the other hand, all the other printk's in NMI seems to be non-fatal
> warnings. In this case, this patch set increases the chance to see
> them.

Maybe for a WARN_ON() the trade off is worth it but I don't think a 
BUG_ON() trace would ever make it out.


> A compromise might be to explicitly call printk_nmi_flush() in the few
> fatal cases. Alternatively we could force the messages on the
> early_console when available.
>
>
>>> - anything being printed out from MCE handlers
>>
>> The MCE handlers should only call printk() when they decide to panic
>> and *after* busting the spinlocks. At this point deferring printk()
>> until it is safe is not very helpful.
>>
>> When we bust the spinlocks we should probably restore the normal
>> printk() function to give best chance of the failure messages making
>> it out.
>
> The problem is that we do not know what locks need to be busted. There
> are too many consoles and too many locks involved. Also busting locks
> open another can of worms.

Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock.

Probably I've been placing too much weight on the importance of getting 
messages out when dying. You're right that surviving far enough through 
a panic to trigger kdump or reset is equally (or more) important in many 
scenarios than getting a failure message out.

However on a system with nothing but "while(1) {}" hooked up to panic() 
then its worth risking a lock up. In this case restoring normal printk() 
behavior and dumping the NMI buffers would be worthwhile.


Daniel.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Dec. 18, 2015, 11:03 p.m. UTC | #15
On Fri, 18 Dec 2015 13:11:41 +0100 Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Dec 18, 2015 at 12:29:02PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote:
> > > I'm not entirely sure that this is an improvement.
> > 
> > What I do these days is delete everything in vprintk_emit() and simply
> > call early_printk().
> 
> On that, whoever made the device model use vprintk_emit() broke the
> debugger (KGDB/KDB) printk intercept, and the whole vprintk_func
> redirection scheme.

crap, we have a whole set of interfaces which are broken this way. 
printk_emit(), vprintk(), vprintk_emit().


commit 7ff9554bb578ba02166071d2d487b7fc7d860d62
Author:     Kay Sievers <kay@vrfy.org>
AuthorDate: Thu May 3 02:29:13 2012 +0200
Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CommitDate: Mon May 7 16:53:02 2012 -0700

    printk: convert byte-buffer to variable-length record buffer

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Thompson March 1, 2016, 2:04 p.m. UTC | #16
On 18/12/15 17:00, Daniel Thompson wrote:
>>> The MCE handlers should only call printk() when they decide to panic
>>> and *after* busting the spinlocks. At this point deferring printk()
>>> until it is safe is not very helpful.
>>>
>>> When we bust the spinlocks we should probably restore the normal
>>> printk() function to give best chance of the failure messages making
>>> it out.
>>
>> The problem is that we do not know what locks need to be busted. There
>> are too many consoles and too many locks involved. Also busting locks
>> open another can of worms.
>
> Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock.
>
> Probably I've been placing too much weight on the importance of getting
> messages out when dying. You're right that surviving far enough through
> a panic to trigger kdump or reset is equally (or more) important in many
> scenarios than getting a failure message out.
>
> However on a system with nothing but "while(1) {}" hooked up to panic()
> then its worth risking a lock up. In this case restoring normal printk()
> behavior and dumping the NMI buffers would be worthwhile.

An a (much) later thread[1] Andrew Morton described this comment as 
non-committal. Sorry for that.

I don't object to the overall approach taken by Petr, merely that I 
think there are cases where the current patchset doesn't put in quite 
enough effort to issue messages.

Panic triggered during NMI is the only case I can think of and that, I 
think, could be addressed by adding an extra call to printk_nmi_flush() 
during panic(). It should probably only cover cases where we *don't* 
call kdump and the panic handler doesn't restart the machine... so just 
after the pr_emerg("...end kernel panic") would be OK for me.


Daniel.


[1]: http://thread.gmane.org/gmane.linux.ports.arm.kernel/482845

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/init/Kconfig b/init/Kconfig
index efcff25a112d..61cfd96a3c96 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -870,7 +870,7 @@  config NMI_LOG_BUF_SHIFT
 	int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
 	range 10 21
 	default 13
-	depends on PRINTK && HAVE_NMI
+	depends on PRINTK_NMI
 	help
 	  Select the size of a per-CPU buffer where NMI messages are temporary
 	  stored. They are copied to the main log buffer in a safe context