diff mbox series

arm64: kexec: flush log to console in nmi_panic()

Message ID 20210617125023.7288-1-shijie@os.amperecomputing.com (mailing list archive)
State New, archived
Headers show
Series arm64: kexec: flush log to console in nmi_panic() | expand

Commit Message

Huang Shijie June 17, 2021, 12:50 p.m. UTC
If kdump is configured, nmi_panic() may run to machine_kexec().

But in NMI context, the log is put in PER-CPU nmi_print_seq.
So we can not see any log on the console since we entered the NMI context,
such as the "Bye!" in previous line.

This patch fixes this issue by two steps:
	1) Uses printk_safe_flush_on_panic() to flush the log from
             nmi_print_seq to global printk ring buffer,
        2) Then uses console_flush_on_panic() to flush to console.

After this patch, we can see the "Bye!" log in the panic console.

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
 arch/arm64/kernel/machine_kexec.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Will Deacon June 17, 2021, 5:52 p.m. UTC | #1
On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote:
> If kdump is configured, nmi_panic() may run to machine_kexec().
> 
> But in NMI context, the log is put in PER-CPU nmi_print_seq.
> So we can not see any log on the console since we entered the NMI context,
> such as the "Bye!" in previous line.
> 
> This patch fixes this issue by two steps:
> 	1) Uses printk_safe_flush_on_panic() to flush the log from
>              nmi_print_seq to global printk ring buffer,
>         2) Then uses console_flush_on_panic() to flush to console.
> 
> After this patch, we can see the "Bye!" log in the panic console.

Does it matter? I'd be more inclined to remove the print altogether...

Will
Pasha Tatashin June 17, 2021, 5:55 p.m. UTC | #2
On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote:
> > If kdump is configured, nmi_panic() may run to machine_kexec().
> >
> > But in NMI context, the log is put in PER-CPU nmi_print_seq.
> > So we can not see any log on the console since we entered the NMI context,
> > such as the "Bye!" in previous line.
> >
> > This patch fixes this issue by two steps:
> >       1) Uses printk_safe_flush_on_panic() to flush the log from
> >              nmi_print_seq to global printk ring buffer,
> >         2) Then uses console_flush_on_panic() to flush to console.
> >
> > After this patch, we can see the "Bye!" log in the panic console.
>
> Does it matter? I'd be more inclined to remove the print altogether...

I agree, the print could be removed entirely. But, my assumption was
that this patch meant to flush other buffered prints beside this last
"Bye" one.

>
> Will
Will Deacon June 17, 2021, 5:58 p.m. UTC | #3
On Thu, Jun 17, 2021 at 01:55:08PM -0400, Pavel Tatashin wrote:
> On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote:
> > > If kdump is configured, nmi_panic() may run to machine_kexec().
> > >
> > > But in NMI context, the log is put in PER-CPU nmi_print_seq.
> > > So we can not see any log on the console since we entered the NMI context,
> > > such as the "Bye!" in previous line.
> > >
> > > This patch fixes this issue by two steps:
> > >       1) Uses printk_safe_flush_on_panic() to flush the log from
> > >              nmi_print_seq to global printk ring buffer,
> > >         2) Then uses console_flush_on_panic() to flush to console.
> > >
> > > After this patch, we can see the "Bye!" log in the panic console.
> >
> > Does it matter? I'd be more inclined to remove the print altogether...
> 
> I agree, the print could be removed entirely. But, my assumption was
> that this patch meant to flush other buffered prints beside this last
> "Bye" one.

That sounds like something which should be done in the core code, rather
than the in the architecture backend (and looks like panic() might do this
already?)

Will
Huang Shijie June 18, 2021, 9:03 a.m. UTC | #4
On Thu, Jun 17, 2021 at 06:58:23PM +0100, Will Deacon wrote:
> On Thu, Jun 17, 2021 at 01:55:08PM -0400, Pavel Tatashin wrote:
> > On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote:
> > > > If kdump is configured, nmi_panic() may run to machine_kexec().
> > > >
> > > > But in NMI context, the log is put in PER-CPU nmi_print_seq.
> > > > So we can not see any log on the console since we entered the NMI context,
> > > > such as the "Bye!" in previous line.
> > > >
> > > > This patch fixes this issue by two steps:
> > > >       1) Uses printk_safe_flush_on_panic() to flush the log from
> > > >              nmi_print_seq to global printk ring buffer,
> > > >         2) Then uses console_flush_on_panic() to flush to console.
> > > >
> > > > After this patch, we can see the "Bye!" log in the panic console.
> > >
> > > Does it matter? I'd be more inclined to remove the print altogether...
We may remove the log in the arm64 code.

But in the panic() itself, it still has many log, such as:

	..............
	pr_emerg("Kernel panic - not syncing: %s\n", buf);
	..............
	dump_stack();
	..............
	kdb_printf("PANIC: %s\n", msg);

Without this patch, all these log above will loss..

> > 
> > I agree, the print could be removed entirely. But, my assumption was
> > that this patch meant to flush other buffered prints beside this last
> > "Bye" one.
> 
> That sounds like something which should be done in the core code, rather
> than the in the architecture backend (and looks like panic() might do this
> already?)
In the non-kdump code path, the core code will take care of it, please read the
code in panic().

But in the kdump code path, the architecture code should take care of it.

Thanks
Huang Shijie
Will Deacon June 21, 2021, 10:08 a.m. UTC | #5
On Fri, Jun 18, 2021 at 09:03:26AM +0000, Huang Shijie wrote:
> On Thu, Jun 17, 2021 at 06:58:23PM +0100, Will Deacon wrote:
> > On Thu, Jun 17, 2021 at 01:55:08PM -0400, Pavel Tatashin wrote:
> > > On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote:
> > > > > If kdump is configured, nmi_panic() may run to machine_kexec().
> > > > >
> > > > > But in NMI context, the log is put in PER-CPU nmi_print_seq.
> > > > > So we can not see any log on the console since we entered the NMI context,
> > > > > such as the "Bye!" in previous line.
> > > > >
> > > > > This patch fixes this issue by two steps:
> > > > >       1) Uses printk_safe_flush_on_panic() to flush the log from
> > > > >              nmi_print_seq to global printk ring buffer,
> > > > >         2) Then uses console_flush_on_panic() to flush to console.
> > > > >
> > > > > After this patch, we can see the "Bye!" log in the panic console.
> > > >
> > > > Does it matter? I'd be more inclined to remove the print altogether...
> We may remove the log in the arm64 code.
> 
> But in the panic() itself, it still has many log, such as:
> 
> 	..............
> 	pr_emerg("Kernel panic - not syncing: %s\n", buf);
> 	..............
> 	dump_stack();
> 	..............
> 	kdb_printf("PANIC: %s\n", msg);
> 
> Without this patch, all these log above will loss..
> 
> > > 
> > > I agree, the print could be removed entirely. But, my assumption was
> > > that this patch meant to flush other buffered prints beside this last
> > > "Bye" one.
> > 
> > That sounds like something which should be done in the core code, rather
> > than the in the architecture backend (and looks like panic() might do this
> > already?)
> In the non-kdump code path, the core code will take care of it, please read the
> code in panic().
> 
> But in the kdump code path, the architecture code should take care of it.

Why the discrepancy? Wouldn't it make more sense to do this in panic() for
both cases, if the prints that we want to display are coming from panic()
itself?

Will
Huang Shijie June 22, 2021, 10:14 a.m. UTC | #6
Hi Will,
On Mon, Jun 21, 2021 at 11:08:37AM +0100, Will Deacon wrote:
> > > That sounds like something which should be done in the core code, rather
> > > than the in the architecture backend (and looks like panic() might do this
> > > already?)
> > In the non-kdump code path, the core code will take care of it, please read the
> > code in panic().
> > 
> > But in the kdump code path, the architecture code should take care of it.
> 
> Why the discrepancy? Wouldn't it make more sense to do this in panic() for
> both cases, if the prints that we want to display are coming from panic()
> itself?

In the kdump code path, code call like this:
	panic() -->__crash_kexec() --> machine_kexec();

When we reach arm64's machine_kexec(), it means we can __NOT__ return to the panic(), we will run
to the kdump linux kernel by cpu_soft_restart().

So we can not depend the panic() to print the log. :)
	
By the way, I quote part of the arm64 log after we enter __crash_kexec() in NMI context:
	1.) the log in machine_crash_shutdown()
	      ..............
		pr_crit("SMP: stopping secondary CPUs\n");
	      ..............
		pr_info("Starting crashdump kernel...\n");
	      ..............

        2.) the log in machine_kexec()

	      ..............
		WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()),
			"Some CPUs may be stale, kdump will be unreliable.\n");
	      ..............
		the logs in kexec_segment_flush(kimage);
	      ..............
		pr_info("Bye!\n");


We cannot remove them all, and need to flush all the logs above to console in the NMI context.		


Thanks
Huang Shijie
diff mbox series

Patch

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 213d56c14f60..0ab841dab9db 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -6,6 +6,7 @@ 
  * Copyright (C) Huawei Futurewei Technologies.
  */
 
+#include <linux/console.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
 #include <linux/kernel.h>
@@ -189,6 +190,12 @@  void machine_kexec(struct kimage *kimage)
 
 	pr_info("Bye!\n");
 
+	if (in_nmi()) {
+		/* Flush the log to console if we are in NMI context */
+		printk_safe_flush_on_panic();
+		console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+	}
+
 	local_daif_mask();
 
 	/*