diff mbox series

panic: Ensure preemption is disabled during panic()

Message ID 20191002123538.22609-1-will@kernel.org (mailing list archive)
State Mainlined
Commit 20bb759a66be52cf4a9ddd17fddaf509e11490cd
Headers show
Series panic: Ensure preemption is disabled during panic() | expand

Commit Message

Will Deacon Oct. 2, 2019, 12:35 p.m. UTC
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled. From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned. This
is easily reproducible on arm64 when booting with "nosmp" on the command
line; a couple of shell scripts print out a periodic "Ping" message
whilst another triggers a crash by writing to /proc/sysrq-trigger:

  | sysrq: Trigger a crash
  | Kernel panic - not syncing: sysrq triggered crash
  | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x148
  |  show_stack+0x14/0x20
  |  dump_stack+0xa0/0xc4
  |  panic+0x140/0x32c
  |  sysrq_handle_reboot+0x0/0x20
  |  __handle_sysrq+0x124/0x190
  |  write_sysrq_trigger+0x64/0x88
  |  proc_reg_write+0x60/0xa8
  |  __vfs_write+0x18/0x40
  |  vfs_write+0xa4/0x1b8
  |  ksys_write+0x64/0xf0
  |  __arm64_sys_write+0x14/0x20
  |  el0_svc_common.constprop.0+0xb0/0x168
  |  el0_svc_handler+0x28/0x78
  |  el0_svc+0x8/0xc
  | Kernel Offset: disabled
  | CPU features: 0x0002,24002004
  | Memory Limit: none
  | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
  |  Ping 2!
  |  Ping 1!
  |  Ping 1!
  |  Ping 2!

The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
local interrupts are disabled in 'smp_send_stop()'.

Disable preemption in 'panic()' before re-enabling interrupts.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Reported-by: Xogium <contact@xogium.me>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/panic.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Kees Cook Oct. 2, 2019, 8:58 p.m. UTC | #1
On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> calling CPU in an infinite loop, but with interrupts and preemption
> enabled. From this state, userspace can continue to be scheduled,
> despite the system being "dead" as far as the kernel is concerned. This
> is easily reproducible on arm64 when booting with "nosmp" on the command
> line; a couple of shell scripts print out a periodic "Ping" message
> whilst another triggers a crash by writing to /proc/sysrq-trigger:
> 
>   | sysrq: Trigger a crash
>   | Kernel panic - not syncing: sysrq triggered crash
>   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
>   | Hardware name: linux,dummy-virt (DT)
>   | Call trace:
>   |  dump_backtrace+0x0/0x148
>   |  show_stack+0x14/0x20
>   |  dump_stack+0xa0/0xc4
>   |  panic+0x140/0x32c
>   |  sysrq_handle_reboot+0x0/0x20
>   |  __handle_sysrq+0x124/0x190
>   |  write_sysrq_trigger+0x64/0x88
>   |  proc_reg_write+0x60/0xa8
>   |  __vfs_write+0x18/0x40
>   |  vfs_write+0xa4/0x1b8
>   |  ksys_write+0x64/0xf0
>   |  __arm64_sys_write+0x14/0x20
>   |  el0_svc_common.constprop.0+0xb0/0x168
>   |  el0_svc_handler+0x28/0x78
>   |  el0_svc+0x8/0xc
>   | Kernel Offset: disabled
>   | CPU features: 0x0002,24002004
>   | Memory Limit: none
>   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
>   |  Ping 2!
>   |  Ping 1!
>   |  Ping 1!
>   |  Ping 2!
> 
> The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> local interrupts are disabled in 'smp_send_stop()'.
> 
> Disable preemption in 'panic()' before re-enabling interrupts.

Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
avoid the extra noise dmesg") was trying to fix?

Either way:

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> 
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <stable@vger.kernel.org>
> Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
> Reported-by: Xogium <contact@xogium.me>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  kernel/panic.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 47e8ebccc22b..f470a038b05b 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
>  	 * after setting panic_cpu) from invoking panic() again.
>  	 */
>  	local_irq_disable();
> +	preempt_disable_notrace();
>  
>  	/*
>  	 * It's possible to come here directly from a panic-assertion and
> -- 
> 2.23.0.444.g18eeb5a265-goog
>
Andrew Morton Oct. 2, 2019, 9:45 p.m. UTC | #2
On Wed,  2 Oct 2019 13:35:38 +0100 Will Deacon <will@kernel.org> wrote:

> Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> calling CPU in an infinite loop, but with interrupts and preemption
> enabled. From this state, userspace can continue to be scheduled,
> despite the system being "dead" as far as the kernel is concerned. This
> is easily reproducible on arm64 when booting with "nosmp" on the command
> line; a couple of shell scripts print out a periodic "Ping" message
> whilst another triggers a crash by writing to /proc/sysrq-trigger:
> 
>   | sysrq: Trigger a crash
>   | Kernel panic - not syncing: sysrq triggered crash
>   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
>   | Hardware name: linux,dummy-virt (DT)
>   | Call trace:
>   |  dump_backtrace+0x0/0x148
>   |  show_stack+0x14/0x20
>   |  dump_stack+0xa0/0xc4
>   |  panic+0x140/0x32c
>   |  sysrq_handle_reboot+0x0/0x20
>   |  __handle_sysrq+0x124/0x190
>   |  write_sysrq_trigger+0x64/0x88
>   |  proc_reg_write+0x60/0xa8
>   |  __vfs_write+0x18/0x40
>   |  vfs_write+0xa4/0x1b8
>   |  ksys_write+0x64/0xf0
>   |  __arm64_sys_write+0x14/0x20
>   |  el0_svc_common.constprop.0+0xb0/0x168
>   |  el0_svc_handler+0x28/0x78
>   |  el0_svc+0x8/0xc
>   | Kernel Offset: disabled
>   | CPU features: 0x0002,24002004
>   | Memory Limit: none
>   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
>   |  Ping 2!
>   |  Ping 1!
>   |  Ping 1!
>   |  Ping 2!
> 
> The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> local interrupts are disabled in 'smp_send_stop()'.
> 
> Disable preemption in 'panic()' before re-enabling interrupts.
> 
> ...
>
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
>  	 * after setting panic_cpu) from invoking panic() again.
>  	 */
>  	local_irq_disable();
> +	preempt_disable_notrace();
>  
>  	/*
>  	 * It's possible to come here directly from a panic-assertion and

We still do a lot of stuff (kexec, kgdb, etc) after this
preempt_disable() and I worry that something in there will now trigger
a might_sleep() warning as a result?
Will Deacon Oct. 3, 2019, 8:53 p.m. UTC | #3
Hi Andrew,

Thanks for having a look.

On Wed, Oct 02, 2019 at 02:45:58PM -0700, Andrew Morton wrote:
> On Wed,  2 Oct 2019 13:35:38 +0100 Will Deacon <will@kernel.org> wrote:
> > Disable preemption in 'panic()' before re-enabling interrupts.
> > 
> > ...
> >
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
> >  	 * after setting panic_cpu) from invoking panic() again.
> >  	 */
> >  	local_irq_disable();
> > +	preempt_disable_notrace();
> >  
> >  	/*
> >  	 * It's possible to come here directly from a panic-assertion and
> 
> We still do a lot of stuff (kexec, kgdb, etc) after this
> preempt_disable() and I worry that something in there will now trigger
> a might_sleep() warning as a result?

Given that interrupts are already disabled at this point, I don't think
we'll get any additional warnings here by disabling preemption as well.

Will
Will Deacon Oct. 3, 2019, 8:56 p.m. UTC | #4
Hi Kees,

On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
> On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> > Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> > calling CPU in an infinite loop, but with interrupts and preemption
> > enabled. From this state, userspace can continue to be scheduled,
> > despite the system being "dead" as far as the kernel is concerned. This
> > is easily reproducible on arm64 when booting with "nosmp" on the command
> > line; a couple of shell scripts print out a periodic "Ping" message
> > whilst another triggers a crash by writing to /proc/sysrq-trigger:
> > 
> >   | sysrq: Trigger a crash
> >   | Kernel panic - not syncing: sysrq triggered crash
> >   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
> >   | Hardware name: linux,dummy-virt (DT)
> >   | Call trace:
> >   |  dump_backtrace+0x0/0x148
> >   |  show_stack+0x14/0x20
> >   |  dump_stack+0xa0/0xc4
> >   |  panic+0x140/0x32c
> >   |  sysrq_handle_reboot+0x0/0x20
> >   |  __handle_sysrq+0x124/0x190
> >   |  write_sysrq_trigger+0x64/0x88
> >   |  proc_reg_write+0x60/0xa8
> >   |  __vfs_write+0x18/0x40
> >   |  vfs_write+0xa4/0x1b8
> >   |  ksys_write+0x64/0xf0
> >   |  __arm64_sys_write+0x14/0x20
> >   |  el0_svc_common.constprop.0+0xb0/0x168
> >   |  el0_svc_handler+0x28/0x78
> >   |  el0_svc+0x8/0xc
> >   | Kernel Offset: disabled
> >   | CPU features: 0x0002,24002004
> >   | Memory Limit: none
> >   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
> >   |  Ping 2!
> >   |  Ping 1!
> >   |  Ping 1!
> >   |  Ping 2!
> > 
> > The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> > local interrupts are disabled in 'smp_send_stop()'.
> > 
> > Disable preemption in 'panic()' before re-enabling interrupts.
> 
> Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
> avoid the extra noise dmesg") was trying to fix?

Hmm, maybe, although that looks like it's focussed more on irq handling
than preemption. I've deliberately left the irq part alone, since I think
having magic sysrq work via the keyboard interrupt is desirable from the
panic loop.

> Reviewed-by: Kees Cook <keescook@chromium.org>

Thanks!

Will
Petr Mladek Oct. 4, 2019, 9:11 a.m. UTC | #5
On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> Hi Kees,
> 
> On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
> > On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> > > Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> > > calling CPU in an infinite loop, but with interrupts and preemption
> > > enabled. From this state, userspace can continue to be scheduled,
> > > despite the system being "dead" as far as the kernel is concerned. This
> > > is easily reproducible on arm64 when booting with "nosmp" on the command
> > > line; a couple of shell scripts print out a periodic "Ping" message
> > > whilst another triggers a crash by writing to /proc/sysrq-trigger:
> > > 
> > >   | sysrq: Trigger a crash
> > >   | Kernel panic - not syncing: sysrq triggered crash
> > >   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
> > >   | Hardware name: linux,dummy-virt (DT)
> > >   | Call trace:
> > >   |  dump_backtrace+0x0/0x148
> > >   |  show_stack+0x14/0x20
> > >   |  dump_stack+0xa0/0xc4
> > >   |  panic+0x140/0x32c
> > >   |  sysrq_handle_reboot+0x0/0x20
> > >   |  __handle_sysrq+0x124/0x190
> > >   |  write_sysrq_trigger+0x64/0x88
> > >   |  proc_reg_write+0x60/0xa8
> > >   |  __vfs_write+0x18/0x40
> > >   |  vfs_write+0xa4/0x1b8
> > >   |  ksys_write+0x64/0xf0
> > >   |  __arm64_sys_write+0x14/0x20
> > >   |  el0_svc_common.constprop.0+0xb0/0x168
> > >   |  el0_svc_handler+0x28/0x78
> > >   |  el0_svc+0x8/0xc
> > >   | Kernel Offset: disabled
> > >   | CPU features: 0x0002,24002004
> > >   | Memory Limit: none
> > >   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
> > >   |  Ping 2!
> > >   |  Ping 1!
> > >   |  Ping 1!
> > >   |  Ping 2!
> > > 
> > > The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> > > local interrupts are disabled in 'smp_send_stop()'.
> > > 
> > > Disable preemption in 'panic()' before re-enabling interrupts.
> > 
> > Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
> > avoid the extra noise dmesg") was trying to fix?
> 
> Hmm, maybe, although that looks like it's focussed more on irq handling
> than preemption.

Exactly, the backtrace mentioned in commit c39ea0b9dd24 ("panic: avoid
the extra noise dmesg") is printed by wake_up() called from
wake_up_klogd_work_func(). It is irq_work. Therefore disabling
preemption would not prevent this.


> I've deliberately left the irq part alone, since I think
> having magic sysrq work via the keyboard interrupt is desirable from the
> panic loop.

I agree that we should keep sysrq working.

One pity thing is that led_panic_blink() in
leds/drivers/trigger/ledtrig-panic.c uses workqueues:

  + led_panic_blink()
    + led_trigger_event()
      + led_set_brightness()
	+ schedule_work()

It means that it depends on the scheduler. I guess that it
does not work in many panic situations. But this patch
will always block it.

I agree that it is strange that userspace still works at
this stage. But does it cause any real problems?

Best Regards,
Petr
Russell King (Oracle) Oct. 4, 2019, 9:29 a.m. UTC | #6
On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > Hi Kees,
> > 
> > On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
> > > On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> > > > Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> > > > calling CPU in an infinite loop, but with interrupts and preemption
> > > > enabled. From this state, userspace can continue to be scheduled,
> > > > despite the system being "dead" as far as the kernel is concerned. This
> > > > is easily reproducible on arm64 when booting with "nosmp" on the command
> > > > line; a couple of shell scripts print out a periodic "Ping" message
> > > > whilst another triggers a crash by writing to /proc/sysrq-trigger:
> > > > 
> > > >   | sysrq: Trigger a crash
> > > >   | Kernel panic - not syncing: sysrq triggered crash
> > > >   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
> > > >   | Hardware name: linux,dummy-virt (DT)
> > > >   | Call trace:
> > > >   |  dump_backtrace+0x0/0x148
> > > >   |  show_stack+0x14/0x20
> > > >   |  dump_stack+0xa0/0xc4
> > > >   |  panic+0x140/0x32c
> > > >   |  sysrq_handle_reboot+0x0/0x20
> > > >   |  __handle_sysrq+0x124/0x190
> > > >   |  write_sysrq_trigger+0x64/0x88
> > > >   |  proc_reg_write+0x60/0xa8
> > > >   |  __vfs_write+0x18/0x40
> > > >   |  vfs_write+0xa4/0x1b8
> > > >   |  ksys_write+0x64/0xf0
> > > >   |  __arm64_sys_write+0x14/0x20
> > > >   |  el0_svc_common.constprop.0+0xb0/0x168
> > > >   |  el0_svc_handler+0x28/0x78
> > > >   |  el0_svc+0x8/0xc
> > > >   | Kernel Offset: disabled
> > > >   | CPU features: 0x0002,24002004
> > > >   | Memory Limit: none
> > > >   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
> > > >   |  Ping 2!
> > > >   |  Ping 1!
> > > >   |  Ping 1!
> > > >   |  Ping 2!
> > > > 
> > > > The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> > > > local interrupts are disabled in 'smp_send_stop()'.
> > > > 
> > > > Disable preemption in 'panic()' before re-enabling interrupts.
> > > 
> > > Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
> > > avoid the extra noise dmesg") was trying to fix?
> > 
> > Hmm, maybe, although that looks like it's focussed more on irq handling
> > than preemption.
> 
> Exactly, the backtrace mentioned in commit c39ea0b9dd24 ("panic: avoid
> the extra noise dmesg") is printed by wake_up() called from
> wake_up_klogd_work_func(). It is irq_work. Therefore disabling
> preemption would not prevent this.
> 
> 
> > I've deliberately left the irq part alone, since I think
> > having magic sysrq work via the keyboard interrupt is desirable from the
> > panic loop.
> 
> I agree that we should keep sysrq working.
> 
> One pity thing is that led_panic_blink() in
> leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> 
>   + led_panic_blink()
>     + led_trigger_event()
>       + led_set_brightness()
> 	+ schedule_work()
> 
> It means that it depends on the scheduler. I guess that it
> does not work in many panic situations. But this patch
> will always block it.
> 
> I agree that it is strange that userspace still works at
> this stage. But does it cause any real problems?

Yes, there are watchdog drivers that continue to pat their watchdog
after the kernel has panic'd.  It makes watchdogs useless (which is
exactly how this problem was discovered.)
Will Deacon Oct. 4, 2019, 10:49 a.m. UTC | #7
On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
> On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> > On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > > I've deliberately left the irq part alone, since I think
> > > having magic sysrq work via the keyboard interrupt is desirable from the
> > > panic loop.
> > 
> > I agree that we should keep sysrq working.
> > 
> > One pity thing is that led_panic_blink() in
> > leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> > 
> >   + led_panic_blink()
> >     + led_trigger_event()
> >       + led_set_brightness()
> > 	+ schedule_work()
> > 
> > It means that it depends on the scheduler. I guess that it
> > does not work in many panic situations. But this patch
> > will always block it.
> > 
> > I agree that it is strange that userspace still works at
> > this stage. But does it cause any real problems?
> 
> Yes, there are watchdog drivers that continue to pat their watchdog
> after the kernel has panic'd.  It makes watchdogs useless (which is
> exactly how this problem was discovered.)

Indeed, and I think the LED blinking is already unreliable if the
brightness operation needs to sleep. For example, if the kernel isn't
preemptible or the work gets queued up on a different CPU which is
sitting in panic_smp_self_stop().

Will
Petr Mladek Oct. 4, 2019, 11:15 a.m. UTC | #8
On Fri 2019-10-04 11:49:48, Will Deacon wrote:
> On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
> > On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> > > On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > > > I've deliberately left the irq part alone, since I think
> > > > having magic sysrq work via the keyboard interrupt is desirable from the
> > > > panic loop.
> > > 
> > > I agree that we should keep sysrq working.
> > > 
> > > One pity thing is that led_panic_blink() in
> > > leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> > > 
> > >   + led_panic_blink()
> > >     + led_trigger_event()
> > >       + led_set_brightness()
> > > 	+ schedule_work()
> > > 
> > > It means that it depends on the scheduler. I guess that it
> > > does not work in many panic situations. But this patch
> > > will always block it.
> > > 
> > > I agree that it is strange that userspace still works at
> > > this stage. But does it cause any real problems?
> > 
> > Yes, there are watchdog drivers that continue to pat their watchdog
> > after the kernel has panic'd.  It makes watchdogs useless (which is
> > exactly how this problem was discovered.)
> 
> Indeed, and I think the LED blinking is already unreliable if the
> brightness operation needs to sleep. For example, if the kernel isn't
> preemptible or the work gets queued up on a different CPU which is
> sitting in panic_smp_self_stop().

To make it clear. I do not want to block this patch. I just wanted
to point out the problem. I am not sure how the blinking is important
these days. Well, I could imagine that it might be useful on some
embedded devices.

Another question is how many people want to end up with dead system
these days. The watchdogs are likely used in data centers. I guess
that automatic reboot in panic() is a better choice there.

Anyway, it might make sense to remove the panic blinking code when
it will not have a chance to work.

Best Regards,
Petr
Feng Tang Oct. 4, 2019, 1:51 p.m. UTC | #9
On Fri, Oct 04, 2019 at 01:15:21PM +0200, Petr Mladek wrote:
> On Fri 2019-10-04 11:49:48, Will Deacon wrote:
> > On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
> > > On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> > > > On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > > > > I've deliberately left the irq part alone, since I think
> > > > > having magic sysrq work via the keyboard interrupt is desirable from the
> > > > > panic loop.
> > > > 
> > > > I agree that we should keep sysrq working.
> > > > 
> > > > One pity thing is that led_panic_blink() in
> > > > leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> > > > 
> > > >   + led_panic_blink()
> > > >     + led_trigger_event()
> > > >       + led_set_brightness()
> > > > 	+ schedule_work()
> > > > 
> > > > It means that it depends on the scheduler. I guess that it
> > > > does not work in many panic situations. But this patch
> > > > will always block it.
> > > > 
> > > > I agree that it is strange that userspace still works at
> > > > this stage. But does it cause any real problems?
> > > 
> > > Yes, there are watchdog drivers that continue to pat their watchdog
> > > after the kernel has panic'd.  It makes watchdogs useless (which is
> > > exactly how this problem was discovered.)
> > 
> > Indeed, and I think the LED blinking is already unreliable if the
> > brightness operation needs to sleep. For example, if the kernel isn't
> > preemptible or the work gets queued up on a different CPU which is
> > sitting in panic_smp_self_stop().
> 
> To make it clear. I do not want to block this patch. I just wanted
> to point out the problem. I am not sure how the blinking is important
> these days. Well, I could imagine that it might be useful on some
> embedded devices.

When reviewing the c39ea0b9dd24 ("panic: avoid the extra noise dmesg"),
there was similar discussion about what's the expectation for kernel
when panic happens, as the earlier version patch is simply keeping the
the local irq disabled, which may break the sysrq and the panic blink
code, so we turned to handling it together with printk.

> 
> Another question is how many people want to end up with dead system
> these days. The watchdogs are likely used in data centers. I guess
> that automatic reboot in panic() is a better choice there.
> 
> Anyway, it might make sense to remove the panic blinking code when
> it will not have a chance to work.

I was also wondering if the panic blinking code still really works
on any platforms.

Thanks,
Feng

> 
> Best Regards,
> Petr
Jiri Kosina Oct. 7, 2019, 8:02 a.m. UTC | #10
On Fri, 4 Oct 2019, Will Deacon wrote:

> Indeed, and I think the LED blinking is already unreliable if the
> brightness operation needs to sleep. 

One thing is that led_set_brightness() can probably be forced to avoid the 
workqueue scheduling, by setting LED_BLINK_SW on the device (e.g. by 
issuing led_set_software_blink() during panic).

But I am afraid this still won't solve the issue completely, as USB 
keyboards need workqueues for blinking the LEDs in for URB management.
diff mbox series

Patch

diff --git a/kernel/panic.c b/kernel/panic.c
index 47e8ebccc22b..f470a038b05b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -180,6 +180,7 @@  void panic(const char *fmt, ...)
 	 * after setting panic_cpu) from invoking panic() again.
 	 */
 	local_irq_disable();
+	preempt_disable_notrace();
 
 	/*
 	 * It's possible to come here directly from a panic-assertion and