diff mbox series

[RFC] exit: Skip panic in do_exit() during poweroff

Message ID 20250410143937.1829272-1-Tze-nan.Wu@mediatek.com (mailing list archive)
State New
Headers show
Series [RFC] exit: Skip panic in do_exit() during poweroff | expand

Commit Message

Tze-nan Wu (吳澤南) April 10, 2025, 2:39 p.m. UTC
When kernel_power_off() is invoked by a process other than the global
init (PID 1) on a specific core, other CPUs are still allowed to execute
processes, even though the userspace becomes unreliable.

If PID 1 exits due to the unreliable userspace after kernel_power_off()
invoked, the panic follow by the last thread of global init exited in
do_exit() will stop the kernel_power_off() procedure, turn a shutdown
behavior into panic flow(reboot).

Add a condition check to ensure that the panic triggered by the last
thread of the global init exiting, only occurs while:
( system_state != SYSTEM_POWER_OFF and system_state != SYSTEM_RESTART).
Otherwise, WARN() instead.

[On Android 16 with arm64 arch]
Here's a scenario where the global init exits during kernel_power_off:
If PID 1 encounters a page fault after kernel_power_off() has been
invoked, the kernel will fail to handle the page fault because the
disk(UFS) has already shut down.
Consequently, the kernel will send a SIGBUS to PID 1 to indicate the
page fault failure, and ultimately, the panic will occur after PID 1
exits due to receiving the SIGBUS.

            cpu1                           cpu2
          ----------                     ----------
    kernel_power_off() start
        UFS shutdown
            ...                	       PID 1 page fault
            ...                    page fault handle failure
            ...			             PID 1 received SIGBUS
            ...                             panic
   kernel_power_off() not done

Backtrace while PID 1 received signal 7:
   init-1 [007] d..1 41239.922385: \
      signal_generate: sig=7 errno=0 code=2 comm=init pid=1 grp=0 res=0
   init-1 [007] d..1 41239.922389: kernel_stack: <stack trace>
   => __send_signal_locked
   => send_signal_locked
   => force_sig_info_to_task
   => force_sig_fault
   => arm64_force_sig_fault
   => do_page_fault
   => do_translation_fault
   => do_mem_abort
   => el0_ia
   => el0t_64_sync_handler

Simplified kernel log:
kernel_power_off() invoked by pt_notify_thread.
[41239.526109] pt_notify_threa: reboot set flag, old value 0x********,
*.
[41239.526114] pt_notify_threa: reboot set flag new value 0x********.
UFS reject I/O after kerenl_power_off.
[41239.686411]  scsi +scsi******** apexd: sd* ******** rejecting I/O to
offline device.
Lots of I/O error & erofs error happened after kernel_power_off().
[41239.690312] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
flags 0x**** phys_seg ** prio class 0.
[41239.690465] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
flags 0x**** phys_seg ** prio class 0.
...
...
[41239.922265] init: erofs: (device ****): z_erofs_read_folio: read
error * @ *** of nid ********.
[41239.922341] init: erofs: (device ****): z_erofs_read_folio: read
error * @ *** of nid ********.
Finally device panic due to PID 1 received SIGBUS.
[41239.923789] init: Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000007

Fixes: 43cf75d96409 ("exit: panic before exit_mm() on global init exit")
Link: https://lore.kernel.org/all/20191219104223.xvk6ppfogoxrgmw6@wittgenstein/
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---

I am also wondering if this patch is reasonable?

From my perspective, there are two reasons not to trigger such panic
during kernel_power_off() or kernel_restart():
  1. It is not worthwhile to interrupt kernel_power_off() by a panic
     resulted from userspace instability.
  2. The panic in do_exit() was originally designed to ensure a usable
     coredump if the last thread of the global init process exited.
	 However, capture a coredump triggered by userspace crash after
     kernel_power_off() seems not particularly useful, in my opinion.
	 
In certain scenarios, a kernel module may need to directly power off
from kernel space to protect hardware (e.g., thermal protection).
In my opinion, rather than causing a panic during kernel_power_off(),
it sounds better to allow the device to complete its power-off process.

Appreciate for any comment on this, if there's any better way to
handle this panic, please point me out.

---
 kernel/exit.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Comments

Oleg Nesterov April 10, 2025, 9:05 p.m. UTC | #1
Well...

Let me repeat. I don't understand the kernel/reboot.c paths, you can
safely ignore me.

But I still think that you target the wrong goal. Quite possibly I am
wrong.

On 04/10, Tze-nan Wu wrote:
>
> If PID 1 exits due to the unreliable userspace after kernel_power_off()
> invoked,

Why. Why the global init does do_exit()? It should not, that is all.
It doesn't matter if it is single threaded or not.

As for sys_reboot(), I think that kernel_power_off() must be __noreturn,
and sys_reboot() should use BUG() after LINUX_REBOOT_CMD_POWER_OFF/_HALT
instead of do_exit().

If nothing else. do_exit() also does debug_check_no_locks_held() and
sys_reboot() calls do_exit() with system_transition_mutex held.

IOW. IMO, it is not that do_exit() needs some changes. The very fact
that the global init does do_exit() is wrong, this should be fixed.

But again, again, I can't really comment.

Oleg.

> the panic follow by the last thread of global init exited in
> do_exit() will stop the kernel_power_off() procedure, turn a shutdown
> behavior into panic flow(reboot).
>
> Add a condition check to ensure that the panic triggered by the last
> thread of the global init exiting, only occurs while:
> ( system_state != SYSTEM_POWER_OFF and system_state != SYSTEM_RESTART).
> Otherwise, WARN() instead.
>
> [On Android 16 with arm64 arch]
> Here's a scenario where the global init exits during kernel_power_off:
> If PID 1 encounters a page fault after kernel_power_off() has been
> invoked, the kernel will fail to handle the page fault because the
> disk(UFS) has already shut down.
> Consequently, the kernel will send a SIGBUS to PID 1 to indicate the
> page fault failure, and ultimately, the panic will occur after PID 1
> exits due to receiving the SIGBUS.
>
>             cpu1                           cpu2
>           ----------                     ----------
>     kernel_power_off() start
>         UFS shutdown
>             ...                	       PID 1 page fault
>             ...                    page fault handle failure
>             ...			             PID 1 received SIGBUS
>             ...                             panic
>    kernel_power_off() not done
>
> Backtrace while PID 1 received signal 7:
>    init-1 [007] d..1 41239.922385: \
>       signal_generate: sig=7 errno=0 code=2 comm=init pid=1 grp=0 res=0
>    init-1 [007] d..1 41239.922389: kernel_stack: <stack trace>
>    => __send_signal_locked
>    => send_signal_locked
>    => force_sig_info_to_task
>    => force_sig_fault
>    => arm64_force_sig_fault
>    => do_page_fault
>    => do_translation_fault
>    => do_mem_abort
>    => el0_ia
>    => el0t_64_sync_handler
>
> Simplified kernel log:
> kernel_power_off() invoked by pt_notify_thread.
> [41239.526109] pt_notify_threa: reboot set flag, old value 0x********,
> *.
> [41239.526114] pt_notify_threa: reboot set flag new value 0x********.
> UFS reject I/O after kerenl_power_off.
> [41239.686411]  scsi +scsi******** apexd: sd* ******** rejecting I/O to
> offline device.
> Lots of I/O error & erofs error happened after kernel_power_off().
> [41239.690312] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
> flags 0x**** phys_seg ** prio class 0.
> [41239.690465] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
> flags 0x**** phys_seg ** prio class 0.
> ...
> ...
> [41239.922265] init: erofs: (device ****): z_erofs_read_folio: read
> error * @ *** of nid ********.
> [41239.922341] init: erofs: (device ****): z_erofs_read_folio: read
> error * @ *** of nid ********.
> Finally device panic due to PID 1 received SIGBUS.
> [41239.923789] init: Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x00000007
>
> Fixes: 43cf75d96409 ("exit: panic before exit_mm() on global init exit")
> Link: https://lore.kernel.org/all/20191219104223.xvk6ppfogoxrgmw6@wittgenstein/
> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> ---
>
> I am also wondering if this patch is reasonable?
>
> From my perspective, there are two reasons not to trigger such panic
> during kernel_power_off() or kernel_restart():
>   1. It is not worthwhile to interrupt kernel_power_off() by a panic
>      resulted from userspace instability.
>   2. The panic in do_exit() was originally designed to ensure a usable
>      coredump if the last thread of the global init process exited.
> 	 However, capture a coredump triggered by userspace crash after
>      kernel_power_off() seems not particularly useful, in my opinion.
>
> In certain scenarios, a kernel module may need to directly power off
> from kernel space to protect hardware (e.g., thermal protection).
> In my opinion, rather than causing a panic during kernel_power_off(),
> it sounds better to allow the device to complete its power-off process.
>
> Appreciate for any comment on this, if there's any better way to
> handle this panic, please point me out.
>
> ---
>  kernel/exit.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 1dcddfe537ee..23cb6b42a1f1 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -901,11 +901,17 @@ void __noreturn do_exit(long code)
>  	if (group_dead) {
>  		/*
>  		 * If the last thread of global init has exited, panic
> -		 * immediately to get a useable coredump.
> +		 * immediately to get a usable coredump, except when the
> +		 * device is currently powering off or restarting.
>  		 */
> -		if (unlikely(is_global_init(tsk)))
> -			panic("Attempted to kill init! exitcode=0x%08x\n",
> -				tsk->signal->group_exit_code ?: (int)code);
> +		if (unlikely(is_global_init(tsk))) {
> +			if (system_state != SYSTEM_POWER_OFF &&
> +			    system_state != SYSTEM_RESTART)
> +				panic("Attempted to kill init! exitcode=0x%08x\n",
> +				      tsk->signal->group_exit_code ?: (int)code);
> +			WARN(1, "Attempted to kill init! exitcode=0x%08x\n",
> +			     tsk->signal->group_exit_code ?: (int)code);
> +		}
>
>  #ifdef CONFIG_POSIX_TIMERS
>  		hrtimer_cancel(&tsk->signal->real_timer);
> --
> 2.45.2
>
Oleg Nesterov April 11, 2025, 2:06 p.m. UTC | #2
Add cc'es.

A similar problem was recently reported,
https://lore.kernel.org/all/20250403-exit-v1-1-8e9266bfc4b7@debian.org/
and I didn't realize this is another thread.

On 04/10, Oleg Nesterov wrote:
>
> Well...
>
> Let me repeat. I don't understand the kernel/reboot.c paths, you can
> safely ignore me.
>
> But I still think that you target the wrong goal. Quite possibly I am
> wrong.
>
> On 04/10, Tze-nan Wu wrote:
> >
> > If PID 1 exits due to the unreliable userspace after kernel_power_off()
> > invoked,
>
> Why. Why the global init does do_exit()? It should not, that is all.
> It doesn't matter if it is single threaded or not.
>
> As for sys_reboot(), I think that kernel_power_off() must be __noreturn,
> and sys_reboot() should use BUG() after LINUX_REBOOT_CMD_POWER_OFF/_HALT
> instead of do_exit().
>
> If nothing else. do_exit() also does debug_check_no_locks_held() and
> sys_reboot() calls do_exit() with system_transition_mutex held.
>
> IOW. IMO, it is not that do_exit() needs some changes. The very fact
> that the global init does do_exit() is wrong, this should be fixed.
>
> But again, again, I can't really comment.
>
> Oleg.
>
> > the panic follow by the last thread of global init exited in
> > do_exit() will stop the kernel_power_off() procedure, turn a shutdown
> > behavior into panic flow(reboot).
> >
> > Add a condition check to ensure that the panic triggered by the last
> > thread of the global init exiting, only occurs while:
> > ( system_state != SYSTEM_POWER_OFF and system_state != SYSTEM_RESTART).
> > Otherwise, WARN() instead.
> >
> > [On Android 16 with arm64 arch]
> > Here's a scenario where the global init exits during kernel_power_off:
> > If PID 1 encounters a page fault after kernel_power_off() has been
> > invoked, the kernel will fail to handle the page fault because the
> > disk(UFS) has already shut down.
> > Consequently, the kernel will send a SIGBUS to PID 1 to indicate the
> > page fault failure, and ultimately, the panic will occur after PID 1
> > exits due to receiving the SIGBUS.
> >
> >             cpu1                           cpu2
> >           ----------                     ----------
> >     kernel_power_off() start
> >         UFS shutdown
> >             ...                	       PID 1 page fault
> >             ...                    page fault handle failure
> >             ...			             PID 1 received SIGBUS
> >             ...                             panic
> >    kernel_power_off() not done
> >
> > Backtrace while PID 1 received signal 7:
> >    init-1 [007] d..1 41239.922385: \
> >       signal_generate: sig=7 errno=0 code=2 comm=init pid=1 grp=0 res=0
> >    init-1 [007] d..1 41239.922389: kernel_stack: <stack trace>
> >    => __send_signal_locked
> >    => send_signal_locked
> >    => force_sig_info_to_task
> >    => force_sig_fault
> >    => arm64_force_sig_fault
> >    => do_page_fault
> >    => do_translation_fault
> >    => do_mem_abort
> >    => el0_ia
> >    => el0t_64_sync_handler
> >
> > Simplified kernel log:
> > kernel_power_off() invoked by pt_notify_thread.
> > [41239.526109] pt_notify_threa: reboot set flag, old value 0x********,
> > *.
> > [41239.526114] pt_notify_threa: reboot set flag new value 0x********.
> > UFS reject I/O after kerenl_power_off.
> > [41239.686411]  scsi +scsi******** apexd: sd* ******** rejecting I/O to
> > offline device.
> > Lots of I/O error & erofs error happened after kernel_power_off().
> > [41239.690312] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
> > flags 0x**** phys_seg ** prio class 0.
> > [41239.690465] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
> > flags 0x**** phys_seg ** prio class 0.
> > ...
> > ...
> > [41239.922265] init: erofs: (device ****): z_erofs_read_folio: read
> > error * @ *** of nid ********.
> > [41239.922341] init: erofs: (device ****): z_erofs_read_folio: read
> > error * @ *** of nid ********.
> > Finally device panic due to PID 1 received SIGBUS.
> > [41239.923789] init: Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00000007
> >
> > Fixes: 43cf75d96409 ("exit: panic before exit_mm() on global init exit")
> > Link: https://lore.kernel.org/all/20191219104223.xvk6ppfogoxrgmw6@wittgenstein/
> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> > ---
> >
> > I am also wondering if this patch is reasonable?
> >
> > From my perspective, there are two reasons not to trigger such panic
> > during kernel_power_off() or kernel_restart():
> >   1. It is not worthwhile to interrupt kernel_power_off() by a panic
> >      resulted from userspace instability.
> >   2. The panic in do_exit() was originally designed to ensure a usable
> >      coredump if the last thread of the global init process exited.
> > 	 However, capture a coredump triggered by userspace crash after
> >      kernel_power_off() seems not particularly useful, in my opinion.
> >
> > In certain scenarios, a kernel module may need to directly power off
> > from kernel space to protect hardware (e.g., thermal protection).
> > In my opinion, rather than causing a panic during kernel_power_off(),
> > it sounds better to allow the device to complete its power-off process.
> >
> > Appreciate for any comment on this, if there's any better way to
> > handle this panic, please point me out.
> >
> > ---
> >  kernel/exit.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 1dcddfe537ee..23cb6b42a1f1 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -901,11 +901,17 @@ void __noreturn do_exit(long code)
> >  	if (group_dead) {
> >  		/*
> >  		 * If the last thread of global init has exited, panic
> > -		 * immediately to get a useable coredump.
> > +		 * immediately to get a usable coredump, except when the
> > +		 * device is currently powering off or restarting.
> >  		 */
> > -		if (unlikely(is_global_init(tsk)))
> > -			panic("Attempted to kill init! exitcode=0x%08x\n",
> > -				tsk->signal->group_exit_code ?: (int)code);
> > +		if (unlikely(is_global_init(tsk))) {
> > +			if (system_state != SYSTEM_POWER_OFF &&
> > +			    system_state != SYSTEM_RESTART)
> > +				panic("Attempted to kill init! exitcode=0x%08x\n",
> > +				      tsk->signal->group_exit_code ?: (int)code);
> > +			WARN(1, "Attempted to kill init! exitcode=0x%08x\n",
> > +			     tsk->signal->group_exit_code ?: (int)code);
> > +		}
> >
> >  #ifdef CONFIG_POSIX_TIMERS
> >  		hrtimer_cancel(&tsk->signal->real_timer);
> > --
> > 2.45.2
> >
Tze-nan Wu (吳澤南) April 14, 2025, 1:02 p.m. UTC | #3
On Thu, 2025-04-10 at 23:05 +0200, Oleg Nesterov wrote:
> 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> Well...
> 
> Let me repeat. I don't understand the kernel/reboot.c paths, you can
> safely ignore me.
> 
> But I still think that you target the wrong goal. Quite possibly I am
> wrong.
> 
> On 04/10, Tze-nan Wu wrote:
> > 
> > If PID 1 exits due to the unreliable userspace after
> > kernel_power_off()
> > invoked,
> 
> Why. Why the global init does do_exit()? It should not, that is all.
> It doesn't matter if it is single threaded or not.
> 
> As for sys_reboot(), I think that kernel_power_off() must be
> __noreturn,
> and sys_reboot() should use BUG() after
> LINUX_REBOOT_CMD_POWER_OFF/_HALT
> instead of do_exit().
> 

Yes, kernel_power_off() should not return, but this is the case only if
kernel_power_off() is invoked by PID 1 through sys_reboot().
If kernel_power_off() is invoked by a kernel thread (e.g., the thermal
kernel module) other than PID 1, then do_exit() could possibly be
invoked by PID 1 after kernel_power_off() on another CPU. (shown as
below)

cpu 1 (thermal ko)                cpu 2 (PID 1)
-----------------                 ---------------
kernel_power_off                      ...
->ufshcd_wl_shutdown(UFS down)       ...
...                                PID 1 page fault
...                                fail to handle page fault (UFS down)
...                                send SIGBUS to PID 1
...                                PID 1 trap to do_exit()
...                                panic()
->machine_power_off()
  -> smp_send_stop() //stop other CPUs

We have encounter this scenario several times in a low rate on kernel-
6.12.

> If nothing else. do_exit() also does debug_check_no_locks_held() and
> sys_reboot() calls do_exit() with system_transition_mutex held.
> 
> IOW. IMO, it is not that do_exit() needs some changes. The very fact
> that the global init does do_exit() is wrong, this should be fixed.
> 
I'm not an expert on UFS, but if we want to prevent entering do_exit()
after kernel_power_off(), perhaps moving ufshcd_wl_shutdown() after 
smp_send_stop() could help.
Since the userspace process running on the other CPUs before
smp_send_stop() could still access the UFS.
But not sure if that's possible...

Tze-nan
> But again, again, I can't really comment.
> 
> Oleg.
> 
> > the panic follow by the last thread of global init exited in
> > do_exit() will stop the kernel_power_off() procedure, turn a
> > shutdown
> > behavior into panic flow(reboot).
> > 
> > Add a condition check to ensure that the panic triggered by the
> > last
> > thread of the global init exiting, only occurs while:
> > ( system_state != SYSTEM_POWER_OFF and system_state !=
> > SYSTEM_RESTART).
> > Otherwise, WARN() instead.
> > 
> > [On Android 16 with arm64 arch]
> > Here's a scenario where the global init exits during
> > kernel_power_off:
> > If PID 1 encounters a page fault after kernel_power_off() has been
> > invoked, the kernel will fail to handle the page fault because the
> > disk(UFS) has already shut down.
> > Consequently, the kernel will send a SIGBUS to PID 1 to indicate
> > the
> > page fault failure, and ultimately, the panic will occur after PID
> > 1
> > exits due to receiving the SIGBUS.
> > 
> >             cpu1                           cpu2
> >           ----------                     ----------
> >     kernel_power_off() start
> >         UFS shutdown
> >             ...                              PID 1 page fault
> >             ...                    page fault handle failure
> >             ...                                    PID 1 received
> > SIGBUS
> >             ...                             panic
> >    kernel_power_off() not done
> > 
> > Backtrace while PID 1 received signal 7:
> >    init-1 [007] d..1 41239.922385: \
> >       signal_generate: sig=7 errno=0 code=2 comm=init pid=1 grp=0
> > res=0
> >    init-1 [007] d..1 41239.922389: kernel_stack: <stack trace>
> >    => __send_signal_locked
> >    => send_signal_locked
> >    => force_sig_info_to_task
> >    => force_sig_fault
> >    => arm64_force_sig_fault
> >    => do_page_fault
> >    => do_translation_fault
> >    => do_mem_abort
> >    => el0_ia
> >    => el0t_64_sync_handler
> > 
> > Simplified kernel log:
> > kernel_power_off() invoked by pt_notify_thread.
> > [41239.526109] pt_notify_threa: reboot set flag, old value
> > 0x********,
> > *.
> > [41239.526114] pt_notify_threa: reboot set flag new value
> > 0x********.
> > UFS reject I/O after kerenl_power_off.
> > [41239.686411]  scsi +scsi******** apexd: sd* ******** rejecting
> > I/O to
> > offline device.
> > Lots of I/O error & erofs error happened after kernel_power_off().
> > [41239.690312] apexd: I/O error, dev sdc, sector ******* op
> > ***:(READ)
> > flags 0x**** phys_seg ** prio class 0.
> > [41239.690465] apexd: I/O error, dev sdc, sector ******* op
> > ***:(READ)
> > flags 0x**** phys_seg ** prio class 0.
> > ...
> > ...
> > [41239.922265] init: erofs: (device ****): z_erofs_read_folio: read
> > error * @ *** of nid ********.
> > [41239.922341] init: erofs: (device ****): z_erofs_read_folio: read
> > error * @ *** of nid ********.
> > Finally device panic due to PID 1 received SIGBUS.
> > [41239.923789] init: Kernel panic - not syncing: Attempted to kill
> > init!
> > exitcode=0x00000007
> > 
> > Fixes: 43cf75d96409 ("exit: panic before exit_mm() on global init
> > exit")
> > Link:
> > https://lore.kernel.org/all/20191219104223.xvk6ppfogoxrgmw6@wittgenstein/
> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> > ---
> > 
> > I am also wondering if this patch is reasonable?
> > 
> > From my perspective, there are two reasons not to trigger such
> > panic
> > during kernel_power_off() or kernel_restart():
> >   1. It is not worthwhile to interrupt kernel_power_off() by a
> > panic
> >      resulted from userspace instability.
> >   2. The panic in do_exit() was originally designed to ensure a
> > usable
> >      coredump if the last thread of the global init process exited.
> >        However, capture a coredump triggered by userspace crash
> > after
> >      kernel_power_off() seems not particularly useful, in my
> > opinion.
> > 
> > In certain scenarios, a kernel module may need to directly power
> > off
> > from kernel space to protect hardware (e.g., thermal protection).
> > In my opinion, rather than causing a panic during
> > kernel_power_off(),
> > it sounds better to allow the device to complete its power-off
> > process.
> > 
> > Appreciate for any comment on this, if there's any better way to
> > handle this panic, please point me out.
> > 
> > ---
> >  kernel/exit.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 1dcddfe537ee..23cb6b42a1f1 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -901,11 +901,17 @@ void __noreturn do_exit(long code)
> >       if (group_dead) {
> >               /*
> >                * If the last thread of global init has exited,
> > panic
> > -              * immediately to get a useable coredump.
> > +              * immediately to get a usable coredump, except when
> > the
> > +              * device is currently powering off or restarting.
> >                */
> > -             if (unlikely(is_global_init(tsk)))
> > -                     panic("Attempted to kill init!
> > exitcode=0x%08x\n",
> > -                             tsk->signal->group_exit_code ?:
> > (int)code);
> > +             if (unlikely(is_global_init(tsk))) {
> > +                     if (system_state != SYSTEM_POWER_OFF &&
> > +                         system_state != SYSTEM_RESTART)
> > +                             panic("Attempted to kill init!
> > exitcode=0x%08x\n",
> > +                                   tsk->signal->group_exit_code ?:
> > (int)code);
> > +                     WARN(1, "Attempted to kill init!
> > exitcode=0x%08x\n",
> > +                          tsk->signal->group_exit_code ?:
> > (int)code);
> > +             }
> > 
> >  #ifdef CONFIG_POSIX_TIMERS
> >               hrtimer_cancel(&tsk->signal->real_timer);
> > --
> > 2.45.2
> > 
>
Oleg Nesterov April 14, 2025, 4:50 p.m. UTC | #4
On 04/14, Tze-nan Wu (吳澤南) wrote:
>
> On Thu, 2025-04-10 at 23:05 +0200, Oleg Nesterov wrote:
> > 
> > As for sys_reboot(), I think that kernel_power_off() must be
> > __noreturn,
> > and sys_reboot() should use BUG() after
> > LINUX_REBOOT_CMD_POWER_OFF/_HALT
> > instead of do_exit().
> > 
> 
> Yes, kernel_power_off() should not return, but this is the case only if
> kernel_power_off() is invoked by PID 1 through sys_reboot().
> If kernel_power_off() is invoked by a kernel thread (e.g., the thermal
> kernel module) other than PID 1, then do_exit() could possibly be
> invoked by PID 1 after kernel_power_off() on another CPU.

Yes sure, this is clear.

I have mentioned sys_reboot() because (unless I am totally confused)
this connects to the previous report from Breno.

And I agree that we should do stop_other_cpus() first, but let me
say this again: I can't help ;)

But in any case, rightly or not I still think that the init process
should not exit/crash due to POWER_OFF/HALT. We should not mask this
problem in do_exit().

Oleg.
Christian Brauner April 15, 2025, 8:41 a.m. UTC | #5
On Mon, Apr 14, 2025 at 06:50:44PM +0200, Oleg Nesterov wrote:
> On 04/14, Tze-nan Wu (吳澤南) wrote:
> >
> > On Thu, 2025-04-10 at 23:05 +0200, Oleg Nesterov wrote:
> > > 
> > > As for sys_reboot(), I think that kernel_power_off() must be
> > > __noreturn,
> > > and sys_reboot() should use BUG() after
> > > LINUX_REBOOT_CMD_POWER_OFF/_HALT
> > > instead of do_exit().
> > > 
> > 
> > Yes, kernel_power_off() should not return, but this is the case only if
> > kernel_power_off() is invoked by PID 1 through sys_reboot().
> > If kernel_power_off() is invoked by a kernel thread (e.g., the thermal
> > kernel module) other than PID 1, then do_exit() could possibly be
> > invoked by PID 1 after kernel_power_off() on another CPU.
> 
> Yes sure, this is clear.
> 
> I have mentioned sys_reboot() because (unless I am totally confused)
> this connects to the previous report from Breno.
> 
> And I agree that we should do stop_other_cpus() first, but let me
> say this again: I can't help ;)
> 
> But in any case, rightly or not I still think that the init process
> should not exit/crash due to POWER_OFF/HALT. We should not mask this
> problem in do_exit().

I agree.
diff mbox series

Patch

diff --git a/kernel/exit.c b/kernel/exit.c
index 1dcddfe537ee..23cb6b42a1f1 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -901,11 +901,17 @@  void __noreturn do_exit(long code)
 	if (group_dead) {
 		/*
 		 * If the last thread of global init has exited, panic
-		 * immediately to get a useable coredump.
+		 * immediately to get a usable coredump, except when the
+		 * device is currently powering off or restarting.
 		 */
-		if (unlikely(is_global_init(tsk)))
-			panic("Attempted to kill init! exitcode=0x%08x\n",
-				tsk->signal->group_exit_code ?: (int)code);
+		if (unlikely(is_global_init(tsk))) {
+			if (system_state != SYSTEM_POWER_OFF &&
+			    system_state != SYSTEM_RESTART)
+				panic("Attempted to kill init! exitcode=0x%08x\n",
+				      tsk->signal->group_exit_code ?: (int)code);
+			WARN(1, "Attempted to kill init! exitcode=0x%08x\n",
+			     tsk->signal->group_exit_code ?: (int)code);
+		}
 
 #ifdef CONFIG_POSIX_TIMERS
 		hrtimer_cancel(&tsk->signal->real_timer);