Message ID | 20200115063410.131692-1-hsinyi@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5] reboot: support offline CPUs before reboot | expand |
On Wed, Jan 15, 2020 at 7:35 AM Hsin-Yi Wang <hsinyi@chromium.org> wrote: > > Currently system reboots uses architecture specific codes (smp_send_stop) > to offline non reboot CPUs. Most architecture's implementation is looping > through all non reboot online CPUs and call ipi function to each of them. Some > architecture like arm64, arm, and x86... would set offline masks to cpu without > really offline them. This causes some race condition and kernel warning comes > out sometimes when system reboots. > > This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would offline cpus in > migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop for > checking online cpus would be an empty loop. If architecture don't enable this > config, or some cpus somehow fails to offline, it would fallback to ipi > function. > > Opt in this config for architectures that support CONFIG_HOTPLUG_CPU. > > Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> > --- > Change from v4: > * fix a few nits: naming, comments, remove Kconfig text... > > Change from v3: > * Opt in config for architectures that support CONFIG_HOTPLUG_CPU > * Merge function offline_secondary_cpus() and freeze_secondary_cpus() > with an additional flag. This does not seem to be a very good idea, since freeze_secondary_cpus() does much more than you need for reboot. For reboot, you basically only need to do something like this AFAICS: cpu_maps_update_begin(); for_each_online_cpu(i) { if (i != cpu) _cpu_down(i, 1, CPUHP_OFFLINE); } cpu_hotplug_disabled++; cpu_maps_update_done(); And you may put this into a function defined outside of CONFIG_PM_SLEEP. > > Change from v2: > * Add another config instead of configed by CONFIG_HOTPLUG_CPU So why exactly is this new Kconfig option needed? Everybody supporting CPU hotplug seems to opt in anyway. [cut] > > -int freeze_secondary_cpus(int primary) > +int freeze_secondary_cpus(int primary, bool reboot) > { > int cpu, error = 0; > > @@ -1237,11 +1237,13 @@ int freeze_secondary_cpus(int primary) > if (cpu == primary) > continue; > > - if (pm_wakeup_pending()) { > +#ifdef CONFIG_PM_SLEEP > + if (!reboot && pm_wakeup_pending()) { > pr_info("Wakeup pending. Abort CPU freeze\n"); > error = -EBUSY; > break; > } > +#endif Please avoid using #ifdefs in function bodies. This makes the code hard to maintain in the long term. > > trace_suspend_resume(TPS("CPU_OFF"), cpu, true); > error = _cpu_down(cpu, 1, CPUHP_OFFLINE); > @@ -1250,7 +1252,9 @@ int freeze_secondary_cpus(int primary) > cpumask_set_cpu(cpu, frozen_cpus); > else { > pr_err("Error taking CPU%d down: %d\n", cpu, error); > - break; > + /* When rebooting, offline as many CPUs as possible. */ > + if (!reboot) > + break; > } > } > > diff --git a/kernel/reboot.c b/kernel/reboot.c > index c4d472b7f1b4..12f643b66e57 100644 > --- a/kernel/reboot.c > +++ b/kernel/reboot.c > @@ -7,6 +7,7 @@ > > #define pr_fmt(fmt) "reboot: " fmt > > +#include <linux/cpu.h> > #include <linux/ctype.h> > #include <linux/export.h> > #include <linux/kexec.h> > @@ -220,7 +221,9 @@ void migrate_to_reboot_cpu(void) > /* The boot cpu is always logical cpu 0 */ > int cpu = reboot_cpu; > > +#if !IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) > cpu_hotplug_disable(); > +#endif You can write this as if (!IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT)) cpu_hotplug_disable(); That's what IS_ENABLED() is there for. > > /* Make certain the cpu I'm about to reboot on is online */ > if (!cpu_online(cpu)) > @@ -231,6 +234,11 @@ void migrate_to_reboot_cpu(void) > > /* Make certain I only run on the appropriate processor */ > set_cpus_allowed_ptr(current, cpumask_of(cpu)); > + > +#if IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) > + /* Offline other cpus if possible */ > + freeze_secondary_cpus(cpu, true); > +#endif The above comment applies here too. > } > > /** > --
On Wed, Jan 15, 2020 at 5:49 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Jan 15, 2020 at 7:35 AM Hsin-Yi Wang <hsinyi@chromium.org> wrote: > > > > Currently system reboots uses architecture specific codes (smp_send_stop) > > to offline non reboot CPUs. Most architecture's implementation is looping > > through all non reboot online CPUs and call ipi function to each of them. Some > > architecture like arm64, arm, and x86... would set offline masks to cpu without > > really offline them. This causes some race condition and kernel warning comes > > out sometimes when system reboots. > > > > This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would offline cpus in > > migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop for > > checking online cpus would be an empty loop. If architecture don't enable this > > config, or some cpus somehow fails to offline, it would fallback to ipi > > function. > > > > Opt in this config for architectures that support CONFIG_HOTPLUG_CPU. > > > > Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> > > --- > > Change from v4: > > * fix a few nits: naming, comments, remove Kconfig text... > > > > Change from v3: > > * Opt in config for architectures that support CONFIG_HOTPLUG_CPU > > * Merge function offline_secondary_cpus() and freeze_secondary_cpus() > > with an additional flag. > > This does not seem to be a very good idea, since > freeze_secondary_cpus() does much more than you need for reboot. > > For reboot, you basically only need to do something like this AFAICS: > > cpu_maps_update_begin(); > > for_each_online_cpu(i) { > if (i != cpu) > _cpu_down(i, 1, CPUHP_OFFLINE); > } > cpu_hotplug_disabled++; > > cpu_maps_update_done(); > > And you may put this into a function defined outside of CONFIG_PM_SLEEP. > v2's implementation is similar to this. The conclusion in v2[1] is that since these 2 functions are similar, we should merge them. I'm fine both ways but slightly prefer v2's. Maybe wait for others to comment? [1] https://lore.kernel.org/lkml/87muarpcwm.fsf@vitty.brq.redhat.com/ > > > > Change from v2: > > * Add another config instead of configed by CONFIG_HOTPLUG_CPU > > So why exactly is this new Kconfig option needed? > > Everybody supporting CPU hotplug seems to opt in anyway. > Currently we opt-in for all arch that supports HOTPLUG_CPU, but if some arch decides that this would make reboot slow (or maybe other reasons), they can choose to opt-out. I have only tested on arm64 and x86 for now. > [cut] > > > > > -int freeze_secondary_cpus(int primary) > > +int freeze_secondary_cpus(int primary, bool reboot) > > { > > int cpu, error = 0; > > > > @@ -1237,11 +1237,13 @@ int freeze_secondary_cpus(int primary) > > if (cpu == primary) > > continue; > > > > - if (pm_wakeup_pending()) { > > +#ifdef CONFIG_PM_SLEEP > > + if (!reboot && pm_wakeup_pending()) { > > pr_info("Wakeup pending. Abort CPU freeze\n"); > > error = -EBUSY; > > break; > > } > > +#endif > > Please avoid using #ifdefs in function bodies. This makes the code > hard to maintain in the long term. > > > > > trace_suspend_resume(TPS("CPU_OFF"), cpu, true); > > error = _cpu_down(cpu, 1, CPUHP_OFFLINE); > > @@ -1250,7 +1252,9 @@ int freeze_secondary_cpus(int primary) > > cpumask_set_cpu(cpu, frozen_cpus); > > else { > > pr_err("Error taking CPU%d down: %d\n", cpu, error); > > - break; > > + /* When rebooting, offline as many CPUs as possible. */ > > + if (!reboot) > > + break; > > } > > } > > > > diff --git a/kernel/reboot.c b/kernel/reboot.c > > index c4d472b7f1b4..12f643b66e57 100644 > > --- a/kernel/reboot.c > > +++ b/kernel/reboot.c > > @@ -7,6 +7,7 @@ > > > > #define pr_fmt(fmt) "reboot: " fmt > > > > +#include <linux/cpu.h> > > #include <linux/ctype.h> > > #include <linux/export.h> > > #include <linux/kexec.h> > > @@ -220,7 +221,9 @@ void migrate_to_reboot_cpu(void) > > /* The boot cpu is always logical cpu 0 */ > > int cpu = reboot_cpu; > > > > +#if !IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) > > cpu_hotplug_disable(); > > +#endif > > You can write this as > > if (!IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT)) > cpu_hotplug_disable(); > > That's what IS_ENABLED() is there for. > > > > > /* Make certain the cpu I'm about to reboot on is online */ > > if (!cpu_online(cpu)) > > @@ -231,6 +234,11 @@ void migrate_to_reboot_cpu(void) > > > > /* Make certain I only run on the appropriate processor */ > > set_cpus_allowed_ptr(current, cpumask_of(cpu)); > > + > > +#if IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) > > + /* Offline other cpus if possible */ > > + freeze_secondary_cpus(cpu, true); > > +#endif > > The above comment applies here too. > > > } > > > > /** > > --
On Wed, Jan 15, 2020 at 02:34:10PM +0800, Hsin-Yi Wang wrote: > Currently system reboots uses architecture specific codes (smp_send_stop) > to offline non reboot CPUs. Most architecture's implementation is looping > through all non reboot online CPUs and call ipi function to each of them. Some > architecture like arm64, arm, and x86... would set offline masks to cpu without > really offline them. This causes some race condition and kernel warning comes > out sometimes when system reboots. > > This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would offline cpus in > migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop for > checking online cpus would be an empty loop. If architecture don't enable this > config, or some cpus somehow fails to offline, it would fallback to ipi > function. > What's the timing impact on systems with large number of CPUs(say 256 or more) ? I remember we added some change to reduce the wait times for offlining CPUs in system suspend path on arm64, still not negligible. -- Regards, Sudeep
Hsin-Yi Wang <hsinyi@chromium.org> writes: > Currently system reboots uses architecture specific codes (smp_send_stop) > to offline non reboot CPUs. Most architecture's implementation is looping > through all non reboot online CPUs and call ipi function to each of them. Some > architecture like arm64, arm, and x86... would set offline masks to cpu without > really offline them. This causes some race condition and kernel warning comes > out sometimes when system reboots. 'some race condition and kernel warning' is pretty useless information. Please describe exactly which kind of issues are caused by the current mechanism. Especially the race conditions are the interesting part (the warnings are just a consequence). > This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would > offline cpus in Please read Documentation/process/submitting-patches.rst and search for 'This patch'. > migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop for > checking online cpus would be an empty loop. This does not make any sense. The issues which you are trying to solve are going to be still there when CONFIG_HOTPLUG_CPU is disabled. > If architecture don't enable this config, or some cpus somehow fails > to offline, it would fallback to ipi function. This is really a half baken solution which keeps the various pointlessly different pseudo reboot/kexec offlining implementations around. So with this we have yet more code which only works depending on kernel configuration and has the issue of potentially not being able to offline a CPU. IOW this is going to fail completely in cases where a system is in a state which prevents regular hotplug. The existing pseudo-offline functions have timeouts and eventually a fallback, e.g. the NMI fallback on x86. With this proposed regular offline solution this will just get stuck w/o a chance to force recovery. While I like the idea and surely agree that the ideal solution is to properly shutdown the CPUs on reboot, we need to take a step back and look at the minimum requirements for a regular shutdown/reboot and at the same time have a look at the requirements for emergency shutdown and kexec/kcrash. Having proper information about the race conditions and warnings you mentioned would be a good starting point. > Opt in this config for architectures that support CONFIG_HOTPLUG_CPU. This is not opt-in. You force that on all architectures which support CONFIG_HOTPLUG_CPU. The way we do this normally is to provide the infrastructure first and then have separate patches (one per architecture) enabling this, which allows the architecture maintainers to decide individually. Thanks, tglx
On Thu, Jan 16, 2020 at 8:30 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Hsin-Yi Wang <hsinyi@chromium.org> writes: > > > Currently system reboots uses architecture specific codes (smp_send_stop) > > to offline non reboot CPUs. Most architecture's implementation is looping > > through all non reboot online CPUs and call ipi function to each of them. Some > > architecture like arm64, arm, and x86... would set offline masks to cpu without > > really offline them. This causes some race condition and kernel warning comes > > out sometimes when system reboots. > > 'some race condition and kernel warning' is pretty useless information. > Please describe exactly which kind of issues are caused by the current > mechanism. Especially the race conditions are the interesting part (the > warnings are just a consequence). > > > This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would > > offline cpus in > > Please read Documentation/process/submitting-patches.rst and search for > 'This patch'. > > > migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop for > > checking online cpus would be an empty loop. > > This does not make any sense. The issues which you are trying to solve > are going to be still there when CONFIG_HOTPLUG_CPU is disabled. > > > If architecture don't enable this config, or some cpus somehow fails > > to offline, it would fallback to ipi function. > > This is really a half baken solution which keeps the various pointlessly > different pseudo reboot/kexec offlining implementations around. So with > this we have yet more code which only works depending on kernel > configuration and has the issue of potentially not being able to offline > a CPU. IOW this is going to fail completely in cases where a system is > in a state which prevents regular hotplug. > > The existing pseudo-offline functions have timeouts and eventually a > fallback, e.g. the NMI fallback on x86. With this proposed regular > offline solution this will just get stuck w/o a chance to force > recovery. > > While I like the idea and surely agree that the ideal solution is to > properly shutdown the CPUs on reboot, we need to take a step back and > look at the minimum requirements for a regular shutdown/reboot and at > the same time have a look at the requirements for emergency shutdown and > kexec/kcrash. Having proper information about the race conditions and > warnings you mentioned would be a good starting point. > We saw this issue on regular reboot (not panic) on arm64: If tick broadcast and smp_send_stop() happen together and the first broadcast arrives to some idled CPU that hasn't already executed reboot ipi to run in spinloop, it would try to broadcast to another CPU, but that target CPU is already marked as offline by set_cpu_online() in reboot ipi, and a warning comes out since tick_handle_oneshot_broadcast() would check if it tries to broadcast to offline cpus. Most of the time the CPU getting the broadcast interrupt is already in the spinloop and thus isn't going to receive interrupts from the broadcast timer. [ 27.032080] Set kernel.core_pattern before fs.suid_dumpable. [ 27.978628] reboot: Restarting system [ 27.978919] WARNING: CPU: 3 PID: 0 at /mnt/host/source/src/third_party/kernel/v4.19/kernel/time/tick-broadcast.c:652 tick_handle_oneshot_broadcast+0x1f8/0x214 [ 27.978932] Modules linked in: rfcomm uinput bridge stp llc nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp esp6 ah6 xfrm6_mode_tunnel xfrm6_mode_transport xfrm4_mode_tunnel xfrm4_mode_transport ip6t_REJECT ip6t_ipv6header hci_uart btqca bluetooth ipt_MASQUERADE ecdh_generic lzo_rle lzo_compress zram fuse cros_ec_sensors_sync cros_ec_sensors_ring cros_ec_light_prox cros_ec_sensors industrialio_triggered_buffer kfifo_buf cros_ec_sensors_core ath10k_sdio ath10k_core ath mac80211 cfg80211 asix usbnet mii joydev [ 27.979102] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S 4.19.56 #79 [ 27.979113] Hardware name: MediaTek krane sku176 board (DT) [ 27.979127] pstate: 00000085 (nzcv daIf -PAN -UAO) [ 27.979140] pc : tick_handle_oneshot_broadcast+0x1f8/0x214 [ 27.979154] lr : tick_handle_oneshot_broadcast+0x108/0x214 [ 27.979162] sp : fffffff85c851610 [ 27.979171] x29: fffffff85c851670 x28: ffffff9082785510 [ 27.979187] x27: ffffff90822da700 x26: ffffff90826169c8 [ 27.979202] x25: ffffff9082616000 x24: 0000000000000001 [ 27.979217] x23: ffffff90827854f8 x22: 000000067a6599b8 [ 27.979232] x21: 0000000000000000 x20: 7fffffffffffffff [ 27.979248] x19: ffffff9082785508 x18: 0000000000000000 [ 27.979263] x17: 0000000000000000 x16: 0000000000000000 [ 27.979278] x15: 0000000000000000 x14: fffffff85bf08040 [ 27.979293] x13: ffffff9082785000 x12: 0000000000000069 [ 27.979308] x11: ffffff9082785000 x10: 0000000000000018 [ 27.979323] x9 : 0000000000000010 x8 : ffffff9082615000 [ 27.979338] x7 : 0000000000000000 x6 : 000000000000003f [ 27.979353] x5 : 0000000000000040 x4 : 0000000000000102 [ 27.979367] x3 : 0000000000000000 x2 : 0000000000000007 [ 27.979383] x1 : 0000000000000008 x0 : 0000000000000008 [ 27.979399] Call trace: [ 27.979413] tick_handle_oneshot_broadcast+0x1f8/0x214 [ 27.979429] mtk_syst_handler+0x34/0x44 [ 27.979443] __handle_irq_event_percpu+0x134/0x254 [ 27.979454] handle_irq_event_percpu+0x34/0x8c [ 27.979465] handle_irq_event+0x48/0x78 [ 27.979478] handle_fasteoi_irq+0xd0/0x1a4 [ 27.979492] __handle_domain_irq+0x84/0xc4 [ 27.979505] gic_handle_irq+0x154/0x1a4 [ 27.979516] el1_irq+0xb0/0x128 [ 27.979531] cpuidle_enter_state+0x298/0x328 [ 27.979543] cpuidle_enter+0x30/0x40 [ 27.979557] do_idle+0x154/0x270 [ 27.979569] cpu_startup_entry+0x24/0x28 [ 27.979584] secondary_start_kernel+0x15c/0x168 [ 27.979594] ---[ end trace 57ed1d1fade60372 ]--- If system supports hotplug, _cpu_down() would properly handle tasks termination such as remove CPU from timer broadcasting by tick_offline_cpu()...etc, as well as some interrupt handling. https://lore.kernel.org/patchwork/patch/1117201/ is a previous attempt to solve this issue by introducing another mask in reboot ipi function to avoid the cpu_online_mask being looked up in too many different places. > > Opt in this config for architectures that support CONFIG_HOTPLUG_CPU. > > This is not opt-in. You force that on all architectures which support > CONFIG_HOTPLUG_CPU. The way we do this normally is to provide the > infrastructure first and then have separate patches (one per > architecture) enabling this, which allows the architecture maintainers > to decide individually. Acked, thanks. > > Thanks, > > tglx
On Wed, Jan 15, 2020 at 7:41 PM Sudeep Holla <sudeep.holla@arm.com> wrote: > > On Wed, Jan 15, 2020 at 02:34:10PM +0800, Hsin-Yi Wang wrote: > > Currently system reboots uses architecture specific codes (smp_send_stop) > > to offline non reboot CPUs. Most architecture's implementation is looping > > through all non reboot online CPUs and call ipi function to each of them. Some > > architecture like arm64, arm, and x86... would set offline masks to cpu without > > really offline them. This causes some race condition and kernel warning comes > > out sometimes when system reboots. > > > > This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would offline cpus in > > migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop for > > checking online cpus would be an empty loop. If architecture don't enable this > > config, or some cpus somehow fails to offline, it would fallback to ipi > > function. > > > > What's the timing impact on systems with large number of CPUs(say 256 or > more) ? I remember we added some change to reduce the wait times for > offlining CPUs in system suspend path on arm64, still not negligible. > This is not the final solution, but I would still provided some data points here: Tested on my arm64 with 4 cpu: 2 a53 and 2 a72. Offlining 3 cpu takes about 60~65 ms Offlining 2 cpu(a53+a72 or a72+a72) takes about 42~47 ms Offlining 1 cpu(a53 or a72) takes about 23~25 ms. It would take longer time for systems with large number of CPUs.
Hsin-Yi Wang <hsinyi@chromium.org> writes: > On Thu, Jan 16, 2020 at 8:30 AM Thomas Gleixner <tglx@linutronix.de> wrote: > We saw this issue on regular reboot (not panic) on arm64: If tick > broadcast and smp_send_stop() happen together and the first broadcast > arrives to some idled CPU that hasn't already executed reboot ipi to > run in spinloop, it would try to broadcast to another CPU, but that > target CPU is already marked as offline by set_cpu_online() in reboot > ipi, and a warning comes out since tick_handle_oneshot_broadcast() > would check if it tries to broadcast to offline cpus. Most of the time > the CPU getting the broadcast interrupt is already in the spinloop and > thus isn't going to receive interrupts from the broadcast timer. The timer broadcasting is obviously broken by the existing reboot unplug mechanism as the outgoing CPU should remove itself from the broadcast. Just addressing the broadcast issue is not sufficient as there are tons of other places which rely on consistency of the various cpu masks. > If system supports hotplug, _cpu_down() would properly handle tasks > termination such as remove CPU from timer broadcasting by > tick_offline_cpu()...etc, as well as some interrupt handling. Well, emphasis on 'if system supports hotplug'. If not, then you are back to square one. On ARM64 hotplug is selectable by a config option. So either we mandate HOTPLUG_CPU for SMP and get rid of all the ifdeffery or we need to have a mechanism which works on !HOTPLUG_CPU as well. That whole reboot/shutdown stuff is an unpenetrable mess of notifiers and architecture hackery, so something generic and understandable is really required. Thanks, tglx
diff --git a/arch/Kconfig b/arch/Kconfig index 48b5e103bdb0..a8db7999cd4c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -255,6 +255,11 @@ config ARCH_HAS_UNCACHED_SEGMENT select ARCH_HAS_DMA_PREP_COHERENT bool +# Select to do a full offline on secondary CPUs before reboot. +config ARCH_OFFLINE_CPUS_ON_REBOOT + bool + depends on HOTPLUG_CPU + # Select if arch init_task must go in the __init_task_data section config ARCH_TASK_STRUCT_ON_STACK bool diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 69950fb5be64..d53cc8cb47e3 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -28,6 +28,7 @@ config ARM select ARCH_KEEP_MEMBLOCK if HAVE_ARCH_PFN_VALID || KEXEC select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_NO_SG_CHAIN if !ARM_HAS_SG_CHAIN + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7 select ARCH_SUPPORTS_ATOMIC_RMW diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 9af26ac75d19..9f913bc5c1f6 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -61,6 +61,7 @@ config ARM64 select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION select ARCH_KEEP_MEMBLOCK + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_USE_CMPXCHG_LOCKREF select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index 590963c9c609..f7245dfa09d9 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -581,5 +581,5 @@ int hibernate_resume_nonboot_cpu_disable(void) return -ENODEV; } - return freeze_secondary_cpus(sleep_cpu); + return freeze_secondary_cpus(sleep_cpu, false); } diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index 4acef4088de7..0f03e5c3f2fc 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -5,6 +5,7 @@ config CSKY select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_QUEUED_RWLOCKS if NR_CPUS>2 select COMMON_CLK diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index bab7cd878464..f12b4b11ee98 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -10,6 +10,7 @@ config IA64 bool select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ACPI select ACPI_NUMA if NUMA select ARCH_SUPPORTS_ACPI diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index b6b5f83af169..9bb2556d21fc 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -8,6 +8,7 @@ config MIPS select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_FORTIFY_SOURCE + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_SUPPORTS_UPROBES select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if 64BIT diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 71034b54d74e..41609f00b057 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -13,6 +13,7 @@ config PARISC select ARCH_HAS_STRICT_KERNEL_RWX select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_NO_SG_CHAIN + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_SUPPORTS_MEMORY_FAILURE select RTC_CLASS select RTC_DRV_GENERIC diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 658e0324d256..a6b76dd82a2d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -142,6 +142,7 @@ config PPC select ARCH_KEEP_MEMBLOCK select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_USE_BUILTIN_BSWAP diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 287714d51b47..19eec37b1682 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -102,6 +102,7 @@ config S390 select ARCH_INLINE_WRITE_UNLOCK_IRQ select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE select ARCH_KEEP_MEMBLOCK + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_SAVE_PAGE_KEYS if HIBERNATION select ARCH_STACKWALK select ARCH_SUPPORTS_ATOMIC_RMW diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 9ece111b0254..4ed1e0ca83a2 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -18,6 +18,7 @@ config SUPERH select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A) select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select PERF_USE_VMALLOC select HAVE_DEBUG_KMEMLEAK select HAVE_KERNEL_GZIP diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index e8c3ea01c12f..f31700309621 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -30,6 +30,7 @@ config SPARC select RTC_SYSTOHC select HAVE_ARCH_JUMP_LABEL if SPARC64 select GENERIC_IRQ_SHOW + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_WANT_IPC_PARSE_VERSION select GENERIC_PCI_IOMAP select HAVE_NMI_WATCHDOG if SPARC64 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b595ecb21a0f..e8edab974f67 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -85,6 +85,7 @@ config X86 select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_STACKWALK select ARCH_SUPPORTS_ACPI select ARCH_SUPPORTS_ATOMIC_RMW diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index 1c645172b4b5..c862dfa69ed9 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -7,6 +7,7 @@ config XTENSA select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_UNCACHED_SEGMENT if MMU + select ARCH_OFFLINE_CPUS_ON_REBOOT if HOTPLUG_CPU select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_WANT_FRAME_POINTERS diff --git a/drivers/power/reset/sc27xx-poweroff.c b/drivers/power/reset/sc27xx-poweroff.c index 29fb08b8faa0..d6cdf837235c 100644 --- a/drivers/power/reset/sc27xx-poweroff.c +++ b/drivers/power/reset/sc27xx-poweroff.c @@ -30,7 +30,7 @@ static void sc27xx_poweroff_shutdown(void) #ifdef CONFIG_PM_SLEEP_SMP int cpu = smp_processor_id(); - freeze_secondary_cpus(cpu); + freeze_secondary_cpus(cpu, false); #endif } diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 1ca2baf817ed..9c62274a4db9 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -137,11 +137,14 @@ static inline void cpu_hotplug_done(void) { cpus_write_unlock(); } static inline void get_online_cpus(void) { cpus_read_lock(); } static inline void put_online_cpus(void) { cpus_read_unlock(); } +#if defined(CONFIG_PM_SLEEP_SMP) || defined(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) +extern int freeze_secondary_cpus(int primary, bool reboot); +#endif + #ifdef CONFIG_PM_SLEEP_SMP -extern int freeze_secondary_cpus(int primary); static inline int disable_nonboot_cpus(void) { - return freeze_secondary_cpus(0); + return freeze_secondary_cpus(0, false); } extern void enable_nonboot_cpus(void); @@ -152,7 +155,7 @@ static inline int suspend_disable_secondary_cpus(void) if (IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) cpu = -1; - return freeze_secondary_cpus(cpu); + return freeze_secondary_cpus(cpu, false); } static inline void suspend_enable_secondary_cpus(void) { diff --git a/kernel/cpu.c b/kernel/cpu.c index 9c706af713fb..52d04e4e1aab 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1209,10 +1209,10 @@ int cpu_up(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpu_up); -#ifdef CONFIG_PM_SLEEP_SMP +#if defined(CONFIG_PM_SLEEP_SMP) || defined(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) static cpumask_var_t frozen_cpus; -int freeze_secondary_cpus(int primary) +int freeze_secondary_cpus(int primary, bool reboot) { int cpu, error = 0; @@ -1237,11 +1237,13 @@ int freeze_secondary_cpus(int primary) if (cpu == primary) continue; - if (pm_wakeup_pending()) { +#ifdef CONFIG_PM_SLEEP + if (!reboot && pm_wakeup_pending()) { pr_info("Wakeup pending. Abort CPU freeze\n"); error = -EBUSY; break; } +#endif trace_suspend_resume(TPS("CPU_OFF"), cpu, true); error = _cpu_down(cpu, 1, CPUHP_OFFLINE); @@ -1250,7 +1252,9 @@ int freeze_secondary_cpus(int primary) cpumask_set_cpu(cpu, frozen_cpus); else { pr_err("Error taking CPU%d down: %d\n", cpu, error); - break; + /* When rebooting, offline as many CPUs as possible. */ + if (!reboot) + break; } } diff --git a/kernel/reboot.c b/kernel/reboot.c index c4d472b7f1b4..12f643b66e57 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -7,6 +7,7 @@ #define pr_fmt(fmt) "reboot: " fmt +#include <linux/cpu.h> #include <linux/ctype.h> #include <linux/export.h> #include <linux/kexec.h> @@ -220,7 +221,9 @@ void migrate_to_reboot_cpu(void) /* The boot cpu is always logical cpu 0 */ int cpu = reboot_cpu; +#if !IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) cpu_hotplug_disable(); +#endif /* Make certain the cpu I'm about to reboot on is online */ if (!cpu_online(cpu)) @@ -231,6 +234,11 @@ void migrate_to_reboot_cpu(void) /* Make certain I only run on the appropriate processor */ set_cpus_allowed_ptr(current, cpumask_of(cpu)); + +#if IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT) + /* Offline other cpus if possible */ + freeze_secondary_cpus(cpu, true); +#endif } /**
Currently system reboots uses architecture specific codes (smp_send_stop) to offline non reboot CPUs. Most architecture's implementation is looping through all non reboot online CPUs and call ipi function to each of them. Some architecture like arm64, arm, and x86... would set offline masks to cpu without really offline them. This causes some race condition and kernel warning comes out sometimes when system reboots. This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would offline cpus in migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop for checking online cpus would be an empty loop. If architecture don't enable this config, or some cpus somehow fails to offline, it would fallback to ipi function. Opt in this config for architectures that support CONFIG_HOTPLUG_CPU. Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> --- Change from v4: * fix a few nits: naming, comments, remove Kconfig text... Change from v3: * Opt in config for architectures that support CONFIG_HOTPLUG_CPU * Merge function offline_secondary_cpus() and freeze_secondary_cpus() with an additional flag. Change from v2: * Add another config instead of configed by CONFIG_HOTPLUG_CPU Previous related discussion on list: https://lore.kernel.org/lkml/20190727164450.GA11726@roeck-us.net/ https://lore.kernel.org/patchwork/patch/1117201/ --- arch/Kconfig | 5 +++++ arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/arm64/kernel/hibernate.c | 2 +- arch/csky/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/sh/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 1 + drivers/power/reset/sc27xx-poweroff.c | 2 +- include/linux/cpu.h | 9 ++++++--- kernel/cpu.c | 12 ++++++++---- kernel/reboot.c | 8 ++++++++ 18 files changed, 41 insertions(+), 9 deletions(-)