Message ID | 20200527233457.2531118-1-nobuhiro1.iwamatsu@toshiba.co.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCH/RFC] arm64/kernel: Fix return value when cpu_online() fails in __cpu_up() | expand |
On Thu, May 28, 2020 at 08:34:57AM +0900, Nobuhiro Iwamatsu wrote: > If boot_secondary() was successful, and cpu_online() was an error in > __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7 > ("arm64/kernel: Simplify __cpu_up() by bailing out early"). > Therefore, bringup_wait_for_ap() causes the primary core to wait for a > long time, which may cause boot failure. > This commit sets -EIO to return code under the same conditions. > > Fixes: d22b115cbfbb7 ("arm64/kernel: Simplify __cpu_up() by bailing out early") > CC: Signed-off-by: Gavin Shan <gshan@redhat.com> > CC: Catalin Marinas <catalin.marinas@arm.com> > CC: Mark Rutland <mark.rutland@arm.com> > Tested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp> > Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> > --- > arch/arm64/kernel/smp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 061f60fe452f7..ea677680e4277 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -137,6 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) > if (cpu_online(cpu)) > return 0; > > + ret = -EIO; > pr_crit("CPU%u: failed to come online\n", cpu); > secondary_data.task = NULL; > secondary_data.stack = NULL; This appears to restore the old behaviour, so looks good to me. I'd probably just replace the final 'return ret' with 'return -EIO' since it's never going to be anything else. Also -- aren't you in a pretty serious mess if the CPU starts but doesn't come online? I think the patch is fine, but this really shouldn't happen, right? Could you share your dmesg? Either way: Acked-by: Will Deacon <will@kernel.org> Catalin -- do you want to take this (the problematic change was introduced during the last merge window afaict)? Will
On Thu, 28 May 2020 08:34:57 +0900, Nobuhiro Iwamatsu wrote: > If boot_secondary() was successful, and cpu_online() was an error in > __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7 > ("arm64/kernel: Simplify __cpu_up() by bailing out early"). > Therefore, bringup_wait_for_ap() causes the primary core to wait for a > long time, which may cause boot failure. > This commit sets -EIO to return code under the same conditions. Applied to arm64 (for-next/fixes), thanks! [1/1] arm64/kernel: Fix return value when cpu_online() fails in __cpu_up() https://git.kernel.org/arm64/c/ba051f097fc3
Hi, Thanks for your review. The patch has already been applied... > -----Original Message----- > From: Will Deacon [mailto:will@kernel.org] > Sent: Thursday, May 28, 2020 5:12 PM > To: iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@toshiba.co.jp> > Cc: linux-arm-kernel@lists.infradead.org; Mark Rutland <mark.rutland@arm.com>; catalin.marinas@arm.com; > Signed-off-by : Gavin Shan <gshan@redhat.com>; ishikawa yuji(石川 悠司 TDSC ○DSRC□ET開○ET3) > <yuji2.ishikawa@toshiba.co.jp> > Subject: Re: [PATCH/RFC] arm64/kernel: Fix return value when cpu_online() fails in __cpu_up() > > On Thu, May 28, 2020 at 08:34:57AM +0900, Nobuhiro Iwamatsu wrote: > > If boot_secondary() was successful, and cpu_online() was an error in > > __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7 > > ("arm64/kernel: Simplify __cpu_up() by bailing out early"). > > Therefore, bringup_wait_for_ap() causes the primary core to wait for a > > long time, which may cause boot failure. > > This commit sets -EIO to return code under the same conditions. > > > > Fixes: d22b115cbfbb7 ("arm64/kernel: Simplify __cpu_up() by bailing out early") > > CC: Signed-off-by: Gavin Shan <gshan@redhat.com> > > CC: Catalin Marinas <catalin.marinas@arm.com> > > CC: Mark Rutland <mark.rutland@arm.com> > > Tested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp> > > Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> > > --- > > arch/arm64/kernel/smp.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index 061f60fe452f7..ea677680e4277 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -137,6 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) > > if (cpu_online(cpu)) > > return 0; > > > > + ret = -EIO; > > pr_crit("CPU%u: failed to come online\n", cpu); > > secondary_data.task = NULL; > > secondary_data.stack = NULL; > > This appears to restore the old behaviour, so looks good to me. I'd probably > just replace the final 'return ret' with 'return -EIO' since it's never > going to be anything else. Yeah, your suggestion is better. > > Also -- aren't you in a pretty serious mess if the CPU starts but doesn't > come online? I think the patch is fine, but this really shouldn't happen, > right? Could you share your dmesg? > I paste the dmesg below. It stops with "printk: bootconsole [pl11] disabled" in my environment. After checking the bisect and the code, I realized that there was a problem with the target commit. And SoC enable-method that causes the problem uses spin-table instead of PCSI. ---- Starting kernel ... Booting Linux on physical CPU 0x0000000000 [0x410fd034] Linux version 5.7.0-rc7-00032-g565e4e976d6e (iwamatsu@ryzen7) (gcc version 8.3.0 (Debian 8.3.0-21), GNU ld (GNU Binutils for Debian) 2.34) #8 SMP PREEMPT Thu May 28 20:48:58 JST 2020 Machine model: Toshiba TMPV7700 EVB earlycon: pl11 at MMIO 0x0000000028200000 (options '') printk: bootconsole [pl11] enabled cma: Reserved 16 MiB at 0x00000000b7000000 NUMA: No NUMA configuration found NUMA: Faking a node at [mem 0x0000000080000000-0x00000000b7ffffff] NUMA: NODE_DATA [mem 0xb6e34100-0xb6e35fff] Zone ranges: DMA [mem 0x0000000080000000-0x00000000b7ffffff] DMA32 empty Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000080000000-0x00000000b7ffffff] Initmem setup node 0 [mem 0x0000000080000000-0x00000000b7ffffff] percpu: Embedded 21 pages/cpu s46168 r8192 d31656 u86016 Detected VIPT I-cache on CPU0 CPU features: detected: ARM erratum 845719 Built 1 zonelists, mobility grouping on. Total pages: 225792 Policy zone: DMA Kernel command line: earlycon=pl011,0x28200000 Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear) mem auto-init: stack:off, heap alloc:off, heap free:off Memory: 765600K/917504K available (4670K kernel code, 318K rwdata, 1236K rodata, 320K init, 327K bss, 135520K reserved, 16384K cma-reserved) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 rcu: Preemptible hierarchical RCU implementation. rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8. Tasks RCU enabled. rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 GIC: Using split EOI/Deactivate mode random: get_random_bytes called from start_kernel팝ﴱ with crng_init=0 arch_timer: cp15 timer(s) running at 150.00MHz (phys). clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x2298375bd0, max_idle_ns: 440795208267 ns sched_clock: 56 bits at 150MHz, resolution 6ns, wraps every 2199023255551ns Console: colour dummy device 80x25 printk: console [tty0] enabled printk: bootconsole [pl11] disabled --- > Either way: > > Acked-by: Will Deacon <will@kernel.org> > Thanks! > Catalin -- do you want to take this (the problematic change was introduced > during the last merge window afaict)? > > Will Best regards, Nobuhiro
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 061f60fe452f7..ea677680e4277 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -137,6 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) if (cpu_online(cpu)) return 0; + ret = -EIO; pr_crit("CPU%u: failed to come online\n", cpu); secondary_data.task = NULL; secondary_data.stack = NULL;