diff mbox series

[PATCH/RFC] arm64/kernel: Fix return value when cpu_online() fails in __cpu_up()

Message ID 20200527233457.2531118-1-nobuhiro1.iwamatsu@toshiba.co.jp (mailing list archive)
State New, archived
Headers show
Series [PATCH/RFC] arm64/kernel: Fix return value when cpu_online() fails in __cpu_up() | expand

Commit Message

Nobuhiro Iwamatsu May 27, 2020, 11:34 p.m. UTC
If boot_secondary() was successful, and cpu_online() was an error in
__cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7
("arm64/kernel: Simplify __cpu_up() by bailing out early").
Therefore, bringup_wait_for_ap() causes the primary core to wait for a
long time, which may cause boot failure.
This commit sets -EIO to return code under the same conditions.

Fixes: d22b115cbfbb7 ("arm64/kernel: Simplify __cpu_up() by bailing out early")
CC: Signed-off-by: Gavin Shan <gshan@redhat.com>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Mark Rutland <mark.rutland@arm.com>
Tested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
---
 arch/arm64/kernel/smp.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Will Deacon May 28, 2020, 8:12 a.m. UTC | #1
On Thu, May 28, 2020 at 08:34:57AM +0900, Nobuhiro Iwamatsu wrote:
> If boot_secondary() was successful, and cpu_online() was an error in
> __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7
> ("arm64/kernel: Simplify __cpu_up() by bailing out early").
> Therefore, bringup_wait_for_ap() causes the primary core to wait for a
> long time, which may cause boot failure.
> This commit sets -EIO to return code under the same conditions.
> 
> Fixes: d22b115cbfbb7 ("arm64/kernel: Simplify __cpu_up() by bailing out early")
> CC: Signed-off-by: Gavin Shan <gshan@redhat.com>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Mark Rutland <mark.rutland@arm.com>
> Tested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
> ---
>  arch/arm64/kernel/smp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 061f60fe452f7..ea677680e4277 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -137,6 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  	if (cpu_online(cpu))
>  		return 0;
>  
> +	ret = -EIO;
>  	pr_crit("CPU%u: failed to come online\n", cpu);
>  	secondary_data.task = NULL;
>  	secondary_data.stack = NULL;

This appears to restore the old behaviour, so looks good to me. I'd probably
just replace the final 'return ret' with 'return -EIO' since it's never
going to be anything else.

Also -- aren't you in a pretty serious mess if the CPU starts but doesn't
come online? I think the patch is fine, but this really shouldn't happen,
right? Could you share your dmesg?

Either way:

Acked-by: Will Deacon <will@kernel.org>

Catalin -- do you want to take this (the problematic change was introduced
during the last merge window afaict)?

Will
Catalin Marinas May 28, 2020, 11:53 a.m. UTC | #2
On Thu, 28 May 2020 08:34:57 +0900, Nobuhiro Iwamatsu wrote:
> If boot_secondary() was successful, and cpu_online() was an error in
> __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7
> ("arm64/kernel: Simplify __cpu_up() by bailing out early").
> Therefore, bringup_wait_for_ap() causes the primary core to wait for a
> long time, which may cause boot failure.
> This commit sets -EIO to return code under the same conditions.

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64/kernel: Fix return value when cpu_online() fails in __cpu_up()
      https://git.kernel.org/arm64/c/ba051f097fc3
Nobuhiro Iwamatsu May 28, 2020, 11:31 p.m. UTC | #3
Hi,

Thanks for your review.
The patch has already been applied...

> -----Original Message-----
> From: Will Deacon [mailto:will@kernel.org]
> Sent: Thursday, May 28, 2020 5:12 PM
> To: iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@toshiba.co.jp>
> Cc: linux-arm-kernel@lists.infradead.org; Mark Rutland <mark.rutland@arm.com>; catalin.marinas@arm.com;
> Signed-off-by : Gavin Shan <gshan@redhat.com>; ishikawa yuji(石川 悠司 TDSC ○DSRC□ET開○ET3)
> <yuji2.ishikawa@toshiba.co.jp>
> Subject: Re: [PATCH/RFC] arm64/kernel: Fix return value when cpu_online() fails in __cpu_up()
> 
> On Thu, May 28, 2020 at 08:34:57AM +0900, Nobuhiro Iwamatsu wrote:
> > If boot_secondary() was successful, and cpu_online() was an error in
> > __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7
> > ("arm64/kernel: Simplify __cpu_up() by bailing out early").
> > Therefore, bringup_wait_for_ap() causes the primary core to wait for a
> > long time, which may cause boot failure.
> > This commit sets -EIO to return code under the same conditions.
> >
> > Fixes: d22b115cbfbb7 ("arm64/kernel: Simplify __cpu_up() by bailing out early")
> > CC: Signed-off-by: Gavin Shan <gshan@redhat.com>
> > CC: Catalin Marinas <catalin.marinas@arm.com>
> > CC: Mark Rutland <mark.rutland@arm.com>
> > Tested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
> > Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
> > ---
> >  arch/arm64/kernel/smp.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 061f60fe452f7..ea677680e4277 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -137,6 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> >  	if (cpu_online(cpu))
> >  		return 0;
> >
> > +	ret = -EIO;
> >  	pr_crit("CPU%u: failed to come online\n", cpu);
> >  	secondary_data.task = NULL;
> >  	secondary_data.stack = NULL;
> 
> This appears to restore the old behaviour, so looks good to me. I'd probably
> just replace the final 'return ret' with 'return -EIO' since it's never
> going to be anything else.

Yeah, your suggestion is better.

> 
> Also -- aren't you in a pretty serious mess if the CPU starts but doesn't
> come online? I think the patch is fine, but this really shouldn't happen,
> right? Could you share your dmesg?
>

I paste the dmesg below. It stops with "printk: bootconsole [pl11] disabled" in my environment.
After checking the bisect and the code, I realized that there was a problem with the target commit.
And SoC enable-method that causes the problem uses spin-table instead of PCSI.

----
Starting kernel ...

Booting Linux on physical CPU 0x0000000000 [0x410fd034]
Linux version 5.7.0-rc7-00032-g565e4e976d6e (iwamatsu@ryzen7) (gcc version 8.3.0 (Debian 8.3.0-21), GNU ld (GNU Binutils for Debian) 2.34) #8 SMP PREEMPT Thu May 28 20:48:58 JST 2020
Machine model: Toshiba TMPV7700 EVB
earlycon: pl11 at MMIO 0x0000000028200000 (options '')
printk: bootconsole [pl11] enabled
cma: Reserved 16 MiB at 0x00000000b7000000
NUMA: No NUMA configuration found
NUMA: Faking a node at [mem 0x0000000080000000-0x00000000b7ffffff]
NUMA: NODE_DATA [mem 0xb6e34100-0xb6e35fff]
Zone ranges:
  DMA      [mem 0x0000000080000000-0x00000000b7ffffff]
  DMA32    empty
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000080000000-0x00000000b7ffffff]
Initmem setup node 0 [mem 0x0000000080000000-0x00000000b7ffffff]
percpu: Embedded 21 pages/cpu s46168 r8192 d31656 u86016
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 845719
Built 1 zonelists, mobility grouping on.  Total pages: 225792
Policy zone: DMA
Kernel command line: earlycon=pl011,0x28200000
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 765600K/917504K available (4670K kernel code, 318K rwdata, 1236K rodata, 320K init, 327K bss, 135520K reserved, 16384K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
rcu: Preemptible hierarchical RCU implementation.
rcu:    RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8.
        Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
GIC: Using split EOI/Deactivate mode
random: get_random_bytes called from start_kernel팝ﴱ with crng_init=0
arch_timer: cp15 timer(s) running at 150.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x2298375bd0, max_idle_ns: 440795208267 ns
sched_clock: 56 bits at 150MHz, resolution 6ns, wraps every 2199023255551ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
printk: bootconsole [pl11] disabled
---
 
> Either way:
> 
> Acked-by: Will Deacon <will@kernel.org>
> 
Thanks!

> Catalin -- do you want to take this (the problematic change was introduced
> during the last merge window afaict)?
> 
> Will


Best regards,
  Nobuhiro
diff mbox series

Patch

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 061f60fe452f7..ea677680e4277 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -137,6 +137,7 @@  int __cpu_up(unsigned int cpu, struct task_struct *idle)
 	if (cpu_online(cpu))
 		return 0;
 
+	ret = -EIO;
 	pr_crit("CPU%u: failed to come online\n", cpu);
 	secondary_data.task = NULL;
 	secondary_data.stack = NULL;